Bayesian auto-train feature does not appear to be updating the Bayesian dictionary


SYMPTOMS

Bayesian dictionary does appear to change in contents despite large amounts of spam being captured by 'honey pot' addresses.

CAUSE

Bayesian filtering dictionaries should be balanced and should contain an equal (or close to that) number of spam and ham messages. The reason for this is to ensure that the dictionary remains balanced. If the dictionary is not balanced, MailEnable will be more likely to detect spam, but it introduces an increased risk of false positives (i.e. legitimate mail being detected incorrectly as spam).

 

As such, when Bayesian auto-training is used, it will only add spam to the dictionary if the number of ham messages in the dictionary is equal to the number of spam messages in the dictionary.

If the dictionary does not change, the most likely reason that the dictionary is that there is not enough ham messages being fed into the dictionary (and therefore, none of the detected spam is being added as a consequence).

RESOLUTION

MailEnable 1.x and versions of MailEnable prior to 2.12 do not allow you to easily determine current balance of the in memory dictionary. It is not possible to determine the status of current training without stopping the MTA or using the command utility to instruct the MTA to dump its dictionary to disk.

Later versions of MailEnable Professional or Enterprise Edition can use the METray utility to determine the current number/balance of spam/ham in the dictionary.

The values representing the current number of spam and ham contained in the dictionary can also be inspected through the following registry keys:

Key Root: HKEY_LOCAL_MACHINE\SOFTWARE\Mail Enable\Mail Enable\Agents\MTA\Filters\MTAFILTER\Counters
Value Name: "Bayesian Dictionary Current Spam"
Value Name: "Bayesian Dictionary Current Ham"

MORE INFORMATION

For more information of Bayesian Filtering within MailEnable, please refer to the MailEnable Product Manual .



Product:MailEnable (ME-2.X)
Category:Operation
Article:ME020443
Module:MTA Filtering
Keywords:Bayesian,Auto,Training,spam,ham,dictionary,training,autotraining,autotrain
Class:TRB: Troubleshooting (Configuration or Environment)
Revised:Wednesday, May 4, 2016
Author:
Publisher:MailEnable