MailEnable Enterprise Guide
Message Filtering / Bayesian filtering / Spam Training Utility
In This Topic
    Spam Training Utility
    In This Topic

    MailEnable provides a command line utility that can be used to manage spam/non-spam dictionaries. This program is called MESPAMCMD.EXE and is located in the MailEnable BIN directory.

    The spam training utility only works on the files stored on the hard disk.  The auto-training feature should be disabled, or the MTA service stopped before any manual update of the dictionary occurs. This is because when you stop the MTA, it will write out any updated dictionary from training, overwriting the existing file.

    MESPAMCMD -option [dictionary, paths]

    Available options:

    -c = Create dictionary

    -v = Verify messages in the specified folder against the nominated dictionary

    -s = Score a single message against the nominated dictionary

    -m = Merge Spam and NoSpam folders into nominated dictionary

    -r = Notifies the spam filter to reload the dictionary

    -w = Notifies the MTA service to write out the dictionary

    -p = Prunes the Dictionary to allow insertion of more words

    Example:

    MESPAMCMD -c C:\TEST\ME.TAB C:\TEST\SPAM C:\TEST\NOSPAM

    An example command line for compiling a dictionary based on the example shown follows:

    MESPAMCMD -c C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\MailEn~1.TAB  C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\Spam C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\NoSpam

    Note: The Spam Training Command Line Utility must use short style file paths (i.e. the paths cannot contain spaces)

    Using XML or Tab delimited files

    Filtering dictionaries can be constructed as either XML or TAB delimited files.

    XML files are slower to load, but may be more desirable if externally managing the dictionary. Tab files are much more efficient (faster loading), so it is advisable to use the default TAB files. The filter determines whether the file is XML or TAB delimited by the file extension. The format for the XML files is:

    <ELEMENTS>

      <ENTRIES W="[number of ham emails]" B="[number of spam emails]">

      <E W="[number in ham emails]" B="[number in spam emails]">word</E>

      <E W="[number in ham emails]" B="[number in spam emails]">word</E>

      …

      …

      </ENTRIES>

    </ELEMENTS>

    Verifying a dictionary

    The command line utility can be used to validate a directory of messages against the dictionary. This will provide a percentage probability of spam for each message in the folder.

    MESPAMCMD -v MailEn~1.TAB C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\Test         

    Scoring a message

    Scoring a single message is much like verifying a directory, except the second parameter is a message file rather than a directory.

    An example of scoring a message follows:

    MESPAMCMD -s MailEn~1.TAB C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\Test\1A38DF23D30845E0B5FF51530A266.MAI

    Merging a dictionary

    Merging a dictionary is much like creating a new dictionary, except that messages in the Spam and NoSpam directories are appended to the dictionary rather than re-creating it. This is useful to add new messages to the dictionary to refine Spam detection.

    An example for merging new content with an existing spam dictionary follows:

    MESPAMCMD -m MailEn~1.TAB C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\Spam C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\NoSpam

    Reload a dictionary

    If changes are made to a dictionary while the spam filter is running, it will not automatically reload it unless it is notified, as the dictionary is held in memory. The dictionary can be reloaded by either restarting the MTA service or using the –r option of the mespamcmd program to tell the spam filter to reload it.

    MESPAMCMD –r

    Pruning a dictionary

    Pruning a directory involves removing any items from the dictionary that will not be able to be used effectively to determine spam or non-spam.  This is done by removing items which very rarely occur, and items which occur almost equally in spam and non-spam emails. To prune, provide the path and filename to a dictionary file. After pruning, this file will be overwritten with the new dictionary.

    MESPAMCMD -p MailEn~1.TAB

    Saving the dictionary

    Dictionary updates from autotraining are saved to disk when the MTA service is stopped. You can notify the MTA service to save out the dictionary by using the -r option. This only applies if autotraining is enabled and only triggers when a message next passes through the MTA service.

    MESPAMCMD -r

    Checking the dictionary

    To check the dictionary, open up the DIC.tab file in the following location using Notepad:

    C:\Program Files\Mail Enable\Dictionaries\DIC.tab

    To check the integrity of the file make sure the first line shows the number of good and bad messages that have been added into the dictionary.  The first number will equal the amount of messages that were in the SPAM folder and the second column equaling the NOSPAM folder.  The first number in the line should equal the amount of bad messages (spam) merged into the dictionary the second number should match the good messages (ham). Each number after this first line equals the amount of good and bad words/tokens were found as a total in each message.