Product Specs

Sourceforge.net

IT Specifications

Table of contents

  1. Introduction
  2. Features
  3. System architecture

Introduction

MailWasher Server is an open-source, server-side junk mail filtering package. MailWasher Server differs from other open-source server anti-spam packages in that it offers a polished, well-integrated web interface and built-in quarantine management facilities - making it easier both for administrators to set up and manage the package on your server and for users to use the product on a day-to-day basis - and that it has full support for Windows Server/Exchange Server systems, in addition to Unix-based systems such as Linux and Solaris.

This document is an introduction to MailWasher Server's architecture and functionality. It details the current feature set of the product, discusses the technologies and services used, and gives direction for the ongoing development efforts.

Features

Junk mail filtering

FirstAlert!

FirstAlert! is the first line of MailWasher Server's defence against junk mail. FirstAlert! is a known-message filtering service provided by Firetrust. It works by decoding the content of the message, computing a fully anonymized "signature" - similar in appearance to a cryptographic hash, but designed so that small variations in the message content have little effect on the signature, providing a carefully controlled amount of "fuzziness" in the message-matching process - and then checking that anonymous signature against the FirstAlert! servers.

As such, FirstAlert! is similar in concept to some other message-matching services such as the Distributed Checksum Clearinghouse, Pyzor, and the commercial (but free for personal use) Vipul's Razor, but the service offers strong protection against false positives by requiring that messages are added to the database of junk mail only after they have been not only submitted by users, but also reviewed by Firetrust's content administrators to confirm that they are definitely junk mail. This policy of reviewing all submissions helps prevent accidental or inappropriate submissions entering the database (for example, messages from popular legitimate mailing lists).

In addition, the tight match settings used by the FirstAlert servers ensure that the system's false positive rate remains low even with hundreds of thousands of active messages in the content database; in fact, no false positive due to signature matching has ever been observed. Finally, the pipelined FirstAlert protocol makes communications fast and efficient. Note also that FirstAlert is supported on all MailWasher Server platforms (the open hash systems mentioned above have poor or no support for Windows).

FirstAlert is provided by Firetrust as a paid service, but Firetrust keeps the cost as low as possible to make it possible for organisations of all sizes to take advantage.

Real-time blackhole lists

MailWasher Server supports both IP address RBLs and SMTP envelope sender domain RBLs (RHSBLs). Support for message-content URI blacklists (SURBLs) is planned for the near future.

Sender address blacklists and whitelists

MailWasher Server offers both global and, optionally, per-user email address whitelists and blacklists.

Statistical content analysis

MailWasher Server's statistical content analysis filter combines both traditional word-based "Bayesian" filtering and "trait-based" filtering.

"Traits" are characteristics often observed in junk mail captured in the wild that rarely or never occur in legitimate mail - usually because the characteristics are produced by the commercial junk mail applications used by many spammers, or written in by the spammers themselves in an attempt to evade other junk mail filters. MailWasher Server detects these traits and considers them in addition to the message's words when performing statistical analysis of the message.

Traits are a very useful tool because they combine the filtering power offered by manual rules in many other mail filtering systems together with the accuracy gained from having the traits trained on the actual mail received by the users - ensuring that, for example, the filter is quickly trained so that legitimate commercial mailings that also happen to express some traits are not undesirably filtered.

Other features

Simple installation

Since MailWasher Server is implemented in native binary code, there are no interpreters or libraries that the systems administrator must install, no paths that need to be configured, and no scripts that need to be written. The daemons run automatically as system-wide services, natively on each platform. A GUI installer for Windows, and an executable installer for Linux and Unix, make setup as simple as possible.

Web-based configuration

Once the initial setup of the daemons has been completed, configuring the product is done completely through the web interface. This provides a user-friendly way to control the product and helps to minimize the size of the platform-specific installers that perform the steps above.

Integration of MailWasher Server into the mail server's mail processing, however, is done in the manner usual for that MTA (see below).

Integrated quarantine management

One of the most important features that MailWasher Server provides that many other open-source mail filtering projects do not is a fully integrated, configurable message quarantine system.

Message quarantine is simply a server-side store of junk messages, separate to the user's actual mailbox. There are a number of advantages to using a server-side quarantine store instead of (for example) tagging the message and letting the end user's mail client filter the message into a separate folder; for example:

  • The message is stored only once, regardless of the number of recipients it was addressed to. As the overall percentage of incoming mail that is junk continues to increase, this becomes more and more important: frequently spammers will address each delivery to 10-20 recipients at the domain, and so storing the message separately for each recipient, as many MTAs do, greatly increases the storage requirements.
  • Filing the messages on the server reduces mail download time. If the user sets up filter rules in their mail client to move tagged messages into a local 'Junk' folder, the mail client must still download the message, wasting bandwidth and time.
  • Newly quarantined messages can be automatically summarised for users. MailWasher Server (optionally and configurably) sends automated emails summarising newly-quarantined messages. This reduces the burden on users to remember to scan over their quarantined messages and makes it easier for them to check only those messages they haven't already seen.
  • Providing a web interface to the quarantine store allows automated server-side retraining. Because the user accesses their quarantined messages via the web interface, when they "rescue" a message, MailWasher Server automatically learns from the user's decision, without the user having to separately submit the message back to the server to be retrained.
  • Message rescue can be disabled on a per-feature basis. System administrators can configure whether users may rescue or download messages quarantined by each filtering feature - and even each junk mail category, in the case of FirstAlert. Many companies (in particular) need or want to prevent their users from rescuing certain kinds of messages (for example, pornographic spam, email scams, and viruses).

Strong Windows support

MailWasher Server has had full support for Windows since the first version of the product. This is in contrast to a number of other open-source mail filtering packages that either do not support Windows at all, or support it only with a reduced feature set and/or low-performance architecture (eg. SpamAssassin, which lacks support for the faster 'daemon mode' and for which Razor/Pyzor/DCC are not supported or unreliable on Windows).

Built-in statistics tracking

MailWasher Server logs statistics on what actions are taken on messages and for what reason, making it easy to see how well the different filters are performing.

The web interface provides simple summaries of these statistics, and allows users to export the statistics to XML or CSV, making it easy to import the data into spreadsheet and reporting packages.

Comprehensive online help

MailWasher Server comes with online help for all functionality built into the product.

Currently the help is maintained in a commercial documentation system, but we will be migrating to an open platform to make it easier for open-source developers to contribute.

System architecture

The MailWasher Server package consists of several different interoperating programs. The bulk of the system is contained in two daemons1, one providing the mail filtering, and the other the web interface; the system is split into these two daemons for the sake of robustness, ensuring any that problems with one will have little impact on the other. separate programs/libraries called "mail conduits" integrate with MailWasher Server into the system's SMTP process.

The Mail Processing Daemon

The Mail Processing Daemon, or MPD, contains the core mail processing functionality, examining the messages passed to it by the mail conduit (see below), checking the messages and source details against both internal databases and external services and, when appropriate, quarantining the message. The MPD also performs a number of other tasks, such as sending the automated quarantined message summaries, and handling junk submissions from users.

The MPD is a multithreaded application, using one thread for each client connection and with a number of background threads that sleep the bulk of the time.

The Web Interface daemon

The web interface to the system is provided by a separate daemon, referred to as the MWI (for historical reasons). The MWI has a simple built-in HTTP server and serves user requests from a combination of coded behaviour and simple editable page templates. The web interface allows administrators to configure the product and manage user accounts, and allows end users to manage their quarantined mail, and, where permitted by the administrators, to edit their own email address whitelists and blacklists and configure various aspects of the system's behaviour.

The MWI is a multithreaded application, using one thread for each client connection and with two background threads that sleep the bulk of the time.

Mail conduits

Mail conduits are small pieces of glue software written for particular MTAs (for example, qmail, or Exchange Server). They pass the message to the MPD, and take the actions specified by the MPD when it has finished processing the message.

Sendmail

The mail conduit-MPD interaction is implemented using the de-facto "milter" protocol, originally defined by Sendmail (and now supported by some other MTAs); this means that a mail conduit is not required for Sendmail.

qmail

The basic DJB qmail distribution offers very little configurability, offering no built-in way to integrate external systems such as MailWasher Server. Instead, the qmail conduit follows the qmail design philosophy of small, simple programs that chain together to perform the overall operation. The qmail conduit is placed infron of the standard qmail-queue, which was renamed by the installer.

It is also possible to call MailWasher Server from a .qmail delivery rules file, using a separate conduit program. However, the qmail "straight paper path" architecture provides no way for programs invoked from the .qmail to modify the message, and so unfortunately the conduit cannot add headers or perform other modifications that the MPD may give. Accordingly, this program is not considered an officially supported mail conduit, but it may be useful for those administrators who wish to test MailWasher Server out in a limited role but do not intend to configure the product to add headers.

Exchange

The Exchange conduit is a DLL that implements a COM interface that Microsoft has defined for such mail filtering systems. This DLL's COM object is passed each message as it is moved from the initial incoming message queue (where the messages are stored when during the SMTP conversation) to the message categorizer.

Note that, in contrast to the other MTAs, this means that MailWasher Server is not involved while the actual SMTP conversation is taking place.

Footnotes

  1. On Windows, these "daemons" are native Windows services; this document uses the term "daemons" to refer to both daemons on Unix platforms and services on Windows, unless stated otherwise.