Optimizing Operation and System Resources Usage

Thread pool control

As Dr.Web MailD modules use multithread model when receiving, processing or delivering a message, each of the modules create a certain number of processing threads. The more mail traffic is to be processed and the more messages are to be checked (for example, when Drweb plug-in scans a large number of messages in the paranoid mode or a large number of Rules are checked for matching), the more threads each component create. All created threads are organized in pools which behavior is controlled by parameters of the PoolOptions type. These parameters also set a number of threads in each pool (both minimum t_min and maximum t_max values). By default, the auto value is set for all thread pools created by all of the modules.

warning

The auto value specified for a pool, sets the following values for t_min and t_max:

For drweb-receiver and drweb-sender modules: t_min=2, t_max=500;

For other modules (drweb-maild, drweb-milter, drweb-notifier and others): t_min=2, t_max=1000.

Restriction on the thread number set for drweb-receiver not only prevents the module from creating more active threads than specified but also influences the module behaviour during SMTP sessions. If the number of connections from clients exceeds the allowed limit of threads in a pool, the module creates maximum number of threads and all other connections, for which a processing thread cannot be created, are queued. Once an active thread becomes free, it starts processing a queued connection. As processing of connections is asynchronous, the same thread can process several connections simultaneously. Queue length of client connections is always restricted to the maximum allowed number of threads in the drweb-receiver pool. Thus, drweb-receiver can simultaneously handle no more than 2*t_max connections, at that, some of them can be queued. Once the queue is full, drweb-receiver stops receiving new connections and responds to clients with the following error:

Server error: 421 3.8 Too many concurrent SMTP connections; please try again later

On heavy load, some new connections are not discarded, as active threads start processing queued connections as soon as they become free and thus, other new connections can be queued. Nevertheless, it is recommended to increase the maximum limit (t_max) of threads in a pool of drweb-receiver module to avoid failure to process new connections. For other Dr.Web MailD modules, increase in number of threads does not influence module operation.

To control the components (number of active threads in pools and queue length), it is recommended to periodically send a SIGUSR1 signal to all processes of Dr.Web MailD.

warning

Dr.Web Monitor and Dr.Web Agent, which control operation of Dr.Web MailD components, do not process SIGUSR1 signal in the current version. Thus, SIGUSR1 signal, if sent to the components, causes them to terminate their operation!

When Dr.Web MailD components receive SIGUSR1 signal, they reset statistics on thread pools. Statistics can be saved either to separate text files or to the log (on Debug level). Location of files with statistics is controlled by the BaseDir parameter from the [General] section. For details on statistics format, refer to Internal Statistics.

Statistics on thread pools of drweb-sender and drweb-receiver modules is saved to sender_thr.txt and receiver_thr.txt files respectively. Statistics contain data on the current size of the pool, number of active threads and queued connections. It is recommended to increase the maximum allowed number of pools (t_max) when the number of queued connections (pending) is approaching to the number of active threads (active).

warning

Actual number of threads (for example, if counted with ps aHx command) is always greater than the number specified in the pool settings. That is because pool settings define only the number of processing threads, but during operation helper threads are also created.

It is required to increase the maximum limit of threads with caution. Before you change the setting, estimate:

Amount of used memory;

Number of files and sockets to be open (that is, file descriptors);

CPU power.

The greater the t_min value is specified (determines the number of threads in a pool), the more time is required by Dr.Web MailD components to start and establish connections. For example, if Drweb anti-virus plug-in is used, threads from the plug-in thread pool establish connection with Dr.Web Daemon on the plug-in startup; therefore, time period which MailD core needs to start increases. If the minimum number of threads in a component pool is too large, time required to start the component can exceed the StartTimeout parameter value specified in the Dr.Web Monitor settings for the component startup. In this case, Dr.Web Monitor abnormally terminates operation of both the component and the whole Dr.Web MailD software suite on startup.

Similarly, if too large number is specified as t_max value (maximum number of threads in a pool), errors can occur on termination of Dr.Web MailD suite when the period required for its components to shut down exceeds the timeout value. In this case, operation of the suite is terminated abnormally by Dr.Web Monitor.

It is not recommended to increase the maximum limit of threads as a reserve, because if that number is too large (about 1000 for drweb-receiver and drweb-sender modules and about 2000 for others), that may cause a delay in creation of new threads and lead to time-out errors while message processing. Such situation can cause processing errors and message loss. If so, decrease the number of threads. If it is impossible, do the following:

1)Increase the time-out value of the IPC subsystem (controlled by the IpcTimeout parameter in the [General] section), for example, to 10 minutes;

2)Increase the maximum allowed time to wait a thread to close, which is used on Dr.Web MailD startup and shutdown (controlled by the MaxTimeoutForThreadActivity parameter in the [General] section), for example, to 3 minutes;

3)Increase the time-out value to wait Dr.Web MailD components to start or shut down in the maild_<mta>.mmc control file of Dr.Web Monitor (as larger number of threads requires more time to stop).

In this case, it is also strongly recommended to adjust parameters of the whole complex (see below).

Possible symptoms of system resources exhaustion

1) It may occur that the successive thread in a pool cannot be created. If so, the following error is logged in a log of the corresponding component:

ERROR <some description>: boost::thread_resource_error

In this case, decrease the number of active threads for the corresponding thread pool. When it is set automatically (auto), specify the thread number explicitly.

If the specified number is not sufficient and increase in number of threads causes an error, increase server performance, that is, install more RAM and increase number of cores available for Dr.Web MailD.

2) On heavy load, processing of messages cannot be performed. If so, Dr.Web MailD logs the following error:

Too many open files

The error occurs because of exhaustion of file descriptors available for Dr.Web MailD (including socket descriptors).

To solve the problem (on Solaris OS 10), before drweb-receiver startup define the LD_PRELOAD_32 environment variable and assign the following value: /usr/lib/extendedFILE.so.1 to it.You can do that:

directly in the console, if drweb-receiver is started not with the starting script, but from the console;

by "wrapping" the startup of drweb-receiver into the script wrapper which sets the required value to this environment variable;

by changing the start script for drweb-monitor (/etc/init.d/drweb-monitor) and adding the corresponding strings that change the system environment variable.

Note that in the last case the environment variable will be defined not only for drweb-receiver, but for all Dr.Web processes run by Dr.Web Monitor.

If that does not fix the problem, leave the made changes and do the following:

increase the ulimit -n values;

add (or adjust, if already exist) the following lines in the /etc/system file:

set rlim_fd_max = 65335
set rlim_fd_cur = 65335

If this error occurs on other OS (FreeBSD or Linux), increase the limit by the number of file descriptors for the process/user and increase the ulimit -n values.

General recommendations on how to enhance performance

To enhance performance on heavy load, it is recommended to:

Use asynchronous mode of message processing, that is, assign the plug-ins to the queues as follows:

BeforeQueueFilters = headersfilter, vaderetro
AfterQueueFilters = drweb, modifer

warning

Plug-ins assigned to BeforeQueueFilters queue can interact with drweb-receiver module synchronically and process messages before they are moved to the database. But if most of the messages have a large attachment (or large number of attachments), their processing by the plug-ins takes considerable time. In this case, it is not recommended to assign the plug-ins to BeforeQueueFilters queue as it can slow down interaction with external MTA when transmitting messages.

Moreover, in this case, a problem can occur while checking messages due to incorrect timeout value (too small) specified in the IpcTimeout parameter. The problem can cause message loss (it will not be deleted and the sender will not be notified on that).

If Dr.Web MailD is integrated with an MTA, it is recommended to use synchronous mode (plug-ins must be assigned to BeforeQueueFilters list). Otherwise, if it is operates as a SMTP/LMTP proxy, it is recommended to use asynchronous mode (plug-ins must be assigned to AfterQueueFilters list). Both parameters are presented in the [Filters] section.

Increase timeout values:

oIPC subsystems (managed by the IpcTimeout parameter in the [General] section);

oMaximum allowed time to wait for a thread to close; used on restart and shutdown of Dr.Web MailD operation (managed by the MaxTimeoutForThreadActivity parameter in the [General] section) ;

oTime to wait for Dr.Web MailD components to start or shutdown in the maild_<mta>.mmc control file of Dr.Web Monitor.

Increase the ulimit -n values.

Estimate load on the thread pools by gathering and analyzing statistics (see above). If required, adjust the limits for the corresponding thread pools.

Mount %var_dir/msgs and %var_dir/infected directories to the tmpfs file system (with the following command mount -t tmpfs tmpfs <directory>, where <directory> – mounted directory).

warning

Mount directories to the tmpfs file system with caution. Note the following information:

The system must have sufficient RAM memory;

If power loss occurs, both external queues and content of Quarantine will be lost.

For all parameters that contain Lookup, use Lookup to files (file:, rfile:), regular expressions (regex:) or lists (as contacting external DBMS and LDAP while processing each message considerably reduces processing speed and success of the processing depends on stability of connection to DBMS and or LDAP server).

Set the value of the MoveAll parameter in the [Quarantine] section to No (especially if %var_dir/msgs and %var_dir/infected are not mounted to tmpfs).

Set the value of the SyncMode parameter in the [MailBase] section to No.

Increase the memory available for the internal DB. For that purpose, increase the MaxPoolSize parameter value in the [MailBase] section, which reduces the number of disk access requests.

Disable use of statistics and reports (by setting the following parameter values Detail=off and Send=no in the [Stat] section and [Reports] section respectively).

Configure logging to files instead of syslog.

Specify protected networks and domains as a list in the ProtectedNetworks and ProtectedDomains parameter values  in the [Maild] section (see the note on Lookup usage mentioned above).

If no message processing Rule contains client-ip parameter in the conditional part, set the GetIpFromReceivedHeader parameter value in the [Maild] section to No.

Set the following parameter values: SkipDSNOnBlock = Yes (in the [Maild] section), SendSDN = No (in the [Sender] section), and try to avoid notify and redirect optional actions in plug-in settings.

Disable Quarantine (remove quarantine action from plug-in settings, if the action is specified).

Limit the maximum size of messages checked by plug-ins (by setting required values to the MaxSizeBeforeQueueFilters and MaxSizeAfterQueueFilters parameters in the [Filters] section).

Values of the StalledProcessingInterval parameters in the [Sender] section and [Receiver] section must not be less than the defaults (10m).

If during Dr.Web MailD operation delay in sending messages occurs and number of queued connections increases for the drweb-sender thread pool, do one of the following (depending on the delivery method set in the Method parameter in the [Sender] section):

For SMTP delivery method:

oDecrease the timeout value set in the OtherCmdsTimeout parameter in the [Sender] section.

oIf value of the Router parameter in the [Sender] section is specified, avoid using Lookup that contact external DBMS and LDAP (see the note on Lookup usage, mentioned above).

oCheck operation of MTAs that receive messages from Dr.Web MailD – check time required to receive MTA response on attempt of drweb-sender module to establish connection as well as where an error occurs while message delivery.

For Pipe delivery method:

oCheck operation of the local MTA that receives messages from Dr.Web MailD. Its daemon that is responsible for local message delivery must be configured correctly.

If during Dr.Web MailD operation number of messages in delivery queue increases (located in the %var_dir/msgs/out directory), it is recommended to send SIGUSR2 signal to drweb-sender module when it is not peak usage time.

Moreover, you can implement cluster solution using internal proxying of requests from Sender and Receiver to several MailD core instances.

Recommendations on configuring Dr.Web MailD if the processed traffic primarily consists of large messages:

1.It is recommended to avoid using Dr.Web Modifier as well as any filtering based on content analysis (that is, avoid setting values to the RejectPartCondition, AcceptPartCondition and MissingHeader  parameters of Dr.Web HeadersFilter plug-in as well as to the RegexForChechecked parameter to Drweb plug-in and etc.). It is recommended because search within MIME objects can significantly reduce message processing. Vaderetro, Dr.Web Modifier and Drweb plug-ins store both message body and headers in RAM memory while processing, so it is recommended to assign the plug-ins to AfterQueueFilters queue (use asynchronous mode).

2.Increase IPC timeout (the IpcTimeout parameter in the [General] section) to 5 minutes, if Drweb plug-in is used.

3.Increase file scanning timeout in Dr.Web Daemon settings as well as the value of the Timeout parameter in Drweb plug-in settings (maximum 10 minutes).

4.Mount directory with messages and Quarantine directory (%var_dir/msgs and %var_dir/infected) into the tmpfs file system, but only if sufficient RAM memory is available in case of large number of messages.

5.Set the maximum size of a message body saved to the internal DB to 1 KB (set MaxBodySizeInDB = 1k in the [MailBase] section).

6.Stop restricting message size and amount of disk space available for Quarantine (set 0 as a value of the MaxSize and MaxNumber parameters in the [Quarantine] section). Note that if DBI is used, Quarantine with large messages will be saved to DBMS, that will cause additional load on server. It does not influence Dr.Web MailD operation directly, but can affect the average load on the server.

7.Restrict time to store messages in Quarantine if no special condition is required (controlled with the StoredTime parameter in the [Quarantine] section). Otherwise, they will consume disk space.

8.Control disk space consumed by %var_dir/msgs/out and %var_dir/msgs/out/failed directories, which allows to detect problems with message delivery and keep free space on the disk.

For stable Dr.Web MailD operation, it is required to have more RAM memory than total amount of messages processed per second and multiplied by the average time required for processing, in seconds. At that, limitation on the number of threads in a pool, specified in component pool settings, restricts the total number of messages per second. The average time required for processing one message in seconds depends only on server capacity (with constatnt settings for plug-ins, processing Rules etc.). Thus, the average time must be estimated with the use of Dr.Web MailD logs for the current architecture. To provide stable operation, pools of drweb-receiver (drweb-milter), drweb-maild and drweb-sender modules must have the same number of threads. However, if drweb-sender uses routing configured by the Router parameter value, this module must have more threads in its pool than other modules.

If a time period of 5 seconds, specified in IpcTimeout, is not enough for message processing, balance the load and decrease the number of threads in pools (see above).