Engine configuration

The Connect Server cooperates with different engines to handle specific tasks. A DataMapper engine extracts data from a data file.A Merge engine merges the template and the data to create Email and Weboutput, or to create an intermediary file for Printed output. The intermediary file is in turn used by a Weaver engine to prepare the Print output. (For more information see: Connect: a peek under the hood).
Settings for these engines are made in the Connect Server Configuration tool (see Server Configuration Settings). These 'Scheduling Preferences' allow you to control precisely how the PlanetPress Connect Connect Server handles jobs.

Connect allows for the parallelization of jobs. This means you can allocate 1 or more engines to process jobs. The number of each type of engine is configurable, as well as the amount of engines than can work together on the same job (determined by job size: small, medium or large) and at what maximum speed.

This gives you, as solution developer or application manager, full control of how to apply a machines power. For example, you can share the available resources to process multiple jobs at once or allocate all resources to process one job as fast as possible, or anything in between.

Connect distinguishes 5 type of jobs:

  1. Small Print

  2. Medium Print

  3. Large Print

  4. Email

  5. Web

Connect categorizes incoming print jobs on the number of records inside it. The boundaries between a Small to Medium job and Medium to Large job can be configured per server (see below, Allocating processing power to jobs).
There is no distinction between small, medium and large jobs for Email and Web output.

This topic explains all of these settings and the principles behind them, and it provides guidelines for letting the Server manage the workload in such a way as to achieve the highest possible output speeds.

Factors to take into account are:

  • Your licence, which imposes a speed quota (see Speed quota: PPM and speed units).

  • The processing power of your machine. How many cores it has determines how many engines can be launched (see Launching multiple engines).

  • The size and number of jobs of one kind that need to be handled, sequentially or simultaneously. In other words, your use case. By allocating processing power to jobs of different sizes you can make the setup match your usage situation (see Allocating processing power to jobs).

Other ways to enhance performance are described in another topic: Performance considerations.

Speed quota: PPM and speed units

The highest possible output speed depends first and foremost upon your licence.
With no Performance Pack, in PlanetPress Connect, one engine can generate output at 500 PPM (Pages and emails and web pages - Per Minute). Additional Performance Packs increase this quota.
The number of engines that are allowed to operate in parallel to create the same type of output are referred to as speed units.
PlanetPress Connect provides 6 speed units. Additional Performance Packs will increase this number.
So, with no Performance Pack, up to 6 engines can create the same type of output in parallel, which means the total maximum output speed is 3,000 PPM per output type (Print, Email, and Web).

One engine needs at least one free speed unit to be able to create output.

It is important to note that only output operations are limited by this quota.

  • Weaver engines always require a speed unit to run.

  • Merge engines only require a speed unit when creating Email or Web output. Merge engines involved in a Print process don't need a speed unit in order to run.

  • DataMapper engines don't need speed units.

In situations where Print and Email and/or Web output are created at the same time, only the Merge engines that create Email/Web output count towards the maximum number of speed units for that type of output.

Spare speed units are distributed proportionally

Since the number of engines is configurable, and jobs may run concurrently, the number of engines in use may not match the exact number of available speed units.
When there are more speed units than there are engines in use, the Connect server distributes the speed units and the maximum 'pages' per minute to all running jobs in proportion to the number of engines they are using.

Output speed is the speed at which the output is created by the engine in question. Data mapping and other steps in a production process are not taken into account. The throughput speed is the speed of the entire production process. This will always be lower than the output speed.

Launching multiple engines

One single engine can only process a single job at a time and will run mostly single-threaded. In order to benefit from multi-core systems it is recommended that several engines run in parallel.
As a rule of thumb, you will want to run one less engine in total on a machine than the system has cores, leaving one CPU core for the Connect Server and the operating system to use.

Modern hardware typically has both full cores and hyper-threading or logical cores. The logical cores should not be counted as a full core when determining how many engines to use. As a guide, count logical cores for only 25%-50% of a full core.
For example: on an Intel i7 CPU that comes with 4 cores and 4 additional hyper-threading cores, Windows Task Manager will show 4 cores and 8 logical processors on its performance tab. On a CPU like this, 5 or 6 engines can be configured to run in parallel.

To configure the number of engines:

  1. Select Window > Preferences... from the menu.

  2. Under Scheduling, select a type of engine.

  3. Set Local engines launched to a number appropriate for your system. See Deciding how many engines of each type to launch.

  4. Click OK or Apply.

It is advised that you do not configure more engines than can be backed by actual processing power. This adds overhead while not adding processing power.

Deciding how many engines of each type to launch

When jobs run in parallel, different types of engines may run at the same time. It depends on the usage situation which type of engines has the biggest impact on performance.

The more and the larger operations of a kind need to be performed simultaneously with smaller operations, the sooner you will see a performance increase when using multiple engines.

Merge engine

Generally, launching a relatively high number of Merge engines results in better performance, as Merge engines are involved in the creation of output of all kinds (Print, Email and Web) and because content creation is relatively time-consuming.

DataMapper engine

Adding DataMapper engines might be useful in the following circumstances:

When large data mapping operations have to be run simultaneously for many jobs When frequently using PDF or XML based input. Particularly in the case of XML input with large individual records. When the All-In-One plugin is used often in Workflow configurations and there are more than two Merge engines running. The Connect MySQL database needs a fast storage system (SSD or other fast devices) to be able to keep up with two or more DataMapper engines. When the database is installed on a system with a slow hard drive, adding a DataMapper engine may not increase the overall performance.

Weaver engine

Adding extra Weaver (Output) engine(s) might be useful when large Print jobs are to be run simultaneously with smaller Print jobs.

Memory per engine

By default, each engine is set to use up to a predetermined amount of RAM. To make optimum use of a machine's capabilities it might be useful to increase the amount of memory that the various engines can use.

  • DataMapper engines may perform better with greater memory when running jobs containing a lot of data.

  • For complex templates with a lot of pages per document, there is a chance that Merge engines will run better with more memory.

  • The maximum memory usage of a Weaver engine can be relevant for jobs with heavy graphics; or for jobs that use Cut & Stack impositioning; or for jobs using particular variables that entail page buffering (see Content variables).

The Maximum memory per engine setting is found in the Engine Setup preferences.

These setting only control the maximum size of the Java heap memory that an engine can use; the total amount of memory that will used by an engine is actually a bit higher.

Also keep in mind that the Connect Server and the operating system itself will need memory to keep running.

Allocating processing power to jobs

Which engine configuration is most efficient in your case depends on how Connect is used. What kind of output is needed: Print, Email, and/or Web? How often? How big are those jobs? Do they have to be handled at the same time or in sequence? Would it be useful to give priority to small, medium or large jobs, and/or to jobs of a certain kind?

Depending on the answers to these questions, you can allocate processing power to jobs in order to run them as fast as possible, and/or in the order of your preference.
The first step in this process is to define the size of small, medium and large jobs.

Job size

Connect lets you define job sizes by setting the maximum number of records in a small job, and the minimum number of records in a large job. Jobs that are neither small nor large are medium sized. (Note that the term 'records' refers to top-level records only. Detail records are not considered.)
Determining the size of small, medium and large jobs is important because you can assign more resources to medium and large jobs via the settings for Merge and Weaver engines.

There is no recommendation to make for the number of records in a small, medium or large job. This setting needs to be based on an assessment of the actual (or expected) workload of Connect. Job size is a relative concept: in a small service company a job may be considered large when it counts 1,000 records, whereas in a large insurance company the same job may be seen as small. Also take into account that jobs with fewer records could actually be medium or large if each individual record outputs 10,000 pages.

To set the job sizes:

  1. Select Window > Preferences... from the menu.

  2. Under Scheduling, enter the maximum number of records in a small job. Note that small jobs always get just one engine, so they should be easily handled by one engine.

  3. Enter the minimum number of records in a large job. The number of engines used for medium and large jobs is configurable (see below).

  4. Click OK or Apply.

Running a job as fast as possible

Number of parallel engines per Print job

Two or more engines of a kind can be combined to work on the same Print job. Generally jobs will run faster with more than one engine, because sharing the workload saves time.
However, running one job with multiple engines reduces the number of jobs that can be handled at the same time by that kind of engine, because there are only so many engines (and speed units) available.

To select a number of parallel engines per Print job size:

  1. Select Window > Preferences... from the menu.

  2. Under Scheduling, select Merge Engine.

  3. Set the Parallel engines per job for medium and large Print jobs; small jobs always get one engine. You cannot assign more engines than the total number of engines launched.

  4. Do the same for the Weaver engine.

  5. Click OK or Apply.

When each individual record in a job is composed of a very large number of pages, the Memory per engine setting and the machine's hard drive speed are probably more important than the number of Merge engines, since one record cannot be split over multiple cores (see Memory per engine).

Number of speed units per Print job

If a Print job of a specific size has more than one parallel speed unit assigned to it, that multiplies its speed, however it reduces the number of Print jobs that can be run simultaneously.

When no other Print output operations run at the same time, a single job will get all available speed units, or the maximum number of speed units reserved for jobs of that size (see Dividing processing power over jobs).

To set a number of speed units per Print job:

  1. Select Window > Preferences... from the menu.

  2. Under Scheduling, select Weaver Engine.

  3. Set the Parallel speed units per job for medium and large Print jobs. Small jobs get one speed unit. You cannot assign more speed units than what you have available according to your licence.

    Keep in mind that when there aren't enough speed units available at the moment a job comes in, the job will have to wait.

  4. Click OK or Apply.

Number of speed units for Email and Web

Although assigning parallel speed units to Email and HTML jobs is possible (on the Merge Engine settings page), it is advised to use only one speed unit per job, firstly because these jobs are usually small.
Secondly, HTML jobs need to be handled as soon as possible, particularly if a request was made over the internet; you don't want those jobs to be kept waiting until the required number of speed units becomes available.
Email output doesn't benefit much from speed units, because most time is spent on communicating with an external SMTP server.

Dividing processing power over jobs

There is a number of ways in which you can divide processing power over output operations of a certain kind and/or size.

  • Byreserving Merge engines for jobs of a certain kind (and size, in the case of a Print job). Note that reserved engines cannot be used by any other type of job. This means there will be fewer engines to handle other jobs. Consequently, the other jobs may take more time and may have to wait (or wait longer). However, if the server receives many web requests then having engines reserved for HTML output can help performance.

  • By setting the maximum number of Merge engines that can handle jobs of a certain kind, or (in case of Print jobs) size, concurrently. This setting is useful to ensure that there will always be some Merge engines available for jobs of another size or kind.

  • By requiring a number of parallel engines for Print jobs of a certain size (see Number of parallel engines per Print job). More parallel engines will make them run faster, but they will have to wait (longer) if the required number of engines - and speed units - isn't available when they come in.

  • By reserving speed units for Print jobs of a certain size.

  • By setting the maximum number of speed units to use for Print jobs of a certain size. This setting is useful to ensure that there will always be some speed units available for other Print jobs. It also limits the number of Print jobs of that size that can run concurrently: if, for example, large Print jobs require 4 parallel engines, and the maximum number of speed units for large Print jobs is also 4, only one large Print job can run at a time.

All of these engine configuration settings are found in the Scheduling Preferences:

  1. Select Window > Preferences... from the menu.

  2. Under Scheduling, select Merge Engine, or Weaver Engine.

How the Server decides if a job can be handled

In summary, this is how jobs are handled when they can run in parallel.

  • Whenever a job comes in, the number of engines to use is determined. (For Print jobs, this is based on whether the operation is small, medium or large; see Job size.)

  • If there are enough reserved Merge engines for that type of job available then those engines will be used.

  • If there are not enough reserved Merge engines available, then any unreserved Merge engine that is available will be used.

  • If no, or not enough, Merge engines are available then the job will have to wait until the required number of appropriate Merge engines becomes available.

The following limitations apply at all times:

  • The maximum number of concurrent Merge engines working on jobs of the same kind or size may not be exceeded.

  • If no - or not enough - speed units are available for that type of output, the job must wait.

Examples

Here are a few examples of use cases and settings that would be appropriate in such cases.

Batch processing. In a batch processing situation, jobs don't have to be handled simultaneously. All jobs - whether they are big and small - are processed one after another. Every job should be handled as quickly as possible. It is therefor recommended to assign the maximum number of engines and speed units to all jobs. Do not reserve engines or speed units for certain jobs.

Web requests. In online communication, response times are critical. If the Server receives a lot of Web requests, it should handle as many as possible, as quickly as possible, at the same time. It is recommended to launch as many Merge engines as possible and to reserve most of them for HTML output. The jobs will generally be small and can do with just one Merge engine.

Mixed jobs that are processed in parallel. In a situation where small, medium and large jobs can come in at any time and should be handled in parallel, the challenge is to find a balance between how much power can be allocated to jobs (to minimize the time they cost) and how long they can wait. No single job should require all of the processing power, unless it is acceptable for it to have to wait until the maximum number of engines finally comes available - and then all other jobs will have to wait.