Engine configuration

The Connect Server cooperates with different engines to handle specific tasks. A DataMapper engine extracts data from a data file. A Merge engine merges the template and the data to create Email and Web output, or to create an intermediary file for Printed output. The intermediary file is in turn used by a Weaver engine to prepare the Print output. (For more information see: Connect: a peek under the hood).
Settings for these engines are made in the Connect Server Configuration tool (see Server Configuration Settings).

Connect allows for the parallelization of jobs. This means you can allocate 1 or more engines to process jobs.

The number of each type of engine is configurable, as well as the amount of Merge engines than can work together on the same job (determined by job size: small, medium or large) and at what maximum speed.

The Parallel Processing preferences allow you to control precisely how the PlanetPress Connect Connect Server handles jobs.

This gives you, as solution developer or application manager, full control of how to apply a machines power. For example, you can share the available resources to process multiple jobs at once or allocate all resources to process one single job as fast as possible, or anything in between.

Connect distinguishes 5 type of jobs:

  1. Small Print
  2. Medium Print
  3. Large Print
  4. Email
  5. Web

Connect categorizes print jobs on the number of pages they will produce. What constitutes a Small, Medium or Large job can be configured per server (see below, in Allocating processing power to jobs).
There is no distinction between small, medium and large jobs for Email and Web output.

This topic explains all of these settings and the principles behind them, and it provides guidelines for letting the Server manage the workload in such a way as to achieve the highest possible output speeds.

Factors to take into account are:

  • Your licence, which imposes a speed quota (see Speed quota: Pages Per Minute).
  • The processing power of your machine. How many cores it has determines how many engines can be launched (see Launching multiple engines).
  • The size and number of jobs of one kind that need to be handled, sequentially or simultaneously. In other words, your use case. By allocating processing power to jobs of different sizes you can make the setup match your usage situation (see Allocating processing power to jobs).

Other ways to enhance performance are described in another topic: Performance considerations.

Speed quota: Pages Per Minute

The highest possible output speed depends first and foremost upon your licence.
With no Performance Pack, in PlanetPress Connect, one engine can generate output at 500 PPM (Pages and emails and web pages - Per Minute). Additional Performance Packs increase this quota.
The number of engines that are allowed to operate in parallel to create the same type of output are referred to as the Licensed task limit.
PlanetPress Connect provides 6 Licensed tasks. Additional Performance Packs will increase this number.
So, with no Performance Pack, up to 6 engines can create the same type of output in parallel, which means the total maximum output speed is 3,000 PPM per output type (Print, Email, and Web).

One engine needs at least one free speed unit to be able to create output.

It is important to note that only output operations are limited by this quota.

  • Weaver engines always require a Licensed task to run.
  • Merge engines only require a Licensed task when creating Email or Web output. Merge engines involved in a Print process don't need a Licensed task in order to run.
  • DataMapper engines don't need Licensed tasks.

In situations where Print and Email and/or Web output are created at the same time, only the Merge engines that create Email/Web output count towards the maximum number of Licensed tasks for that type of output.

Spare speed units are distributed proportionally

Since the number of engines is configurable, and jobs may run concurrently, the number of engines in use may not match the exact number of available Licensed tasks.
When there are more Licensed tasks than there are engines in use, the Connect server distributes the speed units and the maximum 'pages' per minute to all running jobs in proportion to the number of engines they are using.

Output speed is the speed at which the output is created by the engine in question. Data mapping and other steps in a production process are not taken into account. The throughput speed is the speed of the entire production process. This will always be lower than the output speed.

Launching multiple engines

One single engine can only process a single job at a time and will run mostly single-threaded.
In order to benefit from multi-core systems it is recommended that several engines run in parallel.

As a rule of thumb, you will want to run one less engine in total on a machine than the system has cores, leaving one CPU core free for the Connect Server and the operating system to use.

Modern hardware typically has both full cores and hyper-threading or logical cores. The logical cores should not be counted as a full core when determining how many engines to use. As a guide, count logical cores for only 25%-50% of a full core.
For example: on an Intel i7 CPU that comes with 4 cores and 4 additional hyper-threading cores, Windows Task Manager will show 4 cores and 8 logical processors on its performance tab. On a CPU like this, 5 or 6 engines can be configured to run in parallel.

To configure the number of engines:

  1. Open the Connect Server Configuration utility tool (see Server Configuration Settings).
  2. Under Parallel Processing, go to the Content Creation tab and set the number of Merge engines for the various tasks.
  3. Go to the Output Creation tab and set the Reserved Weaver (Output) engines.
    See Deciding how many engines of each type to launch.
  4. Click Apply or Apply and Close.

It is advised that you do not configure more engines than can be backed by actual processing power. This adds overhead while not adding processing power.

Deciding how many engines of each type to launch

When jobs run in parallel, different types of engines may run at the same time. It depends on the usage situation which type of engines has the biggest impact on performance.

The more and the larger operations of a kind need to be performed simultaneously with smaller operations, the sooner you will see a performance increase when using multiple engines.

Merge engine

Generally, launching a relatively high number of Merge engines results in better performance, as Merge engines are involved in the creation of output of all kinds (Print, Email and Web) and because content creation is relatively time-consuming.

DataMapper engine

Adding DataMapper engines might be useful in the following circumstances:

  • When large data mapping operations have to be run simultaneously for many jobs.
  • When frequently using PDF or XML based input. Particularly in the case of XML input with large individual records.
  • When the All In One plugin is used often in Workflow configurations and there are more than two Merge engines running.

The Connect MariaDB database needs a fast storage system (SSD or other fast devices) to be able to keep up with two or more DataMapper engines.
When the database is installed on a system with a slow hard drive, adding a DataMapper engine may not increase the overall performance.

Weaver engine

Adding extra Weaver (Output) engine(s) might be useful when large Print jobs are to be run simultaneously with smaller Print jobs.

Memory per engine

By default, each engine is set to use up to a predetermined amount of RAM. To make optimum use of a machine's capabilities it might be useful to increase the amount of memory that the various engines can use.

  • DataMapper engines may perform better with greater memory when running jobs containing a lot of data.
  • For complex templates with a lot of pages per document, there is a chance that Merge engines will run better with more memory.
  • The maximum memory usage of a Weaver engine can be relevant for jobs with heavy graphics; or for jobs that use Cut & Stack impositioning; or for jobs using particular variables that entail page buffering (see Content variables).

The Maximum memory per engine setting is found in the Engines preferences.

These settings only control the maximum size of the Java heap memory that an engine can use; the total amount of memory that will used by an engine is actually a bit higher.

Also keep in mind that the Connect Server and the operating system itself will need memory to keep running.

Allocating processing power to jobs

Which engine configuration is most efficient in your case depends on how Connect is used. What kind of output is needed: Print, Email, and/or Web? How often? How big are those jobs? Do they have to be handled at the same time or in sequence? Would it be useful to give priority to small, medium or large jobs, and/or to jobs of a certain kind?

Depending on the answers to these questions, you can allocate processing power to jobs in order to run them as fast as possible, and/or in the order of your preference.

The first step in this process is to define the size of small, medium and large jobs.

Job size

Connect lets you define job sizes by setting the maximum number of pages a job can have and still be considered a small job, and what the minimum number of pages a job can have in order to be considered large. Jobs that fall between the small and large jobs are medium jobs.
Defining small, medium and large jobs is important, as you can assign additional resources to jobs that are considered either medium or large, via the settings for Merge and Weaver engines.

There is no recommendation regarding what number of pages constitute a small, medium or large job. Job size is a relative concept: in a small service company a job may be considered large when it outputs 1,000 pages, whereas that same job in a large insurance company might be seen as small. This setting needs to be based on an assessment of the actual (or expected) workload of Connect.

To set the job sizes:

  1. Open the Connect Server Configuration utility tool (see Server Configuration Settings).
  2. Under Parallel Processing, go to the Output Creation tab and enter the maximum number of pages in a small job.
  3. Enter the minimum number of pages in a large job.
  4. Click Apply or Apply and Close.

Medium jobs will be those that fall between the maximum pages of a small job, and the minimum pages of a large job.

  • The number of engines used for small, medium and large jobs is configurable (see below).
  • Running a job as fast as possible

    Number of parallel engines per Print job

    Two or more engines of a kind can be combined to work on the same Print job. Generally jobs will run faster with more than one engine, because sharing the workload saves time.
    However, running one job with multiple engines reduces the number of jobs that can be handled at the same time by that kind of engine, because there are only so many engines (and speed units) available.

    When each individual record in a job is composed of a very large number of pages, the Memory per engine setting and the machine's hard drive speed are probably more important than the number of Merge engines, since one record cannot be split over multiple cores (see Memory per engine).

    Target speed per Print job

    If a Print job of a specific size has more than one parallel speed unit assigned to it, that multiplies its speed, however it reduces the number of Print jobs that can be run simultaneously.

    When no other Print output operations run at the same time, a single job will use all available speed, or the maximum target speed reserved for jobs of that size (see Dividing processing power over jobs).

    To set a number of speed units per Print job:

    1. Open the Connect Server Configuration utility tool (see Server Configuration Settings).
    2. Under Parallel Processing, go to the Output Creation tab.
    3. Set the Target speed when running simultaneous jobs for small, medium and large Print jobs.
    4. Click OK or Apply.

    Dividing processing power over jobs

    There is a number of ways in which you can divide processing power over output operations of a certain kind and/or size.

    • By reserving engines for jobs of a certain kind (and size, in the case of a Print job). Note that reserved engines cannot be used by any other type of job. This means there will be fewer engines to handle other jobs. Consequently, the other jobs may take more time and may have to wait (or wait longer). However, if the server receives many web requests then having engines reserved for HTML output can help performance.
    • By reserving a number of parallel engines for Print jobs of a certain size (see Number of parallel engines per Print job). More parallel engines will make them run faster, but they will have to wait (longer) if the required number of engines isn't available when they come in.
    • By specifying target speeds for simultaneous Print jobs of a certain size.

    All of these engine configuration settings are found in the Parallel Processing Preferences:

    1. Open the Connect Server Configuration utility tool (see Server Configuration Settings).
    2. Under Parallel Processing, check out the information contained in both Content Creation and Output Creation tabs.

    How the Server decides if a job can be handled

    In summary, this is how jobs are handled when they can run in parallel.

    • Whenever a job comes in, the number of engines to use is determined. (For Print jobs, this is based on whether the operation is small, medium or large; see Job size.)
    • If there are enough reserved Merge engines for that type of job available then those engines will be used.
    • If there are not enough reserved Merge engines available, then any unreserved Merge engine that is available will be used.
    • If no, or not enough, Merge engines are available then the job will have to wait until the required number of appropriate Merge engines becomes available.

    The following limitations apply at all times:

    • The maximum number of concurrent Merge engines working on jobs of the same kind or size may not be exceeded.
    • If no - or not enough - speed units are available for that type of output, the job must wait.

    Examples

    Here are a few examples of use cases and settings that would be appropriate in such cases.

    Batch processing. In a batch processing situation, jobs don't have to be handled simultaneously. All jobs - whether they are big and small - are processed one after another. Every job should be handled as quickly as possible. It is therefor recommended to assign the maximum number of engines and target speeds to all jobs. Do not reserve engines for certain jobs.

    Web requests. In online communication, response times are critical. If the Server receives a lot of Web requests, it should handle as many as possible, as quickly as possible, at the same time. It is recommended to launch as many Merge engines as possible and to reserve most of them for HTML output. The jobs will generally be small and can do with just one Merge engine.

    Mixed jobs that are processed in parallel. In a situation where small, medium and large jobs can come in at any time and should be handled in parallel, the challenge is to find a balance between how much power can be allocated to jobs (to minimize the time they cost) and how long they can wait. No single job should require all of the processing power, unless it is acceptable for it to have to wait until the maximum number of engines finally comes available - and then all other jobs will have to wait.