Working with Metadata

A set of special Workflow plugins allows to edit the Metadata during a Workflow process (see Metadata and Metadata Tasks). This topic describes what you have to know about Metadata in order to be able to use these plugins effectively.

How data and Metadata influence each other

When Metadata are created, they are based upon a data file. However, modifying one file doesn't automatically change the other, and Metadata aren't reset by default in a Branch, Condition or Loop.

  • Modifying Metadata does not immediately modify the data. This is one of the benefits of Metadata because you can sort it, filter it, sequence it, add data to it, without ever modifying the data file itself. This is important because if you, for instance, filter out certain data pages from the Metadata and then save your data file with the Send to Folder task, the full data file is saved, not the filtered one. However in some cases Metadata does affect your output directly (see How Metadata affects the output).

  • Modifying data does not immediately modify the Metadata. So, if you have a PDF file with Metadata and you use a PDF splitter, the Metadata information would still reflect the original data, not the split. This can generally be resolved by using the Create Metadata plugin (again).

  • Branches, Conditions and Loops (such as the PDF Splitter) do not reset the Metadata. This is important to know in cases where Metadata does affect your output (see How Metadata affects the output). Not handling the Metadata properly in such cases can cause confusing issues because the Metadata and the Data may become out of sync.

How tasks influence Metadata

As a general rule, only Input tasks and Metadata related tasks modify Metadata. There are, however, a few notable exceptions:

How Metadata affects the output

By default the data file is not affected when the Metadata are modified. There are however a few situations in which Metadata will or may affect the output.

Output issues caused by Metadata, and how to avoid them

A Branch, Loop (the PDF Splitter, for instance, or the Loop task) and Condition don't reset the Metadata. This can cause confusing issues if they are used in combination with a task that takes the Metadata into account.
To avoid such issues, either regenerate your Metadata inside the (condition) branch or loop as early as possible (see Create Metadata), or use the Metadata File Management to delete the active Metadata file and let the data file be taken into account instead of the Metadata.

Example

Here is an example of an issue that occurs when Metadata is not re-created in a Loop.

In the following process, the Job file is a PDF that contains several invoices. Some (but not all) of those invoices start with a separator page that you don't want to print. Invoices that don't have a separator page should be printed as-is.
The process would look something like this (by default):

  • Step 2 splits the PDF whenever it encounters a new Invoice Number on the Top Right corner of a page. From this point on, the rest of the process applies to each split (i.e. each invoice).

  • Step 3 checks if the first page is a separator (presumably by looking for some kind of keyword on the page).

  • If a separator page was found, step 4 creates Metadata for the split PDF…

  • ...and step 5 filters out the first page (which means the Metadata unselects the first Data Page, in effect "hiding" it from the Print Output task).

  • Step 6 prints the PDF to a printer. When printing a PDF file in passthrough mode, the Metadata is inspected to determine which pages should print or not. In this case, Page 1 is unselected in the Metadata, therefore the printer receives the job starting from Page 2, which is exactly what you want.

  • Step 7 prints the entire PDF since no separator page was found.

Now here comes the issue:

  • The process moves back up to Task 2 in order to process the second split of the original PDF. The Metadata file still exists in the process! So far, it doesn't impact the rest of the process… but wait…

  • Let's say in step 3 no separator page is found on page 1 of the second split PDF.

  • Step 7 prints that second split PDF… but page 1 is unselected in the Metadata (because the Metadata was carried over from the last split!) so at the very least, you will be missing one page. If the second split has more pages than the first one, other pages at the end will get missing as well, as the Metadata doesn't know about it. Or if it has less pages than the first one, the last pages will be blank.

To avoid running into the issue, you should use the Create Metadata task to re-create the Metadata immediately after every split, thus ensuring that the process cannot, in either branch of the condition, be using the Metadata from the previous split.