PDF File Settings

Here are the settings for PDF Data Samples.

Delimiters

PDF Files have a natural, static delimiter in the form of Pages, so the options here are interpretation settings for text in the PDF file. Each value represents a fraction of the average font size of text in a data selection, meaning ".3" represents 30% of the height or width.

  • Word spacing: Determines the spacing between words. As PDF Text spacing is somehow done via positioning instead of actual text spaces, position of text is what is used to find new words. This option determines what percentage of the average width of a single character needs to be empty to consider a new word has started. Default value is .3 , meaning a space is assumed if a blank area of 30% of width of the average character in the font.
  • Line spacing: Determines the spacing between lines of text. The default value is 1, meaning the space between lines must be equal to at least the height of the average character height.
  • Paragraph spacing: Determines the spacing between paragraphs. The default value is 1.5, meaning the space between paragraphs must be equal to at least 1.5 times the height of the average character height to start a new paragraph.
  • Magic number: Determines the tolerance factor for all the above values. The tolerance is meant to avoid rounding errors. If two values are more than 70% away from each other they are considered distinct, otherwise they are the same. For example, if two characters have a space of exactly one times the width of the average character, we actually consider any space of between ".7" and "1.43" this average width equals to one space. A space of 1.44 is considered to be 2 spaces.
  • PDF file color space: Determines if the PDF if displayed in Color or Monochrome in the Data Viewer. Monochrome displays is faster in the Data Viewer, but this has no influence on actual data extraction or the data mapping performance.

Boundaries

  • Record Limit: Defines how many Source Records are displayed in the Data Viewer. To disable the limit, use the value "0".
  • Type of rule: Defines the type of rule that defines when a boundary is created, establishing a new record in the Data Sample (called a Source Record).
    • On page: Defines a boundary on a static number of pages.
      • Number of pages: Defines how many pages are in each Source Record.
    • On Text: Defines a boundary on a specific text comparison in the Source Record.
      • Operator: Selects the type of comparison that is done (for example, "contains").
      • Text Value: The text value to compare with the value in the Source Record.
      • Match case: Makes the text comparison case-sensitive.
      • Use Selected Text: Click to use the value of the current data selection as the Text Value.
      • Start Coordinates (x,y): Defines the left and top coordinates of the data selection to compare with the Text Value.
      • Stop Coordinates (x,y): Defines the right and bottom coordinates.
      • Use Selection: Click to set the start and stop coordinates to the current data selection in the Source Record.
      • Times condition found:
      • Pages before/after: Defines the boundary a certain number of pages before or after the current page. This is useful if the text triggering the document is not located on the first page of each Source Record.

Data Samples

The Data Samples box displays all the data sources that have been imported into the data mapping configuration. Right-clicking in the box brings up a control menu:

  • Add: Used to add a new Data Sample from an external Data Source. The new Data Sample will need to be of the same data type as the current one. For example, you can only add PDF files to a PDF data mapping configuration. In version 1.3 and higher, multiple files can be added simultaneously through the Add dialog.
  • Remove: Removes the current Data Sample from the data mapping configuration.
  • Replace: Opens a new Data Sample and replaces it with the contents of a different data source.
  • Reload: Click to reload the currently selected Data Sample and get any changes that have been made to it.
  • Set as Active: Activates the selected Data Sample. The active data sample is shown in the Data Viewer after it has gone through the Preprocessor Step as well as the Delimiter and Boundary settings.

External JS Libraries

External JS libraries included in the Settings Pane can add more JavaScript functionality to your data mapping configuration. Any functions included in the JS library will be available in Preprocessor scripts, as well as Action tasks, Post Functions and JavaScript-based extraction steps.

For example let's take the following JavaScript file, for example:

function myAddFunction(p1, p2) {
    return p1 + p2;
};              

If this is saved as myFunction.js and imported, then the following would work anywhere in the configuration:

var result = myAddFunction(25, 12); // returns 37!

The External JS Libraries box displays all the libraries that have been imported into the data mapping configuration. Right-clicking in the box brings up a control menu, also available through the buttons on the right:

  • Add: Used to add a new external library. Opens the standard Open dialog to browse and open the .js file.
  • Remove: Removes the currently selected library from the data mapping configuration.
  • Replace: Opens a new library and replaces it with the contents of a different js file.
  • Reload: Click to reload the currently selected library and get any changes that have been made to it.

Table of Contents

Index

Glossary

-Search-

Back