Extract step properties

The Extract step takes information from the data source and places it in the record set that is the result of the extraction workflow. For more information see Extract step and Extracting data.

Description

This subsection is collapsed by default in the interface, to give more screen space to other important parts.

Name: The name of the step. This name will be displayed on top of the step's icon in the Steps pane.

Comments: The text entered here will be displayed in the tooltip that appears when hovering over the step in the Steps pane.

Extraction Definition

  • Data Table: Defines where the data will be placed in the extracted record. The root table is record, any other table inside the record is a detail table. For more information see Extracting transactional data.
  • Append values to current record: When the Extract step is inside a loop, check this to ensure that the extraction will be done in the same detail table as any previous extractions within the same loop. This ensures that, if multiple extracts are present, only one detail table is created.

Field Definition

The following field definition settings are identical for all fields.

  • Field List: The Field List displays each of the single fields that belong to the selected step in a drop-down. Fields can be re-ordered and re-named within the Order and rename fields dialog (see Order and rename fields dialog). Select one of the fields to make further settings for that field.
  • Add Unique ID to extraction field: Check to add a unique numerical set of characters to the end of the extracted value. This ensures no two values are identical in this field in the record set.
  • Mode: Determines the origin of the data. Fields always belong to an Extract step, but they don't necessarily contain extracted data. See Fields for more information.
    • Location: The contents of the data selection determine the value of the extracted field. The settings for location-based fields are listed separately, per file type:
    • JavaScript : The result of the JavaScript Expression written below the drop-down will be the value of the extracted field. If the expression contains multiple lines, the last value attribution (variable = "value";) will be the value. See DataMapper API.
      • Use JavaScript Editor: Click to display the Script Editor dialog.
      • Use selected text: Inserts the text in the current data selection in the JavaScript Expression. If multiple lines or elements are selected, only the first one is used.
      • Use selection: Click to use the value of the current data selection for the extraction.
        If the selection contains multiple lines, only the first line is selected.
    • Properties: The value of the property selected below will be the value of the selected field.
      • Property: This drop-down lists all the currently defined properties (including system properties).
        Custom properties can be defined in the Preprocessor step; see Preprocessor step.
        For an explanation of the objects to which the properties belong, see DataMapper Scripts API.
      • Choose a property button: Click this button to open a filter dialog that lets you find a property based on the first few letters that you type.
  • Type: The data type of the selected data; see Data types. Make sure that the data format that the DataMapper expects matches the actual format of the data in the data source; see Data Format.

Settings for location-based fields in a Text file

  • Left: Defines the start of the data selection to extract.
  • Right: Defines the end of the data selection to extract.
  • Top offset: The vertical offset from the current pointer location in the Data Viewer).
  • Height: The height of the selection box. When set to 0, this instructs the DataMapper to extract all lines starting from the given position until the end of the record and store them in a single field.
  • Use selection: Click to use the value (Left, Right, Top offset and Height) of the current data selection (in the Data Viewer) for the extraction.
    If the selection contains multiple lines, only the first line is extracted.
  • Post Function: Enter a JavaScript expression to be run after the extraction.

    A Post function script operates directly on the extracted data, and its results replace the extracted data. For example, the Post function script replace("-", ""); would replace the first dash character that occurs inside the extracted string.

    • Use JavaScript Editor: Click to display the Script Editor dialog.
  • Trim: Select to trim empty characters at the beginning or the end of the field.
  • Concatenation string:
  • Split: Separate the selection into individual fields based on the Concatenation string defined above.

Settings for location-based fields in a PDF File

These are the settings for location-based fields in a PDF file.

  • Left: Defines the start of the data selection to extract.
  • Right: Defines the end of the data selection to extract.
  • Top offset: The vertical offset from the current pointer location in the Data Sample (Viewer).
  • Height: The height of the selection box.
  • Use selection: Click to use the value (Left, Right, Top offset and Height) of the current data selection for the extraction.
    If the selection contains multiple lines, the lines are by default joined and extracted into one field. To split the lines, select the option Split lines (see below).
  • Post Function: Enter a JavaScript expression to be run after the extraction. For example replace("-","") would replace a single dash character inside the extracted string.
  • Trim: Select to trim empty characters at the beginning or the end of the field.
  • Type: The data type of the selected data; see Data types. If the selected data is split (see below), this setting is applied to the first extracted field. Make sure that the data format that the DataMapper expects matches the actual format of the data in the data source; see Data Format.
  • Split:
    • Split lines: Separate a multi-line selection into individual fields .
    • Join lines: Join the lines in the selection with the Concatenation string defined below.
    • Concatenation string: The (HTML) string used to concatenate lines when they are joined.

Settings for location-based fields in CSV and Database files

These are the settings for location-based fields in CSV and Database files.

  • Column: Drop-down listing all fields in the Data Sample, of which the value will be used.
  • Top offset: The vertical offset from the current pointer location in the Data Sample (Viewer).
    • Use selection: Click to use the value of the current data selection for the extraction.
      If the selection contains multiple lines, only the first line is selected.
  • Post Function: Enter a JavaScript expression to be run after the extraction. For example replace("-","") would replace a single dash character inside the extracted string.
    • Use JavaScript Editor: Click to display the Script Editor dialog.
  • Trim: Select to trim empty characters at the beginning or the end of the field.

Settings for location-based fields in an XML File

These are the settings for location-based fields in an XML file.

  • XPath: The path to the XML field that is extracted.
    • Use selection: Click to use the value of the current data selection for the extraction.
      If the selection contains multiple lines, only the first line is selected.
  • Post Function: Enter a JavaScript expression to be run after the extraction. For example replace("-","") would replace a single dash character inside the extracted string.
    • Use JavaScript Editor: Click to display the Script Editor dialog.
  • Trim: Select to trim empty characters at the beginning or the end of the field.

Data Format

Format settings can be defined in three places: in the user preferences (DataMapper preferences), the current data mapping configuration (Data format settings) and per field via the Step properties.
Any format settings specified per field are always used, regardless of the user preferences or data source settings.

Data format settings tell the DataMapper how to read and parse data from the data source. They don't determine how these data are formatted in the Data Model or in a template. In the Data Model, data are converted to the native data type. Dates, for example, are converted to a DateTime object. How they are displayed in the Data Model depends on the current operating system's regional settings.

  • Negative Sign Before : A negative sign will be displayed before any negative value.

  • Decimal Separator : Set the decimal separator for a numerical value.

  • Thousand Separator : Set the thousand separator for a numerical value.

  • Currency Sign : Set the currency sign for a currency value.

  • Date Format : Set the date format for a date value.

  • Date Language : Set the date language for a date value (ex: If English is selected, the term May will be identified as the month of May).

  • Treat empty as 0 : A numerical empty value is treated as a 0 value.