|
Setting boundaries using JavaScriptAs soon as you select the On Script option as the trigger for establishing record boundaries (see Record boundaries), you are instructing the DataMapper to read the source file sequentially and to trigger an event each and every time it hits a delimiter. (What a delimiter is, depends on the source data and the settings for that data; see Input data settings (Delimiters)). If you know, for instance, that a PDF file only contains documents that are 3 pages long, your script could keep count of the number of times it's been called since the last boundary was set (that is, the count of delimiters that have been encountered). Each time the count is a multiple of 3, it could set a new record boundary. This is basically what happens when setting the trigger to On Page and specifying 3 as the Number of Pages. Remember that a boundary script is being called on each new delimiter encountered by the DataMapper parsing algorithm. If for instance a database query returns a million records, the script will be executing a million times! Craft your script in such a way that it doesn't waste time examining all possible conditions. Instead, it should terminate as soon as any condition it is evaluating is false.
Accessing dataData available inside each eventEvery time a delimiter is encountered, an event is triggered and the script is executed. The event gives the script access to the data between the current location - the start of a row, line or page - and the next delimiter. So at the beginning of the process for a PDF or text file, you have access to the first page only, and for a CSV or for tabular data, that would be the first row or record. This means that you can:
To access this data in the script, use the get() function of the boundaries object. This function expects different parameters depending on the type of source file; see get(). Getting access to other dataData that is not passed with the event, but that is necessary to define the record boundaries, can be stored in the boundaries object using the setVariable function (see boundaries and setVariable()). The data can be retrieved using the boundaries' getVariable function (see getVariable()). For more information on the syntax, please refer to DataMapper Scripts API. ExamplesBasic example using a CSV fileImagine you are a classic rock fan and you want to extract the data from a CSV listing of all the albums in your collection. Your goal is to extract records that change whenever the artist OR the release year changes. Here's what the CSV looks like: "Artist","Album","Released" The first line is just the header with the names of the CSV columns. The data is already sorted per year, per artist, and per album.
Your goal is to examine two values in each CSV record and to act when either changes. The DataMapper GUI allows you to specify a On Change trigger, but you can only specify a single field. So for instance, if you were to set the record boundary when the "Released" field changes, you'd get the first four lines together inside a single record. That's not what you want since that would include albums from several different artists. And if you were to set it when the "Artist" field changes, the first few records would be OK but near the end, you'd get both the Led Zeppelin 3 and led Zeppelin 4 albums inside the same record, even though they were released in different years. Essentially, we need to combine both these conditions and set the record boundary when EITHER the year OR the artist changes. Here's what the script would look like:
You can try it yourself. Paste the data into the text editor of your choice and save the file to Albums.csv. Then create a new DataMapper configuration and load this CSV as your data file. In the Data Input Settings, make sure you specify the first row contains field names and set the Trigger to On script. Then paste the above JavaScript code in the Expression field and click the Apply button to see the result. Basic example using a text fileThis example is similar to the previous example, but now the data source is a plain text file that looks like this:
The purpose of the script, again, is to set the record boundary when EITHER the year OR the artist changes. The script would look like this:
This script uses the exact same code as used for CSV files, with the exception of parameters expected by the createRegion() method. The get method adapts to the context (the data source file) and therefore expects different parameters to be passed in order to achieve the same thing. Since a text file does not contain column names as a CSV does, the API expects the text regions to be defined using physical coordinates. In this instance: Left, Top, Right, Bottom. To try this code, paste the data into a text editor and save the file to Albums.txt. Then create a new DataMapper configuration and load this Text file as your data file. In the Data Input Settings, specify On lines as the Page delimiter type with the number of lines set to 1. When you now set the boundary Trigger to On script, the file will be processed line per line (triggering the event on each line). Paste the above code in the JavaScript expression field and click the Apply button to see the result. The PDF context also expects physical coordinates, just like the Text context does, but since PDF pages do not have a grid concept of lines and columns, the above parameters would instead be specified in millimeters relative to the upper left corner of each page. So for instance, to create a region for the Year, the code might look like this:
region.createRegion(190,20,210,25) which would create a region located near the upper right corner of the page. That's the only similarity, though, since the script for a PDF would have to look through the entire page and probably make multiple extractions on each one since it isn't dealing with single lines like the TXT example given here. For more information on the API syntax, please refer to DataMapper Scripts API. |
|