|
Boundaries Using JavaScriptAs soon as you select the On Script option as the trigger for establishing record boundaries, you are instructing the DataMapper to read the file sequentially and to trigger an event each and every time it hits a delimiter. In other words, the script that you'll be writing will be executed - by default - as many times as there are delimiters in your input data. If you know, for instance, that your PDF file only contains documents that are 3-pages, your script could keep count of the number of times it's been called since the last boundary was set (i.e. the count of delimiters it encountered) and each time the count is a multiple of 3, it could set a new record boundary. You would basically have recreated the same functionality that is already available through the GUI when setting the trigger to On Page and specifying 3 as the Number of Pages. Remember that your script is being called on each new delimiter encountered by the DataMapper parsing algorithm. If you are dealing with a DB Query that returns a million record, the script will be executed a million times! So you have to craft your script in such a way that it doesn't waste too much time examining all possible conditions. Instead, it should terminate as soon as any condition it's evaluating is false.
Data available inside each eventEvery time the event is triggered, it has access to the entire data between the current location and the next delimiter. So if you are just beginning the process for a PDF or text file, you have access to the first page only. For CSV/DB, it means you have access to the one record line at the current location. What this means?You can:
In fact, make all the comparisons you want as long as it's all located in the data between the current location and the next delimiter. What happens when the record boundaries depend on data found on different pages, within the same record?The API allows your script to "remember", across delimiters, the values that were evaluated in previous pages so you can easily set record boundaries that span over hundreds of delimiters (or more). ExamplesBasic example using a CSV fileIn this first example, don't focus on the actual syntax being used. You can take a look at the API reference later on for more information.
Imagine you are a classic rock fan and you want to extract the data from a CSV listing of all the albums in your collection. Your goal is to extract records that change whenever the artist OR the release year changes. Here's what the CSV looks like: "Artist","Album","Released" "Beatles","Abbey Road",1969 "Beatles","Yellow Submarine",1969 "Led Zeppelin","Led Zeppelin 1",1969 "Led Zeppelin","Led Zeppelin 2",1969 "Beatles","Let it be",1969 "Rolling Stones","Let it bleed",1969 "Led Zeppelin","Led Zeppelin 3",1970 "Led Zeppelin","Led Zeppelin 4",1971 "Rolling Stones","Sticky Fingers",1971 The first line is just the header with the names of the CSV columns. Obviously, the data is already sorted per year, per artist, and per album.
Your goal is to examine two values in each CSV record and to act when either changes. The DataMapper GUI allows you to specify a On Change trigger, but you can only specify a single field. So for instance, if you were to set the record boundary when the "Released" field changes, then you'd get the first four lines together inside a single record, but that's not what you want since that would include albums from several different artists. And if you were to set it when the "Artist" field changes, then the first few records would be OK but near the end, you'd get both the Led Zeppelin 3 and led Zeppelin 4 albums inside the same record, even though they were released on different years. So that's no good either. Essentially, we need to combine both these conditions and set the record boundary when EITHER the year OR the artist changes. Here's what the script would look like:
You can try it yourself. Paste the data into the text editor of your choice and save the file to Albums.csv. Then create a new DataMapper configuration and load this CSV as your data file. In the Data Input Settings, make sure you specify the first row contains field names and set the Trigger to On script. Then paste the above JavaScript code in the Expression field and click the Apply button to see the result. Same basic example using a text fileSo let's say we want to do the exact same thing, but this time around the Data Source is a plain text file that looks like this:
Then our script would look like this:
We're using the exact same code we as used for CSV files, with the exception of parameters expected by the createRegion() method. The API methods adapts to their context and therefore expect different parameters to be passed in order to achieve the same thing. Since a text file does not contain column names as a CSV does, the API expects the text regions to be defined using physical coordinates. In this instance: (Left, Top, Right, Bottom).
You can try this code as well. Paste the data into the text editor of your choice and save the file to Albums.txt. Then create a new DataMapper configuration and load this TXT as your data file. In the Data Input Settings, specify On lines as the Page delimiter type with the number of lines set to 1 so you can process the file line per line (i.e. triggering the event on each line). Then, set the boundary Trigger to On script and paste the above code in the JavaScript expression and click the Apply button to see the result. The PDF context also expects physical coordinates, just like the Text context does, but since PDF pages do not have a grid concept of lines and columns, the above parameters would instead be specified in millimeters relative to the upper left corner of each page. So for instance, to create a region for the Year, the code might look like this:
region.createRegion(190,20,210,25) which would create a region located near the upper right corner of the page. That's the only similarity, though, since the script for a PDF would have to look through the entire page and probably make multiple extractions on each one since it isn't dealing with single lines like the TXT example given here. For more information on the API syntax, please refer to DataMapper API. |
|