findRegExp()

Finds the first occurrence of a string that matches the given regular expression pattern, starting from the current position.

findRegExp(regexpToFind, flags, leftConstraint, rightConstraint): rectValueText)

Finds the first match for a given regular expression pattern starting from the current position. Regular expression flags (i,s,L,m,u,U,d) are specified in the flags parameter. The search can be constrained to a vertical column of characters located between the left and right constraint, each expressed in characters (in a text file) or millimeters (in a PDF file).
Partial matches are not allowed. The entire match for the regular expression pattern must be found between the two constraints.

The method returns null if the regular expression produces no match. Otherwise it returns a RectValueText object, containing the Left, Top, Right and Bottom coordinates - expressed in characters (in a text file) or millimeters (in a PDF file), relative to the upper left corner of the current page - of the smallest possible rectangle that completely encloses the first match for the regular expression.

Calling this method does not move the current position to the location where the match occurred. This allows you to use the method as a look-ahead function without disrupting the rest of the data mapping workflow.

This function evaluates the regular expression pattern for each individual line. Multiline regular expression patterns, i.e., patterns with tokens such as line end tokens (\n) are not supported.

regexpToFind

Regular expression pattern to find.

flags

i: Enables case-insensitive matching. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag (u) in conjunction with this flag.

s: Enables dotall mode. In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.

L: Enables literal parsing of the pattern. When this flag is specified, then the input string that specifies the pattern is treated as a sequence of literal characters. Metacharacters or escape sequences in the input sequence will be given no special meaning. The CASE_INSENSITIVE (i) and UNICODE_CASE (u)flags retain their impact on matching when used in conjunction with this flag. The other flags become superfluous.

m: Enables multiline mode. In multiline mode, the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default, these expressions only match at the beginning and the end of the entire input sequence.

u: Enables Unicode-aware case folding. When this flag is specified, then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag (i), is done in a manner consistent with the Unicode Standard. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched.

U: Enables the Unicode version of Predefined character classes and POSIX character classes. When this flag is specified, then the (US-ASCII only) Predefined character classes and POSIX character classes are in conformance with Unicode Technical Standard #18: Unicode Regular Expression Annex C: Compatibility Properties.

d: Enables Unix lines mode. In this mode, only the '\n' line terminator is recognized in the behavior of ., ^, and $.

leftConstraint

Number indicating the left limit from which the search is performed. This is expressed in characters for a text file, or in millimeters for a PDF file.

rightConstraint

Number indicating the right limit to which the search is performed. This is expressed in characters for a text file, or in millimeters for a PDF file.

Examples

data.findRegExp(/\d{3}-[A-Z]{3}/,"gi",50,100);

data.findRegExp("\\d{3}-[A-Z]{3}","gi",50,100);}}

Both expressions would match the following strings: 001-ABC, 678-xYz.

Note how in the second version, where the regular expression is specified as a string, some characters have to be escaped with an additional backslash, which is standard in JavaScript.