The latest version of PlanetPress Suite (7.5) now gives PlanetPress Capture the capability of doing the Intelligent Character Recognition (ICR). However, this technology comes with certain limitations. A successful integration of ICR within a business requires the application of best practices by all parties involved: Form designer, Workflow designer and User.
The present document gives a list of recommended best practices. Each of these guidelines aim at maximizing the likelihood that the characters are recognized; and minimize the risk of errors due to an incorrect analysis.
You will find the following information, when applicable, for each best practice:
This section describes a list of the best practices to implement. They are listed in no particular order of importance. Pay attention to the targeted audience to know if this rule applies to you.
The following guidelines are applicable when configuring a PlanetPress Capture object that utilizes ICR:
The collected data is expected to be a number, therefore the numeric mask type must be selected, or
The collected data is expected to be a letter, therefore the alphabet mask type must be selected,
If upper case letters are expected, select Upper case in the Case option menu. The captured characters would be immediately converted to capital letter i.e. the ICR engine will recognize a lower case a but will display it in upper case.
If lower case letters are expected, select Lower case in the Case option menu. Same as for upper case letters, the captured characters would be converted to lower case and displayed as such.
If proper names or nouns are expected (i.e. only the first letter must be a capital letter), select Capitalization in the Case option menu.Only the first letter would be converted to a capital letter.
If no specific format is expected, select None in the Case option menu.The letters will be interpreted as written, no conversion will be done i.e. characters in lower case will be displayed as such.
The collected data is expected to be a combination of numbers and letters, therefore the alphanumeric mask type must be selected.
Why: Reducing the number of expected characters increases the probability that the correct one is matched. This allows us to avoid that the letter l (a lowercase L) is not recognized as the numeric value 1 (one) and vice versa. Or, if the mask type is identified as alphanumeric, there’s a possibility that the letter a is recognized as 2; since Capture will also interpret how the movement was traced.
How: Use the following options from the Capture options tab under Mask Type and Case option to filter the expected data.
The following diagram illustrates the available mask types. It is recommended to select the mask type that is the closest to the desired result. An alphanumeric field should be used as a last resort.
Why: To avoid any ink marks that would spill over from one field to another. If both fields A and B are to close in proximity and the ink marks from field A spill over to field B, then the marks captured on field B would be considered as being part of a character written on field B. For example, if a number spills over and is written over two fields like numbers 9, 1 or 7; then the bottom tip of these numbers could be considered as number 1 in the second field. (Refer to the example below)
How: Make sure there’s enough space between each field. You must re-design the document if that’s the case. There’s no minimum value that is required as the distance between 2 fields, except for the 7mm border that is required in order for the Anoto digital pen to recognize the pattern being used.
WARNING! You must write on a flat and smooth surface i.e. a delivery person should use a clipboard.
Why: Some numbers can create some confusion, like numbers 7 and 1. 7 can be interpreted as a 1 and vice versa. The letter i, where the dot on top is a circle, can possibly cause a conflict because the dot can be considered as an o.
How:
Write an additional line under the number 1.
Write an additional line across the number 7.
The ICR functionality of PlanetPress Capture cannot recognize dotted letters where there are circles instead of dots (like i , j). This would be analyzed as an i AND o. Therefore, dots should be as such and not circles.
In French, the ç is somewhat sensitive. You must apply yourself and draw the letter carefully. In most cases, it is recognized, but attention must be paid.
Number 8 is also sensitive. It is recommended that the number is traced as one movement instead of drawing 2 circles on top of each other.
Why: The available filters to interpret the ink marks done with the Anoto digital pen, allow you to select the engine language to be used. Doing so will give you results that are the closest match to the captured data. Multiple cultural characters can be interpreted with ICR once the correct language is selected such as û, à, é, etc.
How: This option is available from the Capture field processor task.
Why: An automated process can treat the characters incorrectly due to an incorrect interpretation of a value. This occurrence should be minimized as much as possible.
How: Allow for a special process (possibly manual handling) in the case the automated process didn’t reach a high confidence level in its analysis of the ink marks. Use the plugin Capture condition that includes the ICRContent option. This can be configured to be a true condition if the confidence level is greater than a certain value.