Custom OCR
With the AnylineOCR Plugin, you have the ability to create your own OCR use case, on the fly with almost no effort. It offers a variety of parameters for you to adjust the scanning process to your use case.
This section describes the parameters in detail.
If you are looking for a How-To on loading the Anyline OCR Plugin on your platform, please refer to the following sections:
Simultaneous Barcode Scanning
Starting from SDK 3.8 Anyline supports simultaneous barcode scanning for any plugin. Additional Information can be found under Simultaneous Barcode Scanning
Warning
- iOS: Anyline OCR Plugin
- Android: Anyline OCR Plugin
- Cordova: Add the Anyline SDK Plugin to your Project
- React-Native: Add the Anyline SDK Plugin to your Project
- Xamarin.iOS: Implementing Anyline
- Xamarin.Android: Plugin-Specific Configurations
Examples
A couple of different examples can be found at Demos and Sample Code: Anyline OCR: AUTO, Demos and Sample Code: Anyline OCR: Grid and Demos and Sample Code: Anyline OCR: Line.
Parameters
scanMode
The scanMode
provides the basis for the scanning experience. There are three options: AUTO
, LINE
or GRID
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
AUTO , LINE or GRID |
- | String | - | ✓ |
Tip
As a rule of thumb: If you can place a grid on top of the text that you want to scan, use GRID
mode, otherwise use LINE
mode. AUTO
automatically detects the valid text within the cutout.
scanMode.AUTO
New in version 3.11.
The AUTO
mode automatically detects the text to be scanned if placed within the cutout.
It automatically detects if the text to be scanned is formed of one or multiple lines, upper or lowercase characters and/or numbers - and adjusts the scan parameters automatically.
Android/Cordova | iOS |
---|---|
AUTO |
ALAuto |
In this mode, all parameters are optional. The parameters contained in the following table can be set in order to improve the scanning process.
Parameter | Advised | Remark |
---|---|---|
minCharHeight | - | The
AUTO mode automatically detects the text height within the cutout |
maxCharHeight | - | The
AUTO mode automatically detects the text height within the cutout |
tesseractLanguages | - | If the font you are trying to scan differs from a standard sans-serif font,
this parameter should be set.
Otherwise there is no need
|
anyline_ocr_char_whitelist | ✓ | Helps to filter false positives like the number 8 instead of the letter B
|
validationRegex | ✓ | Validates the result against the desired structure.
Therefore this parameter helps to avoid false scans.
Especially if the cutout is not placed on top of the text at the start of the scanning.
|
Known Limitations
As of version 3.12, the AUTO
mode detects text automatically up to 30 characters per line in up to 7 lines.
Version 3.11 did not include multiline support, and lowercase character detection was performed checking the anyline_ocr_char_whitelist - this was changed in version 3.12 and higher
scanMode.LINE
The LINE
mode is the best option for scanning multiple or single line(s) of text with an arbitrary length.
This could be an IBAN code, or a mail header with a prior unknown number of lines and length of the lines.
Android/Cordova | iOS |
---|---|
LINE |
ALLine |
Number of characters
The LINE
mode requires at least 4 characers per line. If your use case has 3 or less characters, consider using the AUTO
or GRID
mode
minCharHeight
Defines the minimum height that the symbols need to be considered in the scanning process.
If, for example, you know that the text you are going to scan is rather big, setting this to a high value prevents smaller contours in the image from being taken into account.
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
- | Pixel | Integer | 15 |
✗ |
maxCharHeight
Defines the maximum height that the symbols need to be considered in the scanning process.
If, for example, you know that the text you are going to scan is rather small, setting this to a low value prevents bigger contours in the image from being taken into account.
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
- | Pixel | Integer | 60 |
✗ |
languages
New in version 3.20.
The OCR part of the SDK relies on language files, which are specific to a font and language.
This parameter tells the plugin which language file to use when performing the OCR.
Examples of language filename extensions are .traineddata
or .any
files.
You can use one of the default traineddata files that comes with the SDK bundle, like eng_no_dict
or deu
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
- | - | String | - | ✓ |
.any files and scanMode
As of version 3.20, .any
files can only be used with the scanMode.AUTO
languages vs tesseractLanguages on iOS and Android
On Android and iOS, unlike the deprecated tesseractLanguages, this does not require a call to copyTrainedData. The files still have to be included in the XCode project for iOS.
tesseractLanguages
Deprecated since version 3.20: Use languages instead. This will be removed in the future.
The OCR part of the SDK relies on so called traineddata files, which are specific to a font and language.
This parameter tells the plugin which traineddata file to use when performing the OCR.
You can use one of the default traineddata files that comes with the SDK bundle, like eng_no_dict
or deu
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
- | - | String | - | ✓ |
Load the traineddata file on Android
On Android, the traineddata files must be copied first via copyTrainedData
validationRegex
Defines a Regular Expression which the detected result is validated against.
If a detected result does not match the validationRegex
, it will not be returned.
The Regular Expression is in ECMAScript regular expressions pattern syntax
Hint
As of version 3.12, the Anyline OCR Plugin provides predefined Regular Expressions for
URL
, EMAIL
, ISBN
, VIN
, IMEI
and PRICE
Please see the iOS API Reference and the Android API Reference for further details
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
- | - | String | - | ✗ |
minConfidence
Defines a minimum confidence the SDK has to have in the result to consider it valid.
Cofidence
The confidence describes how certain the SDK feels that the detected result equals the target to scan.
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
0 - 100 |
- | Integer | 60 |
✗ |
Additional Settings in LINE
Mode
minSharpness
Defines a minimum sharpness that is required of the image to be processed further in the SDK.
It is used to avoid time consuming processing of blurry images which are unlikley to return a result.
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
0 - 100 |
- | Integer | 0 (=Off) |
✗ |
Experimental
This parameter is experimental. It is recommended to set an initial sharpness of 50 and gradually increase the value to a threshold where you get satisfying results
Additional Settings in GRID
Mode
charCountX
Defines the number of symbols in horizontal direction in the grid.
For example, if a code to scan consists of 2 rows with 4 symbols each, this would be set to 4.
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
- | - | Integer | 1 |
✓ |
charCountY
Defines the number of symbols in vertical direction in the grid.
For example, if a code to scan consists of 2 rows with 4 symbols each, this would be set to 2.
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
- | - | Integer | 1 |
✓ |
charPaddingXFactor
Defines the average horizontal distance between two characters, measured in percentage of the characters width.
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
- | Percent |
Double | 1.0 |
✗ |
charPaddingYFactor
Defines the average vertical distance between two characters, measured in percentage of the characters height.
Range | Unit | Type | Default | Mandatory |
---|---|---|---|---|
- | Percent |
Double | 1.0 |
✗ |
Setting a Custom Command File
If your use case requires special opimisation, you will be provided a Custom Command File (.ale
) by Anyline.
In order to load the custom command file, please refer to the platform specific implementations
Settings and Custom Command File
Notice that the custom script will override all settings made to the Anyline OCR Config, so you don’t have to set the parameters manually as they are optimized for your use-case