Custom OCR

With the AnylineOCR Plugin, you have the ability to create your own OCR use case, on the fly with almost no effort. It offers a variety of parameters for you to adjust the scanning process to your use case.

This section describes the parameters in detail.
If you are looking for a How-To on loading the Anyline OCR Plugin on your platform, please refer to the following sections:

Simultaneous Barcode Scanning

Starting from SDK 3.8 Anyline supports simultaneous barcode scanning for any plugin. Additional Information can be found under Simultaneous Barcode Scanning

Warning

Any *.ale or *.any file from a version lower than Anyline 8 are NOT compatible with this Version (Anyline 42.3.0).

Parameters

scanMode

The scanMode provides the basis for the scanning experience. There are three options: AUTO, LINE or GRID

Range Unit Type Default Mandatory
AUTO, LINE or GRID - String -

Tip

As a rule of thumb: If you can place a grid on top of the text that you want to scan, use GRID mode, otherwise use LINE mode. AUTO automatically detects the valid text within the cutout.

scanMode.AUTO

New in version 3.11.

The AUTO mode automatically detects the text to be scanned if placed within the cutout. It automatically detects if the text to be scanned is formed of one or multiple lines, upper or lowercase characters and/or numbers - and adjusts the scan parameters automatically.

Android/Cordova iOS
AUTO ALAuto

In this mode, all parameters are optional. The parameters contained in the following table can be set in order to improve the scanning process.

Parameter Advised Remark
minCharHeight -
The AUTO mode automatically detects the text height within the cutout
maxCharHeight -
The AUTO mode automatically detects the text height within the cutout
tesseractLanguages -
If the font you are trying to scan differs from a standard sans-serif font,
this parameter should be set.
Otherwise there is no need
anyline_ocr_char_whitelist
Helps to filter false positives like the number 8 instead of the letter B
validationRegex
Validates the result against the desired structure.
Therefore this parameter helps to avoid false scans.
Especially if the cutout is not placed on top of the text at the start of the scanning.

Known Limitations

As of version 3.12, the AUTO mode detects text automatically up to 30 characters per line in up to 7 lines.
Version 3.11 did not include multiline support, and lowercase character detection was performed checking the anyline_ocr_char_whitelist - this was changed in version 3.12 and higher

scanMode.LINE

The LINE mode is the best option for scanning multiple or single line(s) of text with an arbitrary length.

This could be an IBAN code, or a mail header with a prior unknown number of lines and length of the lines.

Android/Cordova iOS
LINE ALLine

Number of characters

The LINE mode requires at least 4 characers per line. If your use case has 3 or less characters, consider using the AUTO or GRID mode

scanMode.GRID

With the GRID scan mode, you can scan text that is equally laid out in a grid. One example would be Loyalty codes on cans.

This could be a bottlecap code, scrabble letters, or any other use case in which the text can be placed in an imaginary grid.

Android/Cordova iOS
GRID ALGrid

minCharHeight

Defines the minimum height that the symbols need to be considered in the scanning process.

If, for example, you know that the text you are going to scan is rather big, setting this to a high value prevents smaller contours in the image from being taken into account.

Range Unit Type Default Mandatory
- Pixel Integer 15

maxCharHeight

Defines the maximum height that the symbols need to be considered in the scanning process.

If, for example, you know that the text you are going to scan is rather small, setting this to a low value prevents bigger contours in the image from being taken into account.

Range Unit Type Default Mandatory
- Pixel Integer 60

languages

New in version 3.20.

The OCR part of the SDK relies on language files, which are specific to a font and language.
This parameter tells the plugin which language file to use when performing the OCR.

Examples of language filename extensions are .traineddata or .any files.

You can use one of the default traineddata files that comes with the SDK bundle, like eng_no_dict or deu

Range Unit Type Default Mandatory
- - String -

.any files and scanMode

As of version 3.20, .any files can only be used with the scanMode.AUTO

languages vs tesseractLanguages on iOS and Android

On Android and iOS, unlike the deprecated tesseractLanguages, this does not require a call to copyTrainedData. The files still have to be included in the XCode project for iOS.

tesseractLanguages

Deprecated since version 3.20: Use languages instead. This will be removed in the future.

The OCR part of the SDK relies on so called traineddata files, which are specific to a font and language.
This parameter tells the plugin which traineddata file to use when performing the OCR.

You can use one of the default traineddata files that comes with the SDK bundle, like eng_no_dict or deu

Range Unit Type Default Mandatory
- - String -

Load the traineddata file on Android

On Android, the traineddata files must be copied first via copyTrainedData

validationRegex

Defines a Regular Expression which the detected result is validated against.

If a detected result does not match the validationRegex, it will not be returned.

The Regular Expression is in ECMAScript regular expressions pattern syntax

Hint

As of version 3.12, the Anyline OCR Plugin provides predefined Regular Expressions for
URL, EMAIL, ISBN, VIN, IMEI and PRICE
Please see the iOS API Reference and the Android API Reference for further details

Range Unit Type Default Mandatory
- - String -

minConfidence

Defines a minimum confidence the SDK has to have in the result to consider it valid.

Cofidence

The confidence describes how certain the SDK feels that the detected result equals the target to scan.

Range Unit Type Default Mandatory
0 - 100 - Integer 60

Additional Settings in LINE Mode

minSharpness

Defines a minimum sharpness that is required of the image to be processed further in the SDK.

It is used to avoid time consuming processing of blurry images which are unlikley to return a result.

Range Unit Type Default Mandatory
0 - 100 - Integer 0 (=Off)

Experimental

This parameter is experimental. It is recommended to set an initial sharpness of 50 and gradually increase the value to a threshold where you get satisfying results

Additional Settings in GRID Mode

charCountX

Defines the number of symbols in horizontal direction in the grid.
For example, if a code to scan consists of 2 rows with 4 symbols each, this would be set to 4.

Range Unit Type Default Mandatory
- - Integer 1

charCountY

Defines the number of symbols in vertical direction in the grid.
For example, if a code to scan consists of 2 rows with 4 symbols each, this would be set to 2.

Range Unit Type Default Mandatory
- - Integer 1

charPaddingXFactor

Defines the average horizontal distance between two characters, measured in percentage of the characters width.

Range Unit Type Default Mandatory
- Percent Double 1.0

charPaddingYFactor

Defines the average vertical distance between two characters, measured in percentage of the characters height.

Range Unit Type Default Mandatory
- Percent Double 1.0

Setting a Custom Command File

If your use case requires special opimisation, you will be provided a Custom Command File (.ale) by Anyline.

In order to load the custom command file, please refer to the platform specific implementations

Settings and Custom Command File

Notice that the custom script will override all settings made to the Anyline OCR Config, so you don’t have to set the parameters manually as they are optimized for your use-case