Custom OCR

With the AnylineOCR Plugin, you have the ability to create your own OCR use case, on the fly with almost no effort. It offers a variety of parameters for you to adjust the scanning process to your use case.

This section describes the parameters in detail.
If you are looking for a How-To on loading the Anyline OCR Plugin on your platform, please refer to the following sections:

Simultaneous Barcode Scanning

Starting from SDK 3.8 Anyline supports simultaneous barcode scanning for any plugin. Additional Information can be found under Simultaneous Barcode Scanning

Warning

Any *.ale or *.any file from a version lower than Anyline 8 are NOT compatible with this Version (Anyline 42.3.0).

iOS: Anyline OCR Plugin
Android: Anyline OCR Plugin
Cordova: Add the Anyline SDK Plugin to your Project
React-Native: Add the Anyline SDK Plugin to your Project
Xamarin.iOS: Implementing Anyline
Xamarin.Android: Plugin-Specific Configurations

Examples

A couple of different examples can be found at Demos and Sample Code: Anyline OCR: AUTO, Demos and Sample Code: Anyline OCR: Grid and Demos and Sample Code: Anyline OCR: Line.

Parameters

`scanMode`

The scanMode provides the basis for the scanning experience. There are three options: AUTO, LINE or GRID

Range	Unit	Type	Default	Mandatory
`AUTO`, `LINE` or `GRID`	-	String	-	✓

Tip

As a rule of thumb: If you can place a grid on top of the text that you want to scan, use GRID mode, otherwise use LINE mode. AUTO automatically detects the valid text within the cutout.

`scanMode.AUTO`

New in version 3.11.

The AUTO mode automatically detects the text to be scanned if placed within the cutout. It automatically detects if the text to be scanned is formed of one or multiple lines, upper or lowercase characters and/or numbers - and adjusts the scan parameters automatically.

Android/Cordova	iOS
`AUTO`	`ALAuto`

In this mode, all parameters are optional. The parameters contained in the following table can be set in order to improve the scanning process.

Parameter	Advised	Remark
minCharHeight	-	The `AUTO` mode automatically detects the text height within the cutout
maxCharHeight	-	The `AUTO` mode automatically detects the text height within the cutout
tesseractLanguages	-	If the font you are trying to scan differs from a standard sans-serif font, this parameter should be set. Otherwise there is no need
anyline_ocr_char_whitelist	✓	Helps to filter false positives like the number 8 instead of the letter B
validationRegex	✓	Validates the result against the desired structure. Therefore this parameter helps to avoid false scans. Especially if the cutout is not placed on top of the text at the start of the scanning.

Known Limitations

As of version 3.12, the AUTO mode detects text automatically up to 30 characters per line in up to 7 lines.
Version 3.11 did not include multiline support, and lowercase character detection was performed checking the anyline_ocr_char_whitelist - this was changed in version 3.12 and higher

`scanMode.LINE`

The LINE mode is the best option for scanning multiple or single line(s) of text with an arbitrary length.

This could be an IBAN code, or a mail header with a prior unknown number of lines and length of the lines.

Android/Cordova	iOS
`LINE`	`ALLine`

Number of characters

The LINE mode requires at least 4 characers per line. If your use case has 3 or less characters, consider using the AUTO or GRID mode

`scanMode.GRID`

With the GRID scan mode, you can scan text that is equally laid out in a grid. One example would be Loyalty codes on cans.

This could be a bottlecap code, scrabble letters, or any other use case in which the text can be placed in an imaginary grid.

Android/Cordova	iOS
`GRID`	`ALGrid`

`minCharHeight`

Defines the minimum height that the symbols need to be considered in the scanning process.

If, for example, you know that the text you are going to scan is rather big, setting this to a high value prevents smaller contours in the image from being taken into account.

Range	Unit	Type	Default	Mandatory
-	Pixel	Integer	`15`	✗

`maxCharHeight`

Defines the maximum height that the symbols need to be considered in the scanning process.

If, for example, you know that the text you are going to scan is rather small, setting this to a low value prevents bigger contours in the image from being taken into account.

Range	Unit	Type	Default	Mandatory
-	Pixel	Integer	`60`	✗

`languages`

New in version 3.20.

The OCR part of the SDK relies on language files, which are specific to a font and language.
This parameter tells the plugin which language file to use when performing the OCR.

Examples of language filename extensions are .traineddata or .any files.

You can use one of the default traineddata files that comes with the SDK bundle, like eng_no_dict or deu

Range	Unit	Type	Default	Mandatory
-	-	String	-	✓

.any files and scanMode

As of version 3.20, .any files can only be used with the scanMode.AUTO

languages vs tesseractLanguages on iOS and Android

On Android and iOS, unlike the deprecated tesseractLanguages, this does not require a call to copyTrainedData. The files still have to be included in the XCode project for iOS.

`tesseractLanguages`

Deprecated since version 3.20: Use languages instead. This will be removed in the future.

The OCR part of the SDK relies on so called traineddata files, which are specific to a font and language.
This parameter tells the plugin which traineddata file to use when performing the OCR.

You can use one of the default traineddata files that comes with the SDK bundle, like eng_no_dict or deu

Range	Unit	Type	Default	Mandatory
-	-	String	-	✓

Load the traineddata file on Android

On Android, the traineddata files must be copied first via copyTrainedData

`validationRegex`

Defines a Regular Expression which the detected result is validated against.

If a detected result does not match the validationRegex, it will not be returned.

The Regular Expression is in ECMAScript regular expressions pattern syntax

Hint

As of version 3.12, the Anyline OCR Plugin provides predefined Regular Expressions for
URL, EMAIL, ISBN, VIN, IMEI and PRICE
Please see the iOS API Reference and the Android API Reference for further details

Range	Unit	Type	Default	Mandatory
-	-	String	-	✗

`minConfidence`

Defines a minimum confidence the SDK has to have in the result to consider it valid.

Cofidence

The confidence describes how certain the SDK feels that the detected result equals the target to scan.

Range	Unit	Type	Default	Mandatory
`0` - `100`	-	Integer	`60`	✗

Additional Settings in `LINE` Mode

`minSharpness`

Defines a minimum sharpness that is required of the image to be processed further in the SDK.

It is used to avoid time consuming processing of blurry images which are unlikley to return a result.

Range	Unit	Type	Default	Mandatory
`0` - `100`	-	Integer	`0` (=Off)	✗

Experimental

This parameter is experimental. It is recommended to set an initial sharpness of 50 and gradually increase the value to a threshold where you get satisfying results

Additional Settings in `GRID` Mode

`charCountX`

Defines the number of symbols in horizontal direction in the grid.
For example, if a code to scan consists of 2 rows with 4 symbols each, this would be set to 4.

Range	Unit	Type	Default	Mandatory
-	-	Integer	`1`	✓

`charCountY`

Defines the number of symbols in vertical direction in the grid.
For example, if a code to scan consists of 2 rows with 4 symbols each, this would be set to 2.

Range	Unit	Type	Default	Mandatory
-	-	Integer	`1`	✓

`charPaddingXFactor`

Defines the average horizontal distance between two characters, measured in percentage of the characters width.

Range	Unit	Type	Default	Mandatory
-	`Percent`	Double	`1.0`	✗

`charPaddingYFactor`

Defines the average vertical distance between two characters, measured in percentage of the characters height.

Range	Unit	Type	Default	Mandatory
-	`Percent`	Double	`1.0`	✗

Setting a Custom Command File

If your use case requires special opimisation, you will be provided a Custom Command File (.ale) by Anyline.

In order to load the custom command file, please refer to the platform specific implementations

iOS
Android

Settings and Custom Command File

Notice that the custom script will override all settings made to the Anyline OCR Config, so you don’t have to set the parameters manually as they are optimized for your use-case

Anyline SDK

42.3.0

Custom OCR

Examples

Parameters

scanMode

scanMode.AUTO

scanMode.LINE

scanMode.GRID

minCharHeight

maxCharHeight

languages

tesseractLanguages

validationRegex

minConfidence

Additional Settings in LINE Mode

minSharpness

Additional Settings in GRID Mode

charCountX

charCountY

charPaddingXFactor

charPaddingYFactor