Read text Version 10 (Python)
Action group: Text recognition
Description
The action performs text recognition on the image and returns it as a result
Action icon
Parameters
Input parameters
- Image - Path to the image file. Supported image formats: (jpeg, jpg, bmp, png, tif, tiff)
- Expected languages of text in the image - Expected languages of text in the image
- Content format - Expected text format. Available text formats: (Line, Block, Page)
- Options - Configuration options for text recognition
Output parameters
Result - Returns the text (string) that was derived from the image.
Settings
Property | Description | Type | Filling example | Mandatory field |
---|---|---|---|---|
Parameters | ||||
Image | Path to image file. Supported image formats: (jpeg, jpg, bmp, png, tif, tiff) | Robin.Image | C:\doc\img.png | Yes |
Expected languages of text in the image | Expected languages of text in the image | Robin.String | Yes | |
Content format | Expected text content format. Available text formats: (Line, Block, Page) | Robin.String | Yes | |
Options | Configuration options for OCR) | Robin.String | No | |
Results | ||||
Result | Text (string) recognized from image | Robin.String |
Special conditions of use
The default mode in the "Parameters" field is --psm 3.
All parameters are listed with a space in the format --parameter value_parameter.
List of all parameters: https://muthu.co/all-tesseract-ocr-options/.
Parameter | Default value | Description |
---|---|---|
Main parameters | ||
oem | 3 |
|
psm | 3 |
|
Additional parameters | ||
edges_min_nonhole | 14 | Minimum number of box pixels to recognize |
textord_space_size_is_variable | 0 | If true (1) is set, word delimiter spaces are assumed to be of variable width, even if the characters are of fixed pitch |
textord_tabfind_find_tables | 1 | Launch table detection |
textord_force_make_prop_words | 0 | Apply proportional word segmentation to all strings |
textord_width_limit | 8 | Maximum width of blocks for creating rows |
tessedit_pageseg_mode | 6 |
|
textord_max_noise_size | 7 | Maximum noise size in pixels |
tessedit_dont_blkrej_good_wds | 0 | If true (1) is set, the word segmentation quality score is used |
tessedit_char_blacklist | Blacklisting characters that cannot be recognized | |
tessedit_char_whitelist | White list of characters to recognize | |
List of chars to override tessedit_char_blacklist | List of symbols to override tessedit_char_blacklist |
Example of use
Task
Read the text in the image
Solution
Use the "Read text" action
Implementation
- Transfer the "Read text" action to the workspace.
- Set the parameters of the "Read text" action
- "Image" parameter. Specify the path to the image file, the text of which will be recognized.
- Parameter "Expected languages of text in the image". Select the expected languages of the text in the image. The following languages are available for selection: "Russian language", "English language", "Russian and English language", "Spanish language", "Portuguese language". In this case, it is "Russian language".
- "Content format" parameter. Select the expected text format. The following are available for selection: "Line", "Block", "Page". In this case - "Line".
- "Result". The text obtained during text recognition from the image. Write the result to the "Text" variable.
2. Click on the "Start" button in the top panel.
Result
The program robot completed successfully. The text is read from the image.