General principles of working with ROBIN OCR
To send a document for recognition, at least 2 requests must be made. First, a request to create a package is sent, the only image of the package or the first image of the package is passed to the request. The request returns the GUID of the package. If there are several images in the package, then in subsequent requests these images are added to the package (one by one). The final request is to start the package for processing. The GUID of the created package is passed to the second and subsequent requests.
The format of the result is customized in advance, in the script.
The user will receive the result as a collection of json objects or xml-contexts. Studio actions can work with the obtained results.
The list of package classes the user needs to know before running the action.
Package classes will be configured in the system by the engineer, you need to select a class suitable for image processing. Package class name is the name of the customized project. Package class name must be specified when creating a package (mandatory). Package name must be specified in the request.
When the robot finalizes an error, the error text will display the reason for the error.
If the status of the document is not "export", the robot will not be able to get the result and will skip the document. The user himself will have to move the document to the "export" status on the server. It is necessary to validate the file manually and send it for export by making and accepting changes in it.
Statuses:
- import - wait for status change
- recognize - wait for status change
- validation - manually change status in the Soika system
- export - ready for upload
- deleted - the package has been deleted
- inaccessible - the package is unavailable
- quality control - if the user has sent by wrong script, manually change the status in the Soika system.
- If the timeout expires before we get the recognized text an empty result will be obtained, the action will not terminate with an error.
Soika interface.
Authentication
When connecting for the first time, you must log in through your browser, chrome is preferred. admin, admin.
To access rest-service functions, authentication is required - BasicHttpAuth, login and password are passed in the http header, with the login in plaintext and the password encrypted MD5.
You can save the account in the browser. the user only has access to certain actions, this is set during registration.
Two tabs open 2 modules at once. Administrator module: http://localhost/administrator. Validation module: http://localhost/validation.
Authentication when connecting via browser is a mandatory procedure for each user. User personal login and password are linked to personal customized scripts.
Document review and validation
In default view with the Shift key held down, you can select an area to make edits to it.
When the edits are applied, the changes are reflected in the system.
When the user saved the changes, the document package was accepted by the user, the status of the document changed from "validation" to "export".
The file was validated manually.
The file has been given the status "Export".
Ways of processing the result
Built-in studio actions to work with JSON - files.
An example of a composed chain of actions for processing the result. You can use Queues to parallelize image recognition processes.
The package classes presented for the current user.
The default recognition profile is -default. Recognition profile configuration interface.
Example of a source file.
Example of the resulting file.
Make a sequence of actions to save the file to the computer.
The result is in the form of a json file. There is no need to save the resulting file. It can be immediately processed by studio actions.
Studio actions if the result is received as an xml file. Then you can use the "Get items by XPath" action to get the values necessary for further work.
Send files for recognition action
The action sends the file to an external Soica application for recognition.
The first action in the Send+Receive bundle. Send file for recognition and Receive recognition result is split into 2 actions for cases when the system will take a long time to process a large input file.
Settings
Property | Description | Type | Filling example | Mandatory field |
Parameters | ||||
URL | Link for authentication on the Soika service. | Robin.String. | http://localhost/administrator | Yes |
Login | Login | Robin.String. | admin | Yes |
Password | Password | Robin.Password. | admin | Yes |
Class | The package class that the script will be processed by. | Robin.String. | Package class name | Yes |
File | Path to the file from which you want to extract text. Supported image formats: JPEG, PDF, TIFF, BMP, PNG, DOCX, GIF. | Robin.FilePath. | C:\Users\Документ\1.jpg | Yes |
Results | ||||
---|---|---|---|---|
ID | Package identification number | Robin.String. |
Special conditions of use
- You need to get the authentication details in advance from the Soika service.
- If the server has not sent a response in 120 seconds, the server is unavailable.
- The robot will return an error if:
- an incorrect format is entered in the "File" field.
- incorrect login or password for the connection, link -403
- wrong path, link - 404
Get recognition result action
Extract text from an image file using a pre-configured recognition profile.
Settings
Property | Description | Type | Filling example | Mandatory field |
Parameters | ||||
URL | Link for authentication on the Soika service. | Robin.String | http://localhost/administrator | Yes |
Login | Login | Robin.String | admin | Yes |
Password | Password | Robin.Password | admin | Yes |
ID | Package identification number | Robin.String | 65434 | Yes |
Result type | The format in which the results will be presented. Dropdown list of elements: XML, JSON. Default value: XML. | Robin.String | JSON | No |
Profile | Recognition profile for results. Profiles are created in Soika itself and the user knows in advance which one to choose. The default value is set by the system when creating a package class and is called default. | Robin.String | default | No |
Results | ||||
---|---|---|---|---|
Result | A collection of json objects or xml-contexts containing recognized data. If document recognition is still in process, the result is not filled in. | Robin.Collection | ||
Status | The status of document recognition. | Robin.String |
Special conditions of use
- It is necessary to run the "Send for recognition" action before the "Get recognition result" action.
- To get the result, the file must be recognized by the system and set to the "export" status. If the robot has received any status other than "export" when checking the document status, the robot will return the received document status and an empty recognition result file.
- If the server has not sent a response in 120 seconds, the server is unavailable.
- The robot will return an error if:
- wrong path, link - 404
- Incorrect login or password for connection, link -403
- the selected processing script is not suitable, link -401
- internal server error, link -500.
- personal data and document ID do not belong to the same user.
The robot will NOT return an error if:
- no text found on the image.
- text is not recognized on the image.
- statuses "quality control", "validation", which must be changed manually => call rest api for translation to another module or open the package on validation, correct errors and send it for export.
In all these cases, the robot will return an empty file recognition result.
If the text language configured in the algorithm is specified incorrectly, the result will probably be a non-empty string, matching characters from the alphabet.
Recognize files action
Extract text from a file.
Settings
Property | Description | Type | Filling example | Mandatory field |
Parameters | ||||
URL | Link for authentication on the Soika service. | Robin.String | http://localhost/administrator | Yes |
Login | Login | Robin.String | admin | Yes |
Password | Password | Robin.Password | admin | Yes |
Class | The package class that the script will be processed by. | Robin.String. | Имя класса пакета | Yes |
File | Path to the file from which you want to extract text. Supported image formats: JPEG, PDF, TIFF, BMP, PNG, DOCX, GIF. | Robin.FilePath. | C:\Users\Документ\1.jpg | Yes |
Result type | The format in which Results will be presented. Drop-down list of elements: XML, JSON. Default value: XML. | Robin.String | JSON | No |
Profile | Recognition profile for results. Profiles are created in Soika itself and the user knows in advance which one to choose. The default value is set by the system when creating a package class and is called default. | Robin.String | default | No |
Time out | Время в миллисекундах, в течение которого будет происходить работа действия. | Robin.Numeric | 1000000 | No |
Results | ||||
---|---|---|---|---|
Result | A collection of json objects or xml-contexts containing recognized data. If document recognition is still in process, the result is not filled in. | Robin.Collection | ||
Status | The status of document recognition. | Robin.String |
Special conditions of use
- The robot will return an error if:
- wrong path, link - 404
- Incorrect login or password for connection, link -403
- the selected processing script is not suitable, link -401
- internal server error, link -500.
- personal data and document ID do not belong to the same user
- If no timeout is specified, 1 status request is called. If a timeout is specified and no result is received at the end of the timeout, the last known status and an empty result will be returned.
Examples of use
Send + receive
Task
Recognize the text on the document and retain the ability to perform any other actions while the input document is being processed.
Solution
Use the actions "Send for recognition", "Get the result of the recognition".
Implementation
- Sequentially set the "Send for recognition" and "Get the result of the recognition" actions to the workspace.
- Fill in the action parameters with the correct data:
- "Send for recognition" parameters;
- "Get the result of the recognition" parameters.
- "Send for recognition" parameters;
- Launch the robot using the "Start" button in the top panel.
Result
The robot will return the processed files. The result is represented as a collection with json objects or xml contexts. Export status.
Recognize
Task
Recognize text in a document.
Solution
Use the action "Recognize".
Implementation
Set the "Recognize" action to the workspace.
Fill the action parameters with correct data.
- Launch the robot using the "Start" button in the top panel.
Result
The robot returned the processed files. The result is represented as a collection with json objects or xml contexts. Export status.