Extract data from a document

Action group: Text recognition


The action extracts data from documents (Passport 2-3 pages, Passport 5 pages, personal insurance policy number) and saves it as a dictionary containing extracted document data and an image with blocks from which the data was extracted. The action uses Dbrain services for data extraction, so it requires a vendor API key. 

Settings

PropertyDescriptionTypeFilling exampleMandatory field
Parameters
File pathThe path to the file to extract the data from. Supported formats: jpg, jpeg, bmp, png.Robin.FilePath
Yes
API keyA unique identifier for accessing the service.Robin.String
Yes
Cloud serverIf the value is «true», the action will send a request to the DBrain cloud server. If «false», the action will send a request to the local DBrain server.Robin.BooleantrueNo
Document typeThe type of document to extract data from.Robin.String
Yes
Folder pathThe path to the folder where a copy of the source file with superimposed blocks will be saved, into which the recognition action divides it. To save it, you also need to fill in the «File name» field.Robin.FolderPathC:\doc\imgNo
File nameThe name of the copy of the source file with superimposed blocks (without extension). The image will be created with the *.png extension. To save it, you also need to fill in the «Folder path» field.Robin.String
No
Time outThe time in milliseconds during which data will be extracted from the document. The default value is 120000 ms.Robin.Numeric
No
OverwriteIf «true», and a file with the same name and extension exists in the specified folder, then a new file with blocks will overwrite it. If «false», the file will not be overwritten, and the action will return an error.Robin.BooleantrueNo
Results
Extracted textThe data of each field in the document extracted from the original image.Robin.Dictionary

Image with blocksThe path to the image file with superimposed blocks.Robin.FilePath

Recognition confidenceThe accuracy of recognition of each field in the image is in the range from 0 to 1. The key is the name of the field, the value is the accuracy of recognition of this field.Robin.Dictionary

Special conditions of use

Connecting to Dbrain: https://doc.dbrain.io/podklyuchenie/podklyuchenie-k-oblaku

Neural network recognizes only Russian documents.  

For each document type, a set of fields that the robot will search for is defined. If the image does not contain the fields it is looking for, the field value will be returned empty.  Keys are returned in Russian. 

Passport keys (2-3 pp):

Registration keys (5 p):

Ключи для СНИЛС:

The robot will return an error if:

Example of use

Task: recognize fields with document data from a file.

Solution: use the "Extract data from a document" action. 

Implementation:

A document to recognize:

  1. Move the "Extract data from a document" action to the workspace.
  2. Set the parameters of the "Extract data from a document" action.
  3. Click on the "Start" button in the top panel.

Result

The program robot completed successfully. The data fields have been recognized.


Recognition accuracy
Номер СНИЛС1
Фамилия1
Имя1
Отчество1
Дата рождения1
Место рождения1
Пол1
Дата регистрации1


Image with superimposed blocks: