Extract data from a document Version 3 (Python)
Action group: Text recognition
Description
The action extracts data from documents (Passport 2-3 pages, Passport 5 pages, personal insurance policy number) and saves it as a dictionary containing extracted document data and an image with blocks from which the data was extracted. The action uses Dbrain services for data extraction, so it requires a vendor API key.
Action icon
Parameters and their settings
Property | Description | Type | Filling example | Mandatory field |
Parameters | ||||
File path | The path to the file to extract the data from. Supported formats: jpg, jpeg, bmp, png. | Robin.FilePath | Yes | |
API key | A unique identifier for accessing the service. | Robin.String | Yes | |
Cloud server | If the value is «true», the action will send a request to the DBrain cloud server. If «false», the action will send a request to the local DBrain server. | Robin.Boolean | true | No |
Document type | The type of document to extract data from. | Robin.String | Yes | |
Folder path | The path to the folder where a copy of the source file with superimposed blocks will be saved, into which the recognition action divides it. To save it, you also need to fill in the «File name» field. | Robin.FolderPath | C:\doc\img | No |
File name | The name of the copy of the source file with superimposed blocks (without extension). The image will be created with the *.png extension. To save it, you also need to fill in the «Folder path» field. | Robin.String | No | |
Time out | The time in milliseconds during which data will be extracted from the document. The default value is 120000 ms. | Robin.Numeric | No | |
Overwrite | If «true», and a file with the same name and extension exists in the specified folder, then a new file with blocks will overwrite it. If «false», the file will not be overwritten, and the action will return an error. | Robin.Boolean | true | No |
Results | ||||
Extracted text | The data of each field in the document extracted from the original image. | Robin.Dictionary | ||
Image with blocks | The path to the image file with superimposed blocks. | Robin.FilePath | ||
Recognition confidence | The accuracy of recognition of each field in the image is in the range from 0 to 1. The key is the name of the field, the value is the accuracy of recognition of this field. | Robin.Dictionary |
Special conditions of use
Connecting to Dbrain: https://doc.dbrain.io/podklyuchenie/podklyuchenie-k-oblaku
Neural network recognizes only Russian documents.
For each document type, a set of fields that the robot will search for is defined. If the image does not contain the fields it is looking for, the field value will be returned empty. Keys are returned in Russian.
Passport keys (2-3 pp):
- IssuedBy
- IssuedDate
- IssuedCode
Signature
LName,
FName,
MName,
Sex,
- Photo
BirthDate
BirthPlace
MRZ,
- Number
Registration keys (5 p):
- Полный адрес
- Дата регистрации
- Регион
- Район
- Пункт
- Р-н
- Улица
- Дом
- Строение
- Квартира
- Подразделение
- Код подразделения
Keys for SNILS:
- Number
- LName
- FName
- MName
- BirthDate
- BirthPlace
- Sex
- RegDate
The robot will return an error if:
- The "Folder path" field is filled in and the "File name" field is not filled in.
- the "File name" field is filled in and the "File path" is not filled in.
- an invalid input file format is entered in the "File path" field.
- there is already a file with the specified name on the specified path and the "overwrite" field = false.
- an invalid API key is specified.
- Timeout expired, no result was received.
- Cloud server is not checked, local server is not deployed by the user on the host.
Example of use
Task
Recognize fields with document data from a file.
A document to recognize:
Solution
Use the "Extract data from a document" action.
Implementation
- Move the "Extract data from a document" action to the workspace.
- Set the parameters of the "Extract data from a document" action.
- Click on the "Start" button in the top panel.
Result
The program robot completed successfully. The data fields have been recognized.
Recognition accuracy | |
---|---|
Номер СНИЛС | 1 |
Фамилия | 1 |
Имя | 1 |
Отчество | 1 |
Дата рождения | 1 |
Место рождения | 1 |
Пол | 1 |
Дата регистрации | 1 |
Image with superimposed blocks: