Get text from PDF Version 11 (Python)
Action group: Text recognition
Description
This action is designed to recognize text from the specified page of a PDF document and save the recognized text to a variable.
Parameters
Input parameters:
PDF file path - Path to PDF file to be recognized
Language of the text - Languages that the recognizer expects in the text
Page number - The number of the page of the file from which the text will be read
Output parameters:
Result - Variable into which the recognized text will be saved
Settings
Property | Description | Type | Filling example | Mandatory field |
---|---|---|---|---|
Parameters | ||||
PDF file path | The path to the PDF file for recognition | Robin.FilePath | Yes | |
Language of the text | Expected languages of the text in the PDF file A dropdown list of items:
Default value - Russian | Robin.Collection | No | |
Page number | The page number of the file from which the text will be read | Robin.Numeric | No | |
Additional language | An additional language required for document recognition A dropdown list of items:
The default value is No If the same option is selected in the "Language" and "Additional language" parameters, there will be no error. The duplicate will be counted as 1 language | Robin.Collection | No | |
Trained model | Tesseract trained model file in .taineddata format. Allows you to load your own model trained on the required fonts. If the parameter is populated, it will be prioritized over the "Language" and "Additional language" parameters | |||
Results | ||||
Result | Received text from a specific page from PDF. If the document does not contain the specified page, a blank value will be stored. | Robin.Collection |
Special conditions of use
None.
Example of use
Task
There is a document in pdf format, need to get the text from 2 pages of the document.
Solution
Use the "Get text from PDF" action.
Реализация
- Move the "Get text from PDF" action to the workspace.
- Set "Get text from PDF" action parameters
- Click on the "Start" button in the top panel.
Result
The program robot completed successfully. The text from page 2 of the document has been retrieved.