Get text from PDF Version 11 (Python)

Action group: Text recognition


Description

This action is designed to recognize text from the specified page of a PDF document and save the recognized text to a variable.


Parameters

Input parameters:  

PDF file path - Path to PDF file to be recognized

Language of the text - Languages that the recognizer expects in the text

Page number - The number of the page of the file from which the text will be read

Output parameters:

Result - Variable into which the recognized text will be saved

Settings

PropertyDescriptionTypeFilling exampleMandatory field
Parameters
PDF file pathThe path to the PDF file for recognitionRobin.FilePath
Yes
Language of the text

Expected languages of the text in the PDF file

A dropdown list of items:

  • Russian
  • English
  • Vietnamese
  • Arabic
  • Spanish
  • Portuguese
  • Indonesian
  • Persian
  • Turkish
  • Kazakh
  • Belarusian

Default value - Russian

Robin.Collection
No
Page number

The page number of the file from which the text will be read

Robin.Numeric
No
Additional language

An additional language required for document recognition

A dropdown list of items:

  • No
  • Russian
  • English
  • Vietnamese
  • Arabic
  • Spanish
  • Portuguese
  • Indonesian
  • Persian
  • Turkish
  • Kazakh
  • Belarusian

The default value is No

If the same option is selected in the "Language" and "Additional language" parameters, there will be no error. The duplicate will be counted as 1 language

Robin.Collection
No
Trained model

Tesseract trained model file in .taineddata format.

Allows you to load your own model trained on the required fonts.

If the parameter is populated, it will be prioritized over the "Language" and "Additional language" parameters




Results
ResultReceived text from a specific page from PDF. If the document does not contain the specified page, a blank value will be stored.Robin.Collection

Special conditions of use

None.

Example of use

Task

There is a document in pdf format, need to get the text from 2 pages of the document.

Solution

Use the "Get text from PDF" action. 

Реализация

  1. Move the "Get text from PDF" action to the workspace. 

  2. Set "Get text from PDF" action parameters
  3. Click on the "Start" button in the top panel. 

Result

The program robot completed successfully. The text from page 2 of the document has been retrieved. 

  • Нет меток