Classify text Version 1 (python)
Group "Robin AI", subgroup "Classifier (Robin)"
Description
The action determines the class to which the text belongs based on the trained classification model, i.e. it shows the probability that the text belongs to a certain class corresponding to a rubric based on the trained classification method.
The purpose of the action is to determine the rubric to which the text is closest, i.e. the rubric with the highest percentage of accuracy must be determined to allow a decision to be made about further actions with this text.
Action icon
Parameters and their settings
Property | Description | Type | Filling example | Mandatory field |
---|---|---|---|---|
Parameters | ||||
Text for classification | The text whose class needs to be defined. | Robin.String | Yes | |
Trained model | The path to the folder that contains the trained classification model. | Robin.FolderPath | C:\doc\img | Yes |
Results | ||||
Result | Dictionary, where the key is the name of the class, and the value is the percentage of entry into this class. Sorting in the dictionary by the percentage of entry into the class. | Robin.Dictionary |
Special conditions of use
1. The folder that contains the trained model must contain two files. The files are provided to the customer upon request. These files represent the packaged machine learning model.
2. If some file is missing/another name, it will cause an error when the action runs.
3. The robot will generate an error if:
- incorrect path is specified in the "Trained model" field/ does not contain a trained model (one or two files have been changed);
- an empty string is specified in the "Text for classification" field or the action was unable to determine the class.
4. The robot will not generate an error if:
- the text is not in the language of the trained model, and the % of match with the class will be small.
5. An existing trained model cannot be "re-trained", if classes need to be added, the model training action must be re-launched.
6. For reference, a trained model on 20000 records classifies text in 2-3 minutes.
More information about text classification theory:
https://vas3k.blog/blog/machine_learning/
https://www.edureka.co/blog/classification-in-machine-learning/
Example of use
Task
Classify text based on a trained model.
Solution
Use the "Classify text" action.
Implementation
- Transfer the "Classify text" action to the workspace.
- Set the parameters of the "Classify text" action.
Specify the following text in the "Text to classify" field:
3. Specify the path to the folder that contains the trained model.
4. Click on the "Start" button in the top panel.
Result
The program robot completed successfully. The dictionary is obtained , where Key is a heading and Value is the percentage of occurrences in this heading. Sorting in the dictionary by the percentage of occurrence in the rubric.
The result is a dictionary with the name of the category and the accuracy of belonging to this category.
If it is necessary to get the rubric to which the text belongs to the most extent, it is necessary to use the "Get key collection" action, because the % values are specified in the values, and the rubrics-categories themselves are in the keys. Next, we need to get a collection of keys and the zero element of this collection is the heading to which the text most likely belongs (action "Get value by index").