Classify text Version 1 (python)

Group "Robin AI", subgroup "Classifier (Preferentum)"


Description

The action classifies the text according to the given indexes and defines its class

Action icon


Parameters and their settings

Property

Description

Type

Filling example

Mandatory field
Parameters
Context

Classifier context for the operation of the action

ContextOpen classifier.Classifier
Yes
TextText that needs to be classifiedString

When Wehner and colleagues performed a historical data analysis of hurricanes between 1980 and 2021, they found five storms that would fit into a Category 6 that have all occurred in the last nine years. It includes 2015’s Hurricane Patricia, which was the most powerful tropical cyclone that lashed Mexico with winds up to 215 mph. The other storms include Typhoon Haiyan in 2013, Typhoon Meranti in 2016, Typhoon Goni in 2020, and Typhoon Surigae in 2021.

Yes
Multiclass classificationIf "false", then the class with the highest probability percentage will be determined for the text. If "true", then several classes will be defined, to which the text can belongBooleanTrueNo
Confidence thresholdA number from 1 to 100 that determines whether the classification result is accurate enough. It is used if you need to define only one class. The higher the specified number, the greater the difference between the two most likely classes should be. The parameter is taken into account if "Multiclass classification" = false
Numeric80No
Number of classesThe maximum number of classes the action can return. If more classes were defined for the text during classification, the action will return only the specified number of classesNumeric5No
Results
ClassesA dictionary with classes to which the specified text can belong. The key is the class, the key value is the percentage of probability that the text will enter the class
Dictionary


Confident resultIf "true", the classification result is sufficiently accurate. If "false", the classification result may be inaccurateBoolean

Classifier operation description

Guidelines for the use of the Preferentum classification system - https://preferentum.ru/wp-content/uploads/2022/04/preferentumclass_manual.pdf.

The class in the classifier is listed as "Rubric" and the probability of being in the class is listed as "Rank".

  • Algorithm when "Multiclass classification" = false:

The system classifies the text into possible headings and calculates the rank for each heading. The two rubrics with the highest rank are compared using the formula: X*100/Y, where x is the rank of the first rubric, y is the rank of the second rubric. The resulting number is compared to the value of the "Confidence threshold" parameter. If the obtained number is greater or equal, the result is considered confident. In this case, the parameter "Confident result" = true. If the received number is less, the result is considered uncertain, because it is possible that the most probable heading is not precisely defined. In this case the parameter "Sure result" = false. In both cases, the action returns a dictionary with one rubric (which has the highest rank).

  • Algorithm when "Multiclass classification" = true:

The system classifies the text into possible headings and calculates the rank for each heading. All neighboring rubrics are compared to each other using the formula: X/Y, where x is the first rank and y is the subsequent rank. The highest number obtained during the comparison determines which rubrics will not be included in the resulting dictionary. The action returns the dictionary with the headings that were higher in the list than the heading with the highest comparison number. The rubric with the highest comparison number is also included in the resulting dictionary.

Special conditions of use

  1. If "Multiclass classification" = false and the text was classified into classes with the same percentage of probability, the action will fail.

  2. If "Multiclass classification" = true, "Number of classes" - multiple classes are specified, and the text was classified into classes with the same percentage of probability, the action will fail. 
    (Example: "Number of classes" = 2. The text was classified into three classes, two with the same probability percentage = 50 and the third was classified with probability percentage = 80, the action will end in an error).

  3. If the text has not been classified into any class or the classifier has no classes, the action will fail.

  4. It should be noted that currently only Russian language text can be categorized.

Example of use

Task 1

Classify text based on the trained model, identifying the class with the highest percentage of probability.

Solution

Use the "Classify text" action. 

Implementation

Preface

The "Open classifier" action requires a trained classifier model. 
Training is performed using the "Create index" action.

  1. Transfer the "Open classifier" action to the workspace.

     

  2. Set the parameters of the "Open classifier" action.

    Specify the path to the folder that contains the trained model.

     

  3. Transfer the "Classify text" action to the workspace. 


  4. Set the parameters for the "Classify text" action. 

    1. Specify the context obtained in the "Open classifier" action

    2. Set the value in the "Confidence threshold" field

       

    3. In the "Text" field, specify the following text: 

       

4. Click on the "Start" button in the top panel.  

Result

The program robot completed successfully.

The dictionary with the class having the highest probability percentage, to which the specified text can belong, was obtained,

and confirmation that the classification result is accurate enough ("Confident result" parameter is True).

Task 2

Classify text based on a trained model to determine the classes the text can belong to

Solution

Use the "Classify text" action. 

Implementation

Preface

The "Open classifier" action requires a trained classifier model. 
Training is performed using the "Create index" action.

  1. Repeat steps 1-3 of Task 1
  2. Set the parameters for the "Classify text" action. 
    1. Specify the context obtained in the "Open classifier" action
    2. Set the "Multiclass classification" checkbox 
    3. Add data to the "Number of classes" field


    4. In the "Text" field, specify the following text: 


4. Click on the "Start" button in the top panel.  

Result

The program robot completed successfully.

The dictionary with classes to which the specified text may belong was obtained, and the "Confident Result" parameter is False.

Task 3

Get the results of the "Classify text" action.

Solution

Use the "Get keys", "Get value by index" and "Get value" actions.

Implementation

  1. Repeat steps 1-3 from Task 2.
  2. Transfer the "Get keys" action to the workspace. 

  3. Fill in the "Dictionary" parameter of the "Get keys" action.


  4. Transfer the "Get value by index" action to the workspace. 


  5. Set the parameters of the "Get value by index" action 
    1. Set the result of the "Get keys" action in the "Collection" field
    2. Set the collection index


  6. Transfer the "Get value" action to the workspace. 


  7. Set the parameters of the "Get value" action 
    1. Set the result of the "Classify text" action in the "Dictionary" field
    2. Set the key obtained from the "Get value by index" action


  8.  Click on the "Start" button in the top panel.

Result

The program robot completed successfully.

The following results were obtained:

  • dictionary keys - classes
  • key value - percentage of probability of text occurrence in the class

Value of the "Multiclass classification" parameter in the "Classify text" action

Received value in progress of the "Classify text" action


Multiclass classification = false


The result is the one class with the highest percentage of probability that the specified text can belong to, and the percentage of probability that the text fits into the class



Multiclass classification = true


As a result, the classes to which the specified text can belong and the percentage of probability of the text entering the class are obtained

  • Нет меток