Train a classification model Version 1 (python)

Group "Robin AI", subgroup "Classifier (Robin)".

Description

This action trains a text classification model.  The classification task is to determine the object type from two or more existing classes. Depending on the classification task, suitable classifier types are selected. It is used together with the "Classify text" action. 

More details on classification methods can be found here: Overview of classification methods in machine learning with Scikit-Learn

Action icon


Parameters and their settings

PropertyDescriptionTypeFilling exampleMandatory field
Parameters
The path to the source folderThe path to the data folder for training the classification model. Inside the folder there are subfolders whose names are the name of the class. Inside each subfolder there should be txt files with various texts that correspond to the class.Robin.FolderPathC:\doc\imgYes
Path to the resulting folderThe path to the folder where the trained classification model will be saved.Robin.FolderPathC:\doc\imgYes
MethodThe method that will be used to train the classification model. The default value is RandomForest. 

The following methods can be used to train the classification model: 

    • Choose the most suitable
    • SVC - support vector method 
    • RandomForest - decision tree classifier
    • GradientBoosting
    • AdaBoost
    • nTree - decision tree classifier
    • KNeighboors - k-nearest neighbor method
    • Naive Bayes - naive Bayesian method
Robin.String
Yes
OverwriteIf the value is «true», and a file with the same name and extension already exists in the folder with the result, then it will be overwritten. If «false», the file will not be overwritten, and the action will return an error.Robin.BooleantrueNo
Custom Stop WordsThe path to the txt file that contains stop words that will not be taken into account when training the classification model. Each stop word must be written on a new line.Robin.FilePath
No
Word combinationThe path to the txt file containing phrases, which, when training the model, it is important not to divide into separate words in order to preserve the meaning of the entire phrase. Each phrase must be written on a new line.Robin.FilePath
No
Results
ResultThe percentage of accuracy of the trained model.Robin.Numeric

Special conditions of use

In the list of training methods of the parameter "Method", when changing the studio language to English, the name of the method "Choose the most suitable" is changed to "Choose the most suitable", the other options remain in English.

The robot will generate the following error messages if the parameter check conditions are not met:

ConditionExceptionText of error messages
Checks for "The path to the source folder" parameter

If the permissible length is exceeded in the path nameValidationErrorPath name length limit exceeded "{folder_path}"
If invalid characters are used in the path nameValidationErrorInvalid characters in the path name "{folder_path}"
If the directory is not foundDirectoryNotFoundThe directory "{folder_path}" was not found
If the path is not a directoryDirectoryNotFoundThe resource "{folder_path}" is not a directory
If the folder cannot be accessedDirectoryNotAvailableRead access error to "{folder_path}"
Checks for "Path to the resulting folder" parameter

If the permissible length is exceeded in the path nameValidationErrorThe length limit for the path name "{folder_path}" has been exceeded
If invalid characters are used in the path nameValidationErrorInvalid characters in the path name "{folder_path}"
If the directory is not foundDirectoryNotFoundThe directory "{folder_path}" was not found
If the path is not a directoryDirectoryNotFoundThe resource "{folder_path}" is not a directory
If the folder does not have accessDirectoryNotAvailableError accessing "{folder_path}" for a record
Check for the file machine_model.pkl, which will be saved in the "Path to the resulting folder"

If the file already exists and "Overwrite" is not selectedFileAlreadyExistsThe file at path: {result_file_path} already exists
Check for the file tfidf_model.pk, which will be saved in the "Path to the resulting folder"

If the file already exists and "Overwrite" is not selectedFileAlreadyExistsThe file at path: {result_file_path} already exists
Check for the "Custom Stop Words" parameter

If the permissible length is exceeded in the path nameValidationErrorPath name length limit exceeded "{folder_path}"
If invalid characters are used in the path nameValidationErrorInvalid characters in the path name "{folder_path}"
If the file is not foundFileNotFoundThe file "{file_path}" was not found
If the path is not a fileFileNotFoundThe resource "{folder_path}" is not a file
If the file does not have accessFileNotAvailableRead access error to "{folder_path}"
Error reading a file, e.g. wrong encodingValidationErrorError reading .txt file at path {filepath}: {ex}
The input file does not have a .txt extensionValidationErrorThe file at path {list_words_path} has an invalid extension. Valid values: .txt
Check for the "Word combinations" parameter

If the permissible length is exceeded in the path nameValidationErrorThe length limit for the path name "{folder_path}" has been exceeded
If invalid characters are used in the path nameValidationErrorInvalid characters in the path name "{folder_path}"
If the file is not foundFileNotFoundThe file "{file_path}" was not found
If the path is not a fileFileNotFoundThe resource "{folder_path}" is not a file
If the file does not have accessFileNotAvailableRead access error to "{folder_path}"
Error reading a file, e.g. wrong encodingValidationErrorError reading .txt file at path {filepath}: {ex}
The input file does not have a .txt extensionValidationErrorThe file at {list_words_path} has an invalid extension. Valid values: .txt
If after going through the folders and clearing the texts, there is no text to studyValidationErrorInput data error: no suitable data or empty data

Examples of how the action works

1. All parameters are set correctly:

"The path to the source folder" - Result folder with the required data structure,

"Path to the resulting folder" - existing folder.

Set the Robin AI group action "Train classification model" to the workspace;

Correctly set the action parameters;

Launch the robot.

Result: The robot will save the trained model in the specified folder.

2. The wrong paths in:

"The path to the source file";

"Path to the resulting folder"; 

"Custom Stop Words";

"Word combinations".

Set the Robin AI group action "Train classification model" to the workspace;

Correctly configure the action parameters;

Launch the robot.

Result: The robot will generate an error related to a non-existent folder/file address.

3. Incorrect file formats are specified in:

"Custom Stop Words";

"Word combinations".

Set the Robin AI group action "Train classification model" to the workspace;

Correctly configure the action parameters;

Launch the robot.

Result: The robot will return an error because it cannot read the files.

4. The trained model (files) already exists in the specified path "Path to the resulting folder", but "Overwrite" is turned off:

Set the Robin AI group action "Train classification model" to the workspace;

Correctly set the action parameters. Do not include "Overwrite";

Launch the robot.

Result: The robot will give an error because it cannot write the files.

5. There are no files for training in the "Training data" folder:

Set the Robin AI group action "Train classification model" to workspace;

Correctly configure the action parameters.

Launch the robot.

Result: The robot will generate an error.

Example of use

Task

Train the classification model.

Solution

Use the "Train a classification model" action.

Implementation

1. To set the parameters of the training model, the User needs to create the training data:

 training data - a tree of objects consisting of two folders:

Source folder:

classes in it:

Each folder has a folder with a class in it:

Prepare word combinations.

txt-file containing word combinations, which during model training it is important not to divide into separate words in order to preserve the meaning of the whole phrase. Each word combination should be written on a new line, phrases should be specified as a whole, without dividing by words, for example: operations by receipt.

Prepare stop words.

txt-file that contains stop words that will not be considered when training the classification model. Each stop word should be written on a new line, for example:

  • Good morning!
  • Hello!
  • Sincerely,
  • tel:
  • email:

2. Transfer the "Train classification model" action to the workspace.

3. Set the parameters of the "Train classification model" action.

4. Click on the "Start" button in the top panel.

Result

The program robot completed successfully.



  • Нет меток