AutoML Job
UtoML can simplify the complicated process of algorithm modeling and model parameter debugging, automatically carry out super-parameter learning, and then automatically build high-precision models, saving labor and lowering the threshold of machine learning.
Create a Job
Select "Training-->AutoML job" in the left navigation bar to enter the AutoML job list page. Click the "Create job" button to enter the create job process.
Logistic Regression
Logical regression automatically selects the optimal model training hyper-parameters through the debugging parameter training data and debugging parameter test data given by the user, and outputs a trained model to the user in cooperation with the model training data.
In a debugging parameter experiment, the debugging parameter algorithm will select a group of parameters according to the user specified parameter range and platform debugging parameter algorithm rules. Based on the hyper-parameters and the training/test data set set by the user for debugging parameter, the debugging parameter algorithm completes a model training and evaluation. Debugging parameter algorithm carries out many such debugging parameter experiments, and finally select the hyper-parameters of one experiment with the best effect to train the model.
The number of label types of training data for parameter debugging training/training test/final model must be 2. Currently, automatic hyper-parametric logistic regression only supports dichotomy. If the data is multi-classified, the job may fail.
Configuration Instructions:
Configuration name | Required | Description |
---|---|---|
Job name | Yes | It can only consist of numbers, letters,-or _ and can only start with letters, with less than 40 characters in length |
Algorithm or framework | Yes | Select Logistic Regression |
Send SMS at the end of job | Yes | Text by default |
L1 regularization coefficient range | Yes | Floating-point numbers greater than 0 and less than 1, support scientific counting |
L2 regularization coefficient range | Yes | Floating-point numbers greater than 0 and less than 1, support scientific counting |
Number of iterations in a single test | Yes | A positive integer from 10 to 200. In each test, the number of iterations of the algorithm will be selected within this range. In the test, all the training data participating in the parameter debugging once is called a round, also called an epoch |
Test times | Yes | A positive integer of 10 to 100 is used for a total of "test times" of tests. Each test selects a group of hyper parameters and combines them with the debugging parameter training data to obtain a model, and uses the debugging parameter test data to evaluate the advantages and disadvantages. After all tests are completed, the platform will select the optimal hyper-parameters, and then output the final model combined with the model training data |
Input data format | Yes | Options include: sparse without weight value, sparse with weight value, and dense data. See the algorithm format requirements on the page for details |
Training data path for debugging parameter | Yes | Store the training data of debugging parameter, use the training data in each test and conduct model training with a group of hyper-parameters |
Parameter test data path | Yes | Store the test data of debugging parameter, use the test data in each test and evaluate the model in combination with the debugging parameter model |
Model training data path | Yes | Store the training data of the model, AutoLR selects a set of super parameters with the best evaluation results from all tests, and outputs the final model in combination with the training data |
Output path | Yes | The path to store the model and log. After the job succeeds, store the model in the path /{job_id}/model, and the log in the path /{job_id}/log |
Computing resources | Yes | Currently, only BML clusters are supported |
Resource package | Yes | Currently, only CPU instance_8 core_32GB memory is supported |
Number of instances | Yes | 2-4 |
Maximum running time | Yes | If the job runs for the maximum running time, BML will automatically force the job to stop, which may cause job failure |
Example configuration:
Training data is the SUSY data downloaded from the Internet. Comma, sed -i s/^/,/g yourfile, has been added at the beginning of each row of data, which is divided and stored on the public BOS. After downloading the data, you can divide and store the debugging parameter training data/debugging parameter test/model training data on your bos, or you can directly use our public bos data for training. Input data format: Dense data Debugging parameter training data path: bos:/bml-public/automl-demo/data/susy-train/ Parameter testing data path: bos:/bml-public/automl-demo/data/susy-test/ Model training data path: bos:/bml-public/automl-demo/data/susy-all/ Model output path and log output path to configure your own bos path.
Click "OK" to submit the job.
Notes for Model Output Format:
- The output model is mainly the weight parameters corresponding to each feature dimension in the Logistic Regression model
- The output is in plain text format, each row represents a feature dimension, and a total of three fields are divided by spaces,namely the weight parameter of the feature, the internal ID of the feature in the debugging parameter algorithm, and the original name of the feature.
- Only features whose weight parameter is not 0 are output
Job List Related Operation
- Terminate: terminate the job that is currently running or queued, no longer queued, no longer running. After termination of operation, the job results and job logs will not be uploaded to the specified BOS path.
- Clone: clone the configuration item of a job to enter create job page.
- Delete: delete the job. If the job is still queued or running at the time of deletion, the queue or running will be terminated first, and then the job will be deleted. After deletion, the job will disappear from the job list.
- View job details: click job name to enter job details, view job information, parameter information and cluster information.
- View operation details: click job name and select operation details tab to enter operation details and view operation status, start and end time, log details, operation curve, etc.
View Job Results
After the job runs successfully, the model and debugging parameter log will be stored in the corresponding BOS address according to the model output path and log output path specified during job configuration. Users need to go to BOS to view or download the job model and log.
If the job model and job log cannot be saved, the following conditions may occur:
- Terminate job manually
- Job runs timeout and is automatically terminated
- Job failed to run
User job failed, possibly due to the following conditions:
- Training data of debugging parameter training/debugging parameter training test/final model does not match the data format.
- BOS address of training data of debugging parameter training/debugging parameter training test/final model does not exist or is not accessible
- Bucket for output log / model does not exist or is not accessible
- Training timeout