AutoML Job

Last Updated：2020-10-16

UtoML can simplify the complicated process of algorithm modeling and model parameter debugging, automatically carry out super-parameter learning, and then automatically build high-precision models, saving labor and lowering the threshold of machine learning.

Create a Job

Select "Training-->AutoML job" in the left navigation bar to enter the AutoML job list page. Click the "Create job" button to enter the create job process.

Logistic Regression

Logical regression automatically selects the optimal model training hyper-parameters through the debugging parameter training data and debugging parameter test data given by the user, and outputs a trained model to the user in cooperation with the model training data.

In a debugging parameter experiment, the debugging parameter algorithm will select a group of parameters according to the user specified parameter range and platform debugging parameter algorithm rules. Based on the hyper-parameters and the training/test data set set by the user for debugging parameter, the debugging parameter algorithm completes a model training and evaluation. Debugging parameter algorithm carries out many such debugging parameter experiments, and finally select the hyper-parameters of one experiment with the best effect to train the model.

The number of label types of training data for parameter debugging training/training test/final model must be 2. Currently, automatic hyper-parametric logistic regression only supports dichotomy. If the data is multi-classified, the job may fail.

Configuration Instructions:

Configuration name	Required	Description
Job name	Yes	It can only consist of numbers, letters,-or _ and can only start with letters, with less than 40 characters in length
Algorithm or framework	Yes	Select Logistic Regression
Send SMS at the end of job	Yes	Text by default
L1 regularization coefficient range	Yes	Floating-point numbers greater than 0 and less than 1, support scientific counting
L2 regularization coefficient range	Yes	Floating-point numbers greater than 0 and less than 1, support scientific counting
Number of iterations in a single test	Yes	A positive integer from 10 to 200. In each test, the number of iterations of the algorithm will be selected within this range. In the test, all the training data participating in the parameter debugging once is called a round, also called an epoch
Test times	Yes	A positive integer of 10 to 100 is used for a total of "test times" of tests. Each test selects a group of hyper parameters and combines them with the debugging parameter training data to obtain a model, and uses the debugging parameter test data to evaluate the advantages and disadvantages. After all tests are completed, the platform will select the optimal hyper-parameters, and then output the final model combined with the model training data
Input data format	Yes	Options include: sparse without weight value, sparse with weight value, and dense data. See the algorithm format requirements on the page for details
Training data path for debugging parameter	Yes	Store the training data of debugging parameter, use the training data in each test and conduct model training with a group of hyper-parameters
Parameter test data path	Yes	Store the test data of debugging parameter, use the test data in each test and evaluate the model in combination with the debugging parameter model
Model training data path	Yes	Store the training data of the model, AutoLR selects a set of super parameters with the best evaluation results from all tests, and outputs the final model in combination with the training data
Output path	Yes	The path to store the model and log. After the job succeeds, store the model in the path /{job_id}/model, and the log in the path /{job_id}/log
Computing resources	Yes	Currently, only BML clusters are supported
Resource package	Yes	Currently, only CPU instance_8 core_32GB memory is supported
Number of instances	Yes	2-4
Maximum running time	Yes	If the job runs for the maximum running time, BML will automatically force the job to stop, which may cause job failure

Example configuration:

Training data is the SUSY data downloaded from the Internet. Comma, sed -i s/^/,/g yourfile, has been added at the beginning of each row of data, which is divided and stored on the public BOS. After downloading the data, you can divide and store the debugging parameter training data/debugging parameter test/model training data on your bos, or you can directly use our public bos data for training. Input data format: Dense data Debugging parameter training data path: bos:/bml-public/automl-demo/data/susy-train/ Parameter testing data path: bos:/bml-public/automl-demo/data/susy-test/ Model training data path: bos:/bml-public/automl-demo/data/susy-all/ Model output path and log output path to configure your own bos path.

Click "OK" to submit the job.

Notes for Model Output Format:

The output model is mainly the weight parameters corresponding to each feature dimension in the Logistic Regression model
The output is in plain text format, each row represents a feature dimension, and a total of three fields are divided by spaces,namely the weight parameter of the feature, the internal ID of the feature in the debugging parameter algorithm, and the original name of the feature.
Only features whose weight parameter is not 0 are output

Terminate: terminate the job that is currently running or queued, no longer queued, no longer running. After termination of operation, the job results and job logs will not be uploaded to the specified BOS path.
Clone: clone the configuration item of a job to enter create job page.
Delete: delete the job. If the job is still queued or running at the time of deletion, the queue or running will be terminated first, and then the job will be deleted. After deletion, the job will disappear from the job list.
View job details: click job name to enter job details, view job information, parameter information and cluster information.
View operation details: click job name and select operation details tab to enter operation details and view operation status, start and end time, log details, operation curve, etc.

View Job Results

After the job runs successfully, the model and debugging parameter log will be stored in the corresponding BOS address according to the model output path and log output path specified during job configuration. Users need to go to BOS to view or download the job model and log.

If the job model and job log cannot be saved, the following conditions may occur:

Terminate job manually
Job runs timeout and is automatically terminated
Job failed to run

User job failed, possibly due to the following conditions:

Training data of debugging parameter training/debugging parameter training test/final model does not match the data format.
BOS address of training data of debugging parameter training/debugging parameter training test/final model does not exist or is not accessible
Bucket for output log / model does not exist or is not accessible
Training timeout

AutoDL Job

Visual Modeling

百度智能云

Baidu Machine Learning

AutoML Job

Create a Job

Logistic Regression

View Job Results

Baidu Machine Learning

AutoML Job

Create a Job

Logistic Regression

Job List Related Operation

View Job Results