百度智能云

All Product Document

          Baidu Machine Learning

          AutoML Job

          UtoML can simplify the complicated process of algorithm modeling and model parameter debugging, automatically carry out super-parameter learning, and then automatically build high-precision models, saving labor and lowering the threshold of machine learning.

          Create a Job

          Select "Training-->AutoML job" in the left navigation bar to enter the AutoML job list page. Click the "Create job" button to enter the create job process.

          Logistic Regression

          Logical regression automatically selects the optimal model training hyper-parameters through the debugging parameter training data and debugging parameter test data given by the user, and outputs a trained model to the user in cooperation with the model training data.

          In a debugging parameter experiment, the debugging parameter algorithm will select a group of parameters according to the user specified parameter range and platform debugging parameter algorithm rules. Based on the hyper-parameters and the training/test data set set by the user for debugging parameter, the debugging parameter algorithm completes a model training and evaluation. Debugging parameter algorithm carries out many such debugging parameter experiments, and finally select the hyper-parameters of one experiment with the best effect to train the model.

          The number of label types of training data for parameter debugging training/training test/final model must be 2. Currently, automatic hyper-parametric logistic regression only supports dichotomy. If the data is multi-classified, the job may fail.

          Configuration Instructions:

          Configuration name Required Description
          Job name Yes It can only consist of numbers, letters,-or _ and can only start with letters, with less than 40 characters in length
          Algorithm or framework Yes Select Logistic Regression
          Send SMS at the end of job Yes Text by default
          L1 regularization coefficient range Yes Floating-point numbers greater than 0 and less than 1, support scientific counting
          L2 regularization coefficient range Yes Floating-point numbers greater than 0 and less than 1, support scientific counting
          Number of iterations in a single test Yes A positive integer from 10 to 200. In each test, the number of iterations of the algorithm will be selected within this range. In the test, all the training data participating in the parameter debugging once is called a round, also called an epoch
          Test times Yes A positive integer of 10 to 100 is used for a total of "test times" of tests. Each test selects a group of hyper parameters and combines them with the debugging parameter training data to obtain a model, and uses the debugging parameter test data to evaluate the advantages and disadvantages. After all tests are completed, the platform will select the optimal hyper-parameters, and then output the final model combined with the model training data
          Input data format Yes Options include: sparse without weight value, sparse with weight value, and dense data. See the algorithm format requirements on the page for details
          Training data path for debugging parameter Yes Store the training data of debugging parameter, use the training data in each test and conduct model training with a group of hyper-parameters
          Parameter test data path Yes Store the test data of debugging parameter, use the test data in each test and evaluate the model in combination with the debugging parameter model
          Model training data path Yes Store the training data of the model, AutoLR selects a set of super parameters with the best evaluation results from all tests, and outputs the final model in combination with the training data
          Output path Yes The path to store the model and log. After the job succeeds, store the model in the path /{job_id}/model, and the log in the path /{job_id}/log
          Computing resources Yes Currently, only BML clusters are supported
          Resource package Yes Currently, only CPU instance_8 core_32GB memory is supported
          Number of instances Yes 2-4
          Maximum running time Yes If the job runs for the maximum running time, BML will automatically force the job to stop, which may cause job failure

          Example configuration:

          Training data is the SUSY data downloaded from the Internet. Comma, sed -i s/^/,/g yourfile, has been added at the beginning of each row of data, which is divided and stored on the public BOS. After downloading the data, you can divide and store the debugging parameter training data/debugging parameter test/model training data on your bos, or you can directly use our public bos data for training. Input data format: Dense data Debugging parameter training data path: bos:/bml-public/automl-demo/data/susy-train/ Parameter testing data path: bos:/bml-public/automl-demo/data/susy-test/ Model training data path: bos:/bml-public/automl-demo/data/susy-all/ Model output path and log output path to configure your own bos path.

          Click "OK" to submit the job.

          Notes for Model Output Format:

          • The output model is mainly the weight parameters corresponding to each feature dimension in the Logistic Regression model
          • The output is in plain text format, each row represents a feature dimension, and a total of three fields are divided by spaces,namely the weight parameter of the feature, the internal ID of the feature in the debugging parameter algorithm, and the original name of the feature.
          • Only features whose weight parameter is not 0 are output
          • Terminate: terminate the job that is currently running or queued, no longer queued, no longer running. After termination of operation, the job results and job logs will not be uploaded to the specified BOS path.
          • Clone: clone the configuration item of a job to enter create job page.
          • Delete: delete the job. If the job is still queued or running at the time of deletion, the queue or running will be terminated first, and then the job will be deleted. After deletion, the job will disappear from the job list.
          • View job details: click job name to enter job details, view job information, parameter information and cluster information.
          • View operation details: click job name and select operation details tab to enter operation details and view operation status, start and end time, log details, operation curve, etc.

          View Job Results

          After the job runs successfully, the model and debugging parameter log will be stored in the corresponding BOS address according to the model output path and log output path specified during job configuration. Users need to go to BOS to view or download the job model and log.

          If the job model and job log cannot be saved, the following conditions may occur:

          • Terminate job manually
          • Job runs timeout and is automatically terminated
          • Job failed to run

          User job failed, possibly due to the following conditions:

          • Training data of debugging parameter training/debugging parameter training test/final model does not match the data format.
          • BOS address of training data of debugging parameter training/debugging parameter training test/final model does not exist or is not accessible
          • Bucket for output log / model does not exist or is not accessible
          • Training timeout
          Previous
          AutoDL Job
          Next
          Visual Modeling