Machine Learning (BML)

Machine Learning (BML), an end-to-end machine learning platform designed for enterprises and AI developers, can accomplish one-stop data pre-processing, model training and evaluation, service deployments, among others.
Machine Learning (BML)
  • Overview
  • Features
  • Positioning
  • Advantages
  • Related Products

Overview

The Baidu AI Cloud AI development platform BML is an end-to-end AI development and deployment platform. Based on the BML, users can accomplish the one-stop data pre-processing, model training and evaluation, service deployment, and other works. The platform provides a high-performance cluster training environment, massive algorithm frameworks and model cases, as well as easy-to-operate prediction service tools. Thus, it allows users to focus on the model and algorithm and obtain excellent model and prediction results.

Features

Working area

The fully hosted interactive programming environment realizes the data processing and code debugging.

Have a click-to-run interactive operating environment Jupyter

The fully hosted interactive programming environment realizes the data processing and code debugging. Have a click-to-run interactive operating environment Jupyter The fully hosted Jupyter environment has several built-in algorithms framework and software library. You can click it to use without configuration. Meanwhile, the CPU instance supports users to install a third-party software library and customize the environment, ensuring flexibility in your use.

Provide GPU resources

The Jupyter operating environment in the working area provides users with GPU computing resources. The Jupyter accomplishes your light-weight data processing and training requirements quickly and efficiently. Also, it allows you always to get ready for massive training tasks.

Automatically synchronize the BOS data

It can upload the training data stored in the Baidu Object Storage (BOS) automatically and synchronizes the data in the container to the BOS.

Training

Several in-depth/machine learning frameworks enable you to initiate massive training jobs by a one-click operation.

Support several in-depth/machine learning frameworks

Support several in-depth learning frameworks, including Tensorflow/Pytorch/PaddlePaddle, and Rapids cuML machine learning framework. With the code, you can initiate a job by a one-click operation.

AutoDL/AutoML

Support auto image classification and logic regression hyperparameter optimization. To accomplish model training and continuous optimization, you need to provide training data and parameters only. Thus, it maximizes training efficiency and effectiveness.

Massively distributed training

Provide several kinds of CPU and GPU packages and support multiple-machine and multiple-card scenarios. You can use up to 8 Nvidia Tesla V100 GPU cards in a single machine.

Prediction

The prediction model is launched for the Beta test and provides high-efficiency and low-latency prediction service.

Support several frameworks

Support several prediction service frameworks, including TensorRT, PaddlePaddle, Anakin (a prediction service framework deeply optimized based on the PaddlePaddle).

Prediction model library

Match the model data and model operation environment (Container Image), and manage (adds/deletes/modifies) deployable prediction models and their versions.

Resource management

Configure cluster resources for service endpoints, monitor services in the production environment, and change the service resources online while ensuring the service availability.

A/B Test

The endpoint service supports the launch of different versions of the model. Thus, it enables customers to evaluate the effectiveness of various versions of the model.

Load management

Control the data flow to different endpoints, and provide an effective mechanism for Beta test of the new model, load balance, and service quality control.

Product Positioning


Working area
Training
Prediction
User type
  • Citizen data scientists
  • Focus on usability and duration
  • Expert data scientists
  • Focus on performance and resource utilization
  • Business management and operations personnel
  • Focus on performance and resource utilization
Paint spots
  • High cost: It occupies high-value resources for a long time.
  • Complex software environment: You need to configure the development environment and install appropriate software.
  • High cost & complicated engineering: The hardware and system are complicated to build, and its price remains high, and the high-performance cluster configuration is complicated.
  • Fast asset depreciation: The technical and system updating is rapid.
  • Resource utilization: Each person exclusively occupies several physical resources. Also, it is difficult to reuse it, and its recourse utilization remains low.
  • No well-proven tools and methodologies are available for deployment.
  • It is impossible to effectively manage the running status and mechanism of several model versions.
  • The manual launch and deployment process is complicated.
Features
  • “Start-to-use” IDE development environment supports the up-to-date Jupyter Lab.
  • You can save the working area environment for an extended period. It has a second-level re-start feature.
  • It provides several code examples for the users’ reference.
  • It has a cluster training environment built for you.
  • The work scheduling feature improves resource utilization.
  • The pay-as-you-go billing mode supports different GPU computing types from the high version to the low version.
  • Manage the prediction models
  • You can control the model launch process. It supports the release for the Beta test and the traffic distribution.
  • You can automatically deploy the configuration of the model. In the case of any error, it can go back.

Advantages

Getting Started easily
Getting Started easily
The click-to-run Jupyter environment has several kinds of common built-in frameworks, without having to configure the environment. Meanwhile, it supports several kinds of Auto algorithms, eliminating such onerous works as programming and hyperparameter optimization.
One-stop Development and Deployment
One-stop Development and Deployment
By merely clicking or using the API Calling in the console, you can initiate the training task in a one-stop manner, obtain the training model, and start the prediction service. It covers the whole process of AI development & deployment.
Flexible
Flexible
By merely clicking or using the API Calling in the console, you can initiate the training task in a one-stop manner, obtain the training model, and start the prediction service. It covers the whole process of AI development & deployment.
High Performance
High Performance
The product resources adopt container technology to achieve fast startup and release. The multiple-machine and multiple-card distributed training, and enterprise-level very-large-scale data support can shorten the development time significantly.

Related Products

Cloud Container Engine
Cloud Container Engine

An elastic and high-availability container cluster management platform

Object Storage
Object Storage

A stable, secure, high-efficiency and highly-scalable cloud storage service

Cloud Disk Storage
Cloud Disk Storage

A flexible, stable, and easy-to-scale IOPS block storage service