百度智能云

All Product Document

          Baidu Machine Learning

          Data Set

          The data set is a module to upload, manage and pre-process the data to be used in the modeling process, including user data and common data. The user data are the data you upload, and the common data are the common open-source data set provided by the platform. The user data and common data are continuously updated in the future.

          The visual modeling requires the use of data sets. It means that if you want to make modeling by dragging and dropping the components, you should first upload the data in the data set. Currently, only the table formatted data are supported. The platform converts the data of csv\txt\tsv formats into parquet format, and meanwhile makes a simple preprocessing and save the data in your BOS. Then you can use the data in the visual modeling.

          Note Currently, the fees are not charged for the data set module. But because the data files are saved in the BOS, the BOS fees are generated.

          User Data

          Data set list

          The page of data set list displays the name, type, status, data volume (for the data list format, namely, the lines of data) creation time, update time and operation of the data set.

          Create a data set task and upload data

          Click "Create Data Set" button to pop up the window of "Create Data Set". Fill in the data set name, and the data storage path provides the default values. Of course, you can also select the BOS path, click "Confirm" to create the data set task.

          Upload the data in BOS in advance. Here takes the open-source iris data set for example, the iris.csv file is previewed as below. The data has no header, and the column separator is the halfwidth comma:

          Then click [Upload] button, and the page jumps to the [Upload Data] page. Fill in the upload data list configuration: Upload options (supplement refers to upload of new data or supplement of data of the same dimension, and replacement refers to replacement of data of different dimensions), upload modes (currently, only supporting uploading from BOS), upload path, column separator, whether there is a header, and coded format, as shown in the figure:

          Click [Next] to conduct the data preprocessing configuration. Select the abnormal processing mode. Meanwhile, you can modify the column name or data format.

          Details of data set

          When the data set status becomes successful, click the data set name to enter the page of data set details. You can switch the tags to view the basic information, original data and statistical data.

          The statistical data include the simple statistical results of data set, include the number of unique values, number of missing values, mean value, variance, standard deviation, etc. You can drag the slipper for viewing.

          Public Data

          Currently, the public data set presets the data sets iris and Boston Housing.

          click the data set name to enter the page of data set name details. You view the basic information, original data, statistical data and user data.

          Raw data:

          Granularity data:

          Previous
          Data Annotation
          Next
          Notebook Modeling