Baidu AI Cloud
中国站

百度智能云

Data Warehouse

SHOW-LOAD

SHOW LOAD

Description

This statement is used to show the execution of specified load task.

SHOW LOAD
[FROM db_name]
[
    WHERE 
    [LABEL [ = "your_label" | LIKE "label_matcher"]]
    [STATE = ["PENDING"|"LOADING"|"FINISHED"|"CANCELLED"|]]
]
[ORDER BY ...]
[LIMIT limit][OFFSET offset];
  • db_name

    View load jobs under specified database. If not specified, use the current database.

  • LABEL

    LLabel can be accurately matched by equivalent value, or fuzzy matched by Like.

  • STATE

    View load tasks in specified status.

    1. PENDING: load has been submitted, but has not been executed yet.
    2. LOADING: load is in progress.
    3. FINISHED: load is finished successfully.
    4. CANCELLED: load failed.
  • ORDER BY ... LIMIT [OFFSET]

    It supports ordering result sets by any column.

    ORDER BY createtime DESC LIMIT 10,20

The return result of the SHOW LOAD command is as follows:

mysql> show load order by createtime desc limit 1\G
*************************** 1. row ***************************
         JobId: 76391
         Label: label1
         State: FINISHED
      Progress: ETL:N/A; LOAD:100%
          Type: BROKER
       EtlInfo: unselected.rows=4; dpp.abnorm.ALL=15; dpp.norm.ALL=28133376
      TaskInfo: cluster:N/A; timeout(s):10800; max_filter_ratio:5.0E-5
      ErrorMsg: N/A
    CreateTime: 2019-07-27 11:46:42
  EtlStartTime: 2019-07-27 11:46:44
 EtlFinishTime: 2019-07-27 11:46:44
 LoadStartTime: 2019-07-27 11:46:44
LoadFinishTime: 2019-07-27 11:50:16
           URL: http://192.168.1.1:8040/api/_load_error_log?file=__shard_4/error_log_insert_stmt_4bb00753932c491a-a6da6e2725415317_4bb00753932c491a_a6da6e2725415317
    JobDetails: {"Unfinished backends":{"9c3441027ff948a0-8287923329a2b6a7":[10002]},"ScannedRows":2390016,"TaskNumber":1,"All backends":{"9c3441027ff948a0-8287923329a2b6a7":[10002]},"FileNumber":1,"FileSize":1073741824}

The follow mainly introduces that meaning of parameters in the result set return by view load command:

  • JobId

    unique ID of load task, JobId in different load tasks is different, which is automatically generated by the system. Unlike Label, JobId will never be the same, and Label can be duplicated after the load task fails.

  • Label

    Identification of load task.

  • State

    Current stage of the load task. There will be PENDING and LOADING status appearing in the loading process of Broker. If Broker load is in PENDING status, it means that the current load task is waiting to be executed. If in LOADING status, it indicates that it is being executed.

    There are two final stages of the load task: CANCELLED and FINISHED, when the Load job is in these two stages, the load is completed. In which CANCELLED means load failed and FINISHED means load succeeded.

  • Progress

    Description of load task progress. There are two kinds of progress: ETL and LOAD, corresponding to the two stages of the load process: ETL and LOADING. At present, Broker load has only LOADING stage, so ETL will always be displayed as N/A

    The progress range of LOAD is: 0~100%。

    LOAD progress = the number of tables currently loaded/the total number of tables designed for this load task * 100%

    If all load tables are loaded, then the progress of LOAD will show 99% and the load will enter the final effective stage. After the whole load is completed, the progress of LOAD will be changed to 100%.

    The progress of loading is not linear. Therefore, if the progress does not change within a period of time, it does not mean that the load is not being executed.

  • Type

    Type of task loaded. Broker load has only the type BROKER.

  • EtlInfo

    It mainly displays the loaded data quantity indicators unselected.rows , dpp.norm.ALL and dpp.abnorm.ALL. Users can judge how many rows are filtered by the where condition according to the first numerical value, and verify whether the error rate of the current load task exceeds max_filter_ratio by the latter two indicators.

    The sum of three indicators is the total row number of the original data.

  • TaskInfo

    It mainly displays the current load task parameters, that is, the load task parameters specified by the user when creating the Broker load task, including:clustertimeout andmax_filter_ratio.

  • ErrorMsg

    When the load task status is CANCELLED, the reason of failure will be displayed in two parts: type and msg, when the load task is successful, display N/A.

    The value meaning of type:

    USER_CANCEL:task cancelled by user
    ETL_RUN_FAIL:load task failed in ETL phase 
    ETL_QUALITY_UNSATISFIED:data quality is unqualified, that is, the error data rate exceeds max_filter_ratio
    LOAD_RUN_FAIL: load task TIMEOUT failed in LOADING phase:
    TIMEOUT:load task did not complete within timeout period
    UNKNOWN: unknown load error
  • CreateTime/EtlStartTime/EtlFinishTime/LoadStartTime/LoadFinishTime

    These values represent the load create time, ETL stage start time, ETL stage finish time, loading stage start time and the whole load task finish time.

    There is no ETL stage in Broker load, so its ETL starttime, ETL finishtime and loadstarttime are set to the same value.

    When the load task stays at CreateTime for a long time, while LoadStartTime is N/A, it indicates that load task is seriously piled up at present. Users can reduce the frequency of load submission.

    LoadFinishTime - CreateTime = time consumed by the whole load task 
    LoadFinishTime - LoadStartTime = Execution time of the whole Broker load task  = Time consumed by the whole load task-Waiting time of the load task
  • URL

    Load the error data sample of the task, and access the URL address to get the error data sample loaded this time. If there is no error data in this load, the URL field is N/A.

  • JobDetails

    Display the detailed running status of some jobs, including the number of loaded files, the total size (bytes), the number of subtasks, the number of processed original rows, the BE node Id of running subtasks, and the BE node Id of unfinished subtasks.

    {"Unfinished backends":{"9c3441027ff948a0-8287923329a2b6a7":[10002]},"ScannedRows":2390016,"TaskNumber":1,"All backends":{"9c3441027ff948a0-8287923329a2b6a7":[10002]},"FileNumber":1,"FileSize":1073741824}

    In which the number of processed original rows is updated every 5 seconds. The number of rows is only used to show the current progress, which does not represent the final actual number of rows processed. The actual number of rows processed shall be as shown in EtlInfo.

Example

  1. Show all load tasks of the default db

    SHOW LOAD;
  2. Show the load task of specified db, the label contains the string "2014_01_02".

    SHOW LOAD FROM example_db WHERE LABEL LIKE "2020_01_02" LIMIT 10;
  3. 展示指定 db 的导入任务,指定 label 为 "load_example_db_20140102" 并按 LoadStartTime 降序排序。

    SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20200102" ORDER BY LoadStartTime DESC;
  4. Show the load task of the specified db, specify the label as "load_example_db_20140102" and order in descending order of LoadStartTime

    SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20140102" AND STATE = "loading" ORDER BY LoadStartTime DESC;
  5. 展示指定 db 的导入任务 并按 LoadStartTime 降序排序,并从偏移量5开始显示10条查询结果

    SHOW LOAD FROM example_db ORDER BY LoadStartTime DESC limit 5,10;
    SHOW LOAD FROM example_db ORDER BY LoadStartTime DESC limit 10 offset 5;

Keywords

SHOW, LOAD

Best Practices

  1. Use in combination with LOAD command

    It is recommended that user view the status of load tasks submitted by polling by SHOW LOAD . For example, for an load task with a specified Label, the load status is polled at intervals of 10 seconds or longer, and the next operation is performed according to the status.

    For example, wait for the load task submitted in the previous batch to be completed before submitting in the next batch, or retry the failed task.

  2. Query multiple load tasks with the same Label.

    If the load task corresponding to a Label fails, the user can continue to use the Label to resubmit the task. In this case, querying the Label will return multiple load jobs (including newly submitted and previously failed). At this time, if to query the latest submission, you can use the following command:

    SHOW LOAD WHERE LABEl="my_label" ORDER BY createtime DESC LIMIT 1;
  3. View data quality problem

    If the load task fails and the error message shows ETL_QUALITY_UNSATISFIED , it indicates that there is load quality problem. A link will be given in URL field. The error data information given in this link can be viewed with the following command:

    SHOW LOAD WARNINGS ON "your_url";

    Here are some common data quality problems:

    • no partition for this tuple

      This row of data has no corresponding partition in the table. It is required to check whether the data falls within the partition range of the table.

    • null is not allowed for bitmap column

      Loading null values into bitmap type field is not allowed.

    • the length of input is too long than schema

      The length of string exceeds the length defined by the column. It is required to consider increasing the defined maximum length of VARCHAR column.

    • decimal value is not valid for definition

      Decimal type does not match in precision.

    • Content of HLL type column is invalid

      The value type corresponding to HLL column is not correct. It is required to check whether the hll_hash function is used to convert the data.

    • null value for not null column

      Try to load a null value into a column that is not nullable

    • Parse json data for JsonDoc failed

      Parsing Json data failed.

    • JSON Root not found

      Given json_root does not match.

    • JSON data is array-object, strip_outer_array must be TRUE.

      strip_outer_array is false, but the Json data is an array.

    • JSON data is not an array-object, strip_outer_array must be FALSE

      strip_outer_array is true, but the Json data is not an array.

    • Json value is null, but the column xx is not nullable

      While loading Json data, try to load null into a field that is not nullable.

    • The column xx is not nullable, but it's not found in jsondata.

      When loading Json data, no field matched, and the coloumn is not nullable.

    • All fields is null, this is a invalid row.

      When loading Json data, all field names in a row cannot match the data.

    • Empty json line

      Empty Json data.

    • data is not encoded by UTF-8

      Source data is not UTF-8 encoded.

    • actual column number is less than schema column number.

      The number of columns in the source data is less than that in the table.

    • actual column number is more than schema column number.

      The number of columns in the source data is more than that in the table.

    • column xx value is incorrect while strict mode is true

      In Strict Mode,the source data is converted to null.

    • column xx value is null while columns is not nullable

      Try to load null value into the coloumn is not nullable.

  4. View the progress of job execution

    As the progress displayed in Progress in the returned result is not linear, sometimes it is inconvenient to query progress through this field. At this time, we can also observe whether the job is running through ScannedRows in JobDetails field.

    Under normal circumstances, the ScannedRows of a running job will be updated every 5 seconds, indicating the number of rows of data that have been read. But sometimes, due to the data quality problems, all data rows have been filtered, resulting in the ScannedRows never increasing or being 0. At this time, All backends and Unfinished backends fields can also be viewed. The former represents all BE nodes involved in this load task, while the latter represents the BE nodes of unfinished tasks.

  5. About TaskNumber

    The TaskNumber field in JobDetails does not represent the concurrency of load jobs. This only represents the number of DATA INFILE clauses in the Load statement.

Previous
SHOW-RESOURCES
Next
SHOW-RESTORE