SHOW-LOAD
SHOW LOAD
Description
This statement is used to show the execution of specified load task.
SHOW LOAD
[FROM db_name]
[
WHERE
[LABEL [ = "your_label" | LIKE "label_matcher"]]
[STATE = ["PENDING"|"LOADING"|"FINISHED"|"CANCELLED"|]]
]
[ORDER BY ...]
[LIMIT limit][OFFSET offset];
-
db_name
View load jobs under specified database. If not specified, use the current database.
-
LABEL
LLabel can be accurately matched by equivalent value, or fuzzy matched by Like.
-
STATE
View load tasks in specified status.
- PENDING: load has been submitted, but has not been executed yet.
- LOADING: load is in progress.
- FINISHED: load is finished successfully.
- CANCELLED: load failed.
-
ORDER BY ... LIMIT [OFFSET]
It supports ordering result sets by any column.
ORDER BY createtime DESC LIMIT 10,20
The return result of the SHOW LOAD command is as follows:
mysql> show load order by createtime desc limit 1\G
*************************** 1. row ***************************
JobId: 76391
Label: label1
State: FINISHED
Progress: ETL:N/A; LOAD:100%
Type: BROKER
EtlInfo: unselected.rows=4; dpp.abnorm.ALL=15; dpp.norm.ALL=28133376
TaskInfo: cluster:N/A; timeout(s):10800; max_filter_ratio:5.0E-5
ErrorMsg: N/A
CreateTime: 2019-07-27 11:46:42
EtlStartTime: 2019-07-27 11:46:44
EtlFinishTime: 2019-07-27 11:46:44
LoadStartTime: 2019-07-27 11:46:44
LoadFinishTime: 2019-07-27 11:50:16
URL: http://192.168.1.1:8040/api/_load_error_log?file=__shard_4/error_log_insert_stmt_4bb00753932c491a-a6da6e2725415317_4bb00753932c491a_a6da6e2725415317
JobDetails: {"Unfinished backends":{"9c3441027ff948a0-8287923329a2b6a7":[10002]},"ScannedRows":2390016,"TaskNumber":1,"All backends":{"9c3441027ff948a0-8287923329a2b6a7":[10002]},"FileNumber":1,"FileSize":1073741824}
The follow mainly introduces that meaning of parameters in the result set return by view load command:
-
JobId
unique ID of load task, JobId in different load tasks is different, which is automatically generated by the system. Unlike Label, JobId will never be the same, and Label can be duplicated after the load task fails.
-
Label
Identification of load task.
-
State
Current stage of the load task. There will be PENDING and LOADING status appearing in the loading process of Broker. If Broker load is in PENDING status, it means that the current load task is waiting to be executed. If in LOADING status, it indicates that it is being executed.
There are two final stages of the load task: CANCELLED and FINISHED, when the Load job is in these two stages, the load is completed. In which CANCELLED means load failed and FINISHED means load succeeded.
-
Progress
Description of load task progress. There are two kinds of progress: ETL and LOAD, corresponding to the two stages of the load process: ETL and LOADING. At present, Broker load has only LOADING stage, so ETL will always be displayed as
N/A
The progress range of LOAD is: 0~100%。
LOAD progress = the number of tables currently loaded/the total number of tables designed for this load task * 100%
If all load tables are loaded, then the progress of LOAD will show 99% and the load will enter the final effective stage. After the whole load is completed, the progress of LOAD will be changed to 100%.
The progress of loading is not linear. Therefore, if the progress does not change within a period of time, it does not mean that the load is not being executed.
-
Type
Type of task loaded. Broker load has only the type BROKER.
-
EtlInfo
It mainly displays the loaded data quantity indicators
unselected.rows
,dpp.norm.ALL
anddpp.abnorm.ALL
. Users can judge how many rows are filtered by the where condition according to the first numerical value, and verify whether the error rate of the current load task exceedsmax_filter_ratio
by the latter two indicators.The sum of three indicators is the total row number of the original data.
-
TaskInfo
It mainly displays the current load task parameters, that is, the load task parameters specified by the user when creating the Broker load task, including:
cluster
,timeout
andmax_filter_ratio
. -
ErrorMsg
When the load task status is CANCELLED, the reason of failure will be displayed in two parts: type and msg, when the load task is successful, display
N/A
.The value meaning of type:
USER_CANCEL:task cancelled by user ETL_RUN_FAIL:load task failed in ETL phase ETL_QUALITY_UNSATISFIED:data quality is unqualified, that is, the error data rate exceeds max_filter_ratio LOAD_RUN_FAIL: load task TIMEOUT failed in LOADING phase: TIMEOUT:load task did not complete within timeout period UNKNOWN: unknown load error
-
CreateTime/EtlStartTime/EtlFinishTime/LoadStartTime/LoadFinishTime
These values represent the load create time, ETL stage start time, ETL stage finish time, loading stage start time and the whole load task finish time.
There is no ETL stage in Broker load, so its ETL starttime, ETL finishtime and loadstarttime are set to the same value.
When the load task stays at CreateTime for a long time, while LoadStartTime is N/A, it indicates that load task is seriously piled up at present. Users can reduce the frequency of load submission.
LoadFinishTime - CreateTime = time consumed by the whole load task LoadFinishTime - LoadStartTime = Execution time of the whole Broker load task = Time consumed by the whole load task-Waiting time of the load task
-
URL
Load the error data sample of the task, and access the URL address to get the error data sample loaded this time. If there is no error data in this load, the URL field is N/A.
-
JobDetails
Display the detailed running status of some jobs, including the number of loaded files, the total size (bytes), the number of subtasks, the number of processed original rows, the BE node Id of running subtasks, and the BE node Id of unfinished subtasks.
{"Unfinished backends":{"9c3441027ff948a0-8287923329a2b6a7":[10002]},"ScannedRows":2390016,"TaskNumber":1,"All backends":{"9c3441027ff948a0-8287923329a2b6a7":[10002]},"FileNumber":1,"FileSize":1073741824}
In which the number of processed original rows is updated every 5 seconds. The number of rows is only used to show the current progress, which does not represent the final actual number of rows processed. The actual number of rows processed shall be as shown in EtlInfo.
Example
-
Show all load tasks of the default db
SHOW LOAD;
-
Show the load task of specified db, the label contains the string "2014_01_02".
SHOW LOAD FROM example_db WHERE LABEL LIKE "2020_01_02" LIMIT 10;
-
展示指定 db 的导入任务,指定 label 为 "load_example_db_20140102" 并按 LoadStartTime 降序排序。
SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20200102" ORDER BY LoadStartTime DESC;
-
Show the load task of the specified db, specify the label as "load_example_db_20140102" and order in descending order of LoadStartTime
SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20140102" AND STATE = "loading" ORDER BY LoadStartTime DESC;
-
展示指定 db 的导入任务 并按 LoadStartTime 降序排序,并从偏移量5开始显示10条查询结果
SHOW LOAD FROM example_db ORDER BY LoadStartTime DESC limit 5,10; SHOW LOAD FROM example_db ORDER BY LoadStartTime DESC limit 10 offset 5;
Keywords
SHOW, LOAD
Best Practices
-
Use in combination with LOAD command
It is recommended that user view the status of load tasks submitted by polling by
SHOW LOAD
. For example, for an load task with a specified Label, the load status is polled at intervals of 10 seconds or longer, and the next operation is performed according to the status.For example, wait for the load task submitted in the previous batch to be completed before submitting in the next batch, or retry the failed task.
-
Query multiple load tasks with the same Label.
If the load task corresponding to a Label fails, the user can continue to use the Label to resubmit the task. In this case, querying the Label will return multiple load jobs (including newly submitted and previously failed). At this time, if to query the latest submission, you can use the following command:
SHOW LOAD WHERE LABEl="my_label" ORDER BY createtime DESC LIMIT 1;
-
View data quality problem
If the load task fails and the error message shows
ETL_QUALITY_UNSATISFIED
, it indicates that there is load quality problem. A link will be given inURL
field. The error data information given in this link can be viewed with the following command:SHOW LOAD WARNINGS ON "your_url";
Here are some common data quality problems:
-
no partition for this tuple
This row of data has no corresponding partition in the table. It is required to check whether the data falls within the partition range of the table.
-
null is not allowed for bitmap column
Loading null values into bitmap type field is not allowed.
-
the length of input is too long than schema
The length of string exceeds the length defined by the column. It is required to consider increasing the defined maximum length of VARCHAR column.
-
decimal value is not valid for definition
Decimal type does not match in precision.
-
Content of HLL type column is invalid
The value type corresponding to HLL column is not correct. It is required to check whether the hll_hash function is used to convert the data.
-
null value for not null column
Try to load a null value into a column that is not nullable
-
Parse json data for JsonDoc failed
Parsing Json data failed.
-
JSON Root not found
Given json_root does not match.
-
JSON data is array-object, strip_outer_array must be TRUE.
strip_outer_array
is false, but the Json data is an array. -
JSON data is not an array-object, strip_outer_array must be FALSE
strip_outer_array
is true, but the Json data is not an array. -
Json value is null, but the column xx is not nullable
While loading Json data, try to load null into a field that is not nullable.
-
The column xx is not nullable, but it's not found in jsondata.
When loading Json data, no field matched, and the coloumn is not nullable.
-
All fields is null, this is a invalid row.
When loading Json data, all field names in a row cannot match the data.
-
Empty json line
Empty Json data.
-
data is not encoded by UTF-8
Source data is not UTF-8 encoded.
-
actual column number is less than schema column number.
The number of columns in the source data is less than that in the table.
-
actual column number is more than schema column number.
The number of columns in the source data is more than that in the table.
-
column xx value is incorrect while strict mode is true
In Strict Mode,the source data is converted to null.
-
column xx value is null while columns is not nullable
Try to load null value into the coloumn is not nullable.
-
-
View the progress of job execution
As the progress displayed in
Progress
in the returned result is not linear, sometimes it is inconvenient to query progress through this field. At this time, we can also observe whether the job is running throughScannedRows
inJobDetails
field.Under normal circumstances, the
ScannedRows
of a running job will be updated every 5 seconds, indicating the number of rows of data that have been read. But sometimes, due to the data quality problems, all data rows have been filtered, resulting in theScannedRows
never increasing or being 0. At this time,All backends
andUnfinished backends
fields can also be viewed. The former represents all BE nodes involved in this load task, while the latter represents the BE nodes of unfinished tasks. -
About TaskNumber
The TaskNumber field in
JobDetails
does not represent the concurrency of load jobs. This only represents the number ofDATA INFILE
clauses in the Load statement.