Create a Step

Last Updated：2020-09-23

You can add java and streaming steps to cluster, which uses hadoop image. You can add spark, java, and streaming steps to cluster, which uses spark image. After adding an application to cluster, you can add such an application’s step. For example, if you add hive application when creating clusters, you can create the hive step; if you add pig application, you can create the pig step. How to create a step:

In "Product Service>MapReduce>Baidu MapReduce-Homework List" page, click "Create a step" to enter the step creation page.
In step creation page, select step type and configure parameters according to such step type. Description of parameter configuration for all types of steps:
- Streaming step
  - step name: Enter the step name with length not exceeding 255 characters.
  - Mapper: It divides the step input by you into several steps and processes them. The input address for Mapper is the application’s address in BOS.
  - Reducer: It summarizes the processing results of the divided tasks. The input address for Reducer is the application’s address in BOS.
  - bos input address: It must exist, and you have access to files at this address.
  - bos output address: The read-in address after bucket must not exist, but you have the privilege to write to this address, or the step cannot run.
  - Action after failure: You can select action after the step fails: Continue (after the step fails, continue the next step) and Wait (after the step fails, check the step running status and cancel the subsequent steps).
  - Application parameters: In case of the setup of other parameters than the above five parameters which are required for the streaming program, you need to configure the parameters (separated with space) in the parameter input box. You only need to input parameter strings that are separated with space. You do not need parameter escape and url encode.
- Java step
  - step name: Enter the step name with length not exceeding 255 characters.
  - Application location: Enter the JAR package’s address on the bos.
  - Action after failure: You can select action after the step fails: Continue (after the step fails, continue the next step) and Wait (after the step fails, check the step running status and cancel the subsequent steps).
  - MainClass: Write the class name of the master program specified by org.apache.MyClass.
  - Application parameters: Enter the parameters, which are transmitted to the main function in MainClass without any modification. You only need to input parameter strings that are separated with space. You do not need parameter escape and url encode.
- Spark step
  - step name: Enter the step name with length not exceeding 255 characters.
  - Application location: Enter the JAR package’s address on the bos.
  - Action after failure: You can select action after the step fails: Continue (after the step fails, continue the next step) and Wait (after the step fails, check the step running status and cancel the subsequent steps).
  - Spark-submit: Spark’s system parameter and Spark observes the following rules when determining the Spark-submit input:
    - In case of multiple setups of the following parameters, the last setup takes effect: --name, --driver-memory, --driver-java-options, --driver-library-path, --driver-class-path, --executor-memory, --executor-cores, --queue, --num-executors, --properties-file, --jars, --files, and --archives.
    - You cannot specify the following parameters: --master, --deploy-mode, --py-files, --driver-cores, --total-executor-cores, --supervise, and --help. The specification does not take effect on the parameters which you cannot specify.
    - You only need to input parameter strings that are separated with space. You do not need parameter escape and url encode.
    - Application parameters: Enter the custom parameters.
- Pig step
  - step name: Enter the step name with length not exceeding 255 characters.
  - bos script address: bos script address must be a valid bos path to Hive script.
  - bos input address: It must exist, and you have access to files at this address. In the script, you can use ${INPUT} to refer to this address.
  - bos output address: The read-in address after bucket must not exist, but you have the privilege to write to this address, or the step cannot run. In the script, you can use ${OUTPUT} to refer to this address.
  - Action after failure: You can select action after the step fails: Continue (after the step fails, continue the next step) and Wait (after the step fails, check the step running status and cancel the subsequent steps).
  - Application parameters: Enter the following specified parameters for configuration: -D key=value is to specify the configuration, and -p KEY=VALUE is to specify the variable. You can add custom parameters. You only need to input parameter strings that are separated with space. You do not need parameter escape and url encode.
- Hive step
  - step name: Enter the step name with length not exceeding 255 characters.
  - bos script address: BOS script address must be a valid BOS path to Hive script.
  - bos input address: It must exist, and you have access to files at this address. In the script, you can use ${INPUT} to refer to this address.
  - bos output address: The read-in address after bucket must not exist, but you have the privilege to write to this address, or the step cannot run. In the script, you can use ${OUTPUT} to refer to this address.
  - Action after failure: You can select action after the step fails: Continue (after the step fails, continue the next step) and Wait (after the step fails, check the step running status and cancel the subsequent steps).
  - Application parameters: Only two types are acceptable: --hiveconf key=value and --hivevar key=value. The former parameter is used to overwrite the configuration when the hive is running. The latter parameter is used to state the custom variable, and you can use ${KEY} to refer to it in the script. You only need to input parameter strings that are separated with space. You do not need parameter escape and url encode.
Select the adaptive cluster.
Click "Finish" to complete the creation of the step.
The step status changes from "Waiting" to "Running", and the status becomes "Finished" after the step is completed.
(Optional) You can only cancel steps in waiting or running status by clicking "Cancel a step".

Use Remote Files

Use the "Additional Files" function if your step parameters are dependent on local files. Use the remote files directly by mapping them to the local path. For example, the -libjars parameter in hadoop only supports local files, and -libjars can use the files on BOS if you add the additional file parameters. The Hadoop step reads the files after you upload them to BOS.

Please be noted that the file names used in application parameters need to be the same as those set in local file paths. For example, for an additional file, the remote path is bos://path/to/testA.jar, the local path is testB.jar, and the application parameter is -libjars testB.jar.

Operating steps:

In the step creation page, enter the "Remote Path" and "Local Path" of the "Additional File"; the "Remote Path" is the file path on BOS, and the "Local Path" is the file name to use in the step parameters.

View details:

The "Running Parameters" in the step details page show the remote path and local path of the additional file.

The step list in the cluster details page also contains the information on additional files.

Note:

Perform the above steps if you want to add the additional files to the steps of the scheduled task.

Access Clusters

View Steps

MapReduce

Create a Step

Streaming step

Java step

Spark step

Pig step

Hive step

Use Remote Files