Hadoop-Manager

Last Updated：2020-10-21

Overview

As a fine granularity hadoop management platform provided to the users by BMR, the Hadoop Manager serves the functions of cluster monitoring, service management, and hadoop multi-tenant management.

Log in to the System

Select "Product Service>Data Analysis>Baidu MapReduce>Cluster" to enter the cluster list page.

Note: You need to create a cluster if no cluster is currently available, and for more information about cluster creation, please see Cluster Administration.
Click the "Administration" button in the "Action" bar of the cluster list to enter the system page. The system users include system administrator and common user.
Notes:
- System administrator: It is an administrator by default (account/password: admin/admin). For security reasons, the administrator needs to change the login password after logging in to the system. The administrator can add a user who plays the role of a system administrator and holds the same privilege as that of the system administrator by default.
- Common user: It is an account for the common user, which is added and managed by the system administrator in the Hadoop Manager system.
You can see it in the Hadoop Manager page after logging in to the system. It includes cluster monitoring, service management, CVM server list, multi-tenant management, user management, etc.

Notes:

The new clusters after November 21, 2019 provide the password-free login function, and the old clusters before that date still need the login account and password.
The new cluster creates admin and admin_onlyread accounts by default. For security reasons, you need to contact customer service if you want a password.

Cluster Monitoring

As the following figure shows, you can obtain the monitoring information on key cluster indexes by clicking the cluster monitoring. You can obtain the specific index information according to index screening and time frame.

Notes:

CPU: Trend in the average ratio of the CPU system, CPU user, and CPU idle. Computing formula (take CPU system for example): (CPU system ratio of CVM server 1+CPU system ratio of CVM server 2+……+CPU system ratio of CVM server N)/N
Cluster load: Trend of the cluster in average system load per minute and an average number of running processes per minute.
Cluster memory: Trend in the total volume of cluster usage, shared memory, memory, cache memory, idle memory, and kernel cache.
Cluster network: Trend of cluster in total number of network input bytes per unit time and the total number of network output bytes per unit time.
HDFS disk usage: Trend in HDFS disk's total size and disk used size.
Namenode heap memory: Trend in total capacity and used capacity of namenode heap memory.
Yarn memory: Trend in total used capacity and total available capacity of yarn memory.

Service Management

Service List Page

In this page, you can view the current status, the number of components, and the last operation time of the services installed on the current cluster, and enable, disable and restart a particular service if required.

Notes:

In case of any change in the service configuration, you need to restart the service to validate the change. The restart message is in the service list (for example, YARN service in the above figure).

No component is operable if the service is restarted or disabled.

Service Detail Page

Take the HDFS service detail page as an example.

The component list shows the status and number of all components of the current service. You can enable, disable, or restart any component.

Note:

Other components are not operable if any component is restarted or disabled.

In the deployment list page, you can enable or disable any instance of the service component. For example, the DATANODE components of HDFS are installed on different CVM servers, and you can restart the DATANODE component of any machine.

In the monitoring page, you can see the monitoring status of some key indexes of HDFS service, such as disk and rpc delay.

The service monitoring page also supports the management of management items, and you can choose the monitoring items to display.

In the configuration page, you can selectively modify HDFS configuration, such as blocksize in the hdfs-site.xml configuration of HDFS.

Note:

Currently, Hive parameters do not support configuration modification.

When submitting the modified configuration, you need to enter and submit the modification remarks. You can check the remarks in the operating records.

As the following figure shows, you click the operating record at the upper right to enter the operating record page. The list shows the audit of all actions. Enabling or disabling service is time-consuming and not completed instantly. The operation list shows the action status.

CVM Server Monitoring

The CVM server monitoring fully shows the CVM server information of the current cluster. The information includes CVM server specification and disk usage rate; the CVM server detail page shows the CVM server's monitoring data.

After selecting the "CVM server list" and entering the CVM server administration interface, you can view all CVM server instances of the cluster, including CVM server name, private IP, RAM, disk capacity, specification, and other properties.
Notes:
- The alarm attention icon shows on the right of the CVM server name if the CVM server gives an alarm. You can click the icon to view the alarm information.
- The disk usage rate turns red when the CVM server's disk reaches the configuration alarm value.
You can click the CVM server's name to jump to its monitoring page, which shows cpu, disk, load, memory, network, number of processes, and other indexes. You can obtain the historical trend data of any index according to the time frame.

Notes:
- CPU: Trend in the ratio of CPU system, CPU user, CPU idle, and CPU I/O idle.
- Disk: Trend in total capacity and residual capacity of the disk.
- Load: Trend of CVM server in system load for 1 minute, 5 minutes, and 15 minutes.
- Memory: Trend in total memory, used memory, residual memory, and total kernel cache.
- Network: Trend of CVM server in the total number of network input bytes and input data packages per unit time and the total number of network output bytes and output data packages per unit time.
- Number of processes: Trend in the total number of CVM server processes and the total number of running processes.

Multi-tenant Management

Resources Management

Click the "Resources Management" option to enter the resources management page. The resources management covers the Yarn resources queue and HDFS storage resources.

Yarn

You can select different types of schedules in the "Yarn" page of the resources management page.

The schedule has two types: Capacity Scheduler and FAIR Scheduler.

Capacity Scheduler: The organizations share a Hadoop cluster, and every organization gets a part of cluster resources. Every organization is configured a special queue, and every queue is configured to use specific cluster resources. The queue is further divided by hierarchy, and different users in an organization can share the resources assigned to the queue. In the queue, the FIFO schedule policy is used to adjust the applications.

FAIR Scheduler: This kind of schedule is used to assign resources to all running applications fairly. The fair sharing of resources among queues is described as follows: Assume the Users A and B, who have their queues. User A initiates a job and is assigned all available resources if User B has no need; User B initiates a job when User A's job is running, and the jobs of Users A and B are equally assigned cluster resources over some time. If User B initiates a second job and other jobs are running, the second job and other jobs (User B's first job in this case) fairly share the resources assigned to User B's queue. At last, User A has one job and User B has two jobs, so the resources are finally assigned as follows: a quarter of cluster resources for each of User B's jobs, which means totally a half of the resources; User A's job is assigned half of the resources. It is the fair sharing of resources among users.

Capacity Scheduler

How to create Capacity Scheduler resource pool:

Click the "Create Resource Pool" button when the Capacity Scheduler policy is used.

Configure the resource pool. After clicking "Create Resource Pool", you can see the following configuration list of creating a resource pool. As the figure shows, you can create a queue which is named test, seizes 33.3% of resources, minimum 10% of queue resources and maximum 75% of resources, and allows a user to seize a maximum of 50% of queue resources.

The configuration interpretation is as follows:

Configuration Parameter	Parameters Corresponding to Hadoop YARN	Description
Name	-	Name of resource pool queue, and resource pools at the same level cannot have duplicate names (required)
Resource share	yarn.scheduler.capacity..capacity	Resource share of the resource pool queue, ranging from 0 to 100 (required)
miniUserLimit	yarn.scheduler.capacity..minimum-user-limit-percent	Minimum limit of the resource pool, which means the minimum percent of the queue to resources.
maximumCapacity	yarn.scheduler.capacity..maximum-capacity	Maximum limit of the resource pool, which means the maximum percent of the queue to resources
Single User limit ratio	yarn.scheduler.capacity..user-limit-factor	Queue resources to which the user has access, ranging from 0 to 1.0 (1.0 by default, which ensures the single user is not beyond resources assigned to the queue)
Maximum memory	yarn.scheduler.capacity..maximum-allocation-mb	Maximum memory assigned to a container by Resource Manager
Maximum cores	yarn.scheduler.capacity..maximum-allocation-vcores	Maximum virtual kernel assigned to a container by Resource Manager
Maximum applications	yarn.scheduler.capacity..maximum-applications	Maximum applications to permit the queue in running and waiting status
AM maximum resource ratio	yarn.scheduler.capacity..maximum-am-resource-percent	Maximum resource percent if application master runs
states	yarn.scheduler.capacity..state	Operating state of queue If the queue is in STOPPED state, the new application cannot be submitted to this queue and its subqueues
Submit access control	yarn.scheduler.capacity.root..acl_submit_applications	Specify which users/user groups can apply for the ACL of the specified queue. If not specified, the property's ACL is inherited from parent queue
Administer access control	yarn.scheduler.capacity.root..acl_administer_queue	Specify which users/user groups can administer the ACL of application on the queue. If not specified, the property's ACL is inherited from parent queue

You can realize the actions of "Edit", "Delete", and "Create Subpool" on the created resource pool.

Edit: To modify the configuration of the resource pool.
Delete: To delete the resource pool.
Create a sub-pool: To create a sub-pool under the current resource pool. The configuration of creating a sub-pool is the same as that of creating a resource pool.

More setups

The "More Setups" in the administration page is used for default configuration and global configuration of the resource pool, and the configurations in "More Setups" serve as default configurations of the whole resource pool.

The configurations pop up after you click the "More Setups" button.

The configuration interpretation is described as follows:

Configuration Parameter	Parameters Corresponding to Hadoop YARN	Description
Maximum applications	yarn.scheduler.capacity.maximum-applications	Maximum applications to permit the queue in running and waiting status
Maximum AM ratio	yarn.scheduler.capacity.maximum-am-resource-percent	Maximum resource percent if application master runs
Resource calculation class	yarn.scheduler.capacity.resource-calculator	This configuration realizes the calculation mode for resources. For example, org.apache.hadoop.yarn.util.resource .DefaultResourceCalculator only conducts resource calculation for the memory.
Node delay wait times	yarn.scheduler.capacity.node-locality-delay	It means the number of schedule opportunities to miss before the scheduler loosens the node restriction and tries to match other nodes on the same rack.
Placement rules	yarn.scheduler.capacity.queue-mappings	This configuration specifies the mapping relationship between the user/user group and the specific queue.
Queue mapping override	yarn.scheduler.capacity.queue-mappings-override.enable	This configuration is used to specify if the queue mapping can override the queue specified by the user.
Submit access control	yarn.scheduler.capacity.root.acl_submit_applications	Specify by default which users/user groups can apply for the ACL of the queue. If not specified, the property's ACL is inherited from parent queue
Administer access control	yarn.scheduler.capacity.root.acl_administer_queue	Specify by default which users/user groups can administer the ACL of application on the queue. If not specified, the property's ACL is inherited from parent queue

Fair Scheduler

How to create a resource pool:

Click the "Create a Resource Pool" button when the Fair Scheduler policy is used.

Configure the resource pool. After clicking the "Create a Resource Pool" button, you can see the following configuration list of the resource pool. As the following figure shows, you can create a queue, which: i) is named test; ii) has weight of 25, minimum memory of 516M, minimum kernel of 4, maximum memory of 1024 and maximum kernel of 8; iii) runs 200 applications at maximum simultaneously; iv) takes "0.8*queue resource share" as maximum limit of resources for application master; v) adopts the DRF queue schedule policy; vi) enables the preemption mode.

The configuration interpretation is described as follows:

Configuration Parameter	Parameters Corresponding to Hadoop YARN	Description
Name	-	Name of resource pool queue, and resource pools at the same level cannot have duplicate names (required)
Weight	weight	Weight of resource share used by resource pool queue
Minimum memory and virtual kernel	minResources	Minimum resources accessible to queue (memory and kernel are both set or not set)
Maximum memory and virtual kernel	maxResources	Maximum resources accessible to queue (memory and kernel are both set or not set)
Maximum running applications	maxRunningApps	Maximum applications simultaneously run by queue
Maximum share of Application Master	maxAMShare	Resource share for running application master (note: the value -1 means this property is forbidden, and AMShare does not check)
Scheduling policy	schedulingPolicy	Setup of queue schedule rules
Preemption mode	allowPreemptionFrom	Whether to enable preemption mode
Fair share preemption threshold	fairSharePreemptionThreshold	After the preemption mode is enabled, the container preemption is triggered if the resources corresponding to the threshold are not available during the fair share preemption timeout.
Fair share preemption timeout	fairSharePreemptionTimeout	Timeout of obtaining resources corresponding to fair share preemption threshold after the preemption mode is enabled. If not set, the queue's value is inherited from its parent queue.
Minimum share preemption timeout	minSharePreemptionTimeout	Timeout of obtaining the promised minimum shared resources after the preemption mode is enabled. If not set, the queue's value is inherited from its parent queue.
Submit access control	aclSubmitApps	List of users/user groups who can submit applications to the queue.
Administer access control	aclAdministerApps	List of users/user groups who can administer the queue's application. Currently, the administrative action is only the termination of applications.

You can realize the actions of "Edit", "Delete", and "Create a Subpool" on the created resource pool queue.

More setups

After clicking "More Setups", you can see the following configuration list.

The configuration interpretation is described as follows:

Configuration Parameter	Parameters Corresponding to Hadoop YARN	Description
Default schedule policy	defaultQueueSchedulingPolicy	Setup of queue's default schedule policy
Maximum running applications for resource pool	queueMaxAppsDefault	Setup of queue's maximum running applications
Default maximum share of Application Master	queueMaxAMShareDefault	Resource share for running application master
Fair share preemption threshold	defaultFairSharePreemptionThreshold	Setup of global fair share preemption threshold
Fair share preemption timeout	defaultFairSharePreemptionTimeout	Setup of global fair share preemption timeout
Minimum shared priority timeout	defaultMinSharePreemptionTimeout	Setup of global minimum share preemption timeout
Placement rules	queuePlacementPolicy	Configuration of placement rules, and placement of application to corresponding queue. Run the rules in sequence until matching the good rules. Create queue: Whether to create another rule when the current rule is not satisfactory. Rule meaning: Specified: Placement of application to the specified queue. User: Placement of application to the queue with the same name as the user. Replace the "." in the queue name with the "dot". If the user name is "first.last", the queue name is "firstdot_last". primaryGroup: Placement of application to the queue which is named after the user's primary Unix group name. Replace the "." in the queue name with the "_dot". secondaryGroupExistingQueue: Placement of application to the queue which is named after the user's secondary Unix group name. Replace the "." in the queue name with the "dot". secondaryGroupExistingQueueNestedUser: The difference from secondaryGroupExistingQueue is the queue of secondaryGroupExistingQueue rule must take root as parent queue, and the parent queue of the queue corresponding to this rule can be any queue. specifiedNestedUser: The difference from Specified is the queue of Specified rule must take root as parent queue, and the parent queue of the queue corresponding to this rule can be any queue. primaryGroupNestedUser: The difference from primaryGroup is the queue of primaryGroup rule must take root as parent queue, and the parent queue of the queue corresponding to this rule can be any queue. Default: When the preceding rules do not match, this miscellaneous rule is used to add the application to the set queue. Reject: The application is rejected.
User limit	User	Configuration of maximum running applications for specific user

Complete and save the configuration, and click "Synchronize Configuration to Cluster" for synchronization. The synchronization is successful if "√" is in the options box after "Synchronize configuration to cluster", and the configuration takes effect after the yarn service is restarted. The synchronization fails if "!" is in the options box after "Synchronize configuration to cluster".

HDFS

By selecting the "HDFS" option in the "Resources Management" page, you can see space directory and quota corresponding to the created test user, and you can also view the usage of the current user.

Namespace quota: The quota of directories and files which can exist under the user directory, and the default value is 200 Space quota: The storage space for user directory and the default value is 2000M

Click the "Modify Quota" button to modify user quota. The system automatically synchronizes configuration to the cluster every 10 minutes. The administrator can also click "Synchronize Configuration to Cluster" for synchronization. After synchronization, the configuration takes effect without the restart of the HDFS service.

Resources Monitoring

Click the "Resources Monitoring" button on the administration page. Select the "Yarn Resources Queue" option to view the resource usage of every queue.

Click the target queue in the pie chart to view the details of that queue. The figure describes how to view the details of the queue sub_test1. Click the target queue in the pie chart to view the details of that queue. The figure describes how to view the details of the queue sub_test1.

HDFS Resources Monitoring

Click the "Resources Monitoring" button in the administration page, and select the "HDFS" option to view the resources monitoring table. You can select the time granularity to display and query the specific directory table.

Submitted Tasks

If the current user is administrator, all submitted tasks under the current cluster are shown; if the current user is not an administrator, only the jobs submitted by that user are shown. The user can check the task's details, know the task's progress and executive outcome, and view the task's operating log. The last 20,000 data is saved by default, and the cleaning interval is 12 hours.

Select "Multi-tenant Management>Submitted Task" to view the task list.
Click the task ID in "Task List" to view the task's details.
Click the "Log Administration" button in "Task Details" page to administer log information.

Note: You can view the task execution log through the configuration of OpenVPN Client. For more information on configuration, please see Access to Cluster Through OpenVPN
Click "Pull Log" to preview the log, and you also have options for downloading log and wiping log cache. The last week's logs are saved by default, and the cleaning interval is 12 hours.
In the "Task Details" page, you can click task execution ID to view container details.

Alarm Strategy

By adding the alarm strategy, you can set the thresholds for current monitoring items in the Hadoop Manager system. The monitoring item beyond threshold triggers alarm information, which is shown in the Alarm Record module.

Add Policy

Select "Alarm Management>Alarm Strategy>Add Strategy" to enter the page of adding strategy.
Enter the following forms, and the symbol * indicates a required field.
Click "Save" to complete the addition.

Notes:

Select monitoring items: Single choice is supported, and service type and monitoring items can yield the second linkage effect.

Strategy group involved: Single choice is supported, and you can click "Create Strategy Group" if no strategy group is suitable.

Alarm rules: Based on the above "Select Monitoring Item", the alarm rules are automatically imported for current monitoring items.

Disable/Enable Policy

When the disabled alarm strategy exceeds the threshold, both alarm and alarm notice are no longer sent, and you cannot see the alarm record; both alarm and alarm notices are normally sent when the alarm strategy is enabled.

Select "Alarm Management>Alarm Strategy" to enter the alarm strategy list page.
Select "Action" bar in the alarm strategy list and click "Enable" or "Disable" button to enable or disable the strategy.

Add Strategy Group

You can add different strategies into a strategy group to view strategy by groups and configure alarm notices.

Notes:

The same alarm strategy can be added to different strategy groups.

For the "Strategy Number" in the strategy group list, click "View" to show all alarm strategies in the current strategy group.

Set alarm notice: Add alarm notices to all strategies in the current strategy group.

How to add a strategy group:

Select "Alarm Management>Alarm Strategy" to enter the alarm strategy list page.
Click "Add Strategy Group" to enter the addition page.
Enter the following forms, and the symbol * indicates required a field.
Click "Save" to complete the addition.

Note: Select strategy: The list shows all available monitoring items at present, and you can screen them based on monitoring name.

Alarm Records

The alarm records include new alarm records and all alarm records. The new alarm records show the policies currently in alarm status and the alarm information; all alarm records show every alarm record in the cluster history.

New Alarm Records

Select "Alarm Management>Alarm Records>New Alarm Records" to enter the alarm record page.

Note: The alarm record disappears from the list if it returns to normal.
Click "View" in the "Alarm Records" bar of the alarm record list, and you can view the details of all alarm items in current alarm records.
Notes:
- Support to screen alarm records based on "Severity".
- Support to sort based on "Alarm Occurrence Time".

Historical Alarm Records

The historical alarm records show all occurred alarm records, including records that are currently in alarm state and records which return to normal.

You can screen alarm records by selecting the time frame and service type and entering record ID and CVM server.

Select "Alarm Management>Alarm Records>All Alarm Records" to enter the page of all alarm records.

Alarm Notices

The alarm notice pushes the alarm information of monitoring items exceeding the threshold to the user, and in this way, the user can timely solve the cluster exception after knowing the reason for it.

Create Alarm Notices

Select "Alarm Management>Alarm Notice" to enter the management page of alarm notices.
Click the "Add Alarm Notice" button to enter the creation page of the alarm notice.
Enter the information on alarm notice, and the symbol * indicates a required field.
Click "Save" to complete the addition.

Notes:

Select strategy group: Multiple choices are supported. Selecting a strategy group means selecting all strategies in that group.
Select severity: Multiple choices are supported. The alarm notification is sent when at least one kind of severity is satisfactory.
Notification way: Mail and short message, and the mail is a required field.
Contact person: Multiple choices are supported. The contact person is limited to system users. The selected contact person must provide an E-mail address, or the notice cannot be sent.
When different strategies in the same notice simultaneously give alarms, only one alarm notice is sent, and it contains all alarm information.

View Alarm Records

Select "Alarm Management>Alarm Notice" to enter the management page of alarm notices.
Click the "View Records" button in the action bar of the alarm notice list, and you can see all alarm records in the current alarm notice.

Disable/Enable Alarm Notices

You are not informed of any alarm when the strategy in the disabled alarm notice gives alarms. You can restart the alarm notice to enable it again.

How to disable/enable alarm notice:

Select "Alarm Management>Alarm Notice" to enter the management page of alarm notices.
When the alarm notice is in "Enabled" state, you can click the "Disable" button in the action bar of the alarm notice list, and then the alarm notice is in "Disabled" state; on the contrary, you can click the "Enable" button.

Notice Records

The notice records show all historical alarm notices, whether they are sent successfully or not. You can screen alarm records by selecting the time frame and entering record ID, alarm notice name, and contact person.

User Administration

The users on the cluster management platform include administrator and common user. The administrator can add common users, and the common user's privileges are limited (for example, service start/stop, configuration modification, and multi-tenant management).

When clicking "User Details" at the upper right, the administrator has a user management link which the common user does not have, and the administrator can click it and

enter the user list to edit, disable, and reset the password.

Edit User

The administrator can modify user information, add administration privileges, and determine whether to log in to the cluster. If checking "Whether to Log in to Cluster", you can create an operating system account of the same name in every virtual machine of cluster, and the password of the operating system account is the same as that of the administration system. Hdfs directory represents the user's directory on the hdfs, and the directory capacity limit can be changed through resources management. If the cluster creates client nodes, it can use client instances to integrate functions, check client node instances to log in according to instance ID and private IP of the virtual machine, and grant the user with the login privilege of the client node. The login account and password are the same as the administration password.

Add Users

Adding the administration system user is similar to editing users, and the sole difference is the directory of the same name is synchronously created in hdfs (for example, when the test user is created, the /user/test directory is created in hdfs).

Note: When the hadoop cluster enables kerberos, the option of kerberos certification shows, and the user can choose to enable kerberos as needed.

Management Steps

Practical Operation

MapReduce

Hadoop-Manager

Overview

Log in to the System

Cluster Monitoring

Service Management

Service List Page

Service Detail Page

CVM Server Monitoring

Multi-tenant Management

Resources Management

HDFS

Resources Monitoring

HDFS Resources Monitoring

Submitted Tasks

Alarm Strategy

Add Policy

Disable/Enable Policy

Add Strategy Group

Alarm Records

New Alarm Records

Historical Alarm Records

Alarm Notices

Create Alarm Notices

View Alarm Records

Disable/Enable Alarm Notices

Notice Records

User Administration

Edit User

Add Users