百度智能云

All Product Document

          MapReduce

          Impala

          Impala Introduction

          The Impala is a query system of MPP architecture developed under the leadership of Cloudera. It offers the SQL semantics to fast query the data stored in HDFS and HBase. In addition, the Impala uses the same metadata, SQL syntax and ODBC drive as Hive.

          Create Clusters

          Log in to Baidu AI Cloud console, select "Product Service -> Baidu MapReduce (BMR)", and click "Create Cluster" to enter the page of cluster creation. BMR2.0.0 and the versions above have supported the Impala component integration. You can check the Impala component when purchasing the clusters, as shown in the figure below:

          image.png

          Usage Introduction

          1. Log in to the created cluster remotely.

            ssh root@$public_ip 
            Use the password you enter when creating the cluster. 
          2. To prepare data, you can refer to Data Preparation. Upload the log file to HDFS.

            hadoop dfs -get bos://datamart-gz/web-log-10k/accesslog-10k.log ./ 
            hadoop dfs -put accesslog-10k.log /tmp/test 
          3. Execute command to build table in the impala-shell.

            • Enter the impala-shell in shell.

              Description: The impala-shell connects to the 21000 port of impalad on localhost by default. The BMR cluster only installs the impalad service on the core and task nodes by default.

              If you execute the impala-shell on the master node, you should use the -i <host:port> parameter to specify the host with impalad installed. You can view more available parameters by impala-shell-h.

            • Execute the following table creation statements.

               CREATE EXTERNAL TABLE `access_logs`( 
               `remote_addr` string COMMENT 'client IP', 
               `time_local` string COMMENT 'access time', 
               `request` string COMMENT 'request URL', 
               `status` string COMMENT 'HTTP status', 
               `body_bytes_sent` string COMMENT 'size of response body', 
               `http_referer` string COMMENT 'referer', 
               `http_cookie` string COMMENT 'cookies', 
               `remote_user` string COMMENT 'client name', 
               `http_user_agent` string COMMENT 'client browser info', 
               `request_time` string COMMENT 'consumed time of handling request', 
               `host` string COMMENT 'server host', 
               `msec` string COMMENT 'consumed time of writing logs') 
               COMMENT 'web access logs' 
               ROW FORMAT DELIMITED 
               FIELDS TERMINATED BY '\t' 
               LOCATION '/tmp' 
          4. After creating a table successfully, you can use the SQL statement to query the results. If you use the provided example data and table building statements, you can find the following results.

            image.png

          Reference

          1. Apache Impala Guide
          2. Impala Home
          Previous
          Druid
          Next
          Operation Guide