百度智能云

All Product Document

          MapReduce

          Presto

          Presto Introduction

          Presto is a distributed SQL query engine used to query big datasets distributed in one or more data sources. Presto queries massive data quickly and efficiently in a distributed way and provides a Web UI page to allow the user to view query details and service running status.

          Create Clusters

          Log in to Baidu AI Cloud console, select "Product Service->Baidu MapReduce BMR", and click "Create Cluster" to enter the cluster creation page. BMR of version 2.0.0 and above supports the integration of Presto components. You can select Presto components when purchasing the cluster, as the following figure shows:

          image.png

          Usage Introduction

          Presto supports local and remote operations.

          Local Connection to Presto

          Log in to cluster machine

          Enter the command and enter the presto terminal interaction page: presto --catalog XXX --schema XXX, and the example is as follows: (Meaning of parameters: --catalog means the used configuration of data sources; --schema means the schema that the user selects to query)

          presto --catalog hive --schema test
          presto:test>

          Execute the query command, and the example is as follows:

          presto:test> select * from t2;
             id
          ---------
           1
          (1 row)
           
          Query 20190517_075918_00019_hs9fa, FINISHED, 1 node
          Splits: 17 total, 17 done (100.00%)
          0:01 [1 rows, 208B] [1 rows/s, 324B/s]

          Remote Connection to Presto

          Make a local copy of presto-cli-0.219-executable.jar under the cluster machine /opt/bmr/presto/bin directory.

          Configure OpenVPN

          Execute connection command: ./presto-cli-0.219-executable.jar --server internal_ip:8089 --catalog XXX --schema XXX Connect to presto service

          Presto UI Interface

          How to access:

          1. Configure OpenVPN
          2. Execute hostname command on Master node to obtain CVM server name
          3. Access http://hostname:port (master_hostname:8089) to view presto monitoring page, where you can monitor services and query details:

          2.png

          1. You can use the state option of the page to screen the query tasks:

          3.png

          1. Click the query number to view the query details further; in the query details page, you can view the logs of failed tasks:

          .

          .

          .

          Notes

          Hive3 fully controls the managed tables, so Persto does not support query and modification of the managed table of Hive; if you want to use Presto to query Hive table, please use Hive to create External tables. https://github.com/prestodb/presto/issues/12484 Examples are as follows:

          Use hive to create external tables and use presto to query, and the query is successful:

          # connect hive and create external tables
          hive> create external table test1 (id bigint) stored as textfile;
          hive> insert into test1 values(123);
          
          # connect Presto, and query
          presto:test> select * from test1;
           id
          -----
           123
          (1 row)

          Use hive to create managed tables and use presto to query, and the query is failed:

          # connect hive and create the managed table
          hive> create table test2(id bigint) stored as textfile;
          
          # connect Presto, and query
          presto:test> select * from test2;
          Query XXX failed: XXX
          Previous
          Ranger
          Next
          Zeppelin