Presto Usage Guide
Overview
Presto, a data query engine developed by Facebook, enables fast interactive analysis of massive datasets and supports multiple data sources like Hive and relational databases. Thanks to BOS’s advantages such as ultra-low cost, high performance, reliability, and high throughput, more enterprises are turning to BOS as their big data storage solution. This document provides a brief overview of using Presto with BOS.
Prerequisites
Refer to the Hive Usage Guide to install and configure Hive
Installation and configuration
The installed version is 349, and you can refer to the process in the document Presto Deployment. Among them, if the local machine also serves as a worker, set in config.properties:
1node-scheduler.include-coordinator=true #Need to be changed to true
In etc/catalog/hive.properties, configure as follows:
1connector.name=hive-hadoop2
2 hive.config.resources=/ssd2/hadoop-3.3.2/etc/hadoop/core-site.xml,/ssd2/hadoop-3.3.2/etc/hadoop/hdfs-site.xml #The address here must be correct
3hive.metastore.uri=thrift://127.0.0.1:9083
4hive.allow-drop-table=false
5hive.storage-format=ORC
6hive.metastore-cache-ttl=1s
7hive.metastore-refresh-interval=1s
8hive.metastore-timeout=35m
9hive.max-partitions-per-writers=1000
10hive.cache.enabled=true
11hive.cache.location=/opt/hive-cache
Copy the bos filesystem jar package to plugin/hive-hadoop2/, then run:
1./bin/launcher start
Start the presto-server,
1./presto-cli --server localhost:8881 --catalog hive --schema default
Run
1presto:default>use hive;
2 USE
3 presto:hive>select * from hive_test limit 10;
4 a | b
5-------+----------
6 11027 | "11345"
7 10227 | "24281"
8 32535 | "16409"
9 24286 | "24435"
10 2498 | "10969"
11 16662 | "16163"
12 5345 | "26005"
13 21407 | "5365"
14 30608 | "4588"
15 19686 | "11831"
16 (10 rows)
17Query 20230601_084130_00004_dzvjb, FINISHED, 1 node
18Splits: 18 total, 18 done (100.00%)
19[Latency: client-side: 0:02, server-side: 0:02] [59.4K rows, 831KB] [28.9K rows/s, 404KB/s]
In the example above, we query data stored in BOS through Presto.
Presto access based on S3
Presto accesses data stored in BOS only via Hive, but Hive can retrieve BOS data in two ways: the first uses the previously mentioned method based on bos-hdfs, while the second accesses BOS directly through the S3 protocol.
Hive configuration
Install metastore:
1wget "https://repo1.maven.org/maven2/org/apache/hive/hive-standalone-metastore/3.1.2/hive-standalone-metastore-3.1.2-bin.tar.gz"
2tar -zxvf hive-standalone-metastore-3.1.2-bin.tar.gz
3sudo mv apache-hive-metastore-3.1.2-bin /usr/local/metastore
4sudo chown user:user /usr/local/metastore
Download and use hive-standalone-metastore, and add the necessary jar packages:
1rm /usr/local/metastore/lib/guava-19.0.jar
2cp /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar \
3 /usr/local/metastore/lib/
4cp /usr/local/hadoop/share/hadoop/tools/lib/hadoop-aws-3.2.1.jar \
5 /usr/local/metastore/lib/
6cp /usr/local/hadoop/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.375.jar \
7 /usr/local/metastore/lib/
Then configure /usr/local/metastore/conf/metastore-site.xml.
1<property>
2 <name>javax.jdo.option.ConnectionURL</name>
3 <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
4</property>
5<property>
6 <name>javax.jdo.option.ConnectionDriverName</name>
7 <value>com.mysql.jdbc.Driver</value>
8</property>
9<property>
10 <name>javax.jdo.option.ConnectionUserName</name>
11 <value>hive</value>
12</property>
13<property>
14 <name>javax.jdo.option.ConnectionPassword</name>
15 <value>hive</value>
16</property>
17<property>
18 <name>hive.metastore.event.db.notification.api.auth</name>
19 <value>false</value>
20</property>
21<property>
22 <name>fs.s3a.access.key</name>
23 <value>S3_ACCESS_KEY</value>
24</property>
25<property>
26 <name>fs.s3a.secret.key</name>
27 <value>S3_SECRET_KEY</value>
28</property>
29<property>
30 <name>fs.s3a.connection.ssl.enabled</name>
31 <value>false</value>
32</property>
33<property>
34 <name>fs.s3a.path.style.access</name>
35 <value>true</value>
36</property>
37<property>
38 <name>fs.s3a.endpoint</name>
39 <value>S3_ENDPOINT</value>
40</property>
fs.s3a.endpoint It refers to the S3 endpoint. The S3 endpoint of BOS can be found in BOS S3 Domain Name.
Start hive metastore:
1/usr/local/metastore/bin/start-metastore &
Presto configuration
The version of Presto remains unchanged here, and the content of hive.properties is changed to:
1connector.name=hive-hadoop2
2hive.metastore.uri=thrift://localhost:9083
3hive.s3.path-style-access=true
4hive.s3.endpoint=S3_ENDPOINT
5hive.s3.aws-access-key=S3_ACCESS_KEY
6hive.s3.aws-secret-key=S3_SECRET_KEY
7hive.s3.ssl.enabled=false
Start Presto
1/usr/local/trino/bin/launcher start
Create a Schema
1CREATE SCHEMA IF NOT EXISTS hive.iris
2 WITH (location = 's3a://my-bos-bucket/'); //Note that the location here starts with s3a. If the bucket is my-bos-bucket, the location should start with s3a://my-bos-bucket/
3 #Create a table
4CREATE TABLE IF NOT EXISTS hive.iris.iris_parquet (
5 sepal_length DOUBLE,
6 sepal_width DOUBLE,
7 petal_length DOUBLE,
8 petal_width DOUBLE,
9 class VARCHAR
10)
11WITH (
12 external_location = 's3a://my-bos-bucket/iris_parquet',
13 format = 'PARQUET'
14);
15SELECT
16 sepal_length,
17 class
18FROM hive.iris.iris_parquet
19LIMIT 10;
