Name		Name	Last commit message	Last commit date
parent directory ..
BUILD		BUILD
README.md		README.md
__init__.py		__init__.py
presto.sh		presto.sh
test_presto.py		test_presto.py

README.md

NOTE: The Presto initialization action has been deprecated. Please use the Presto Component

The Presto Component is the best way to use Presto with Cloud Dataproc. To learn more about Dataproc Components see here.

Presto

This initialization action installs the latest version of Presto on a Google Cloud Dataproc cluster. Additionally, this script will configure Presto to work with Hive on the cluster. The master Cloud Dataproc node will be the coordinator and all Cloud Dataproc workers will be Presto workers.

Using this initialization action

⚠️ NOTICE: See best practices of using initialization actions in production.

You can use this initialization action to create a new Dataproc cluster with Presto installed:

Using the gcloud command to create a new cluster with this initialization action.

REGION=<region>
CLUSTER_NAME=<cluster_name>
gcloud dataproc clusters create ${CLUSTER_NAME} \
    --region ${REGION} \
    --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/presto/presto.sh

Once the cluster has been created, Presto is configured to run on port 8080 (though you can change this in the script) on the master node in a Cloud Dataproc cluster. To connect to the Presto web interface, you will need to create an SSH tunnel and use a SOCKS 5 Proxy as described in the dataproc web interfaces documentation. You can also use the Presto command line interface using the presto command on the master node.

You can find more information about using initialization actions with Dataproc in the Dataproc documentation.

Important notes

This script must be updated based on which Presto version you wish to install
You may need to adjust the memory settings in jvm.config based on your needs
Presto is set to use HTTP port 8080 by default, but can be changed using --metadata presto-port=8060
Only the Hive connector is configured by default
High-Availability configuration is discouraged as coordinator is started only on m-0 and other master nodes are idle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

presto

presto

README.md

NOTE: The Presto initialization action has been deprecated. Please use the Presto Component

Presto

Using this initialization action

Important notes

Files

presto

Directory actions

More options

Directory actions

More options

Latest commit

History

presto

Folders and files

parent directory

README.md

NOTE: The Presto initialization action has been deprecated. Please use the Presto Component

Presto

Using this initialization action

Important notes