- Install Python (versions > 3.7).
- Clone Superset repo.
git clone https://github.com/xxiao23/superset
cd superset
- Install Python Virtual Environment
pip install virtualenv
- Install and initialize backend
# Create a virtual environemnt and activate it (recommended)
python3 -m venv venv # setup a python3 virtualenv
source venv/bin/activate
# Install external dependencies
pip install -r requirements/local.txt
# Install Superset in editable (development) mode
pip install -e .
# Create an admin user in your metadata database
superset fab create-admin
# Initialize the database
superset db upgrade
# Create default roles and permissions
superset init
# Load some data to play with
superset load_examples
# Start the Flask dev web server from inside your virtualenv.
# Note that your page may not have css at this point.
# See instructions below how to build the front-end assets. You need to specify the host option --host=0.0.0.0 to make it visible for other network.
FLASK_ENV=development superset run -p 8088 --with-threads --reload --debugger
- Prerequisite (nvm and node).
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.37.0/install.sh | bash
cd superset-frontend
nvm install
nvm use
- Install dependencies.
# From the root of the repository
cd superset-frontend
# Install dependencies from `package-lock.json`
npm ci
- Build and run dev server.
# build assets in development mode
npm run build-dev
# Start the dev server at http://localhost:9000
npm run dev-server
- Convert the PO file into a JSON file.
npm install -g po2json
./scripts/po2json.sh
- Compile translation catalogs into binary MO files.
pybabel compile -d superset/translations
- Install the ODBC driver (Linux, macOS)
- pip install pyodbc
- mssql+pyodbc://UserName:Password@HostIP,Port/DBName?driver=ODBC Driver 17 for SQL Server
- Install Java SDK 11 and set JAVA_HOME.
export JAVA_HOME=$(/usr/libexec/java_home)
-
Download Presto ver343 and follow deployment instructions. Modify
node.data-dir
value inetc/node.properties
to be a path that you have access. -
Download Presto CLI jar and follow CLI instructions to access Presto via its commandline interface.
-
Download Hadoop 3.3.0.
-
Start Hadoop in Pseudo Distributed Mode.
-
Configuration Use the following: etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
etc/hadoop/hdfs-site.xml:<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
-
Setup passphraseless ssh Now check that you can ssh to the localhost without a passphrase:
$ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys
-
Execution The following instructions are to run a MapReduce job locally.
5.1. Format the filesystem:
bin/hdfs namenode -format
5.2. Start NameNode daemon and DataNode daemon:
sbin/start-dfs.sh
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).5.3. Browse the web interface for the NameNode; by default it is available at: NameNode - http://localhost:9870/
5.4. Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/<username>
5.5. Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -mkdir /input $ bin/hdfs dfs -put etc/hadoop/*.xml /input
5.6. Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar grep /input output 'dfs[a-z.]+'
5.7. Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs -get output output $ cat output/*
or$ bin/hdfs dfs -cat output/*
5.8. When you’re done, stop the daemons with:
$ sbin/stop-dfs.sh
No datanode
Find the datanode log (you can access the log from localhost:5007) and check if there is any error in the log.
If you see an error about a temp folder, you can remove the temp folder and restart Hadoop.
$ $HADOOP_HOME/bin/stop-dfs.sh
$ $HADOOP_HOME/bin/start-dfs.sh
-
Download Hive 2.3.7.
-
Create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before you can create a table in Hive.
$HADOOP_HOME/bin/hdfs dfs -mkdir /tmp $HADOOP_HOME/bin/hdfs dfs -mkdir -p /user/hive/warehouse $HADOOP_HOME/bin/hdfs dfs -chmod g+w /tmp $HADOOP_HOME/bin/hdfs dfs -chmod g+w /user/hive/warehouse
-
Copy hive-default-xml to hive-site.xml
$ cd $HIVE_HOME/conf $ cp hive-default.xml.template hive-site.xml
-
Edit following lines in hive-site.xml
<property> <name>hive.exec.local.scratchdir</name> <value>/tmp/hive</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/tmp/hive</value> <description>Temporary local directory for added resources in the remote file system.</description> </property> <property> <name>hive.querylog.location</name> <value>/tmp/hive</value> <description>Location of Hive run time structured log file</description> </property>
-
Init Metastore Schema
$ $HIVE_HOME/bin/schematool -dbType derby -initSchema
-
Run Hive
$ cd $HIVE_HOME $ bin/hive
-
Run Hive Metastore Service
$ cd $HIVE_HOME $ bin/hive --service metastore
-
Configuration
Create $PRESTO_HOME/etc/catalog/hive.properties with the following contents to mount the hive-hadoop2 connector as the hive catalog, with the correct host and port for your Hive metastore Thrift service:
connector.name=hive-hadoop2 hive.metastore.uri=thrift://localhost:9083
-
HDFS Username and Permissions
Override this username by setting the HADOOP_USER_NAME system property in the Presto JVM Config, replacing hdfs_user with the appropriate username:
-DHADOOP_USER_NAME=<hdfs_user>
-
Create a table in Hive.
CREATE TABLE pokes (foo INT, bar STRING);
-
Start Presto.
-
Connect to Presto/Hive.
./presto --server localhost:8080 --catalog hive --schema default presto>show tables;
You should be able to see
pokes
table that you created in Hive.
- Add AWS S3 credentials in $PRESTO_HOME/etc/catalog/hive.properties.
hive.s3.aws-access-key=<aws_access_key> hive.s3.aws-secret-key=<aws_secret_key>
-
Add following lines in $HADOOP_HOME/etc/hadoop/core-site.xml.
<property> <name>fs.oss.impl</name> <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value> </property> <property> <name>fs.oss.endpoint</name> <value>oss-us-west-1.aliyuncs.com</value> </property> <property> <name>fs.oss.accessKeyId</name> <value>aliyun-access-key-id</value> </property> <property> <name>fs.oss.accessKeySecret</name> <value>aliyun-access-key-secrect</value> </property>
- Download Presto JDBC jar and put it in CLASSPATH. Make Sure JDBC jar version matches your presto server version.
export CLASSPATH=<path_to_presto_jdbc_jar>/presto-jdbc-344.jar
-
Install JayDeBeApi for python to use JDBC
pip install jaydebeapi
. -
Start a python shell and try the following code. You should be able to see the system tables in Presto.
>>> import jaydebeapi
>>> conn1 = jaydebeapi.connect("com.facebook.presto.jdbc.PrestoDriver",
... "jdbc:presto://localhost:8080/system/information_schema", ["root", ""])
>>> curs = conn1.cursor()
>>> curs.execute("select * from tables")
>>> curs.fetchall()
[('system', 'runtime', 'queries', 'BASE TABLE'), ('system', 'runtime', 'transactions', 'BASE TABLE'), ('system', 'information_schema', 'enabled_roles', 'BASE TABLE'), ('system', 'jdbc', 'types', 'BASE TABLE'), ('system', 'jdbc', 'udts', 'BASE TABLE'), ('system', 'metadata', 'column_properties', 'BASE TABLE'), ('system', 'jdbc', 'super_types', 'BASE TABLE'), ('system', 'information_schema', 'views', 'BASE TABLE'), ('system', 'information_schema', 'applicable_roles', 'BASE TABLE'), ('system', 'jdbc', 'procedure_columns', 'BASE TABLE'), ('system', 'information_schema', 'schemata', 'BASE TABLE'), ('system', 'jdbc', 'procedures', 'BASE TABLE'), ('system', 'information_schema', 'columns', 'BASE TABLE'), ('system', 'information_schema', 'table_privileges', 'BASE TABLE'), ('system', 'information_schema', 'roles', 'BASE TABLE'), ('system', 'jdbc', 'pseudo_columns', 'BASE TABLE'), ('system', 'jdbc', 'tables', 'BASE TABLE'), ('system', 'runtime', 'tasks', 'BASE TABLE'), ('system', 'metadata', 'analyze_properties', 'BASE TABLE'), ('system', 'metadata', 'catalogs', 'BASE TABLE'), ('system', 'jdbc', 'attributes', 'BASE TABLE'), ('system', 'jdbc', 'super_tables', 'BASE TABLE'), ('system', 'runtime', 'nodes', 'BASE TABLE'), ('system', 'information_schema', 'tables', 'BASE TABLE'), ('system', 'metadata', 'table_properties', 'BASE TABLE'), ('system', 'jdbc', 'schemas', 'BASE TABLE'), ('system', 'jdbc', 'catalogs', 'BASE TABLE'), ('system', 'jdbc', 'columns', 'BASE TABLE'), ('system', 'jdbc', 'table_types', 'BASE TABLE'), ('system', 'metadata', 'schema_properties', 'BASE TABLE')]