Skip to content

vincent19974/automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Automating Hive Query

1. Login in EMR (with hive hadoop)
   > open terminal and make ssh access to your EMR in which cluster is made.
   e.g: ssh -i <location of emrkey in local machine> [email protected]

2. Make a two shell script to execute all the installation environment needed to run Hive:
   > First script is to create mini.sh to execute installation for minicoda: below is the content for mini.sh,(chmod +x mini.sh, to make the script executable).
   wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
   bash ~/miniconda.sh -b -p $HOME/miniconda
   conda init bash
   eval "$(/home/hadoop/miniconda/bin/conda shell.bash hook)"
   
   > Second script is for automating installation for hive environment. In this case we create rec.sh (make sure to change mode to executable (chmod +x rec.sh)) . Rec.sh content:
   conda create --name py38 python=3.8
   conda init bash
   conda activate py38
   pip3 install pandas
   pip3 install pyhive
   pip3 install wheel
   pip3 install thrift
   sudo yum install python3-devel
   sudo yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64
   pip3 install thrift-sasl
3. now you can both scripts assuming it is in executable mode(chmod +x <shell_name>) via
   > . ./mini.sh –  to run installation for minicoda
   > . ./rec.sh – to run installation for hive environment
   
4. copy the commmand to aws location:
   aws s3 cp /bucketpath/mini.sh
   aws s3 cp /bucketpath/rec.sh
   5. Now create python script to query hive database, below is the python script for data base query: (in this case we create test.py)
   
   from pyhive import hive
   import pandas
   
   conn = hive.Connection(host='localhost', port=10000,  database='default')
   cursor = conn.cursor()
   sql = 'show databases'
   cursor.execute(sql)
   print (cursor.fetchall())
   df = pandas.read_sql(sql, conn)
   print (df)
   

   6. Try to run test.py using command:
   > python3 test.py

END

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published