Skip to content

Latest commit

 

History

History
63 lines (56 loc) · 2.73 KB

AnalyzeBigDataWithHadoop.md

File metadata and controls

63 lines (56 loc) · 2.73 KB

Create an Amazon S3 bucket

  • In the AWS Management Console, on the Service menu, click S3. avatar
  • Click Create bucket avatar
  • For Bucket name, enter hadoop- followed by a random number.
  • Click Create

Launch an Amazon EMR cluster

  • On the Service menu, clickemr
  • Click Create cluster.
  • In the General Conifiguration section, configure the following:
  • Cluster name: My cluster
  • Click the hadoop-bucket that I created.
  • Click Select.
  • In the Hardware configuration section,configure:
  • Instance type: m4.large
  • Number of instances: 2 avatar
  • In the Security and access section, configure:
  • EC2 key pair: Proceed without an EC2 key pair
  • Permissions: Custom
  • EMR role: EMR_DefaultRole
  • EC2 instance profile: EMR_EC2_DefaultRole avatar
  • Click Create cluster to lauch your EMR cluster. avatar

Process Your Sample Dat by Running a Hive Script

  • Wait until cluster is showing a status of Waiting
  • Click the Step tab.
  • Click Add step. avatar
  • In the Add step dialog, configure the following settings:
  • Step type: Hive program
  • Name: Process logs
  • Script S3 location: s3://us-west-2.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q
  • Input S3 location: s3://us-west-2.elasticmapreduce.samples
  • Output S3 location: s3://hadoop-1995/
  • Arguments: -hiveconf hive.support.sql11.reserved.keywords=false
  • avatar
  • click Add
  • Wait for the status of the step to change to Completed. avatar

View the Results

  • On the Services menu, click S3.
  • Click on the name of the hadoop-bucket.
  • Click the os_requests folder.
  • Select the 000000_0 file.
  • Click Download in the pop-up box and save the file to your computer.
  • Open the file using a text editor such as WordPad. avatar

Terminate your Amazon EMR Cluster

  • On the Services menu, click EMR.
  • Select My cluster.
  • Click Terminate.
  • In the Terminate cluster dialog, click Terminate. avatar