Skip to content

nekoemperor/kafka-spark-databricks-stream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

demo - Confluent, Kafka, Spark, Databricks

DISCLAIMER!!!: This repo is created based on GitHub Sean Coyne and his YouTube for exercise purpose.

  1. Get Twitter API Credentials
  2. Create env file named .env with the below (replacing XXX with your actual keys). This will be ignored by .gitignore.
    CONSUMER_KEY = "XXX"
    CONSUMER_SECRET = "XXX"
    ACCESS_TOKEN_KEY = "XXX"
    ACCESS_TOKEN_SECRET = "XXX"
    
  3. Create a free Confluent Cloud account
  4. Create a Kafka cluster in Confluent Cloud
  5. Create a Kafka topic named streaming_test_6 with 6 partitions.
  6. Get api credentials and
  7. Setup a Databricks community cloud account
  8. Create a Databricks cluster
  9. Import the kafka_test notebook
  10. In the first cell of the notebook, replace the XXX with your values for confluentApiKey, confluentSecret and host which you will find in the Confluent UI in step 6
  11. Create a kafka config file by running vi ~/.confluent/python.config. In the file replace HOST, API_KEY, API_SECRET with the values from Confluent Cloud
#kafka
bootstrap.servers={HOST}:9092
security.protocol=SASL_SSL
sasl.mechanisms=PLAIN
sasl.username={API_KEY}
sasl.password={API_SECRET}   
  1. Build and run the Docker Container bash run.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published