Skip to content

A project consisiting on ingesting data from different APIs, storing it in HDFS, and analyse it with Pyspark

Notifications You must be signed in to change notification settings

callysthenes/music_industry_intelligence

Repository files navigation

Copy of MDA II Final Presentation (2)

 

🌟 About the Project

PlanIt* is a prototype project engineered to ingest data from Spotify and Twitter, analyze it, and make forecasts of music trends (artists and songs) that are valuable to festival planners and music touring industry based on the current demand.

The tool analyzes the music taste of an audience and identify artists and songs that could tailor best a music gig.

As a result, PlanIt* creates recommendations for opening acts based on featured artist selection and playlists

👾 Tech Stack

Here's a brief high-level overview of the tech stack the project uses:

  • For the ingestion platform, we leverage NIFI, for both streaming and batch ingestion from our data sources Spotify and Twitter.
  • The project uses the HDFS as our primary storage system
  • Processing layer leverage mainly Apache Spark. We leverage also ML and Graphframes for advance analytics
  • Our serving layer is made of MariaDB for data base storage
  • Finally, analytics done with Power BI. POC Dashboard can be visualized here PlanIt Dashboard

MDA II Final Presentation

🎯 Features

  • By ingesting data from the Spotify API we combine and rank a selection of artists by analyzing top charts, frequency of appearance, followers and popularity and generate reccommendations relevant to tour managers, festival organizers to support their decision making.
  • We leverage Spotify API and Twitter to better understand popularity trends and social media positioning of music artists.
  • We generate recommendations for young bands to open for larger, more established performers as supporting acts based on song's audio features similarities and social media trends.
  • Leverage fan's music taste in order to identiy the best song line-up for concerts and music events.

📑 Documentation

You can find our documentation here with the following tutorials

  • Update Nifi: Updates done in the course environment that allowed us to stream ingest from Twitter leveraging NIFI processor.
  • Spotify Config Ingestion: Step by step guidance of Spotify ingestion leveraging Postman and NIFI.
  • Tweepy tutorial: Step by step guidance on how to ingest from twitter leveraging Tweepy and NIFI.
  • Two objects working with VM environment: Sharing a workaround to be able to work with 2 os.environment on the same Jupyter notebook which allowed us to work both with Graphframes and Mariadb.
  • Connect Powerbi to VM environment: Step by step guidance on how to connect the course environment to PowerBI and leverage the full potential of analytics

✨ Contributors

Thanks goes to these wonderful people

Dominik Roser
Dominik Roser

💻
Carlos Blazquez
Carlos Blazquez

💻
Diana Fernandez
Diana Fernandez

💻
Christian Barba
Christian Barba

💻
Hiba Shanaa
Hiba Shanaa

💻
Mark Hourany
Mark Hourany

💻
Pedro V. Esteban
Pedro V. Esteban

💻

About

A project consisiting on ingesting data from different APIs, storing it in HDFS, and analyse it with Pyspark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published