A data pipeline with GCP Storage, Python, Compute Instance, Mage Data Pipeline Tool, BigQuery, and Looker Studio.
The goal of this project is to analyze Uber data using tools like GCP Storage, Python, Compute Instance, Mage Data Pipeline Tool, BigQuery, and Looker Studio. We aim to explore TLC Trip Record Data for yellow and green taxi trips, focusing on details like dates/times, locations, distances, fares, rate types, payment types, and passenger counts. The objective is to gain insights and perform data analytics for informed decision-making.
TLC Trip Record Data Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
Here is the dataset used - Link
More info about dataset can be found here:
- Website - https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
- Data Dictionary - https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf
-
Cloud - Google Cloud Platform
-
Transformation - Mage
-
Data Lake - Google Cloud Storage
-
Data Warehouse - BigQuery
-
Data Visualization - Looker Studio
-
Language - Python
If you already have a Google Cloud account, you can skip the pre-requisite steps.
- Google Cloud Platform.