Skip to content

This repository describes the architecture we've used to implement a stocks analysis data pipeline on the Google Cloud Platform

Notifications You must be signed in to change notification settings

Ali-Doggaz/GCP_Stocks_Data_Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Architecture Analysis

Batch

Data Source

  • Finnhub API: Data is ingested from the Finnhub API, and streamed in real-time using Websocket connections, ensuring minimal latency.
  • Batch Stocks Data: Over a few weeks, we've collected our own stocks data from various private sources and pre-processed them during the collection phase (ETL). We then stored all of them in a cloud storage bucket. The "batch" part of this data pipeline aims to collect this data, post-process it, and then analyze it along with the real-time data.

Processing

Real-Time Stream Processing

  • Cloud Functions: Triggered by incoming data, performs preliminary data parsing and validation.

  • Pub/Sub: Acts as a message broker, decoupling data collection from processing. Supports scalable message queuing.

  • Dataflow (Stream): Processes and transforms data streams, grouping by fixed window sizes of 5 seconds for efficient aggregation.

    image

  • Error Handling: Errors are extracted, flattened, and stored in a BigQuery 'deadletter' table for later analysis.

Batch Data Processing

  • Cloud Storage: Stores batch data in JSON format, each file containing transactional data.
  • Dataflow (Batch): Triggered manually to process stored batch data, performing cleansing and transformation. image

Storage

  • BigQuery: Central data warehousing solution where both streaming and batch processed data are consolidated into a single table for analysis.

Data Analysis and Presentation

  • Looker Studio: Connects to BigQuery to visualize and report on the data. Reports are dynamically updated to reflect new data entries.

About

This repository describes the architecture we've used to implement a stocks analysis data pipeline on the Google Cloud Platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published