Skip to content

πŸ“‘ Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.

Notifications You must be signed in to change notification settings

SonicEXEDVP/real-time-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 

Repository files navigation

πŸš€ Real-Time Data Pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset

Real-Time Data Pipeline

Welcome to the Real-Time Data Pipeline repository! This project is a comprehensive data system setup consisting of Apache Kafka, Apache Flink, Apache Iceberg, Trino, MinIO, and Apache Superset. It serves as an ideal environment for learning about data systems, big data processing, ETL strategies, and real-time analytics.

Features

πŸ”Ή Real-Time Processing: Utilizes Apache Kafka and Apache Flink for streaming data processing, enabling real-time data analysis and insights.

πŸ”Ή Data Lakehouse: Demonstrates the use of Apache Iceberg for managing tables in a data lake, providing versioning and efficient data management capabilities.

πŸ”Ή Advanced Analytics: Connects to Trino for SQL-based analytics on the processed data, making it easy to run queries and generate reports.

πŸ”Ή Data Visualization: Integrates with Apache Superset for data visualization and dashboard creation, allowing for easy exploration and presentation of insights.

πŸ”Ή Open Source Technologies: Entire setup is based on open-source technologies, making it accessible for learning and development purposes.

Getting Started

To access the Real-Time Data Pipeline setup, you can download the project files from here. Once you have downloaded the zip file, extract it and follow the instructions in the README files provided within each component folder.

If the link does not work or the file needs to be launched, please check the "Releases" section of this repository for alternative download options.

Repository Topics

['apache-flink', 'apache-iceberg', 'apache-kafka', 'apache-superset', 'big-data', 'data-engineering', 'data-pipeline', 'data-visualization', 'docker', 'etl', 'lakehouse', 'minio', 'open-source', 'real-time-data', 'sql-analytics', 'streaming-analytics', 'trino']

Explore Further

If you are interested in diving deeper into the world of real-time data processing, streaming analytics, and data systems architecture, feel free to explore the different components of this repository. Each technology plays a crucial role in building a robust and efficient data pipeline for modern data-driven applications.

Don't hesitate to reach out if you have any questions or need assistance with setting up the Real-Time Data Pipeline. Happy data engineering!

Download Here