Skip to content

LuisGustavoCorrea/End-To-End-Free-Tools-Data-Enginner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

End-To-End-Free-Tools-Data-Enginner

image

Introduction

The goal of this project is to perform Data Analytics ensuring Data Quality throughout the entire Extraction, Transformation, and Load (ETL) process using various tools and technologies. Additionally, the goal of this project is to assist data engineers in their daily tasks by monitoring data flow and quality. Help us with that questions:

  • Is any data missing from the source?
  • Are the source data within the previously reported standards?
  • Was there any error during the reading, transformation, and final loading process into the report?
  • Was there a change in the script or business rule during data transformation?
  • Was there a change in the business rule?

Technology Used

  • Programming Language - Python and SQL
  1. Database - PostgreSQL
  2. Ingestion - Airbyte
  3. Data Quality - Great Expectations
  4. Orchestration - Airflow
  5. Data Visualization - MetaBase
  6. Environment - Docker

Data Model

  • Transactional Database (Source) image
  • Data Warehouse ( Final Destination ) image

Ingestion

  • Airbyte: To simplify data ingestion, ensuring smooth integration between the source and the staging area, monitoring schema/column changes, and ensuring the first full load and the subsequent incremental ones using Airbyte's own resources, it's fantastic. image

Data Quality

  • We set clear expectations for our data and ensure its quality every step of the way. Here, every time the data is moved or transformed, a check is carried out to see if it is as expected according to the business rules. Data Quality is directly linked to how the business works too.

Examples

  • Checking null values image
  • Checking if the values in the Quantidade column follow the business rules and expected data image
  • Checking if the values in the Nome_tipo column follow the business rules and expected data image

Orchestration

  • Efficiently and automatically orchestration all steps of the data workflow. From ingestion to transformation and delivery, Airflow robustly provided end-to-end data process management and monitoring. fluxo airflow Executado

Data Visualization

  • Create visualizations and actionable insights from the processed data.

image image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published