I'm a data analyst/engineer with a strong foundation in statistics, currently transitioning into analytics engineering.
I enjoy designing tools that make working with data more intuitive, efficient, and reproducible. Whether it's optimizing pipelines, automating tedious processes, or building scalable validation frameworks, I aim to bridge the gap between data engineering and analytics.
- Big Data: PySpark, Polars, DuckDB
- Python: Pydantic, Dagster, DLT, Narwhals
- Automation & DevOps: Pre-commit, Ruff, MkDocs-Material
- Data Validation: Custom-built PySpark validation tools inspired by tidylog & Pydantic
- Web Apps: Shiny for interactive analytics dashboards
A declarative schema validation and transformation framework for PySpark, Polars, and DuckDB, designed with Narwhals to bring structure and clarity to data workflows.
An on-going effort to track and analyze my bowling scores.
A deep dive into Spotify’s API to analyze music trends, listening habits, and playlist dynamics.
I have a strong background in statistics and have taught courses at UIUC, covering topics in Python, data science, and big data. Some highlights:
- CS 105 (Intro to Python) – Co-led weekly lectures on Python applications in statistics and CS.
- STAT 430 (Data Science in Python) – Built an automated grading bot for GitHub submissions.
- STAT 480 (Big Data Fundamentals) – Designed a testing suite for lab assignments.
- I also host weekly training sessions at work, covering Python fundamentals and best practices in analytics engineering.
Always open to discussing data engineering, analytics, and workflow optimization!