I'm Mahdi, a Product and Data lead building products in the data space - specifically, I'm currently a PM at Sifflet. Before transitioning to product, I spent seven years designing and building petabyte-scale data platforms wearing different hats (data engineer, tech lead, data architect, and ML Ops engineer). I'm very passionate about open-source projects and enjoy working with data and designing scalable solutions. You can also read my content on Medium and via the Data Espresso newsletter.
- Apache Spark (and larger Databricks ecosystem): I used it on a daily basis for nearly five years (and so we know each other pretty well).
- dbt: It's the tool I currently work with the most. At Zendesk, I added dbt to our data stack and worked on defining and implementing standards, frameworks, and automation to better leverage it at scale. (Article from the Zendesk Engineering blog)
- Snowflake: Was part of the core team that worked on transitioning from BigQuery to Snowflake at Zendesk.
- AWS Ecosystem: Worked on it for 3 years, for various data and ML projects (mostly worked with Glue, EMR, Athena, ECS, SageMaker, Redshift, and the AWS CI/CD stack).
- GCP Ecosystem: Worked on it for 3 years, mostly everything BigQuery and GKE.
- Hadoop: Worked with Hadoop data lakes for two and a half years (it was the ecosystem that first introduced me to distributed systems and the paradigms/concepts behind them).
- Other notable projects/tools: Apache Superset, Apache Airflow, Apache Zeppelin, Apache Hive, Dremio, Jupyter, and D3.js.
- Languages I'm fluent in: Python, Java, and SQL.
- Other languages I used in the past: C++, C#, JavaScript (Angular, Node.js), and HTML+CSS.
- IaC: Terraform and CloudFormation.
- End-to-End Batch Data Pipeline with Spark: A series of four projects that I authored for Manning Publications as part of their liveProjects platform. The series goes through the different steps of building an end-to-end Big Data pipeline. Learners get to use Apache Spark, Delta Lake, and Apache Superset.
- Building an End-to-End Open-Source Modern Data Platform: Proposes an exhaustive design (accompanied by the necessary Infrastructure-as-Code) to build a modern data platform solely using open-source projects and the resources offered by cloud providers.
- Writing design docs for data pipelines: Exploring the what, why, and how of design docs for data components — and why they matter.
- Navigating Your Career Transition in Tech: A Practical Roadmap: A practical guide to a successful career pivot in tech: from making the decision to thriving in your new role.
- Data Modeling Techniques for the Post-Modern Data Stack: A set of generic techniques and principles to design a robust, cost-efficient, and scalable data model for your post-modern data stack.
- Navigating Your Data Platform’s Growing Pains: A Path from Data Mess to Data Mesh: A set of strategies and guiding principles to scale your data platform while maximizing its business impact efficiently.
- A Simple (Yet Effective) Approach to Implementing Unit Tests for dbt Models: Proposes an innovative unit testing approach for dbt models - relying on standards and dbt best practices.
- Creating Notebook-based Dynamic Dashboards: A design (accompanied by a POC) in which notebooks are leveraged to generate dynamic dashboards, to support a Google-like metadata search engine.
- Data Innovation Summit 2023: The Data Engineer's Guide to Data Quality Testing: The Fun, Easy, and Scalable Way
- Big Data Expo 2022: A Practical Case Study for Data Engineers: Performing Data Quality at Scale
- The Modern Data Show (S01E02): The third wave of data technologies