Skip to content

vivekvision/PySparkDataWrangling

Repository files navigation

PySparkDataWrangling

PySpark is python API for Spark & it facilitates interactions with Resilient Distributed Datasets (RDDs) from Python. It provides scalable & distributed processing for variety of use cases including exploratory data analysis, ETL, Machine Learning pipelines.

This sample project, brings some of well known data wrangline techniques implemented using Google Colab

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published