PySpark is python API for Spark & it facilitates interactions with Resilient Distributed Datasets (RDDs) from Python. It provides scalable & distributed processing for variety of use cases including exploratory data analysis, ETL, Machine Learning pipelines.
This sample project, brings some of well known data wrangline techniques implemented using Google Colab