This is the code repository for Apache Spark for Machine Learning, published by Packt.
Build and deploy high-performance big data AI solutions for large-scale clusters
Apache Spark for Machine Learning teaches you to use Spark for big data processing and solving future big data challenges.
This book covers the following exciting features:
- Master Apache Spark for efficient, large-scale data processing and analysis
- Understand core machine learning concepts and their applications with Spark
- Implement data preprocessing techniques for feature extraction and transformation
- Explore supervised learning methods – regression and classification algorithms
- Apply unsupervised learning for clustering tasks and recommendation systems
- Discover frequent pattern mining techniques to uncover data trends
If you feel this book is for you, get your copy today!
All of the code is organized into folders.
The code will look like the following:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("HDFS Read Example") \
.getOrCreate()
Following is what you need for this book: This book is ideal for data scientists, ML engineers, data engineers, students, and researchers who want to deepen their knowledge of Apache Spark’s tools and algorithms. It’s a must-have for those struggling to scale models for real-world problems and a valuable resource for preparing for interviews at Fortune 500 companies, focusing on large dataset analysis, model training, and deployment.
With the following software and hardware list you can run all code files present in the book (Chapter 1-9).
Chapter | Software required | OS required |
---|---|---|
1-9 | Python 3.x | Windows, macOS, or Linux |
1-9 | Apache Spark 3.x.x | Windows, macOS, or Linux |
1-9 | ECMAScript 11 | Windows, macOS, or Linux |
Deepak Gowda is a data scientist and AI/ML expert with over a decade of experience in leading innovative solutions across various industries, including supply chain, cybersecurity, and data center infrastructure. He holds over 30 granted patents, contributing to advancements in automation, predictive analytics, and AI-driven optimization. His work spans data engineering, machine learning, and distributed systems, focusing on building scalable and impactful products. A passionate inventor, mentor, author, and FAA-certified pilot, Deepak is also dedicated to content creation, sharing his expertise through writing, speaking, and mentoring. He continues to push the boundaries of technology, driving innovation across sectors.