Customer Segmentation Using K-Means Clustering with HDFS, MySQL, and PySpark Integration

Overview

This project implements customer segmentation using K-Means clustering, with the results stored in both HDFS and MySQL databases. The solution leverages PySpark for efficient processing and is optimized for a big data environment.

Project Structure

data/: Contains the dataset customer_data.csv.
src/: Contains the implementation code customer_segmentation.py.
README.md: Project documentation.

Installation

Clone the repository:
```
git clone <repository-link>
```

Install the required packages:

pip install pandas scikit-learn matplotlib mysql-connector-python hdfs pyspark

Usage

Run the customer_segmentation.py script to perform clustering and store results:

python src/customer_segmentation.py


# Key Features

- Your specified HDFS path is set as `hdfs://localhost:50000/customer segmentation reult.csv`.
- The code integrates with Hadoop and PySpark, optimized for Ubuntu setup.
- The results are stored in both HDFS and MySQL.

This setup provides a comprehensive solution while utilizing your big data environment.


# License
This project is licensed under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Customer-segmentation.py		Customer-segmentation.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Segmentation Using K-Means Clustering with HDFS, MySQL, and PySpark Integration

Overview

Project Structure

Installation

Usage

About

Releases

Packages

Languages

divithraju/divith-raju-Data-Mining

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation Using K-Means Clustering with HDFS, MySQL, and PySpark Integration

Overview

Project Structure

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages