Skip to content

This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.

Notifications You must be signed in to change notification settings

divithraju/divith-raju-Data-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Customer Segmentation Using K-Means Clustering with HDFS, MySQL, and PySpark Integration

Overview

This project implements customer segmentation using K-Means clustering, with the results stored in both HDFS and MySQL databases. The solution leverages PySpark for efficient processing and is optimized for a big data environment.

Project Structure

  • data/: Contains the dataset customer_data.csv.
  • src/: Contains the implementation code customer_segmentation.py.
  • README.md: Project documentation.

Installation

  1. Clone the repository:
    git clone <repository-link>
  2. Install the required packages:
    pip install pandas scikit-learn matplotlib mysql-connector-python hdfs pyspark

Usage

Run the customer_segmentation.py script to perform clustering and store results:

python src/customer_segmentation.py


# Key Features

- Your specified HDFS path is set as `hdfs://localhost:50000/customer segmentation reult.csv`.
- The code integrates with Hadoop and PySpark, optimized for Ubuntu setup.
- The results are stored in both HDFS and MySQL.

This setup provides a comprehensive solution while utilizing your big data environment.


# License
This project is licensed under the MIT License

About

This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages