GitHub - VarIr/copac at 5c21c5728714e962d7b0355b37751aa48d00b449

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
cluster		cluster
recipes		recipes
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
documentation_group08.pdf		documentation_group08.pdf
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

Data Mining

Univie VU Data Mining course - Programming assignments

Assignment 1 - High dimensional data clustering with COPAC

We implement COPAC (Correlation Partition Clustering), which

computes the local correlation dimensionality based on the largest eigenvalues
partitions the data set based on this dimension
calculates a Euclidean distance variant weighted with the correlation dimension, called correlation distance
further clusters objects within each partition with Generalized DBSCAN, requiring a minimum number of objects to be within eps range for each core point.

Installation

Make sure you have a working Python3 environment (at least 3.5) with numpy, scipy and scikit-learn packages. Consider using Anaconda. You can install COPAC from within the cloned directory with

python3 setup.py install

COPAC is then available through the cluster package.

Example

COPAC usage follows scikit-learn's cluster API.

from cluster import COPAC
# load some X here ...
copac = COPAC(k=10, mu=5, eps=.5, alpha=.85)
y_pred = copac.fit_transform(X)

Documentation

See the PDF.

Implementation

Published in GitHub: https://github.com/VarIr/data_mining

Citation

The original publication of COPAC.

    @article{Achtert2007,
         author = {Achtert, E and Bohm, C and Kriegel, H P and Kroger, P and Zimek, A},
         title = {{Robust, Complete, and Efficient Correlation Clustering}},
         journal = {Proceedings of the Seventh Siam International Conference on Data Mining},
         year = {2007},
         pages = {413--418}
}

License

This work is free open source software licensed under GPLv3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Mining

Assignment 1 - High dimensional data clustering with COPAC

Installation

Example

Documentation

Implementation

Citation

License

About

Releases 2

Packages

Contributors 3

Languages

License

VarIr/copac

Folders and files

Latest commit

History

Repository files navigation

Data Mining

Assignment 1 - High dimensional data clustering with COPAC

Installation

Example

Documentation

Implementation

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages