Univie VU Data Mining course - Programming assignments
We implement COPAC (Correlation Partition Clustering), which
- computes the local correlation dimensionality based on the largest eigenvalues
- partitions the data set based on this dimension
- calculates a Euclidean distance variant weighted with the correlation dimension, called correlation distance
- further clusters objects within each partition with Generalized DBSCAN, requiring a minimum number of objects to be within eps range for each core point.
Make sure you have a working Python3 environment (at least 3.5) with numpy, scipy and scikit-learn packages. Consider using Anaconda. You can install COPAC from within the cloned directory with
python3 setup.py install
COPAC is then available through the cluster package.
COPAC usage follows scikit-learn's cluster API.
from cluster import COPAC
# load some X here ...
copac = COPAC(k=10, mu=5, eps=.5, alpha=.85)
y_pred = copac.fit_transform(X)
See the PDF.
Published in GitHub: https://github.com/VarIr/data_mining
The original publication of COPAC.
@article{Achtert2007,
author = {Achtert, E and Bohm, C and Kriegel, H P and Kroger, P and Zimek, A},
title = {{Robust, Complete, and Efficient Correlation Clustering}},
journal = {Proceedings of the Seventh Siam International Conference on Data Mining},
year = {2007},
pages = {413--418}
}
This work is free open source software licensed under GPLv3.