Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
medxiaorudan authored Sep 26, 2023
1 parent c82ce6b commit e960ccb
Show file tree
Hide file tree
Showing 5 changed files with 211 additions and 1 deletion.
27 changes: 27 additions & 0 deletions matlab/external_functions/waynezhanghk-gactoolbox-53508ce/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Copyright (c) 2013, waynezhanghk
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.

Neither the name of the {organization} nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
GACToolbox
==========

Graph Agglomerative Clustering (GAC) toolbox

Introduction
------------

Gactoolbox is a summary of our research of agglomerative clustering on a graph. Agglomerative clustering, which iteratively merges small clusters, is commonly used for clustering because it is conceptually simple and produces a hierarchy of clusters. Classifical aggolomerative clustering algorithms, such as average linkage and DBSCAN, were widely used in many areas. Those algorithms, however, are not designed for clustering on a graph. This toolbox implements the following algorithms for agglomerative clustering on a directly graph.

1. Structural descriptor based algorithms (`gacCluster.m`). We define a cluster descriptor based on the graph structure, and each merging is determined by maximizes the increment of the descriptor. Two descriptors, including zeta function and path integral, are implemented. You can also design new descriptor (creating functions similar to `gacPathEntropy.m` and `gacPathCondEntropy.m`) and develop new algorithms with our code.

2. Graph degree linkage (`gdlCluster.m`). It is a simple and effective algorithm, with better performance than normalized cuts and spectral clustering, and is faster.

This toolbox is written and maintained by Wei Zhang (`wzhang009 at gmail.com`).
Please send me an email if you find any bugs or have any suggestions.

Examples
--------
Preparations:

1. Compile mex functions
2. Add 'gacfiles' and 'gdlfiles' to your matlab paths
3. Calculate a pairwise distance matrix from your data

```matlab
K = 20;
a = 1;
z = 0.01;
% path integral
clusteredLabels = gacCluster (distance_matrix, groupNumber, 'path', K, a, z);
% zeta function
clusteredLabels = gacCluster (distance_matrix, groupNumber, 'zeta', K, a, z);
% GDL-U algorithm
clusteredLabels = gdlCluster(distance_matrix, groupNumber, K, a, false);
% AGDL algorithm
clusteredLabels = gdlCluster(distance_matrix, groupNumber, K, a, true);
```

Citations
---------

Please cite the following papers, if you find the code is helpful.

* W. Zhang, D. Zhao, and X. Wang.
Agglomerative clustering via maximum incremental path integral.
Pattern Recognition, 46 (11): 3056-3065, 2013.

* W. Zhang, X. Wang, D. Zhao, and X. Tang.
Graph Degree Linkage: Agglomerative Clustering on a Directed Graph.
in Proceedings of European Conference on Computer Vision (ECCV), 2012.

Additional Notes
----------------

1. How to compile mex files?

I include mexw64 files. If you use a system other than win64, you can find a file called compileMex.m to help you build the mex files.

2. We provide MATLAB implementation of structural descriptor based clustering and MATLAB-C++ mixed implementation of graph degree linkage. The MATLAB implementation is for ease of understanding, although it's inefficient. In the future we will add MATLAB implementation of graph degree linkage.

In speed: AGDL > GDL-U > path integral > zeta function

3. GDL-U and AGDL have similar performance. GDL-U is for small datasets and AGDL is for large datasets.

AGDL has an additional parameter Kc in gdlMergingKNN_c.m. The larger Kc is, the closer performance AGDL has to GDL-U and slower the algorithm is. Default Kc = 10 is a good trade-off for most datasets.
Original file line number Diff line number Diff line change
@@ -1 +1,12 @@

cd ./gdlfiles/
mex -O gacLlinks_c.cpp
mex -O gacOnelink_c.cpp
mex -O gacPartial_sort.cpp
mex -O gacPartialMin_knn_c.cpp
mex -O gacPartialMin_triu_c.cpp
mex -O gdlInitAffinityTable_c.cpp gdlComputeAffinity.cpp
mex -O gdlInitAffinityTable_knn_c.cpp gdlComputeAffinity.cpp
mex -O gdlAffinity_c.cpp gdlComputeAffinity.cpp
mex -O gdlDirectedAffinity_c.cpp gdlComputeDirectedAffinity.cpp
mex -O gdlDirectedAffinity_batch_c.cpp gdlComputeDirectedAffinity.cpp
cd ../
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
function clusteredLabels = gacCluster (distance_matrix, groupNumber, strDescr, K, a, z)
%% Graph Agglomerative Clustering toolbox
% Input:
% - distance_matrix: pairwise distances, d_{i -> j}
% - groupNumber: the final number of clusters
% - strDescr: structural descriptor. The choice can be
% - 'zeta': zeta function based descriptor
% - 'path': path integral based descriptor
% - K: the number of nearest neighbors for KNN graph, default: 20
% - p: merging (p+1)-links in l-links algorithm, default: 1
% - a: for covariance estimation, default: 1
% sigma^2 = (\sum_{i=1}^n \sum_{j \in N_i^K} d_{ij}^2) * a
% - z: (I - z*P), default: 0.01
% Output:
% - clusteredLabels: clustering results
% by Wei Zhang (wzhang009 at gmail.com), June, 8, 2011
%
% Please cite the following papers, if you find the code is helpful
%
% W. Zhang, D. Zhao, and X. Wang.
% Agglomerative clustering via maximum incremental path integral.
% Pattern Recognition, 46 (11): 3056-3065, 2013.
%
% W. Zhang, X. Wang, D. Zhao, and X. Tang.
% Graph Degree Linkage: Agglomerative Clustering on a Directed Graph.
% in Proceedings of European Conference on Computer Vision (ECCV), 2012.

%% parse inputs
disp('--------------- Graph Structural Agglomerative Clustering ---------------------');

if nargin < 2, error('GAC: input arguments are not enough!'); end
if nargin < 3, strDescr = 'path'; end
if nargin < 4, K = 20; end
if nargin < 5, a = 1; end
if nargin < 6, z = 0.01; end

%% initialization

disp('---------- Building graph and forming initial clusters with l-links ---------');
[graphW, NNIndex] = gacBuildDigraph(distance_matrix, K, a);
% from adjacency matrix to probability transition matrix
graphW = bsxfun(@times, 1./sum(graphW,2), graphW); % row sum is 1
initialClusters = gacNNMerge(distance_matrix, NNIndex);
clear distance_matrix NNIndex

disp('-------------------------- Zeta merging --------------------------');
clusteredLabels = gacMerging(graphW, initialClusters, groupNumber, strDescr, z);

end
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
function clusteredLabels = gdlCluster (distance_matrix, groupNumber, K, a, usingKcCluster, p)
%% Graph Agglomerative Clustering toolbox
% Input:
% - distance_matrix: pairwise distances, d_{i -> j}
% - groupNumber: the final number of clusters
% - strDescr: structural descriptor. The choice can be
% - 'gdl': graph degree linkage algorithm
% - others to be added
% - K: the number of nearest neighbors for KNN graph, default: 20
% - p: merging (p+1)-links in l-links algorithm, default: 1
% - a: for covariance estimation, default: 1
% sigma^2 = (\sum_{i=1}^n \sum_{j \in N_i^3} d_{ij}^2) * a
% Output:
% - clusteredLabels: clustering results
% by Wei Zhang (wzhang009 at gmail.com), June, 8, 2011
%
% Please cite the following papers, if you find the code is helpful
%
% W. Zhang, X. Wang, D. Zhao, and X. Tang.
% Graph Degree Linkage: Agglomerative Clustering on a Directed Graph.
% in Proceedings of European Conference on Computer Vision (ECCV), 2012.
%
% W. Zhang, D. Zhao, and X. Wang.
% Agglomerative clustering via maximum incremental path integral.
% Pattern Recognition, 46 (11): 3056-3065, 2013.

%% parse inputs
disp('--------------- Graph Agglomerative Clustering ---------------------');

if nargin < 2, error('GAC: input arguments are not enough!'); end
if nargin < 3, K = 20; end
if nargin < 4, a = 1; end
if nargin < 5, usingKcCluster = true; end
if nargin < 6, p = 1; end

%% initialization
disp('---------- Building graph and forming initial clusters with l-links ---------');
[graphW, NNIndex] = gacBuildDigraph_c(distance_matrix, K, a);
initialClusters = gacBuildLlinks_cwarpper(distance_matrix, p, NNIndex);
clear distance_matrix NNIndex

disp('-------------------------- Zeta merging --------------------------');
if usingKcCluster
clusteredLabels = gdlMergingKNN_c(graphW, initialClusters, groupNumber);
else
clusteredLabels = gdlMerging_c(graphW, initialClusters, groupNumber);
end

end

0 comments on commit e960ccb

Please sign in to comment.