-
Notifications
You must be signed in to change notification settings - Fork 13
/
ChangeLog
1945 lines (1378 loc) · 77.5 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Sun Oct 9 2022
* mcl-22-282 released.
* Building mcl now requires installing the library cimfomfa. The easiest
way is to use the build script
https://raw.githubusercontent.com/micans/mcl/main/install-this-mcl.sh
* A new method for concensus clustering called RCL (restricted contingency
linkage) is introduced. It is implemented in rcl/rcl; the help
output for this should be sufficient to run it. {rcl} uses
{clm vol}, {clm close}, and {rcl-select.pl} to compute the RCL results.
RCL is completely independent from mcl, but can be used to integrate a
set of MCL outputs at different levels of inflation. Various items below
in this update reference RCL; a pre-print is forthcoming.
The script rcl-qc supports various modes to create QC plots as in
the preprint, as well as the cluster/marker-gene heatmap from the preprint.
* Mcl source is now hosted on github.com/micans/mcl .
Compiling mcl now requires installation of cimfomfa, which provides
the C utility library libtingea.
Cimfomfa is hosted on github.com/mican/cimfomfa .
Tar releases will for the foreseeable future still be hosted on
micans.org/mcl and micans.org/cimfomfa .
Previously cimfomfa was imported in the mcl source tree, but this
practice has stopped. The mcl source tree contains a script that
will download both cimfomfa and mcl tar archives and compile both.
* Various modes and programs were removed in a clean-up.
* Fixed bug in mcxarray in the case of NAs in data. The wrong N was used.
Additionally in the presence of NAs the spearman correlation coefficient
uses the ranks from the vectors pre-removal of the join NA set. There is
now a mode --flexnasp that will re-compute ranks after removal of this
set in both vectors, separately for each pair of vectors.
* Fixed bug in mcxload -etc <file> --expect-values; a logical cascade
triggered wrong parse branch.
* Fixed bug in mcxload -etc / -235 and derived options, where --stream-mirror
would cause a crash. Achieve the intended effect e.g. with '-ri max'
instead.
* clm vol can compute an rcl (Restricted Contingency Linkage) object,
encoding consensus clustering values.
* clm close can compute the single linkage tree of a network with --sl.
The output is an ordered list of join events that represent the tree.
Main intended use is to apply it to the rcl object described just above.
* rcl-res.pl is a script that reads a single linkage tree from the
clm close join-order output in --sl mode. It accepts resolution parameters
as argument and will compute balanced flat clusterings from the tree
that are adjacent in cluster size to the resolutions specified.
It will also output a GraphViz dot file encoding the tree relationship
between clusters at different resolution levels.
* rcl implements the full RCL - Restricted Contingency Linkage -
method of consensus clustering, using clm vol, clm close, and rcl-res.pl,
resulting in the resolution clusterings described in the entry above this.
* The volatility measure reported by clm vol has had its scale reversed
to be compatible with rcl scores.
Low values are the more volatile. Output values are in promilles, a scale
of 0-1000, where 1000 means a node is always consistently clustered across
all clusterings considered.
* Added --degree-adjust option to mcl; this adjusts edge weight A(i,j)
inversely proportional to deg(i)+deg(j), i.e. it decreases similarity
between highly connected nodes.
* After 40 iterations slightly perturb mcl inflation. This is to force convergence
in case a subgraph corresponds (proportionally) to the matrix
[ 4 1 1 ]
[ 1 4 1 ]
[ 1 1 4 ]
This sub-network happens to be a rare beast for which inflation and expansion
cancel each other out. Nearly all mcl processes start converging noticeably after
10 iterations, so 40 is a conservative number here.
* mcxi has a new 'type' primitive.
It returns one of 'mx', 'int', 'dbl', 'str', or 'null'.
Can be used for easy iteration over a set of objects of a certain type,
and/or to detect end of input/empty stack.
Fri May 16 2014
* mcl-14-137 released. Fix build error under clang.
Fri May 16 2014
* mcl-14-136 released.
* mcl can subcluster a given clustering. This is achieved by supplying
the latter as argument to the new -icl option. Edges in the input
graph that cross clusters will be removed before using the MCL cluster
algorithm to compute the subclustering.
* clm vol can compute node volatility by comparing one set of clusterings
against another set of clusterings (rather than comparing all against all).
This is achieved by inserting a "--" separator in the list of filenames.
The set of clusterings in the files with names to the left are compared
to those with names to the right of the separator.
* clm dist obeys the same convention as clm vol above. That is, if
a separator "--" is present in the list of filenames, than all
clusterings specified to the left are compared with all clusterings
specified to the right.
* The mcx query interface has been made less idiosyncratic. Edge weights
attributes and summary statistics are no longer scaled.
* mcx ctty can now also compute edge centrality betweenness (--edge
option). It can be parallelised using threading across multiple CPUs
(with -t) and by using multiple machines (with -j and -J).
* clm has a new mode 'optics'. It does everything that the OPTICS
algorithm does short of actually producing a clustering.
- It outputs the ordered reachability scores. (coredistance)
- It can output the OPTICS landscape visualisation.
- It can visualise a given clustering (for example
one derived by MCL) in the OPTICS landscape by colouring
bars according to asssociated node cluster membership.
* Several programs have acquired the --write-binary option
to enable native binary formatted output.
* A graph can be subselected such that only the edges consistent with a clustering are retained:
mcx alter -imx NETWORKFILE -icl CLUSTERFILE --block
The edges that are not consistent with a clustering can be selected by
using the --blockc option instead.
* mcxload can now read binary format with the options
-packed <fname>
-pack-cnum <num>
-pack-rnum <num>
The scripts/ directory contains an example script packed-example.sh
documenting (by example C code) the binary input accepted by mcxload.
* mcx query can output degree histogram data with -hist-degrees <step>.
* mcxrand can now shuffle clusterings. It reads a given clustering
with the -icl option and outputs a clustering that preserves the cluster sizes
with nodes randomly redistributed.
* mcxrand can generate graphs using preferential attachment with -pa V/m
to create a graph with V nodes, m edges added at each step.
* mcxarray has several new options:
--angle-norm normalised to [0, 1].
--acute-angle-norm normalised to [0, 1].
--slow-cosine cosine(0.5 * phi)
correlated 1, anti-correlated 0
--slow-sine sine(0.5 * phi)
correlated 1, anti-correlated 0
--taxi sum | xi - yi |
--max max | xi - yi |
--euclid Euclidean distance
-minkowski <num> Minkowski distance with power <num>
* Fixed bug in mcxdump -imx-tree; the case where trees were encoded
as a stack of flattened clusterings was not treated correctly.
* clm info2 is a new mode to clm. It is a streamlined update of clm info
that either outputs the efficiency for all nodes in a network relative
to a given clustering (with the --list option), or the average accross
all nodes.
It can be parallelised using threads and job dispatching using the
customary -t <threads> -j <jid> -J <jnum> options also shared by
mcxarray, mcx diameter, mcx ctty, and mcx clcf.
* mcxdump can now transform objects before outputting them (-tf option).
* mcxi has a new 'threads' primitive to set the number of threads.
* The environment variable MCLEDGE_NCPUS is now interpreted by various
programs in mcl-edge. For certain compute-intensive tasks they will
split the work into this many different threads of computation.
Mon May 21 2012
* mcl-12-135 released.
* mcx ctty was sped up by trimming the implementation of Brandes' algorithm.
* A new transform was added: #n(). Use e.g. as #n(20). This takes the
top N (20 in this cases) neighbours for each node by edge (arc) weight.
All selected arcs will be converted to (bidirectional) edges.
The existing #knn() transform has an identical selection stage, but then removes
all arcs for which the reciprocal arc was not selected.
* mcx query has a new mode -vary-n utilising the nearest neighbour transform
described above.
* mcl did not treat -t and -te (thread options) correctly when the number of
threads specified was larger than 64, and would crash. Fixed. mcl now
accepts any number of threads up to 65536.
Thu, 8 Mar 2012
* mcl-12-068 released.
* Stopgap release to fix a segmentation fault. Triggered in abc mode
due to absent initialisation of new SIF functionality. Now fixed.
* mcxrand has a new -pa <V>/<m> mode. This uses the standard preferential attachment
model to randomly generate a scale-free network.
Wed, 7 Dec 2011
* mcl-11-335 released.
* For a long time now MCL has shipped with many siblings in what is intended to
be a coherent set of network analysis tools. What this collection has lacked is
some way of refering to it without creating confusion. This void is now filled,
and MCL-edge is the new name. The unadorned name MCL will not go away, and the
source code will still be shipped using a name of the format mcl-YY-DDD.
* mcxload can now read SIF (Simple Interaction File) format with -sif <filename>.
The second column (relationship type) is ignored. To ensure
that an undirected graph is read, use the --stream-mirror option.
* mcxdump can now output SIF format with -dump-sif <relationship-type>.
The argument indicates the, yes, relationship type.
* mcl can now read SIF format with the --sif option.
With this option, the input file is assumed to be in SIF format.
* mcl can now read etc format with the --etc option. This is similar to
SIF format except that the second column, identifying the relationship
type, is dropped.
* mcx query now can read label format in its default mode. Not yet
in other modes.
* 'efficiency' criterion added to mcx query -vary-threshold,
-vary-knn, et cetera.
* minor fix in clm enstrict; on some occassions it would not clean up
empty clusters left by the removal of overlap.
Fri, 20 Jul 2010
* mcl-10-201 released.
* The speed of the clustering program mcl was improved.
It now utilises vanilla matrix/vector multiplication where this is faster
than sparse matrix/vector multiplication. It chooses between an
O(k^2 + N) algorithm with very low hidden constant and an O(k^2) algorithm
with significantly higher hidden constant, using the algorithm with least
expected run-time. Here k is a constant that bounds the number of
resources allocated to a node in the graph.
* The speed optimization code acquired a more logical option for adjustment
(-sparse) with a sane default setting that should work satisfactorily
across a wide range of graphs.
* The memory management code for vectors and sets was improved to work
better under malloc implementations that do not cope well with high
memory churn and large shrinking realloc requests.
* mcx query acquired --edges, --edges-sorted, and -edges-hist options.
They respectively output an unsorted list, sorted list, and histogram of
edge weights. Despite 'edges' being part of the names of these option,
the output contains a weight for each arc. For undirected graphs each
edge will be represented twice.
* mcx query acquired --output-table option. With -vary-threshold or
--vary-correlation the output will be a tab-separated table with row
names and without key.
* Got rid of the version tag (i.e. 1.008). It never acquired any meaning.
* Documentation was fixed and upated.
Fri, 28 May 2010
* mcl-10-148 released.
* mcl has become faster. It chooses between different matrix/vector multiplication
algorithms depending on the sparsity level of the vector.
* mcx erdos now works for directed graph, requiring the option --is-directed.
* clm adjust has a new option --force-connected. If
the input clustering does not induce connected subgraphs a subclustering
is output that does have that property.
* mcx clcf has been parallelized and now accepts -t <num> option.
* All -knn and -ceil-nb command line options are gone. The functionality
is still available in a more general fashion as new modes to the -tf
transformation option. As an example, '-knn 40' is now specified as -tf
'#knn(40)'. This is more general since the k-NN transformation
can be one among a list of transformations, where the user is
free to choose and order.
* All -tf style options allow new modes:
#max() make a graph symmetric using max
#min() make a graph symmetric using min
#add() make a graph symmetric using add
#arcmax() reduce two arcs to one arc using max
$arcsub() compute G[->] - G[<-]
#arcmcl(<num>) cluster a directed graph with inflation <num>
#tug() perturb edge weights to break ties (uses neighbourhood information)
#shrug() perturb edge weights to break ties (randomly)
#mcl(<num>) cluster an undirected graph with inflation <num>
#knn(<num>) reduce graph using k-Nearest-Neighbour selection
#ceilnb(<num>) reduce graph using ceil-nb selection
#tp() replace graph/matrix by inverse relationship
#step(<num>) replace graph by <num>-step relation
#thread() set thread count for parallelizable transformations (e.g. #knn)
Modes that start with the octothorpe (#) operate on entire graphs/matrices.
Other modes, e.g. ceil(), lt(), add(), operate on edges.
* A bug in clm enstrict was removed.
* clm dist has a new mode --index, in which it outputs the Rand index,
the adjusted (Hubert Arabie) Rand index, and the Jaccard index.
* Esoteric options and logging code were removed from mcl, to improve
readability and maintainability of its source.
* mcx clcf, mcx diameter, and mcx ctty all output column headers.
mcx collect by default expects column headers.
mcx --paste concatenates rows from different tables, requiring that the
first column is identical.
* mcx query --node-attr outputs a table of network node attributes.
* mcx diameter, mcx ctty, mcx clcf, and mcxarray all accept the same parallelization
interface. Jobs can split over multiple threads as well as over
multiple machines. The latter is done by the concept of 'thread group'.
- Use -t to specify the number of threads.
mcxarray -t 4
will run with 4 threads.
- Use -J to specify the number of groups/machines to use,
use -j to specify the group index.
It is important that all jobs use the same -t value, as all jobs
assume all other jobs use the same number while figuring out which
tasks they should run. The collection of tasks will only be
consistent if the jobs work from the same number of threads and
the same number of groups.
In the future different jobs will be able to run different numbers of
threads by having multiple group IDs.
mcxarray:
machine 1: mcxarray -t 3 -J 4 -j 0 -o d0.cor
machine 2: mcxarray -t 3 -J 4 -j 1 -o d1.cor
machine 3: mcxarray -t 3 -J 4 -j 2 -o d2.cor
machine 4: mcxarray -t 3 -J 4 -j 3 -o d3.cor
mcx collect -o result.cor d0.cor d1.cor d2.cor d3.cor
This last command combines the partial results and writes
to the file called result.cor .
mcx diameter/ctty:
machine 1: mcx diameter -t 8 -J 4 -j 0 -o d0.diam
machine 2: mcx diameter -t 8 -J 4 -j 1 -o d1.diam
machine 3: mcx diameter -t 8 -J 4 -j 2 -o d2.diam
machine 4: mcx diameter -t 8 -J 4 -j 3 -o d3.diam
mcx collect --two-column -o result.diameter d0.diam d1.diam d2.diam d3.diam
- The previous -start and -end options to mcx ctty and mcx diameter
have been removed.
- By default mcx collect expects matrix arguments. The two-column output
generated by mcx diameter/ctty should be specified using the
mcx collect --two-column option.
* mcxarray has seen many changes and improvements.
- It can use multiple cores. It uses pthreads and accepts the option -t
<num> to specify that <num> threads/cores should be used.
- It only accepts tab characters as separators. Spaces
no longer work.
- Parse errors are pinpointed precisely within the input file.
- It can handle missing data. Missing data is introduced
either by 'NA' or 'NaN' or 'inf' values in the tabular data, or by an
empty column. When computing correlations, rows are only compared on
those positions where neither of them has missing data.
- It has a new option --zero-as-na. With this, zeroes are treated
as NA (not available/applicable), and during the calculation of
correlations vectors are only considered on positions where neither is
NA. This works for modes --pearson (default) and --spearman. This
mode has very specialized uses. One example is when the input is
constructed using mcxload and read in mcxarray with -imx. In this
case missing data cannot be specified as 'NA' or empty columns, so
other means are necessary.
* clm order should write an output tree in its default mode,
but it did not. Now fixed.
* mcxdump --dump-table would err for graphs with sparse domains. Now fixed.
* mcx query has mode -vary-knn, to analyse different levels of
knn-selection (k-Nearest-Neighbours).
* Code clean-up: the taurus library for integer set manipulation
was finally discarded.
* A bug in the mcxi max and min operators was fixed, and both max and min
can now operate on matrices.
Wed, 04 Nov 2009
* mcl-09-308 released.
* The mcl cluster interpretation function did not deal correctly with
graphs that were encoded in gappy representations. Fixed. This would not
have been an issue for normal mcl usage, as mcl itself never constructs
such graphs (e.g. from label input).
* The 'q' mode to mcx (invoked as 'mcx q') has been renamed as 'mcx query'.
The new mode --vary-correlation triggers analysis of a correlation graph
at a series of thresholds, e.g. the number of connected components,
statistics (median, average, iqr) on node degrees and edge weights,
and a graph plotting the log(k) / log(#nodes of degree >=k) R^2 value
(high for scale-free-ish networks).
* mcl has had many not-so-interesting options removed or hidden.
* mcl has a new option, -knn-mutual <num>.
This considers the <num> best neighbours for each node, then only keeps
edges that occur in both best-neighbour lists for the two incident nodes.
* clm has a new mode 'stable', which outputs a (possibly overlapping) clustering
derived from a set of input clusterings. Each output cluster has a
stability value associated with it. If it is high, it means that
the output cluster occurred in some form in many of the input clusterings.
Such a matrix can be dumped with inclusion of the stability score
using the mcxdump option --dump-vlines.
* clm dist has new option --chain and --sort. The first causes only
consecutive comparisons to be made, the second sorts the clusterings
in order of (descending) granularity.
* If automatic naming of output files is employed (by not using the -o
option), mcl will only use the trailing part of the input file name.
Output will accordingly be written in the current directory, rather than
the directory in which the input file resides, as was previously the
case. The latter behaviour can be obtained by using the new unary --d
option.
* clm order and mcxarray have received polishing and upgrades.
- mcxarray now preserves negative correlations and
makes them available for later transformation (i.e. either
discarding or absolute value replacement).
- clm order finally outputs sensible output, namely a clustering
ordered in a manner consistent with the received set of nesting
clusterings, with largest clusters first, recursively,
starting from the coarsest clustering. It is possible to
halt this procedure at any level in the hierarchy using
the new -level option.
- clm order used to accept just a cluster hierarchy (such as produced
by mclcm or simple concatenating of cluster files). It now accepts
multiple separate clusterings as well, and one can even mix hierarchies
and clusterings all at the same time.
* The mcxplotlines.R script shipped with mcl has been improved and can
display a coarse experiment ordering (derived from clustering experiments
rather than probes) such as provided by 'clm order' on the expression
plots. The script should be regarded as a template for further
customization, although it does accept a number of parameters.
The expression plots now plot expression for all probes in a given
cluster, as well as the median value of expression of all probes
across a given experiment.
It is possible to let mcxplotlines.R handle a secondary coarser
clustering encoded by 'clm order'. The script will group together
information for the clusters within each supercluster, and plot
the medians of expression for all the subclusters as well as the
median of expression across the supercluster.
* mcx alter and mcxarray have both acquired the -tf option common in
several other programs. Use this option to transform values; e.g.
mcxarray -data t.expr -co 0.7 -tf 'abs(),add(-0.7)' -o t.mci
takes expression data, computes the pearson correlation coefficient,
takes values >= 0.7 and <= -0.7, takes the absolute value,
and maps the interval [0.7-1.0] to [0-0.3].
* The mcl options --adapt-smooth and --adapt-local have been turned
to no-ops. The implementation of these experimental options was not
sufficiently supported and not sufficiently elegant.
* Fixed a bug in mcxload, where mcxload -etc-ai would consistently crash.
Fri, 18 Sep 2009
* mcl-09-261 released.
* And a bug it did have (cf entry Wed 12 Jul 2009). When a resulting
clustering contains overlap, mcl tries to split off the overlapping part.
However, since the last release it has been doing this in a botched way,
causing erratic results. Bug reported by Tao Yue, now fixed.
* The mcxarray --transpose option now transposes the input data matrix,
rather than the result matrix, and -write-tab finds and writes the
correct labels in the presence of --transpose. Bugs were reported by
Jose Afonso Guerra Assuncao. The mcxarray --teartp option has been
removed.
* Added -resource <num> flag to mcl. Throughout the process, each
node will only keep track of at most <num> neighbours.
Use -ceil-nb <num> if you want to reduce the input graph in
the same manner.
Wed, 12 Jul 2009
* mcl-09-182 released.
* Lint modes have been removed from mcl. Linting can now be achieved
using 'clm adjust'.
* Analysis modes (and lint modes, now removed) would crash when combined
with --abc. Bug reported by David MacIver, now fixed.
* The interpretation routines were rewritten to be more compact and at a
somewhat higher-level of expressiveness, and should accordingly be
more understandable, maintainable, and extensible. They might have new
bugs too.
* mcl and 'mcx alter' and 'mcx erdos' have acquired the option -ceil-nb
to remove edges of lowest weight from highly connected nodes.
* volatility measures reported by 'clm dist' were wrong. Fixed.
* changed clm dist output to key=value format.
* 'mcx erdos' can now read label input with the -abc <fname> option.
* Added -ceil-nb <num> (cap neighbours) option to remove edges with
lowest weight from nodes with more than <num> edges. Consider it a poor
man's hub removal. Edges are removed in both directions, starting with
nodes that have the most neighbours and going down the list. This option
should help in obtaining more balanced clusterings. It reduces the
impact of sticky (having many neighbours) nodes, which generally have the
effect of pulling in many nodes, contributing to large clusters.
Breaking up those clusters otherwise requires increased inflation, which
increases cluster granularity throughout the entire graph. The -ceil-nb
option encodes a localized approach that should take the stick out of
sticky nodes.
* The output format of 'mcx erdos' was streamlined to some extent.
It is now in a pseudo s-expression syntax. mcx erdos can also read label
input with the customary -abc option. In interactive mode it is possible
to transform a graph in various ways, and additionally, to reread the
graph from file.
* Option -pp <num> (simple pre-pruning mode) has been removed seeing
that -ceil-nb should do a better job.
* mcx has a new mode, 'alter'. Currently supports -ceil-nb similar as
above.
Fri, 07 Nov 2008
* mcl-08-312 released (fixes bug in mcxdeblast).
* Fixed a bug in mcxdeblast, reported by Zhenxiang Xi.
* clm and mcx have acquired a help mode, for example
clm help info
will invoke the manual page for the info mode. It is fully equivalent to
'man clminfo'. All the modes that have a manual page are listed if mcx or
clm is invoked without arguments. This does require that the manual
pages are installed either in a directory listed in MANPATH or in
a standard location known to the 'man' program.
* Both mcx ctty (betweenness centrality) and mcx diameter can be run in
multiple threads with the -t option. In addition, the computation can
be split among different machines (each machine optionally running
multiple threads). The correct result is obtained by adding the partial
results of all the distributed runs, using 'mcx collect' (for diameter
subsequently, the maximum has to be taken over the resulting values).
This implies that mcx ctty and mcx diameter can now be sped up
arbitrarily by increasing the computation resources. Example:
HOST1: mcx ctty -imx graph.mci -t 4 -start 0 -end 1000 > graph.ctty1
HOST2: mcx ctty -imx graph.mci -t 4 -start 1000 -end 2000 > graph.ctty2
HOST3: mcx ctty -imx graph.mci -t 4 -start 2000 -end 3000 > graph.ctty3
HOST4: mcx ctty -imx graph.mci -t 4 -start 3000 -end 4000 > graph.ctty4
mcx collect graph.ctty1 graph.ctty2 graph.ctty3 graph.ctty3 > graph.CTTY
* clm close has new modes of output: The number of components,
and the list of component sizes.
* clm close accepts label type input with the -abc option
similar to mcx diameter, mcx ctty, mcx clcf and others.
* Added reference to Ulrik Brandes' paper on centrality betweenness
update algorithm.
* fixed bug in mcxdump that causes --dump-upper, --dump-upperi, --dump-lower,
and --dump-loweri to be ignored.
* A small R script called mcxplotlines.R was added to the scripts
directory. Use it to visualize per-cluster expression profiles
for clusterings of networks derived from expression data.
* mcxdump in newick mode has a modality to output singleton
labels without enclosing parentheses; -newick S.
* The layer responsible for handling label input (including
the format where each line consists of LABEL1 LABEL2 WEIGHT) was
rewritten. It is now in a more maintainable state, although
work still needs to be done.
Thu, 05 Jun 2008
* mcl-08-057 released.
- mcxarray reads in gene expression data in table format and
converts it to an mcl input graph.
- mcl now uses a simplified way of adding loops to the input graph. The
loop edge weight for a node is now set to the maximum of the weights
of edges connecting the node to its neighbours. This may cause small
changes in clustering results. These changes should generally be of
the same (small) magnitude as changes resulting from perturbing the
input data (edge weights).
- Added a program to compute, in various modes, for each node its
clustering coefficient, its eccentricity, and its betweenness
centrality. Also, to compute the diameter of a graph (i.e. the
maximum eccentricity): mcx clcf, mcx diameter, and mcx ctty.
- The number of applications has decreased substantially.
See below.
- The mcl suite is moving towards a wider focus on general purpose
large scale graph utilities, with the emphasis so far on basic
measures and transformations.
* mcx diameter compute diameter
* mcx ctty compute centrality
* mcx clcf compute clustering coefficient
* mcx erdos compute shortest paths
* mcxrand randomly shuffle, add, create, perturb edges
* mclcm hierarchical clustering with mcl
* clm dist compute cluster distance
* clm meet compute maximal joint subclustering
* clm close compute (subgraph) connected components
* ... and more.
* mcxi (formerly mcx) basic matrix operations
The binaries installed are
mcl mcxarray mcxrand clmformat
mclcm mcxassemble mcxsubs
clm mcxdump
mcx mcxload
mcxi mcxmap
The scripts installed are mclpipeline, along with mcxdeblast and
mclblastline if configure was instructed with --enable-blast.
Currently all programs in the mcl suite use one of the three
prefixes "mcl", "mcx", or "clm".
- first basic support for tree structures in the library.
- new --shadow-vl mcl preprocessing option.
- new mcl logging framework.
- speed ups in many applications.
- binary format can now be streamed over STDOUT/STDIN.
* Added mclcm, which implements hierarchical clustering with mcl.
It supports several modes:
contraction progress from fine to coarse
subcluster progress from coarse to fine
dispatch compute and combine different
* The mcl option --shadow-vl aids in creating well-balanced hierarchies
by adding dummy (shadow) nodes to a graph, which throttles flow between
denser and sparser parts. This prevents rapid absorption of sparse parts
by dense parts.
Possibly useful in standalone mode as well.
* New experimental mcl options --adapt-local and --adapt-smooth. They
adapt inflation according to local density characteristics of the input
graph.
* The number of applications has decreased substantially.
Most of the clm**** applications are now dispatched by the new
program clm, and most of the mcx**** applications are now dispatched
by the program mcx:
clm MODE clm encapsulates
dist order vol mate meet imac info close residue
mcx MODE mcx encapsulates
convert clcf diameter
Use
clm dist [options] <files>
clm order [options] <files>
mcx convert [options] <files>
et cetera.
The functionality formerly in mcx is now offered by mcxi.
* Interchange format now uses scientific notation except within a
limited range around zero (by using the fprintf %g conversion
specifier). This makes interchange format less lossy.
* binary format can now be streamed over STDOUT/STDIN. implying
very fast and lossless communication between mcl programs. Binary
format is lossless compared to interchange format in that the text
representations used by the latter are currently not garantueed to
result in the exact same value when read back.
* Lots of optimization work on graph and set related operations.
Many operations have been sped up for canonical matrices. These speed
ups do not affect mcl itself. Sped up: clmclose.
* mcl verbosity output is now largely controlled by a new logging
framework. Use the -q option or set environment variable TINGEA_LOG_TAG.
Use -q x -V all to thoroughly silence mcl.
* mcl emits more graph and cluster-related quantities in its
progress/log output.
* mcxarray:
It can now read flat-file array files with the -data option.
+ Skipping leading rows and leading columns is supported
(-skipr/-skipc). missing data is not yet supported.
+ Labels can be written to a tab file
* Renamed -cache-graph, -cache-graphx, -cache-tab to -write-graph,
-write-graphx, -write-tab. This is to avoid terminological confusion
with the process-level caching sometimes employed by mcl within a single
run to accomodate postprocessing.
Similarly the mcxload -cache-xxx options were all renamed to -write-xxx.
* mcl is accessible as a C library call. It is very undocumented
and lacking is an interface to build up a matrix. There are not yet
convenient installation tools.
* mcxdump by default read from STDIN and -imx is no longer
required.
* Fixed bug in mcxassemble where it would crash when presented
with a corrupted format.
* Added -lint-k and -lint-l options. Either will reread the input matrix and
do postprocessing on the clustering, reallocating nodes that seem to
have siphoned the wrong way.
When applied to networks with inhomogenously distributed edge density
characteristics the mcl process will sometimes cause smaller
clusters/sparse areas to suck in border nodes which 1) have only few
edges to that cluster/area and 2) seem to have been sucked out of a much
denser cluster into which they would fit beautifully. This is fully in
line with the flow characteristics of mcl but a largely unwanted
phenomenon. The postprocessing steps were added to remedy this.
-lint-l <num> considers all nodes in clusters of size not exceeding
<num> and optionally moves them to a larger cluster. Each
node is considered separately.
-lint-k <num> will try to have small clusters (up to a given size k)
assimilated in their entirety by a larger cluster if a suitable suitor
can be found.
* Fixed bug in mcxload -etc -etc-ai functionality. Singletons
would cause mayhem.
* The code underlying the analysis framework was largely reimplemented
and reorganized.
* --keep-overlap=y/n was removed and replaced by -overlap
<keep|remove|split>, remove being the default as before.
The split mode is new and causes all maximally consistent
overlapping fragments to be put in new clusters. This mode
is used in mclcm to cover theoretical fringe cases.
* If label data is tab-separated labels may contain spaces.
The code switches to tab-separated values if it finds a tab
in the input.
* Fixed bug in label loading where transformed values set to zero
were kept.
* Changed default output format in clminfo.
* GRATUITOUS. Bumped the gratuitous version tag to 1.007.
Mon, 27 Feb 2006
* mcl-06-058 released.
* Added scripts/minimcl, a 200-line fully functional mcl
implementation in perl. It only accepts label input and
has no parameters except inflation. The implementation is
hash-based rather than array, which may or may not leverage
sparseness properties.
Sat, 21 Jan 2006
* mcl-06-021 released.
* This release flushes some work before embarking on a big
mcxsubs overwrite. Analysis and cache modes have been improved.
* mcxsubs excepts path(<index-list>) top-level spec. It sets
the domain to all nodes participating in all shortest path
between all members of (the comma-separated) <index-list>.
* mcxsubs now works by default on the nil matrix. This makes it
easy to create domain templates with mcxsubs, e.g.
mcxsubs 'dom(cr, i(2,3,4))'
creates an empty matrix on the specified domains.
* mcxsubs did not recognize --from-disk and 'ext(disc())' specifications
cannot be combined and would dereference a NULL pointer. Fixed.
* Fixed weed-related bug in mcxsubs (removing rows/columns).
* Cleaned up the postprocess/analysis/caching frameworks.
Exceptions, limitations, and user-second-guessing were removed. By
default mcl does not append the log (it used to do this *sometimes*).
* Analysis modes try to read a cached graph if it exists.
* Added -cache-graphx <fname>. This caches a graph after transformations
have been applied.
* Caching of the input graphs is now done before matrix transformations
have been applied, but necessarily after stream transformations (if any)
have been applied in case input is streamed.
* Added -etc option to mcxload to load simple graphs from label-data
in a line-based format.
Use -etc-ai to load matrices for which the column labels are not
specified (e.g. clusterings). mcxload will autoincrement the columns.
* Fixed some documentation errors; inserted -abc-tf, --abc-log and
--abc-neg-log where -stream-tf, --stream-log and --stream-neg-log were
erroneously used.
* Added rand(<pbb-keep>) transformation (the -tf and -abc-tf options).
Selects each matrix entry with probability <pbb-keep>.
* mcl now prints a helpful reminder to cite the appropriate reference.
* Added configure-time check for
void* val = (void*) unsigned_number
idiom used by the stream interface.
Thu 17 Nov 2005
* mcl-05-321 released.
* Focus: uniform transformation syntax accross programs, improved
documentation, especially mcxio. Previous 'ascii' format is now called
interchange format throughout the documentation.
* mcl accepts the -abc-tf option to transform the input stream
and the -tf option to transform the input matrix (either constructed
from a stream with --abc or directly read from a stream).
-abc-tf 'pow(2), ceil(200), gt(20)'
This squares everything in the input stream, then truncates everything
larger than 200 to 200, and removes anything less than or equal to 20.
There are two special transform cases that appear as separate options.
--abc-log
--abc-neg-log
indicate that as the first thing to do the log or negative logarirthm
should be taken. The reason is that probability scores can get quite low
and are best represented as doubles (64 bit values); however mcl's
internal floating point representation is by default float (32 bit
values).
This means that blast clustering can be done from columnar format
like this:
grep -v '^#' hsfsp.cblast |\
cut -f 1,2,11 |\
mcl - --abc --abc-neg-log -tf 'ceil(460), gt(10)' -o -
This will make a few people very happy, and bewilder the rest.
For sake of completeness, ceil(460) because 1e-200 (standard
BLAST p-value cut-off) corresponds to 1/e/-460.517019 where
/e/ is the REAL e, namely 2.718281828.
* All of
- mcxsubs 'val(<spec>)'
- mcl -tf <spec>, -abc-tf <spec>
- mcxload -tf <spec>, -stream-tf <spec>
- mcxassemble -raw-tf <spec> -prm-tf <spec> -sym-tf <spec>
now accept the same syntax, documented in mcxio(5).
* The mcxio manual has gained two sections, one on transformation
syntax, one on label input.
* mcl -cache-graph saves the graph after any transformations have
been applied to it.
* Throughout the documentation, environment variables, and
logging statements, replaced 'ascii' by 'interchange'.
MCLXIOASCIIDIGITS is now MCLXIOINTERCHANGEDIGITS. bliss.
There is still plenty-o-ascii in the ChangeLog below.
Fri, 11 Nov 2005
* mcl-05-314 released - major new features.
* GRATUITOUS. Bumped the gratuitous version tag to 1.006 - because of
mcl's new label input munging abilities.
* mcl can read label input.
mcl <fname> --abc [options]
will read a line based white-space separated label format:
label1 label2 [value]
The current default is to resolve repeated entries by taking the maximum
of the values.
--abc or --expect-abc
input is expected to be in label format.
--abc or --yield-abc
cluster output will be done with labels.
-cache-tab <fname> (assumes label input)
the name of the file mcl writes the tab file too.
-cache-graph <fname> (assumes label input)
the name of the file mcl writes the input matrix.
-strict-tab <fname> (assumes label input)
makes MCL use the named tab file and die if labels
are not found.
-restrict-tab <fname> (assumes label input)
makes MCL use the named tab file and warn if labels
are not found.
-extend-tab <fname> (assumes label input)
makes MCL use the named tab file and extend it if labels
are not found.
* new utility mcxload with many custom options for reading in
label data and transforming the associated numerical values,
storing mappings in tab files and saving a graph in native mcl
input format.
* mcxdeblast acquired --abc-out to stream label input into mcl.