forked from liuliu/ccv
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
comment out image generation part of tld
added tld.md doc.
- Loading branch information
Showing
3 changed files
with
114 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
TLD: Track Learn Detect | ||
======================= | ||
|
||
What's TLD? | ||
----------- | ||
|
||
This algorithm, also known as "Predator" algorithm, developed by Zdenek Kalal. For | ||
more information, please visit his homepage: http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html | ||
|
||
How it works? | ||
------------- | ||
|
||
This is a long story, please read Zdenek's paper. Here is how it works in command-line | ||
if you compiled ccv with FFMPEG support: | ||
|
||
./tld <Your Video> x y width height | ||
|
||
It will output each tracking coordinates for each frame. | ||
|
||
What about performance? | ||
----------------------- | ||
|
||
TLD is implemented closely after Zdenek's paper, but still, varies in quite a few | ||
aspects significantly. I've done excessive tests to make sure performance, in | ||
terms of accuracy and speed matches the original implementation. | ||
|
||
Accuracy-wise: | ||
|
||
TLD uses randomization algorithm, thus, the result can vary from time to time, | ||
I managed to run ccv's TLD implementation on test videos with "rotation == 0" and | ||
default parameters. With 3 runs and then pick the median, I've able to generate | ||
some meaningful data to analyze on. | ||
|
||
On motocross.mpg: | ||
|
||
detections : 774 | ||
true detections : 1412 | ||
correct detections : 710 | ||
precision : 0.917313 | ||
recall : 0.502833 | ||
f-measure : 0.649588 | ||
|
||
The result on the same video reported in: Zdenek Kalal, Jiri Matas and Krystian Mikolajczyk, Online Learning of Robust Object Detectors during Unstable Tracking: | ||
|
||
precision : 0.96 | ||
recall : 0.54 | ||
|
||
On pedestrian3.mpg: | ||
|
||
After 69th frame failed to recover (out of 140 frames) | ||
|
||
The result on the same video reported in: Zdenek Kalal, Jiri Matas and Krystian Mikolajczyk, P-N Learning: Bootstrapping Binary Classifiers by Structural Constraints: | ||
|
||
After 27th frame failed to recover (out of 140 frames) | ||
|
||
Note that a few runs I can get outperformed results than Zdenek's implementation | ||
sometimes, but choose to ignore these instead. | ||
|
||
All these results are obtained with alantrrs' evaluate_vis.py script in https://github.com/alantrrs/OpenTLD/blob/master/datasets/evaluate_vis.py and the dataset in | ||
that repository. Thanks alantrrs! | ||
|
||
Speed-wise: | ||
|
||
By enable "rotation" technique, you can achieve near real-time performance on QVGA | ||
video, with minor accuracy loss. With "rotation == 1" (default parameter), TLD | ||
spends around 15ms on tracking, 50ms on detecting, 50ms on learning for 320x240 | ||
video on single thread of i7-2620M 2.7GHz. | ||
|
||
Under the hood? | ||
--------------- | ||
|
||
ccv's TLD implementation varies from Zdenek's original Matlab implementation in | ||
several significant ways: | ||
|
||
1). Tracking: | ||
|
||
Zdenek's implementation uses a smaller LK window for computation (5x5), whereas | ||
ccv's implementation uses a 15x15 window for such. | ||
|
||
2). Ferns Detection (Random Forest): | ||
|
||
Zdenek's implementation uses random forest for object detection (in short, the | ||
probability for each feature add up), whereas ccv's implementation uses ferns | ||
for object detection (using multiplication of probabilities, A.K.A. semi-naive | ||
Bayes classifier). To compensate such choice, ccv's implementation uses 40 ferns, | ||
and for each fern, uses 18 features (the default parameter), and the default | ||
ferns threshold for ccv's implementation is 0. | ||
|
||
3). Nearest-neighbor Classifier: | ||
|
||
Zdenek's implementation uses aspect-ratio normalized examples (15x15); these | ||
examples are normalized so that a simple multiply can yield correlation confidence. | ||
ccv's implementation uses aspect-aware examples (constraint to area size of 400); | ||
examples are left as it is and using normalized coefficient computation to get | ||
confidence score. | ||
|
||
4). Pseudo-random Number Generator: | ||
|
||
Zdenek's implementation uses srand() for random number generation, and seed it | ||
with 0. ccv's implementation uses a Mersenne-Twister random number generator with | ||
an environment-dependent seed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters