Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lower performance compared to exhaustive search ? #17

Open
FanWan opened this issue May 24, 2021 · 4 comments
Open

lower performance compared to exhaustive search ? #17

FanWan opened this issue May 24, 2021 · 4 comments

Comments

@FanWan
Copy link

FanWan commented May 24, 2021

Hello, I got a performance problem about the ac machine of this project, and the following was my job.

First i constructed a trie tree with 2700+ patterns and did something about string-matching.

@FanWan
Copy link
Author

FanWan commented May 24, 2021

#include
#include
#include
#include
#include
#include

#include "aho_corasick.hpp"

std::chrono::steady_clock::time_point time_start_point() {
return std::chrono::steady_clock::now();
}

double time_duration(const std::chrono::steady_clock::time_point& start) {
auto end = time_start_point();
auto span = std::chrono::duration_caststd::chrono::microseconds(end - start).count();
return ((double)span) / 1000.0;
}

int main(int argc, const char * argv[])
{
auto time_start_0 = time_start_point();
std::string match_str = "quick 百度 red fox,paddlecloud相关的文档有吗?jumps over lazy brown dog that owns brown house";

// create instance
aho_corasick::trie automation;
std::ifstream in("events.txt");
char word[256];
while(!in.eof()){
in.getline(word, 256);
//std::cout << "hello:" << word << std::endl;
automation.insert(word);
//high_user.insert(buffer);
}
in.close();
// add patterns to search for
automation.insert("quick");
auto construct_time = time_duration(time_start_0);
std::cout << "construct time: " << construct_time << std::endl;

// begin searching
auto time_start = time_start_point();
auto results = automation.parse_text(match_str);

auto prepare_time = time_duration(time_start);
std::cout << "size of results: "<< results.size() << std::endl;
for (const auto& m: results) {

  //std::cout << m << std::endl;

}
std::cout << "match cost time: " << prepare_time << std::endl;

}

@FanWan
Copy link
Author

FanWan commented May 24, 2021

the outputs were as follows:

construct time: 28.581
size of results: 2
match cost time: 53.299

@FanWan
Copy link
Author

FanWan commented May 24, 2021

Next, i did an experiment about exhaustive search using Python:

`import os
import time
cur_dir = os.path.dirname(os.path.abspath(file)) + '/'
data_dir = "/Users/wanfan01/workspace/cpp/learning/ac_machine/AhoCorasick/events.txt"

if name == "main":
words_ = list()
for line in open(data_dir, 'r', encoding='utf8'):
line = line.strip()
if line:
words_.append(line)
words_.append("quick")

begin_time = time.time() * 1000
sentence = "quick 百度 red fox,paddlecloud相关的文档有吗?jumps over lazy brown dog that owns brown house"

match_res = []
for w in words_:
    if w in sentence:
        match_res.append(w)
end_time = time.time() * 1000
cost_time = end_time - begin_time
print("match costing time: %.3f" % cost_time)

`

the output was:

match costing time: 0.347

Obviously, ac machine (53.299 ms) performed worse than exhaustive search (0.347ms) under the same conditions.

I am so confused, and wonder if there was any mistake about my experiments ??

@yqynju
Copy link

yqynju commented Jun 5, 2021

please do performance after a few random search

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants