Skip to content

congy/AutoType

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

AutoType: Benchmark Data for Type Detection in Tables

Overview 

This benchmark data set contains example data values for 112 semantic data types (e.g., phone, email, address, isbn, upc, etc.), collected from the public web. The benchmark was compiled to evaluate precision/recall of type-detection algorithms, when given a small number of positive-examples, as described in the AutoType paper [1]. (Details of the data set can also be found in the paper).  

We hope this data set can facilitate research of detecting semantic data types in tabular data, and can serve as a common benchmark for future research in this area.

License

This data set is released under the Computational Use of Data Agreement v1.0.

Reference

[1] Auto-Type: Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-source Code. Cong Yan and Yeye He. In SIGMOD 2018. https://dl.acm.org/doi/10.1145/3183713.3196888

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages