AutoType: Benchmark Data for Type Detection in Tables

Overview

This benchmark data set contains example data values for 112 semantic data types (e.g., phone, email, address, isbn, upc, etc.), collected from the public web. The benchmark was compiled to evaluate precision/recall of type-detection algorithms, when given a small number of positive-examples, as described in the AutoType paper [1]. (Details of the data set can also be found in the paper).

We hope this data set can facilitate research of detecting semantic data types in tabular data, and can serve as a common benchmark for future research in this area.

License

This data set is released under the Computational Use of Data Agreement v1.0.

Reference

[1] Auto-Type: Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-source Code. Cong Yan and Yeye He. In SIGMOD 2018. https://dl.acm.org/doi/10.1145/3183713.3196888

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data_release		data_release
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoType: Benchmark Data for Type Detection in Tables

Overview

License

Reference

About

Releases

Packages

Languages

congy/AutoType

Folders and files

Latest commit

History

Repository files navigation

AutoType: Benchmark Data for Type Detection in Tables

Overview

License

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages