What is DeltaParse?

DeltaParse is a simple template-based data extraction tool, it is designed to be a part of a tool chain.

How it works

Given a template string with tokens matching {{\\w*?}} data is extracted using Google's diff-match-patch

string text = "The quick brown fox jumps over the lazy dog";
string template = "The quick {{colour}} fox jumps over the lazy {{animal}}";

var parser = new Parser<SimpleResult>(template, new StandardProcessor(), new SimpleOutputBuilder());
var results = parser.Parse(text);

Results in

{
  "ParsedValues": {
    "colour": [
      "brown"
    ],
    "animal": [
      "dog"
    ]
  },
  "Difference": 0
}

Difference is a rough measurement on a scale of 0 -> 1 of how large a difference there is between the template and the text, excluding any matched data.

What is DeltaParse not?

DeltaParse is not a full data extraction library, while it can handle a wide range of inputs it can only parse them naively, it does not support more advanced features like parsing inputs with variable amount of tokens to extract

e.g. parsing a datatable with n rows, where n cannot be determined when creating the templates is unfeasible, instead chaining DeltaParse together with other text processig tools such as Regex or HTML Selectors will give better results.

StandardProcessor vs MungeProcessor

Standard processor uses regex to find and prepare template tokens for parsing and should work fine in the majority of situations.

Munge processor guarantees that only template tokens in the template text will be considered when extracting data, it does however require more memory/processing time.

Using MungeProcessor as follows

string text = "The quick {{adjective}} brown fox {{verb}} over the lazy dog";
string template = "The quick {{adjective}} brown fox {{verb}} over the lazy dog";

var parser = new Parser<SimpleResult>(template, new MungeProcessor(), new SimpleOutputBuilder());
var results = parser.Parse(text);

Results in

{
  "ParsedValues": {
    "adjective": [
      "{{adjective}}"
    ],
    "verb": [
      "{{verb}}"
    ]
  },
  "Difference": 0
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vscode		.vscode
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is DeltaParse?

How it works

What is DeltaParse not?

StandardProcessor vs MungeProcessor

About

Releases

Packages

Languages

License

HeartofTheForce/DeltaParse

Folders and files

Latest commit

History

Repository files navigation

What is DeltaParse?

How it works

What is DeltaParse not?

StandardProcessor vs MungeProcessor

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages