Skip to content

CityOfDetroit/pgsql-addressing-dictionary

 
 

Repository files navigation

TSearch Address Dictionaries

Motivation

Trying to use full-text indexing for address data with the default language configurations doesn't work well: if you use an 'english' configuration, you end up getting street names stemmed by the english dictionary. Using the 'simple' configurations is better but still not optimal: there is no handling of obvious abbreviations, so you might fail to match "12 N Oak st" to "12 North Oak Street", or vice versa.

This extension provides an 'addressing_en' configuration that has common (Canada Post enumerated) street type abbreviations, direction tokens (n, s, e, w, etc) and numeric street variants (first/1st) handled. There's sure to be piles more, as well as many cases that cannot be handled with a simple full-text tokenizing strategy (st = street or saint?). However, using full-text features for basic geocoding is too convenient to ignore.

Using full-text indexing for nationwide address data might still be a bad idea, since there are so many possible false or multiple matches to find, but for a single city, county or even state, this approach can generate a quick'n'dirty address lookup routine perfect to tying to an autocomplete form field on a web page.

Currently there is only support for English (really, North American English) addressing -- pull requests gratefully received.

Installation

Clone the repository, ensure that pg_config is on your path, and run make install to copy the dictionary files into place.

Examples

> CREATE EXTENSION addressing_dictionary;

> SELECT to_tsvector('addressing_en', '1234 n main st');

                 to_tsvector                  
----------------------------------------------
 '1234':1 'main':4 'n':2 'north':3 'street':5

Caveats

If using v1.1, then cardinal directions are tokenized as both the single letter abbreviation, and the word. This is to avoid ambiguity. See example above.

About

TSearch dictionaries for addresses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Makefile 100.0%