Stars
Looker Extension GenAI - using LLMs to make exploration easier and getting dashboard insights
Rapid fuzzy string matching in Python using various string metrics
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Clod Formation templates from Key2Market to set up Live data streaming from RDS Postgres to Redshift
Imports the discogs.com monthly XML dumps into databases
PDF and support scripts for shebang PostgreSQL talk
Simple, easily customised trigger-based auditing for PostgreSQL (Postgres). See also pgaudit.
Project migrated to : https://gitlab.com/Oslandia/audit_trigger
Provides utilities for Postgres database schema versioning.
set of functions and operators for executing similarity queries
👪 a python library for parsing unstructured western names into name components.
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Scalable in-database machine learning with PL/Python: Postgres Open SV 2017 talk
🆔 Examples for using the dedupe library
A powerful and modular toolkit for record linkage and duplicate detection in Python
Port of Google's language-detection library to Python.
Utilities for tracing program execution line-by-line
Duke is a fast and flexible deduplication engine written in Java
Elasticsearch entity resolution plugin based on Duke
🔍 A Chrome extension that lets you inspect a website's framework and libraries
Open Source Tutorial For Analyzing & Visualizing 60 Million Police Stops Using Python
OpenRefine is a free, open source power tool for working with messy data and improving it