-
Lancaster University
- Lancaster
- https://www.lancaster.ac.uk/history/about/people/katherine-mcdonough
Highlights
- Pro
Stars
Text mining of English children's literature 1789-1914 for the representation of insects and other creepy crawlies.
Teaching materials for the Applied Data Analysis course at DHOxSS. Data science methods to analyse humanities data.
A collection of Jupyter notebooks in many human and computer languages for doing digital humanities. PRs welcome!
BookNLP, a natural language processing pipeline for books
Locolligo is a single-page, browser-based javascript application to facilitate the formatting, linking, and geolocation of datasets, with a particular focus on Cultural Heritage.
A python package for working with data in the Linked Places Format (LPF).
Class info and labs for Fall 2024 ENG 790/QTM 490 on data, archives, and the US founding
UrbanOccupationsOETR / Automatic-Road-Extraction-from-Historical-Maps-using-Deep-Learning-Techniques
This repository contains code for the paper "Automatic Road Extraction from Historical Maps using Deep Learning Techniques: A Regional Case Study of Turkey in a German World War II map"
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
Interact, analyze and structure massive text, image, embedding, audio and video datasets
Natural Earth data in GeoJSON
Experimental protocol and results for the paper "Linear Object Detection in Document Images using Multiple Object Tracking" accepted at ICDAR 2023 by Bernet et al.
[arXiv preprint] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Example notebooks and tutorials from Constellate, the text analysis service from ITHAKA.
A Transformer-based object-centric approach for date estimation of historical photographs
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
Layout analysis to find layout elements in documents (similar to P2PaLA)
OCR, layout analysis, reading order, table recognition in 90+ languages
Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)
Entity linking system for Wikidata updated by your edits in real time
A cross-platform streaming parser for the ESRI Shapefile spatial data format.
An extension of GeoJSON that encodes topology! 🌐
The repository of benchmarking modern historical map vectorization process
All the material (paper, code, dataset, results) of our DAS 2022 paper (OCR+NER benchmark)
Supplementary files for "Infrastructural semantics: postal networks and Statistical Accounts in Scotland, 1790-1845"