-
Rhapsode Consulting LLC
- @[email protected]
- in/tim-allison-5a6722
-
commons-compress Public
Forked from apache/commons-compressMirror of Apache Commons Compress
Java Apache License 2.0 UpdatedMay 30, 2024 -
tika-gui-v2 Public
Unofficial user interface for Apache Tika
-
incubator-stormcrawler Public
Forked from apache/incubator-stormcrawlerA scalable, mature and versatile web crawler based on Apache Storm
HTML Apache License 2.0 UpdatedMay 3, 2024 -
-
tika-detector-stormcrawler Public
Forked from DigitalPebble/tika-detector-stormcrawlerWraps the charset detection logic from StormCrawler as a Tika module
Java Apache License 2.0 UpdatedFeb 2, 2024 -
nutch Public
Forked from apache/nutchApache Nutch is an extensible and scalable web crawler
Java Apache License 2.0 UpdatedNov 6, 2023 -
-
tika-docker Public
Forked from apache/tika-dockerConvenience Docker images for Apache Tika Server
Shell Apache License 2.0 UpdatedNov 6, 2023 -
awesome-digital-preservation Public
Forked from digipres/awesome-digital-preservationCarefully curated list of awesome digital preservation resources.
-
-
commoncrawl-fetcher-lite Public
Simplified version of a common crawl fetcher
-
-
-
SimpleCommonCrawlExtractor Public
Simple wrapper around IIPC Web Commons to take a literal warc.gz and extract standalone binaries
-
-
opensearch-java Public
Forked from opensearch-project/opensearch-javaJava Client for OpenSearch
Java Apache License 2.0 UpdatedJun 13, 2023 -
commons-io Public
Forked from apache/commons-ioApache Commons IO
Java Apache License 2.0 UpdatedJun 12, 2023 -
any23 Public
Forked from apache/any23Apache Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents.
HTML Apache License 2.0 UpdatedMay 24, 2023 -
file-observatory Public
Single server/laptop grade file-observatory
-
-
-
tika-arlington-pdf-model Public
Simple wrapper around the Arlington PDF model's TestGrammar
Dockerfile Apache License 2.0 UpdatedFeb 10, 2023 -
apachestuff Public
Forked from chrismattmann/apachestuffPython Apache License 2.0 UpdatedFeb 6, 2023 -
quaerite Public
Forked from mitre/quaeriteSearch relevance evaluation toolkit
-
droid Public
Forked from digital-preservation/droidDROID (Digital Record and Object Identification)
Java BSD 3-Clause "New" or "Revised" License UpdatedAug 11, 2022 -
nanite Public
Forked from openpreserve/naniteNanite - a friendly swarm of format-identifying robots.
Java UpdatedAug 11, 2022 -
language-detector Public
Forked from optimaize/language-detectorLanguage Detection Library for Java
-
junrar Public
Forked from junrar/junrarplain java unrar util (former sf project)
Java Other UpdatedJun 22, 2022 -
metadata-extractor Public
Forked from drewnoakes/metadata-extractorExtracts Exif, IPTC, XMP, ICC and other metadata from image files
-