#

web-archiving

Here are 122 public repositories matching this topic...

ArchiveBox

ArchiveBox / ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Updated May 19, 2025
Python

webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives

python web-archiving wayback web-archives pywb

Updated Aug 18, 2025
JavaScript

conifer

Rhizome-Conifer / conifer

Collect and revisit web pages.

python docker archives warc web-archiving wayback webrecorder pywb

Updated Jan 11, 2025
Python

webrecorder / archiveweb.page

A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

extension archiving chromium browser-extension warc web-archiving webrecorder wacz

Updated Jul 25, 2025
TypeScript

gildas-lormeau / single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

nodejs cli dockerfile crawler web-crawler archiving web-scraper web-scraping web-archiving scraping-websites single-file deno

Updated Jun 2, 2025
JavaScript

bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

python docker service scraping archive web-archiving open-source-research

Updated Aug 11, 2025
Python

Ray-D-Song / web-archive

Free web archiving and sharing service based on Cloudflare. 跑在 Cloudflare 上的免费网页归档和分享工具。

serverless self-hosted cloudflare free web-archiving d1 hono web-archive cloudflare-pages

Updated Jun 6, 2025
TypeScript

webrecorder / browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

crawler web-crawler crawling warc web-archiving webrecorder wacz

Updated Jul 31, 2025
TypeScript

webrecorder / replayweb.page

Serverless replay of web archives directly in the browser

service-worker warc web-archiving wayback-machine web-archive replay-web-page web-replay wacz

Updated Jul 25, 2025
TypeScript

ipwb

oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

python docker service-worker ipfs memento warc web-archiving wayback memento-rfc

Updated May 19, 2025
Python

waybackpy

akamhy / waybackpy

Wayback Machine API interface & a command-line tool

osint internet-archive web-archiving wayback-machine webarchiving cdx-api internet-archiving savepagenow archive-webpage archive-webpages wayback-machine-api wayback-machine-python

Updated Feb 26, 2024
Python

harvard-lil / perma

Indelible links

libraries web-archiving

Updated Aug 11, 2025
JavaScript

webrecorder / webrecorder-player

Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)

electron warc web-archiving webrecorder pywb

Updated Sep 17, 2020
JavaScript

rahiel / archiveror

Archiveror will help you preserve the webpages you love. 💾

javascript chrome-extension bookmark archiving webextension firefox-extension browser-extension mhtml linkrot web-archiving

Updated Oct 18, 2019
JavaScript

webrecorder / warcio

Streaming WARC/ARC library for fast web archive IO

python warc web-archiving web-archives pywb

Updated Dec 10, 2024
Python

oduwsdl / archivenow

A Tool To Push Web Resources Into Web Archives

internet-archive web-archiving

Updated Jan 23, 2024
Python

Florents-Tselai / WarcDB

WarcDB: Web crawl data as SQLite databases.

cli database sqlite crawling warc web-archiving web-data

Updated Jul 13, 2024
Python

wail

machawk1 / wail

🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation

python gui warc web-archiving pyinstaller wayback heritrix openwayback

Updated Mar 12, 2025
Roff

ArchiveBox / archivebox-browser-extension

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

chrome-extension archiving svelte firefox-extension browser-extension web-archiving digital-preservation digipres internet-archiving archivebox

Updated May 3, 2025
JavaScript

webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

kubernetes cloud archiving warc web-archiving webrecorder web-archive wacz

Updated Aug 20, 2025
TypeScript

Improve this page

Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."