Mercator 2 is a crawler based on Mercator, but it has only one design goal: ease of use
docker run -p 8082:8082 ghcr.io/dnsbelgium/mercator:latest
open localhost:8082
Important differences compared to current version of Mercator
- Zero required dependencies to deploy it.
- Can be run as a single docker image
- No longer requires a PhD in Kubernetes in order to deploy it ;-)
- Heck, it doesn't even need Kubernetes at all.
- Does not require any AWS services (but can optionally save its output on Amazon S3).
- Uses an embedded duckdb database and writes its output as parquet files
- Uses an embedded ActiveMQ to distribute the work over multiple threads
- Only one Javascript dependency: htmx
- Multi-platform docker images published to docker hub (x86 and aarch64) so it also works an Apple Silicon machines
curl -s "https://get.sdkman.io" | bash
sdk install java 21.0.5-tem
sdk install maven 3.9.9
mvn package
Will compile the sources and run all (enabled) tests. To run the tests, you need a Docker environment.
To run without tests and vulnerability scanning, use:
mvn package -DskipTests -Dsnyk.skip
mvn spring-boot:run
To use a specific profile:
mvn spring-boot:run -Dspring-boot.run.profiles=local
Note: using the 'local' profile will start mercator on port 8090 instead of 8082.
java -jar -Dspring.active.profiles=local target/mercator-*-SNAPSHOT.jar
Note: you need to run mvn package
first.
Since Lombok is not yet compatible with JDK 23, we compile the sources with Java 21. Once compiled, it is possible to run the application with Java 23.
Build container image:
mvn jib:dockerBuild
- Store a username in ~/.env.grafana-username
- Store a password in ~/.env.grafana-password
cd ./observability
docker-compose up --renew-anon-volumes --remove-orphans -d
docker compose logs monocator
The Mercator UI should be available at http://localhost:8082
Metrics are available on http://localhost:3000
Will follow soon.
Mercator will do the following info for each submitted domain name
- Fetch a configurable set of DNS resource records (SOA, A, AAAA, NS, MX, TXT, CAA, HTTPS, SVCB, DS, DNSKEY, CDNSKEY, CDS, ...)
- Fetch one or more html pages
- Extract features from all collected html pages
- Record conversations with all configured SMTP servers
- Check the TLS versions (and cipher suites) supported on port 443
- Find work by scanning a configurable folder for parquet files with domain names
Other features:
- Publish metrics for Prometheus
- docker-compose file to start Prometheus & Grafana
- with custom Grafana dashboards with the most important metrics
- export output files to S3
- optionally receive work via SQS
- show fetched data in web ui