GitHub - s0i37/crawl: Finding secrets in various (non-text) popular files.

Crawling

Each crawler goes through some source and pulls out exclusively useful data - text. Does not depend on extension. Easily customizable. Supported file types: text, html, doc/docx, xls/xlsx, pdf, archives, exe/bin, eml/msg, images, sounds.

You can easily add your own file types (GNU power)

Installation

System

Depends:

lynx, uchardet - html
catdoc - doc
xls2csv - xls
unzip - docx,xlsx
pdf2txt - pdf
rabin2 - exe,dll
7z - archives
identify, tesseract - images
vosk-transcriber - audios
msgconvert, munpack, mu - emails
binwalk - disk images

sudo apt install sqlite3 cifs-utils
sudo apt install file xdg-utils uchardet cifs-utils lynx catdoc unzip python3-pdfminer radare2 p7zip-full
sudo apt install maildir-utils mpack libemail-outlook-message-perl libemail-sender-perl binwalk
sudo apt install graphicsmagick-imagemagick-compat tesseract-ocr tesseract-ocr-eng tesseract-ocr-rus ffmpeg
sudo pip3 install vosk

Docker

sudo docker build -t crawl .
sudo docker run --cap-add SYS_ADMIN --cap-add DAC_READ_SEARCH --cap-add NET_BIND_SERVICE --cap-add CAP_SYSLOG -u 1000 -p 8080:8080 --name crawl -it crawl

SMB crawling

Making a network drive local and crawl it:

mount.cifs "//10.10.10.10/Docs" /mnt/Docs -o ro,dom=corp.net,user=username,pass=password
./crawl.sh /mnt/Docs -size -10M

It will create Docs.csv index file.

Web crawling

Depends:

wget with controllable download limit (https://yurichev.com/wget.html)

Making site content local and crawl it:

./spider.sh --limit-size=500k http://target.com/
./crawl.sh target.com/

It will create target.com.csv index file.

FTP crawling

Making FTP content local and crawl it:

./spider.sh --limit-size=500k ftp://target.com/`
./crawl.sh target.com/

It will create target.com.csv index file.

Searching

After crawling, the extracted text is stored in csv files. Data can be searched using simple grep:

grep -ia -o -P ".{0,100}password..{0,100}" *.csv | grep -ai --color=auto "password"

Or search for data using a fuzzy search (written with errors):

tre-agrep -i -E 2 passw *.csv

Searching CLI (pentesters)

Data can be converted into a sqlite3 database with full-text search support:

./import.sh INBOX.csv

Searching for data in the database is now more convenient:

./search.sh INBOX.db 's3cr3t'
./search.sh INBOX.db 'password' -c 10 -o 20
./search.sh INBOX.db 'password' -m 'admin'

Searching GUI (enterprise)

Depends:

sudo apt install nodejs npm openjdk-17-jre
cd www && npm install
sudo npm install -g bower && bower install && mv bower_components static
wget wget https://artifacts.opensearch.org/releases/bundle/opensearch/2.11.0/opensearch-2.11.0-linux-x64.tar.gz -O /tmp/opensearch.tar.gz && tar xvf /tmp/opensearch.tar.gz -C /opt/
JAVA_LIBRARY_PATH=/opt/opensearch/plugins/opensearch-knn/lib /opt/opensearch/opensearch-tar-install.sh

Searching for data using opensearch:

JAVA_LIBRARY_PATH=/opt/opensearch/plugins/opensearch-knn/lib /opt/opensearch/bin/opensearch
./opensearch.py localhost:9200 -i test -init
./opensearch.py localhost:9200 -i test -import INBOX.csv
cd www && node index.js
chrome http://localhost:8080/test/

Continuous crawling (your Google in local network) - just use a few easy cron scripts cron/README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawling

Installation

System

Docker

SMB crawling

Web crawling

FTP crawling

Searching

Searching CLI (pentesters)

Searching GUI (enterprise)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
bin		bin
crawlme		crawlme
cron		cron
img		img
www		www
Dockerfile		Dockerfile
README.md		README.md
crawl.sh		crawl.sh
imap.sh		imap.sh
import.sh		import.sh
opensearch.py		opensearch.py
save_images.sh		save_images.sh
search.sh		search.sh
spider.sh		spider.sh

s0i37/crawl

Folders and files

Latest commit

History

Repository files navigation

Crawling

Installation

System

Docker

SMB crawling

Web crawling

FTP crawling

Searching

Searching CLI (pentesters)

Searching GUI (enterprise)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages