RegexHarvester is a command-line tool written in Go that allows you to search for and extract specific patterns from files within a directory. It uses regular expressions to identify and extract matches, making it a powerful tool for data mining and text processing tasks.
- Pattern Matching: Utilizes regular expressions to find and extract specific patterns from files.
- Directory Scanning: Scans all files within a specified directory (recursively).
- File Extension Filtering: Only processes files with a specified extension.
- Unique Matches: Ensures that only unique matches are returned.
- Sorting: Sorts the results alphabetically for easy readability.
To install RegexHarvester, you need to have Go installed on your system. Follow these steps:
-
Clone the repository:
git clone https://github.com/toxyl/regex-harvester.git
-
Navigate to the project directory:
cd regex-harvester
-
Build the project:
go build
-
Run the executable:
./regex-harvester
RegexHarvester requires three command-line arguments:
- File Extension: The extension of the files you want to process (e.g.,
eml
). - Directory: The directory containing the files you want to scan.
- Regular Expression: The regular expression pattern you want to match.
./regex-harvester eml /emails/ '\bfoo[bar|]\b'
This command will:
- Scan all
.eml
files in the/emails/
directory (recursively). - Extract and print all unique matches of the pattern
\bfoo[bar|]\b
.
The output will be a list of unique matches, sorted alphabetically, printed line by line.
Contributions are welcome! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request.
This project is released into the public domain under the UNLICENSE.