Skip to content

CLI Application written in Kotlin to generate strings from regular expression

License

Notifications You must be signed in to change notification settings

knok16/regrunch

Repository files navigation

Regrunch

CircleCI

Regrunch is a multiplatform Kotlin library to:

  • parse regular expressions;
  • convert regular expressions into finite automata;
  • list strings accepted by finite automata;
  • answer to some decisions properties of finite automata.

Plus on its base CLI application is built to generate strings from regex pattern that can be used for wordlist generation.

Regrunch CLI

Regrunch is a wordlist generator where you can specify a regex to be used in generating the wordlists. The wordlists are created through converting passed regex into deterministic finite automata and extracting all string that it accepts. Additionally, to regex you can determine the amount of characters and set of characters.

How to build

Prerequisites

Regrunch uses gradle as build tool. Gradle wrapper goes with project, as result it requires only a Java JDK version 8 or higher to be installed.

Build Instruction

git clone --depth 1 --branch latest-release https://github.com/knok16/regrunch.git
cd regrunch
./gradlew linkReleaseExecutableNative
cp ./build/bin/native/releaseExecutable/regrunch.kexe regrunch
./regrunch --help

Basic help

$ ./regrunch --help   
Usage: regrunch [OPTIONS] REGEX

  Generate strings from regex

Options:
  -m, --max-length INT             Limits length of strings generated from
                                   regex, required in case regex defines
                                   infinite amount of strings
  -a, --alphabet [ascii|ascii-printable|ascii-digits]
                                   Symbols set used for string generation
  -s, --symbols TEXT               Symbols used for string generation
  -i, --case-insensitive / --case-sensitive
                                   Enables/disables i (case insensitive) flag
                                   for regex
  -h, --help                       Show this message and exit

Arguments:
  REGEX  Regex to generate strings

Supported regex constructs

Name Example Is supported Note
Concatenation abc Yes
Union a|b, a|, |a Yes
Quantifiers
Kleene star a* Yes
Plus quantifier (1 or more) a+ Yes
Question mark (0 or 1) a? Yes
Exact repeats a{2}, a{3,}, a{4,6} Yes
Lazy quantifiers a*?, a+?, a??, a{2}?, a{3,}?, a{4,6}? Yes
Possessive quantifiers a*+, a+=, a?+, a{2}+, a{3,}+, a{4,6}+ No
Shorthand character classes
Dot character . Yes
Digit character \d Yes
Non-digit character \D Yes
Whitespace character \s Yes
Non-whitespace character \S Yes
Word character \w Yes
Non-word character \W Yes
Codepoint from unicode category/script/block \p{Letter}, \p{Sm}, \p{Thai}, \p{InLimbu} No
Codepoint not from unicode category/script/block \P{M}, \P{Control}, \P{Greek}, \P{InLao} No
Any unicode grapheme \X No
Hexadecimal notation \x3e Yes
Hexadecimal notation \x{003E} No
Unicode notation \u003e Yes
Octal notation \044 No \044 will be treated as simple character concatenation 044
Control character notation \cA, \cb,\cf Yes
\ escape \\ Yes
Non-printable character
Tab character (0x09) \t Yes
Carriage return character (0x0D) \r Yes
Line feed character (0x0A) \n Yes
Vertical tab \v Yes
Horizontal tab \h Yes
Bell character (0x07) \a Yes
Backspace character (0x08) \b Yes \b correspond to backspace character only when used in character class notation: [\b]
Escape character (0x1B) \e Yes
Form feed character (0x0C) \f Yes
Character classes [0248] Yes
Negation in character classes [^0248] Yes
Ranges in character classes [0-36-9b-d] Yes
Shorthands in character classes [ab\d] Yes
Character class subtraction [0-9-[0-6-[0-3]]] No
Character class intersection [0-9&&[0-6]&&[4-9]] No
Capturing groups (\d\d)-(\d\d) Yes
Non capturing groups (?:a|b)c Yes
Lookahead groups a(?=b), a(?!b) No
Lookbehind groups (?<=a)b, (?<!a)b No
Atomic groups a(?>bc|b)c No
Anchors No Parser will parse regular expression properly, but string generator will report that it does not support anchors
Start and end of line $abc^, No
Start and end of string \Aabc\Z No
End of string only abc\z No
Previous match \Gabc No
Word boundary .+\b.+ No \b correspond to backspace character when used in character class notation: [\b]
Non word boundary .+\B.+ No

Examples of usage

Use-case Regular expression CLI command Output
All 3 digit combinations \d{3} ./regrunch '\d{3}' 000
001
002
...
998
999
All 3 digit combinations [0-9]{3} ./regrunch '[0-9]{3}' 000
001
002
...
998
999
All 3 digit combinations of arabic digits [٠-٩]{3} ./regrunch '[٠-٩]{3}' ٠٠٠
٠٠١
٠٠٢
...
٩٩٨
٩٩٩
Georgian alphabet [ა-ჰ] ./regrunch '[ა-ჰ]'
All positive 3 digit numbers [1-9][0-9]{0,2} ./regrunch '[1-9][0-9]{0,2}' 1
10
100
101
...
998
999
All positive 3 digit numbers [1-9][0-9]* ./regrunch --max-length 3 '[1-9][0-9]*' 1
10
100
101
...
998
999
All ip addresses in 192.168.0.0/16 192\.168(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){2} ./regrunch '192\.168(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){2}' 192.168.0.0
192.168.0.1
192.168.0.10
192.168.0.100
...
192.168.220.240
...
192.168.99.98
192.168.99.99
All 6 bit masks [01]{6} ./regrunch '[01]{6}' 000000
000001
000010
000011
...
111110
111111
All words with 4, 5 and 6 letters followed by 2 digits and 1 of special characters !, % or # [a-zA-Z]{4,6}\d\d[!%#] ./regrunch '[a-zA-Z]{4,6}\d\d[!%#]' AAAA00!
AAAA00#
AAAA00%
AAAA01!
...
zzzzzz99#
zzzzzz99%
All combination of character cases for word 'taylor' followed by 2 digits and exclamation mark [Tt][Aa][Yy][Ll][Oo][Rr][0-9]{2}! ./regrunch '[Tt][Aa][Yy][Ll][Oo][Rr][0-9]{2}!' TAYLOR00!
TAYLOR01!
TAYLOR02!
TAYLOR03!
...
taYLOr55!
...
taylor98!
taylor99!
All combination of character cases for word 'taylor' followed by 2 digits and exclamation mark taylor[0-9]{2}! with i flag ./regrunch -i 'taylor[0-9]{2}!' TAYLOR00!
TAYLOR01!
TAYLOR02!
TAYLOR03!
...
taYLOr55!
...
taylor98!
taylor99!

About

CLI Application written in Kotlin to generate strings from regular expression

Topics

Resources

License

Stars

Watchers

Forks

Languages