Skip to content

Commit

Permalink
feat: regexp extension
Browse files Browse the repository at this point in the history
  • Loading branch information
nalgeon committed Feb 3, 2023
1 parent 2b14136 commit fcea3d8
Show file tree
Hide file tree
Showing 45 changed files with 45,822 additions and 25 deletions.
12 changes: 6 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ compile-linux:
gcc -fPIC -shared src/sqlite3-ipaddr.c -o dist/ipaddr.so
gcc -fPIC -shared src/sqlite3-json1.c -o dist/json1.so
gcc -fPIC -shared src/sqlite3-math.c -o dist/math.so -lm
gcc -fPIC -shared src/sqlite3-re.c src/re.c -o dist/re.so
gcc -fPIC -shared -DPCRE2_CODE_UNIT_WIDTH=8 -DLINK_SIZE=2 -DHAVE_CONFIG_H src/sqlite3-regexp.c src/regexp/regexp.c src/regexp/pcre2/*.c -o dist/regexp.so
gcc -fPIC -shared src/sqlite3-stats.c -o dist/stats.so -lm
gcc -fPIC -shared src/sqlite3-text.c -o dist/text.so
gcc -fPIC -shared src/sqlite3-unicode.c -o dist/unicode.so
Expand All @@ -41,7 +41,7 @@ compile-windows:
gcc -shared -I. src/sqlite3-fuzzy.c src/fuzzy/*.c -o dist/fuzzy.dll
gcc -shared -I. src/sqlite3-json1.c -o dist/json1.dll
gcc -shared -I. src/sqlite3-math.c -o dist/math.dll -lm
gcc -shared -I. src/sqlite3-re.c src/re.c -o dist/re.dll
gcc -shared -DPCRE2_CODE_UNIT_WIDTH=8 -DLINK_SIZE=2 -DHAVE_CONFIG_H -I. src/sqlite3-regexp.c src/regexp/regexp.c src/regexp/pcre2/*.c -o dist/regexp.dll
gcc -shared -I. src/sqlite3-stats.c -o dist/stats.dll -lm
gcc -shared -I. src/sqlite3-text.c -o dist/text.dll
gcc -shared -I. src/sqlite3-unicode.c -o dist/unicode.dll
Expand All @@ -59,7 +59,7 @@ compile-macos:
gcc -fPIC -dynamiclib -I src src/sqlite3-ipaddr.c -o dist/ipaddr.dylib
gcc -fPIC -dynamiclib -I src src/sqlite3-json1.c -o dist/json1.dylib
gcc -fPIC -dynamiclib -I src src/sqlite3-math.c -o dist/math.dylib -lm
gcc -fPIC -dynamiclib -I src src/sqlite3-re.c src/re.c -o dist/re.dylib
gcc -fPIC -dynamiclib -DPCRE2_CODE_UNIT_WIDTH=8 -DLINK_SIZE=2 -DHAVE_CONFIG_H -I src src/sqlite3-regexp.c src/regexp/regexp.c src/regexp/pcre2/*.c -o dist/regexp.dylib
gcc -fPIC -dynamiclib -I src src/sqlite3-stats.c -o dist/stats.dylib -lm
gcc -fPIC -dynamiclib -I src src/sqlite3-text.c -o dist/text.dylib
gcc -fPIC -dynamiclib -I src src/sqlite3-unicode.c -o dist/unicode.dylib
Expand All @@ -75,7 +75,7 @@ compile-macos-x86:
gcc -fPIC -dynamiclib -I src src/sqlite3-ipaddr.c -o dist/x86/ipaddr.dylib -target x86_64-apple-macos10.12
gcc -fPIC -dynamiclib -I src src/sqlite3-json1.c -o dist/x86/json1.dylib -target x86_64-apple-macos10.12
gcc -fPIC -dynamiclib -I src src/sqlite3-math.c -o dist/x86/math.dylib -target x86_64-apple-macos10.12 -lm
gcc -fPIC -dynamiclib -I src src/sqlite3-re.c src/re.c -o dist/x86/re.dylib -target x86_64-apple-macos10.12
gcc -fPIC -dynamiclib -DPCRE2_CODE_UNIT_WIDTH=8 -DLINK_SIZE=2 -DHAVE_CONFIG_H -I src src/sqlite3-regexp.c src/regexp/regexp.c src/regexp/pcre2/*.c -o dist/regexp.dylib -target x86_64-apple-macos10.12
gcc -fPIC -dynamiclib -I src src/sqlite3-stats.c -o dist/x86/stats.dylib -target x86_64-apple-macos10.12 -lm
gcc -fPIC -dynamiclib -I src src/sqlite3-text.c -o dist/x86/text.dylib -target x86_64-apple-macos10.12
gcc -fPIC -dynamiclib -I src src/sqlite3-unicode.c -o dist/x86/unicode.dylib -target x86_64-apple-macos10.12
Expand All @@ -91,7 +91,7 @@ compile-macos-arm64:
gcc -fPIC -dynamiclib -I src src/sqlite3-ipaddr.c -o dist/arm64/ipaddr.dylib -target arm64-apple-macos11
gcc -fPIC -dynamiclib -I src src/sqlite3-json1.c -o dist/arm64/json1.dylib -target arm64-apple-macos11
gcc -fPIC -dynamiclib -I src src/sqlite3-math.c -o dist/arm64/math.dylib -target arm64-apple-macos11 -lm
gcc -fPIC -dynamiclib -I src src/sqlite3-re.c src/re.c -o dist/arm64/re.dylib -target arm64-apple-macos11
gcc -fPIC -dynamiclib -DPCRE2_CODE_UNIT_WIDTH=8 -DLINK_SIZE=2 -DHAVE_CONFIG_H -I src src/sqlite3-regexp.c src/regexp/regexp.c src/regexp/pcre2/*.c -o dist/regexp.dylib -target arm64-apple-macos11
gcc -fPIC -dynamiclib -I src src/sqlite3-stats.c -o dist/arm64/stats.dylib -target arm64-apple-macos11 -lm
gcc -fPIC -dynamiclib -I src src/sqlite3-text.c -o dist/arm64/text.dylib -target arm64-apple-macos11
gcc -fPIC -dynamiclib -I src src/sqlite3-unicode.c -o dist/arm64/unicode.dylib -target arm64-apple-macos11
Expand All @@ -110,7 +110,7 @@ test-all:
make test suite=ipaddr
make test suite=json1
make test suite=math
make test suite=re
make test suite=regexp
make test suite=stats
make test suite=text
make test suite=unicode
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Think of them as the extended standard library for SQLite:
- [ipaddr](docs/ipaddr.md): IP address manipulation
- [json1](docs/json1.md): JSON functions
- [math](docs/math.md): math functions
- [re](docs/re.md): regular expressions
- [regexp](docs/regexp.md): regular expressions
- [stats](docs/stats.md): math statistics
- [text](docs/text.md): string functions
- [unicode](docs/unicode.md): Unicode support
Expand Down
2 changes: 2 additions & 0 deletions docs/re.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# re: Regular expressions in SQLite

**⛔️ This extension is deprecated. Use [regexp](regexp.md) instead.**

Regexp search and replace functions.
Adapted from [regexp.old](https://github.com/garyhouston/regexp.old) by Henry Spencer.

Expand Down
121 changes: 121 additions & 0 deletions docs/regexp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# regexp: Regular Expressions in SQLite

Regexp search and replace functions. Based on the [PCRE2](https://github.com/pcre2project/pcre2) engine, this extension supports all major regular expression features (see the section on syntax below).

Provides the following functions:

### `REGEXP` statement

Checks if the source string matches the pattern.

```
sqlite> select true where 'the year is 2021' regexp '[0-9]+';
1
```

### `regexp_like(source, pattern)`

Checks if the source string matches the pattern.

```
sqlite> select regexp_like('the year is 2021', '[0-9]+');
1
sqlite> select regexp_like('the year is 2021', '2k21');
0
```

### `regexp_substr(source, pattern)`

Returns a substring of the source string that matches the pattern.

```
sqlite> select regexp_substr('the year is 2021', '[0-9]+');
2021
sqlite> select regexp_substr('the year is 2021', '2k21');
(null)
```

### `regexp_replace(source, pattern, replacement)`

Replaces all matching substrings with the replacement string.

```
sqlite> select regexp_replace('the year is 2021', '[0-9]+', '2050');
the year is 2050
sqlite> select regexp_replace('the year is 2021', '2k21', '2050');
the year is 2021
```

Supports backreferences to captured groups `$1` trough `$9` in the replacement string:

```
sqlite> select regexp_replace('the year is 2021', '([0-9]+)', '$1 or 2050');
the year is 2021 or 2050
```

## Supported syntax

Basic expressions:

```
. any character except newline
a the character a
ab the string ab
a|b a or b
\ escapes a special character
```

Quantifiers:

```
* 0 or more
+ 1 or more
? 0 or 1
{n} exactly n
{n,m} between n and m
{n,} n or more
```

Groups:

```
(...) capturing group
(?:...) non-capturing group
(?>...) atomic group
\N match the Nth captured group
```

Character classes:

```
[ab-d] one character of: a, b, c, d
[^ab-d] one character except: a, b, c, d
\d one digit
\D one non-digit
\s one whitespace
\S one non-whitespace
\w one word character
\W one non-word character
```

Assertions:

```
^ start of string
$ end of string
\b word boundary
\B non-word boundary
(?=...) positive lookahead
(?!...) negative lookahead
```

## Usage

```
sqlite> .load ./regexp
sqlite> select regexp_like('abcdef', 'b.d');
```

[⬇️ Download](https://github.com/nalgeon/sqlean/releases/latest)
[✨ Explore](https://github.com/nalgeon/sqlean)
[🚀 Follow](https://twitter.com/ohmypy)
83 changes: 83 additions & 0 deletions src/regexp/pcre2/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
## PCRE2 LICENCE

PCRE2 is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.

Releases 10.00 and above of PCRE2 are distributed under the terms of the "BSD"
licence, as specified below, with one exemption for certain binary
redistributions. The documentation for PCRE2, supplied in the "doc" directory,
is distributed under the same terms as the software itself. The data in the
testdata directory is not copyrighted and is in the public domain.

The basic library functions are written in C and are freestanding. Also
included in the distribution is a just-in-time compiler that can be used to
optimize pattern matching. This is an optional feature that can be omitted when
the library is built.

## THE BASIC LIBRARY FUNCTIONS

Written by: Philip Hazel
Email local part: Philip.Hazel
Email domain: gmail.com

Retired from University of Cambridge Computing Service,
Cambridge, England.

Copyright (c) 1997-2022 University of Cambridge
All rights reserved.

## PCRE2 JUST-IN-TIME COMPILATION SUPPORT

Written by: Zoltan Herczeg
Email local part: hzmester
Email domain: freemail.hu

Copyright(c) 2010-2022 Zoltan Herczeg
All rights reserved.

## STACK-LESS JUST-IN-TIME COMPILER

Written by: Zoltan Herczeg
Email local part: hzmester
Email domain: freemail.hu

Copyright(c) 2009-2022 Zoltan Herczeg
All rights reserved.

## THE "BSD" LICENCE

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notices,
this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright
notices, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

* Neither the name of the University of Cambridge nor the names of any
contributors may be used to endorse or promote products derived from this
software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

## EXEMPTION FOR BINARY LIBRARY-LIKE PACKAGES

The second condition in the BSD licence (covering binary redistributions) does
not apply all the way down a chain of software. If binary package A includes
PCRE2, it must respect the condition, but if package B is software that
includes package A, the condition is not imposed on package B unless it uses
PCRE2 independently.

End
1 change: 1 addition & 0 deletions src/regexp/pcre2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Extracted from the [PCRE2-10.42](https://github.com/PCRE2Project/pcre2/releases/tag/pcre2-10.42) release.
Loading

0 comments on commit fcea3d8

Please sign in to comment.