Skip to content
/ regex Public

Helper methods to cross-compile Unicode regular expressions in Haxe.

License

Notifications You must be signed in to change notification settings

skial/regex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

1c61d80 · Mar 28, 2023

History

13 Commits
Mar 28, 2023
Mar 2, 2020
Mar 2, 2020
Jun 25, 2020
Feb 25, 2020
Mar 28, 2023
Feb 24, 2020
Jun 25, 2020
Mar 28, 2023
Mar 28, 2023
Feb 25, 2020
Feb 25, 2020

Repository files navigation

regex

Helper methods to cross-compile Unicode regular expressions.

Note

Currently, all the code in this repo has been pulled out of the rxpattern rewrite.

The file/library regenerate.js is created by @mathiasbynens. Core functionality was ported to Haxe, see utf16/RangeUtil.hx.

Install

lix install gh:skial/regex

Dependencies

  • seri - Unicode blocks, scripts, classes & range information.
  • unifill - Haxe library for Unicode UTF{8/16/32} support

Tested Platforms

  • Tested ✅
  • Untested ➖
Php Python Java JVM C# Js/Node Interp Neko HashLink Lua CPP Flash

Usage

package ;

import be.Regex;

class Main {

    public static function main() {
        /**
            Prints either a regular expression category `\p{Ll}` or
            the range of codepoints.
        **/
        var Ll = Regex.category('Ll');
        /**
            Why `²-¹⁰-⁹`?
            `²-¹` are `\u00B2-\u00B9` and `⁰-⁹` are `\u2080-\u2089`, so if you 
            used `⁰-⁹` you would only include `⁰`, `⁴`, `⁵`, `⁶`, `⁷`, `⁸`, `⁹`.
            ---
            See https://codepoints.net/search?gc=No for more info.
        **/
        var term = '(' + Ll + Regex.pattern('[²-¹⁰-⁹]?') + ')';
        /**
            The `u` Unicode flag is required. If you skip it, you can
            get an exception on some targets.
        **/
        var repeat = Regex.pattern('(?:[ +]*)');

        var regexp = new EReg(term + repeat, 'u');
 
        /**
            For regexp engines that support categories:
            - (\p{Ll}[²-¹⁰-⁹]), (?:[ +]*)
            
            For those that don't:
            - _skipping afew so not to show 1900+ codepoints_:
            - [a-z\\xB5\\xDF-\\xF6\\xF8-\\xFF\\u0101\\u0103\\u0105...|\\uD83A[\\uDD22-\\uDD43]
            
        **/
        trace( term, repeat );

        trace( regexp.match("a⁴ + b³+c²") ); // true

        // a⁴ +
        trace( regexp.matched(0) );
    }

}