github

TYVRNET / SRL

  • среда, 31 августа 2016 г. в 03:14:23
https://github.com/TYVRNET/SRL

PHP
Simple Regex Language



Simple Regex Language

codecov Build Status

We all know Regular Expressions are hard to read. Once written you're happy if you never ever have to touch this line of code again because you're going to have a hard time understanding what you've written there.

Before we get started, a short note on how to use SRL: You can either use this project directly, or, if you're not into PHP or including a library like that, you can build your query online and use the generated Regular Expression elsewhere:

https://simple-regex.com/build

An Example

Regular Expressions don't have to be bulky? - No, they don't! Just have a look at this:

begin with any of (digit, letter, one of "._%+-") once or more,
literally "@",
any of (digit, letter, one of ".-") once or more,
literally ".",
letter at least 2,
must end, case insensitive

Or, if you like, a implementation in code itself:

$query = SRL::startsWith()
    ->anyOf(function (Builder $query) {
        $query->digit()
            ->letter()
            ->oneOf('._%+-');
    })->onceOrMore()
    ->literally('@')
    ->anyOf(function (Builder $query) {
        $query->digit()
            ->letter()
            ->oneOf('.-');
    })->onceOrMore()
    ->literally('.')
    ->letter()->atLeast(2)
    ->mustEnd()->caseInsensitive();

Yes, indeed, both examples are definitely longer than the corresponding regular expression:

/^([A-Z0-9._%+-])+@[A-Z0-9.-]+\.[A-Z]{2,}$/i

But, however, the above is quite better to read and definitely better to maintain, isn't it? And to top that off: It's much harder to forget to escape something like a dot in SRL.

Let's go through it real quick:

  1. First, we require the matching string to start. This way, we make sure the match won't begin in the middle of something.
  2. Now, we're matching either a digit, a letter, or one of the literal characters ., _, %, + or -. We expect there to be one or more of them.
  3. We now expect exactly one @ - Looks like an email address.
  4. Again, either digits, letters or . or -, once or multiple times.
  5. A dot. Seems to be the end of the TLDs name
  6. To the end, we'll expect two or more letters, for the TLD.
  7. We require the string to end now, to avoid matching stuff like invalid@email.com123.
  8. And of course, all of that should be case insensitive, since it's an email address.

Features

Using the Language

Above you can see two examples. The first one uses the language itself, the second one the Query Builder. Since using a language is more fluent than a builder, we wanted to make things as easy as possible for you.

$srl = new SRL('literally "colo", optional "u", literally "r"');
preg_match($srl, 'color') // 1
$srl->isMatching('colour') // true
$srl->isMatching('soup') // false

Everything below applies to both, the SRL itself and the Query Builder.

Matching

SRL is as simple as the example above states. To retrieve the built Regular Expression which can be used by external tools like preg_match, either use the ->get() method, or just let it cast to a string:

preg_match($query, 'sample@email.com');

Of course, you may use the builtin match methods for an even easier approach:

$query->isMatching('sample@email.com'); // true
$query->isMatching('invalid-email.com'); // false

Capture Groups

Since regular expressions aren't only used for validation, capturing groups is supported by SRL as well. After defining the Regular Expression just like before, simply add a capture-group which will match the query defined in the lambda function. Optionally, a name for that capture group (color) can be set as well:

// Using SRL
$regEx = new SRL('literally "color:", whitespace, capture (letter once or more) as "color", literally "."');

// Using the query builder
$regEx = SRL::literally('color:')->whitespace()->capture(function (Builder $query) {
    $query->letter()->onceOrMore();
}, 'color')->literally('.');

$matches = $regEx->getMatches('Favorite color: green. Another color: yellow.');

echo $matches[0]->get('color'); // green
echo $matches[1]->get('color'); // yellow

Each match will be passed to a SRL\Match object, which will return the matches found.

Additional PCRE functions

Feel free to use all the available PCRE PHP functions in combination with SRL. Although, why bother? We've got wrappers for all common functions with additional features. Just like above, just apply one of the following methods directly on the SRL or Builder:

  • isMatching() - Validate if the expression matches the given string.
  • getMatches() - Get all matches for supplied capture groups.
  • getMatch() - Get first match for supplied capture groups.
  • replace() - Replace data using the expression.
  • split() - Split string into array through expression.
  • filter() - Filter items using the expression.

Lookarounds

In case you want some regular expressions to only apply in certain conditions, lookarounds are probably what you're searching for.

With queries like:

// SRL:
new SRL('capture (literally "foo") if followed by (literally "bar")');

// Query Builder:
SRL::capture(function (Builder $query) {
    $query->literally('foo');
})->ifFollowedBy(function (Builder $query) {
    $query->literally('bar');
});

you can easily capture 'foo', but only if this match is followed by 'bar'.

But to be honest, the Query Builder version is quite much code for such a simple thing, right? No problem! Not only are we supporting anonymous functions for sub-expressions, strings and Builder objects are supported as well. Isn't that great? Just have a look at one possible example:

SRL::capture('foo')->ifFollowedBy(SRL::literally('bar'));

If desired, lookbehinds are possible as well. Using ifAlreadyHad() you can validate a certain condition only if the previous string contained a specific pattern.

Performance

The built Regular Expression will be cached, so you don't have to worry about it being created every time you call the match-method. And, since it's a normal Regular Expression under the hood, performance won't be an issue.

Of course, building the expression may take some time, but in real life applications this shouldn't be noticeable. But if you like, you can build the expression somewhere else and just use the result in your app. If you do that, please keep the code for that query somewhere and link to it, otherwise the Regular Expression will be unreadable just as before.

Usage

Add the package to your require section in the composer.json-file and update your project.

"require": {
    "tyvrnet/srl": "0.1.x-dev"
}
composer update

Things to do

We're definitely not done yet. There's much to come. A short list of stuff that's planned would contain:

  • More functionality
  • More documentation
  • Variable support
  • Rule the world

License

SRL is published under the MIT license. See LICENSE for more information.

Contribution

Like this project? Want to contribute? Awesome! Feel free to open some pull requests or just open an issue.