Writers

The Nerdary

Finally, a place where web developers blog.

PREG_SPLIT_DELIM_CAPTURE!

By Mark Huot August, 26th 2010

I find myself splitting strings quite often. While some may argue that you can use regular expressions to achieve, more or less, the same thing, I find that I often need to iterate over bits of a string with procedural code and stuffing that all into a regex is just not feasible. For example take the following string:

type_id = 4 AND location_id='6' AND assigned=3

What I’d love to do is work through this bit of pseudo SQL and automatically prefix each field with a table alias. Ideally the end result would be something like:

table.type_id = 4 AND table.location_id='6' AND joined_table.id=3

Now, just for fun, notice the last field is not attached to our primary table, but a joined table. To do this I will need to check each field and add the table alias if the field exists in our primary table or add a joined alias if the field exists in another table.

The regular expression for this would look something like:

/(\S)(\s*(?:<=|<|>=|>|=|LIKE|NOT LIKE|IN|NOT IN)\s*?)['"]?\w['"]?/ie

But even that doesn’t work because it required all the values to be in quotes and that’s not always guaranteed in this situation. So, instead I resort to splitting the string. The first split I’ll do is on my logical operators (AND||OR) like so:

preg_split('/\s+(AND|OR)\s+/i', $str, -1);

Pretty standard, and gives us:

[0] => type_id = 4
[1] => location_id='6'
[2] => assigned=3

Awesome, right?

Sort of, it’s lost the AND operator and when I go to reassemble this string I’ll have no idea if I should put it back together with ANDs or ORs. Here’s where the PREG_SPLIT_DELIM_CAPTURE flag comes in, run that same script like so:

preg_split('/\s+(AND|OR)\s+/i', $str, -1, PREG_SPLIT_DELIM_CAPTURE);

And you’re now left with:

[0] => type_id = 4
[1] => AND
[2] => location_id='6'
[3] => AND
[4] => assigned=3

Now when I’m reassembling things I can reassemble it with the proper operator.

With that, I’ll leave you with the little loop I’m using to operate on this string, it’s not perfect and doesn’t include parentheses yet, but it’ll do…

$first_split = preg_split('/\s+(AND|OR)\s+/i', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach ($first_split as $split) {
    $second_split = preg_split('/\s*(<=|<|>=|>|=|LIKE|IN)\s*/i', $split, -1, PREG_SPLIT_DELIM_CAPTURE);
    print_r($second_split);
}

That leaves us with:

[0] => Array
    [0] => type_id
    [1] => =
    [2] => 4
[1] => AND
[2] => Array
    [0] => location_id
    [1] => =
    [2] => '6'
[3] => AND
[4] => Array
    [0] => assigned
    [1] => =
    [2] => 3
Bookmark and Share

Comments

  • Gil Gomez said…

    Exactly what I was looking for.  You rock Mark!

    Posted at 10:59 AM on October 11, 2011

Post a Comment