Pyparsing has a really nice feature that I want in PLY. I want to

specify a list of strings and have them converted to a regular

expression.

A Perl module which does an aggressively optimizing job of this is

Regexp::List -

http://www.yqcomputer.com/ ~dankogai/Regexp-Optimizer-0.15/lib/Regexp/List.pm

I really dont care if the expression is optimal. So the goal is

something like:

vowel_regexp = oneOf("a aa i ii u uu".split()) # yielding r'(aa|a|uu|

u|ii|i)'

Is there a public module available for this purpose?

I was about to start a thread about a very similar problem; hope I'm

not hijacking this thread. However I am interested in a solution that

(1) scales to a potentially large number of alternate strings

(hundreds or thousands), which will hit, sooner or later, some max

limit in the builtin regexps and (2) is not limited to strings but can

be used for arbitrary objects. Therefore I am looking for a different

approach, perhaps a specialization of the general regexp search

algorithm.

More formally, given (a) an input sequence I and (b) a set of

'censored' sequences S, find and remove all the sequences in S that

appear in I. For instance, if

S = set([(a,), (a,b), (a,c), (c,), (d,a)])

and

I = (x, a, b, c, y, d, a, b), the filtered sequence would be:

O = (x, y, b)

i.e. the censored subsequences are (a,b), (c,), and (d,a).

As with regexps, if sequence `x` is a prefix of `y`, the longer

sequence `y` wins when both match (e.g. if (a,b) matches, it

supersedes (a,)).

For extra bonus, the algorithm should remove overlapping subsequences

too, i.e. for the input I above, the output would be (x, y) since both

(d,a) and (a,b) would match for (d,a,b).

With respect to complexity, I am mainly interested in len(S); len(I)

is small for my application, typically no more than 10. Of course, an

algorithm that scales decently in both len(S) and len(I) would be even

better. Any ideas or relevant links ?

George

PS: This is not a homework ;)

> >>>> I really dont care if theexpressionis optimal. So the goal is> > >>>> something lik>: >>

>>>> vowel_regexp = oneOf("a aa i ii u uu".split()) yielding r'(aa|>|uu>> > >> u>ii>i)'>>>

> >> Is there a public module available for this>pu>pose?

>

> Check Ka-Ping Yee's rx> m>dule:

>

> ttp://lfw.org/pyth<n/

Ok < http://www.yqcomputer.com/ ;thon/rxb.py>

suffers from the possibility of putting shorter match before longer

one:

def either(*alternatives):

options = []

for option in alternatives:

options.append(makepat(option).regex)

return Pattern('\(' + string.join(options, '|') +>'\)')

>lso, check PyPI>to see if

> someone has already updated rxb for use with re.

No one has - http://www.yqcomputer.com/ %3Aaction=search&term=rxb&submit=search

no results returned

yes, thank you very much:

import re

def oneOf(s):

alts = sorted(s.split(), reverse=True)

alts = [re.escape(s) for s in alts]

return "|".join(alts)

1. Need help creating regular expression for html lists...

2. Regular expression from end of another regular expression till end of line

3. Customizing Regular Expression Editor for Regular Expression Validator Control

4. Regular Expression Regex/Match fails if regular expression returns a null

5. Regular expression from end of another regular expression till end of line

6. Add custom regular expressions to the validation list of available expressions

7. regular expressions in javascript (was regular expressions in java)

9. Parsing (X)HTML with regular expressions, or not (was: Help with a regular expression)

10. C++ regular expression Vs. Perl regular Expression

11. Dynamically changing the regular expression of Regular Expression validator

12. Regular expression to match regular expressions?

13. Add custom regular expressions to the validation list of available expressions

14. Alternate Regular Expressions?

15. Regular Expressions to Split Lists Into Sub-Lists

4 post • Page:**1** of **1**