creating an (inefficent) alternating regular expression from a list of options

creating an (inefficent) alternating regular expression from a list of options

Post by Larry Bate » Wed, 10 Sep 2008 22:19:04



Perhaps I'm missing something but your function call oneOf(...) is longer than
than actually specifying the result.

You can certainly write quite easily:

def oneOf(s):
return "|".join(s.split())

-Larry
 
 
 

creating an (inefficent) alternating regular expression from a list of options

Post by skip » Wed, 10 Sep 2008 22:23:11


>> I really dont care if the expression is optimal. So the goal is
>> something like:

>> vowel_regexp = oneOf("a aa i ii u uu".split()) # yielding r'(aa|a|uu|
>> u|ii|i)'

>> Is there a public module available for this purpose?

Check Ka-Ping Yee's rxb module:

http://www.yqcomputer.com/

Needs translating from the old regex module to the current re module, but it
shouldn't take a lot of effort. You can find regex docs in old
distributions (probably through 2.1 or 2.2?). Also, check PyPI to see if
someone has already updated rxb for use with re.

Skip

 
 
 

creating an (inefficent) alternating regular expression from a list of options

Post by Marc 'Blac » Wed, 10 Sep 2008 22:39:20


I'd throw `re.escape()` into that function just in case `s` contains
something with special meaning in regular expressions.

Ciao,
Marc 'BlackJack' Rintsch
 
 
 

creating an (inefficent) alternating regular expression from a list of options

Post by Nick Craig » Wed, 10 Sep 2008 23:36:24


I wrote one of these in perl a while ago

http://www.yqcomputer.com/

It transforms the regular expression recursively into more efficient
ones. It uses regular expressions to do that. I considered porting
it to python but looking at the regular expressions made me feel weak
at the knees ;-)

--
Nick Craig-Wood < XXXX@XXXXX.COM > -- http://www.yqcomputer.com/
 
 
 

creating an (inefficent) alternating regular expression from a list of options

Post by Fredrik Lu » Thu, 11 Sep 2008 01:42:28


"|" works strictly from left to right, so that doesn't quite work since
it doesn't place "aa" before "a". a simple reverse sort will take care
of that:

return "|".join(sorted(s.split(), reverse=True))

you may also want to do re.escape on all the words, to avoid surprises
when the choices contain special characters.

</F>