Regular expression to match only strings NOT containing particular words

Regular expression to match only strings NOT containing particular words

Post by Dylan Nich » Sat, 20 Oct 2007 14:00:28


I can write a regular expression that will only match strings that are
NOT the word apple:

^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$

But is there a neater way, and how would I do it to match strings that
are NOT the word apple OR banana? Then what would be needed to match
only strings that do not CONTAIN the word "apple" or "banana" or
"cherry"?

I'd love it if the following worked:

^[^(apple)(banana)(cherry)]*$

But it appears the parantheses are ignored, as

^[(apple)(banana)(cherry)]*$

simply matches any string that consists entire of the characters
a,b,c,e,h,l,n,r,p & y.
 
 
 

Regular expression to match only strings NOT containing particular words

Post by Jgen Exne » Sat, 20 Oct 2007 14:03:23


!(/apple/ or /banana/ or /cherry/)

jue

 
 
 

Regular expression to match only strings NOT containing particular words

Post by Stephane C » Sat, 20 Oct 2007 17:03:59

2007-10-18, 22:00(-07), Dylan Nicholson:

With perl regexps:

perl -ne 'print if /^(?:(?!apple|banana).)*$/'
or probably better:
perl -ne 'print if /^(?!.*(?:apple|banana))/'

But then, why not

perl -ne 'print if !/apple|banana/'

Note that vim's regexps have an equivalent negative look-ahead
operator.

--
Sthane
 
 
 

Regular expression to match only strings NOT containing particular words

Post by Michele Do » Sat, 20 Oct 2007 20:11:29

On Thu, 18 Oct 2007 22:00:28 -0700, Dylan Nicholson



The general answer is that you should use separate regexen and logical
operators, or an explicit !~ but the subject of negating regexen is
discussed to some depth in the following thread @ PM:

http://www.yqcomputer.com/


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
 
 
 

Regular expression to match only strings NOT containing particular words

Post by Jgen Exne » Sun, 21 Oct 2007 01:40:25


Actually, coming to think of it: there is no good reason to use a RE in the
first place because you are looking for a literal substring only without any
of the meta-functionality of REs. The proper tool for that much simpler task
is index().

jue
 
 
 

Regular expression to match only strings NOT containing particular words

Post by Dylan Nich » Sat, 27 Oct 2007 10:18:45


Sure, except the regular expression mechanism is already in place as a
feature of the application. I was just curious if it could be used to
solve a particular problem.

Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
Microsoft's .NET regex library.

Thanks anyway,

Dylan
 
 
 

Regular expression to match only strings NOT containing particular words

Post by Jgen Exne » Sat, 27 Oct 2007 11:18:55


And index() is a function of native Perl itself.


Well, it is native Perl code, no need for some .Net regex library.

jue
 
 
 

Regular expression to match only strings NOT containing particular words

Post by Jesse Houw » Mon, 29 Oct 2007 04:45:49

Hello Dylan,






It isn't ideal, but this will do the trick:

^((?!\b(cherry|banana|apple)\b).)*$

Make sure you set the option SingleLine and unset the option Multiline when
appropriate. If the application is under your control, it would probably
be easier to add a checkbox which will invert the match result from Success
to fail.

Though as Jue pointed out, it's probably faster and easier to maintain when
you implement a "bad words" list and use indexOf to see if the string is
in there somewhere. You might even use \bword\b in a regex for that.

Jesse Houwing
jesse.houwing at sogeti.nl
 
 
 

Regular expression to match only strings NOT containing particular words

Post by Dylan Nich » Mon, 29 Oct 2007 06:04:53

On Oct 28, 6:45 am, Jesse Houwing < XXXX@XXXXX.COM >




Thanks...works great...why do you say it's not ideal? I removed the
\b's though, as I need to exclude any string that contains "apple",
regardless of whether it's a separate word.


Yes, we'll probably do something similar for the next version.

If the regex does the job, it's more than adequate for now.
 
 
 

Regular expression to match only strings NOT containing particular words

Post by Jesse Houw » Tue, 30 Oct 2007 09:02:42

Hello Dylan,






My guess is that is isn't the fastest solution.


Ok, didn't understand that from the original post. You can then also remove
the addiotional ()

^((?!cherry|banana|apple).)*$


Good. Glad I was of help.

Jesse Houwing
jesse.houwing at sogeti.nl