Use grep to match files not containing a string?

Use grep to match files not containing a string?

Post by luca » Sat, 18 Oct 2008 07:41:00


Hello everybody,
I am trying to use regular expressions with egrep to do something
like:

string1 and string 2 but not string3

while searching in a list of files.

I have seen that the "|" operator implements a logical OR, but what
about a logical AND? I ask that because my idea was creating an
expression like

string1 | string2 "&" ^string3 <------ "&" is just a dummy
symbol to get the idea.

How can I do that using regular expressions?

Many thanks for your help

Luca P. Lavorante
 
 
 

Use grep to match files not containing a string?

Post by Janis Papa » Sat, 18 Oct 2008 07:49:09


You should clarify the semantics of the 'AND' operator. Match on the
same line or anywhere in the file? The same for the NOT operator. And
don't forget to define the requested associativity of the operators.

If the pattern shall match if all subpatterns are on the same line, a
typical quick way to achieve that is

egrep 'string1.*string2|string2.*string1' | grep -v string3

If the subpatterns string1..3 can be on different lines of the file
things get complicated; then I'd use awk to solve that task.

Janis

 
 
 

Use grep to match files not containing a string?

Post by luca » Sat, 18 Oct 2008 08:01:52

Hi Janis, many thanks for your quick reply!

Well, I have to match the strings anywhere in the file, so I end up
having to use awk.
Would you please send me an example of how you would do that with awk?
Can a single awk command line do the job or do I have to write a
program? And which awk command should I use?

Thank yoy again

Luca

On Oct 16, 7:49 pm, Janis Papanagnou < XXXX@XXXXX.COM >
 
 
 

Use grep to match files not containing a string?

Post by Janis Papa » Sat, 18 Oct 2008 08:38:07


[Please don't top-post!]


Whatever command you like. Here's one possibility that might work
for non-binary files if you want to evaluate the return code...

awk 'BEGIN{RS=SUBSEP}
/string1/&&/string2/&&!/string3/{rc=1}END{exit(rc)}' your_file

Or if you want to print the matching filenames...

awk 'BEGIN{RS=SUBSEP}
/string1/&&/string2/&&!/string3/{print FILENAME}' your_files...


Janis
 
 
 

Use grep to match files not containing a string?

Post by luca » Sat, 18 Oct 2008 10:46:21

On Oct 16, 9:38m, Janis Papanagnou < XXXX@XXXXX.COM >


> wk 'BEGIN{RS=SUBSEP} >> /string1/&&/string2/&&!/string3/{rc=1}END{exit(rc)}' your_f>le> >
> Or if you want to print the matching filenames>..> >
> wk 'BEGIN{RS=SUB>EP}
> /string1/&&/string2/&&!/string3/{print FILENAME}' your_>il>s...
> >> >an>s >>> >
>
> > Thank y>y >g>in
>
> > gt;gt;Luca
>
> > On Oct 16, 7:49 pm, Janis<Papanagnou
>
> >>l>ca>w>>>e:
>
> >>>Hello >v>>>body,
> >>>I am trying to use regular expressions with egrep to do>s>>>thing
>
> >>> string1 and string 2 but >ot>s>>>ng3
>
> >>>while searching in a lis> o> >>les.
>
> >>You should clarify the semantics of the 'AND' operator. M>t>> on the
> >>same line or anywhere in the file? The same for the NOT op>r>>or. And
> >>don't forget to define the requested associativity of the>op>r>>ors.
>
> >>If the pattern shall match if all subpatterns are on the s>m>>line, a
> >>typical quick way to achi>ve>t>>t is
>
> >> egrep 'string1.*string2|string2.*string1' | gre> -> >>ring3
>
> >>If the subpatterns string1..3 can be on different lines>o>>the file
> >>things get complicated; then I'd use awk to solv> t>a>>task.
>
> >>>I have seen that the "|" operator implements a logical >R>>>ut what
> >>>about a logical AND? I ask that because my idea was>c>>>ting an
> >>>exp>es>i>>>like
>
> >>> tring1 | string2 <&" ^string3 <------ "&" >s>>>st a dummy
> >>>symbol t> g>t>>>e idea.
>
> >>>How can I do that using regul>r >x>>>ssions?
>
> >>>Many thank> f>r>>>ur help
>
> >>> uca P. Lavorante

Hi Janis, the awk command worked perfectly!

Once again, thanks for your help,

Luca
 
 
 

Use grep to match files not containing a string?

Post by Ed Morto » Sat, 18 Oct 2008 20:12:36


For the OP: the above is fine unless your files are very large, in which case
you probably won't want to read the entire file into memory and you'll want to
exit ASAP:

awk '/string1/{f1=1} /string2/{f2=1} /string3/{f3=1;exit}
END{exit(f1&&f2&&!f3)}' your_file

awk 'FNR==1{if(f1&&f2&&!f3) print fname; f1=f2=f3=0; fname=FILENAME}
/string1/{f1=1} /string2/{f2=1} /string3/{f3=1;nextfile}
END{if(f1&&f2&&!f3) print fname}' your_files

Remove the ";nextfile" if you aren't using GNU awk.

Regards,

Ed.
 
 
 

Use grep to match files not containing a string?

Post by luca » Sun, 19 Oct 2008 04:47:46


Thanks Ed. My files are not large. But I wonder if I could perform a
recursive search with awk.
Can I do that?

Thank you

Luca
 
 
 

Use grep to match files not containing a string?

Post by Janis Papa » Sun, 19 Oct 2008 06:05:59


If you mean to recursively traverse the filesystem and apply the
awk program to a subset of files in the file hierarchy; yes, that
is possible but not the domain of awk, use find(1) and xargs(1)
for that. Something like... (see manual pages for details)

find <dir> <selection-criteria> | xargs awk '...'


Janis
 
 
 

Use grep to match files not containing a string?

Post by Ed Morto » Sun, 19 Oct 2008 22:04:38

n 10/17/2008 2:47 PM, luca wrote:

See Janis' response for, I think, the most likely interpretation of your
question. Alternatively if you want to, say, recusrively include files and
search for some pattern in the result, then you could do the following:

This script will not only expand all the lines that say "include subfile", but
by writing the result to a tmp file, resetting ARGV[1] (the highest level input
file) and not resetting ARGV[2] (the tmp file), it then lets awk do any normal
record parsing on the result of the expansion since that's now stored in the
tmp file. If you don't need that, just do the "print" to stdout and remove any
other references to a tmp file or ARGV[2].

awk 'function read(file) {
while ( (getline < file) > 0) {
if ($1 == "include") {
read($2)
} else {
print > ARGV[2]
}
}
close(file)
}
BEGIN{
read(ARGV[1])
ARGV[1]=""
close(ARGV[2])
}1' a.txt tmp

The result of running the above given these 3 files in the current directory:

a.txt b.txt c.txt
----- ----- -----
1 3 5
2 4 6
include b.txt include c.txt
9 7
10 8

would be to print the numbers 1 through 10 and save them in a file named "tmp".
Just change the final "1" to "/pattern/" to search for a pattern across all the
files.

Ed.