Extracting one record from multiple-records textfile

Extracting one record from multiple-records textfile

Post by tom » Thu, 29 Nov 2007 10:37:58



Sure, but you probably can do the job more simply. You could collect a
line at a time until you've reach the end of a document, then pass the
whole document to the subroutine at once. I'm imagining a program
looking roughly like this:

my @document;
while (<INPUT>) {
push @document, $_ if @document or /DOC-START/;
if (/DOC-END/) {
&process(@document); # or whatever you need
@document = (); # empty again
}
}
warn "unexpected EOF" if @document;

Good luck with it!

--Tom Phoenix
Stonehenge Perl Training
 
 
 

Extracting one record from multiple-records textfile

Post by pang » Thu, 29 Nov 2007 10:44:30

Hello,

Just show another way to do it.

use strict;

local $/="\n\n";
while(<DATA>) {
next unless /^DOC-START\n(.*?)\nDOC-END$/sm;
my $content = $1;
parse($content);
}

sub parse {
my $c = shift;
print length($c),"\n";
}


__DATA__
DOC-START
content of the document
DOC-END

DOC-START
some text
DOC-END

DOC-START
rtreytgfbvb
DOC-END

DOC-START
sdfdf fdff ee
DOC-END

__END__

 
 
 

Extracting one record from multiple-records textfile

Post by krahn » Thu, 29 Nov 2007 11:07:53


Hello,



# Set the Input Record Separator
$/ = "\nDOC-END\n";

while ( my $doc = <FILE_HANDLE> ) {
parsing function( $doc );
}




John
--
use Perl;
program
fulfillment
 
 
 

Extracting one record from multiple-records textfile

Post by giuseppega » Fri, 30 Nov 2007 07:05:27


Thank you all for your answers. In particular, Jeff, I'm trying to
use your code.. what does
do?

I've noticed that if there are some newlines between DOC_START, text,
and DOC_END, as in

DOC_START


gfghdfghdfghfgh

DOC_END

by doing

local $/="\n\n\n";

I get output (if I don't do that $content is empty). Can you tell me
why?

Regards Giuseppe
 
 
 

Extracting one record from multiple-records textfile

Post by krahn » Fri, 30 Nov 2007 09:46:31


It sets the Input Record Separator to the string "\n\n" instead of the
default "\n".

perldoc perlvar



Because you are changing the Input Record Separator and you have empty
lines inside the document. Chose one of the other methods posted.



John
--
use Perl;
program
fulfillment