There's an implicit contradiction in what you're asking for. You want to
process the data by lines, but some lines are too big to be processed.
You're going to have to give up one of these. Before you make that decision,
however, you should ask yourself a couple of questions.
1) Is the OutOfMemory exception really being caused by an input line that is
too large? Will such lines be common or expected and must your program
defend against them? Are the lines supposed to be less than a particular
length such that a very long one constitutes an invalid input file?
2) How important is it that your data be processed by lines? Are you
scanning for something in particular? or are you just counting lines as you
go? Is each line parsed independently or scanned for data? As in part 1,
will there never be valid data after a particular length?
3) You say that reading and searching is too slow, but are you using a
BufferedReader? Also, what do you mean by "slow" as your tests that run
using readline simply fail with an exception, perhaps the file is so large
that "slow" is normal.
I would guess that, realistically, you're going to have to give up the idea
of processing data by lines in order to protect your program from input
files that consists of 2.4Gb of data with no carriage returns at all.
To do this you have to change your input system so that it is not line
oriented but that is uses some other structure such as words or phrases,
etc. You say you've tried but that it takes too much time to search for the
end of line. Consider this: the readline method must also search (stop at)
the end of line and if it can do it with reasonable performance so can
you--the answer is probably in how you buffer the data. I would recommend
you look at a design centered on reading (buffering) a large chunk and
tokenizing it according to whatever you're looking for. This tokenizer
would refill the buffer when it gets low and handle the two unpleasant cases
of a line (or whatever you're looking for) either spanning multiple blocks
or there being several within one block. It may be possible for your
tokenizer to read simlpy read a character a time from a BufferedReader and
for you to scan for what you're looking for.
Matt Humphrey XXXX@XXXXX.COM http://www.yqcomputer.com/