CR-NL, NL and ftell

CR-NL, NL and ftell

Post by Martin Joh » Mon, 21 Feb 2005 13:49:26


Hello

When opening a CR-NL file, ftell returns the length of the file with the
CR-NL as two bytes, is it supposed to do so?

I am comparing two file-sizes, one CR-NL and one NL using ftell to get the
filesize. Any alternative suggestion is welcomed.

Thanks
- Martin Johansen
 
 
 

CR-NL, NL and ftell

Post by infobah » Mon, 21 Feb 2005 15:04:14


No. It is not ftell's job to return the length of a file.


What do you mean by file size? The number of disk clusters occupied by
the file, multiplied by the cluster size? The number of bytes that
while((ch = getc(fp)) != EOF) { ++count; } would count when the file
is opened in binary mode? Or text mode? The number of tape blocks the
file occupies multiplied by the tape block size?

Until the C world can agree on what "file size" means, there will
continue to be no standard way of finding out.

 
 
 

CR-NL, NL and ftell

Post by Chris Crou » Mon, 21 Feb 2005 21:09:21

On Sun, 20 Feb 2005 05:49:26 +0100, Martin Johansen



I assume that you did a seek to the end of the file first.

Yes, it can.

If the file is in binary mode, then yes, it will report the number of
characters in the file. This will be the number of characters which you
can read using getc() and counting them (which may be different from the
allocated space on disk or whatever). If it's in text mode, the only
thing guaranteed about ftell() is that it returns a value which can be
used later by fseek() to get to the same position, the value may have no
other relation to the size of the file at all.


If you actually want the filesize, you'll have to use operating system
specific functions (on many systems, look for stat() and fstat()). If
you want to know the number of characters which can be read from a file
opened in text mode, the only way is to read it and count them. You
can't even guarantee that the value will be less than that returned by
opening it in binary mode.

Chris C
 
 
 

CR-NL, NL and ftell

Post by infobah » Mon, 21 Feb 2005 23:40:43


"A binary stream need not meaningfully support fseek calls with a
whence value of SEEK_END."
 
 
 

CR-NL, NL and ftell

Post by Thomas Mat » Tue, 22 Feb 2005 03:09:36


The best method to find the size of a file is to use
a platform specific method; not very portable though.

To get the number of characters in a file
open the file in binary mode, which disables
translations, and read each character using fread
while incrementing a counter.

As others have said, the ftell function returns the
current position in the file, which may not reflect
the number of characters in the file.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.yqcomputer.com/ ~shiva/welcome.txt
C++ Faq: http://www.yqcomputer.com/ ++-faq-lite
C Faq: http://www.yqcomputer.com/ ~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.yqcomputer.com/
Other sites:
http://www.yqcomputer.com/ -- C++ STL Library book
http://www.yqcomputer.com/
 
 
 

CR-NL, NL and ftell

Post by Bart » Tue, 22 Feb 2005 12:16:48


..
..

I already know that feof does not work as one would expect.

Now I'm learning that ftell may not give an accurate file position and that
there is no guaranteed way to find the size of a normal disk file. You'd
think these are fairly basic file operations but C seems to want to make
life difficult.

Why doesn't feof work as expected, ie return True when positioned at
end-of-file? Is coding 'currfileposition' >= 'filesize' really that
difficult.

Why is it so awkward to get the size of a file anyway? What do cluster sizes
(on a modern OS) have to do with it?

And why bother with text mode with all it's pitfalls; reading or writing
cr-lf explicitly is not that hard is it?

Bart
 
 
 

CR-NL, NL and ftell

Post by Ben Pfaf » Tue, 22 Feb 2005 12:35:34

"Bart C" < XXXX@XXXXX.COM > writes:


The problem is that your suggested formulation requires feof() to
predict the future. If another process comes along and appends
to the file, then a read might succeed that you thought would
fail. If the stream is connected to an interactive device, such
as a keyboard, you'd have to know whether the user was going to
enter EOF next or not, and that's impossible in general.
--
"I don't have C&V for that handy, but I've got Dan Pop."
--E. Gibbons
 
 
 

CR-NL, NL and ftell

Post by Peter Nils » Tue, 22 Feb 2005 13:03:51


by

feof is no different to any other function, be it a standard function
or
not. It will do what its specification says it will do. No more, no
less.

If you program by guesswork and assumptions, without reading the
relevant specifications, then you can _expect_ programs to fail
in (often capricious) ways.


It will give an accurate file position in most cases, particularly
those
where you need it to.


That's because C does not assume 'normal disk files'.

want

No. C is just more generic than most people wish it to be. But if you
only ever want to program vanilla machines, then you are free to do so.


How often do you *need* it to? Think about this carefully before
answering.


What if you don't (and can't) know what filesize is?


Again, how often do you *need* it?


On some file systems of old, there was no recording of 'logical' file
size, merely 'physical' file size. Files were precisely as big as the
space occupied on the disk. C allows null bytes to fill the trailing
disk space.

Hence, fseek-ing to the end of a binary file doesn't always get you
to the 'logical' end.


Again, you show your naivity. Some old file systems didn't use end-
of-line characters. Instead, they used fixed width records, padded
with either null or space characters. A text mode is needed for C
programs to operate consistently across a range of file systems.

All that said, most of what you think you *need* is available in
POSIX. So there's no need to get too upset. ;)

--
Peter
 
 
 

CR-NL, NL and ftell

Post by Jack Klei » Tue, 22 Feb 2005 13:41:16

On Sun, 20 Feb 2005 17:15:06 -0000, SM Ryan
< XXXX@XXXXX.COM > wrote in
comp.lang.c:



True for files opened in text mode, but:

"The ftell function obtains the current value of the file position
indicator for the stream pointed to by stream. For a binary stream,
the value is the number of characters from the beginning of the file."

So for a binary file, it is not a magic cookie but an actual value, as
long as the current file position is within the range of a positive
signed long.

--
Jack Klein
Home: http://www.yqcomputer.com/
FAQs for
comp.lang.c http://www.yqcomputer.com/ ~scs/C-faq/top.html
comp.lang.c++ http://www.yqcomputer.com/ ++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.yqcomputer.com/ ~ajo/docs/FAQ-acllc.html
 
 
 

CR-NL, NL and ftell

Post by Jack Klei » Tue, 22 Feb 2005 13:46:43

On Sun, 20 Feb 2005 05:49:26 +0100, "Martin Johansen" < XXXX@XXXXX.COM >
wrote in comp.lang.c:


That depends. If the files are opened in binary mode, ftell() is
supposed to return a file position which is the exact number of
characters from the beginning of the file. There are no special
characters at all in binary mode, so most certainly every '\r' is
counted as a character whether or not it is immediately followed by a
'\n'.

But there is no way to guarantee that you are at the end of a binary
file unless you have read every single character in the file. fseek()
is not guaranteed to work for binary files in the way you expect.

For files opened in text mode, the value returned by ftell() is not
guaranteed to be useful for anything other than passing to fseek() to
return to the same point in the file. It need not have any
relationship to the size of the file in any meaningful way that your
program can use.


If two text files, containing one or more lines, differ in the fact
that one contains only "\n" at the end of each line and the other
contains "\r\n", then they are indeed different sizes. What do you
expect?

--
Jack Klein
Home: http://www.yqcomputer.com/
FAQs for
comp.lang.c http://www.yqcomputer.com/ ~scs/C-faq/top.html
comp.lang.c++ http://www.yqcomputer.com/ ++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.yqcomputer.com/ ~ajo/docs/FAQ-acllc.html
 
 
 

CR-NL, NL and ftell

Post by Eric Sosma » Tue, 22 Feb 2005 22:53:56


Yes. Keep in mind that feof() et al. operate on FILE*
streams, which may be connected to data sources (and sinks)
that are not fixed-size files. Explain, if you will, how
you would implement a "predictive" feof() on a stream taking
data from a TCP/IP socket, or even from your keyboard.


"There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy."

Your experience with different file formats is clearly
not very extensive. Here are a few of the byte sequences
you might find in a file after puts("Hello") -- all of these
are from my own experience and none is a fabrication, although
I may have mis-remembered a detail here and there:

H e l l o \n
H e l l o \r
H e l l o \r \n
\005 \000 H e l l o \000
H e l l o \040 \040 \040... (75 spaces all told)
\006 \000 \001 H e l l o
H e l l o \n \032... (plus 121 garbage characters)

Thought question: Would you prefer to learn all the rules of
these (and many other) file formats and write that knowledge
into all your programs, or would it make more sense to use a
text stream to mediate between these and a standardized format?

<off-topic>

In a way, your inexperience can be seen as a Good Thing
and a sign of progress: You have seen few file formats because
the industry has learned to value simplicity, and inventing
strange new formats is not the cottage industry it once was.
In truth, simplicity is not the answer to all things; complex
formats have their purposes and can handle some circumstances
better than simple alternatives. Yet, it has turned out that
the simple designs are more widely applicable than was thought
(in large part because today's computers can afford to spend
more memory and processing power on interpreting them), so the
more complex formats are marginalized to the special purposes
where their strengths are indispensable. The casual and even
not-so-casual user encounters only the simple formats, and
begins to believe no others exist.

May I introduce you to my pet crow, "Whitey?"

</off-topic>

--
Eric Sosman
XXXX@XXXXX.COM
 
 
 

CR-NL, NL and ftell

Post by Bart » Wed, 23 Feb 2005 00:52:29


I wouldn't. I would treat disk files differently from devices such as
keyboards or i/o ports. The two kinds of data are different enough to
warrant a separate set of functions

You may want to use C to implement another language and to emulate the
behaviour of that language's equivalent of feof(). C, being general purpose,
should be up to the job but sometimes it's not that easy.

I also found some time back on this newgroup that reading a single key from
the keyboard was not part of standard C! This is a problem I remember from
mainframes in the 70s. It went away with microcomputers in the 80s, and now
with C it's come back again. 2 major revisions of the C standard and
something so basic is not in?


My specs for puts() say that '\n' is appended after the string argument.
Whether that means cr-lf, cr or lf I'm not sure, but it's best to assume any
of these when reading such a file. If you're getting all this extra garbage
after your data (I don't mean padding bytes to fill up a disk sector) then
I'd complain.


I've invented plenty of file formats. But to the OS or the C runtime, my
file should be just a bunch of data, namely a set of N bytes. And a text
file is set of bytes sprinkled with cr and/or lf characters. The total size
of N bytes should (naturally) include those characters.

If the OS, disk controller, modem, whatever wants to add extra bytes to
that, that's fine provided they are transparent.

Bart
 
 
 

CR-NL, NL and ftell

Post by infobah » Wed, 23 Feb 2005 01:46:21

art C wrote:

<snip>


On the other hand, the stream model (complete with stdin, stdout,
and stderr) is extremely convenient much of the time. On platforms
where it makes sense to have a separate set of functions for
disk I/O or the console, a good implementation will typically
provide such functions as an extension. That way, you get the
best of both worlds - you can ignore console I/O in favour of
stream I/O if you need portability, or you can take advantage
of console I/O (at the expense of portability). The choice is
yours.


Take Pascal (where the equivalent of feof() is predictive). How
does it know whether the user is about to terminate keyboard
input? Unless Pascal can read minds, it simply can't know this.
So it can't do predictive feof on stdin except in cases where
it knows for sure that the input is being redirected from a
data source of known size.


Correct. You can read a single key from stdin, of course, but that
might not be attached to the keyboard. Or if your system's stream
I/O is line-buffered, you might not get the instant response for
which you may have been hoping.


Line-buffered I/O.


Well, in a way it is (see above). IIRC the Standard does not
/insist/ on line-buffered I/O for stdin. That's just the way
it normally turns out. On some systems (eg Linux) you get to
choose. But the microcomputer BASICs of the 80s didn't have
to worry about portability to all kinds of bizarre systems.
Programs only had to work "RIGHT HERE"; so unbuffered I/O was
easily supplied. Similarly, you can have that with C on
platforms where it makes sense. Witness the getch() of Borland
and Microsoft, the rather different getch() of ncurses, the
Conin() (if I remember rightly) of the Atari, and so on - all
available in C programs for their respective platforms and
implementations.

<snip>


That wouldn't do much good. The reality is that the world is wider
than many programmers realise, and there's a lot of variety out there.

<snip>

In text mode, they are. In binary mode, C *must* be able to see
every byte. That's part of its job!
 
 
 

CR-NL, NL and ftell

Post by Eric Sosma » Wed, 23 Feb 2005 01:55:18

art C wrote:


IMHO such a step would be a backwards step. The first
programming language I used had different I/O verbs for
different devices: "READ CARD," "PUNCH PAPER TAPE," and so
on. A program that had been written to generate output on
punched cards could not be persuaded to write a magnetic
tape instead; a program capable of both card and tape output
needed conditional logic at every output-generating point to
decide which verb was appropriate, which (as you can imagine)
was both ugly and bug-prone.

C's uniform I/O model relieves the programmer of such
headaches by pushing the details of handling different device
types out of the application arena and into the implementation.
Devices differ, of course, and this has two consequences: first,
some of the differences get "filed off" in the sense that the
unified I/O model can't exploit them (e.g., there's no portable
way for a C program to stream data to your sound card), and
second, some of the differences obtrude themselves through the
uniformity in ugly ways (e.g., setvbuf(), fflush() ...). Still
and all, I feel it's a pretty good trade most of the time.


"The keyboard" itself is not part of standard C.


C has a perfectly good way of reading a single character
(or getting an EOF or error indication) from any input source:
it's called getc(). However, different input sources have
different ideas about how and when to provide their characters.
Users of systems with keyboards often like to be able to edit
their input while composing it, so most platforms gather keyed
input in a "batch and commit" mode. Some platforms provide a
way to change the mode, but since different platforms do it
differently, the C language Standard cannot legislate one
platform's solution over the others'.

Remember always that the C Standard carries no authority
beyond that voluntarily granted it by organizations that choose
to adopt and/or require it. If the Standard had required a
discipline for keyboard handling that was difficult for some
systems to support, the users of those systems would have been
less inclined to adopt the Standard.


All my examples have the '\n' present. You can't see
it in some of them because the '\n' is encoded differently
in different storage schemes, but that doesn't mean it's
missing.


The point is that you *don't* get "all this extra garbage"
if you use a text stream to read the data back again: the C
library understands the local conventions for files, and mediates
between their idiosyncrasies and a simpler convention.


The text stream makes them so. And that's reason enough
(referring to your earlier message) to "bother with text mode
and all it's [sic] pitfalls."

--
Eric Sosman
XXXX@XXXXX.COM
 
 
 

CR-NL, NL and ftell

Post by Flash Gord » Wed, 23 Feb 2005 02:18:10

art C wrote:

That would make it far harder to implement programs that can take input
from a number of possible sources including a file based on command line
switches.


There is no language in which everything is easy.


Actually it never went away.

> 2 major revisions of the C standard and

I agree that it would be useful, but it was not added to the standard.


It adds a \n which then gets translated to by the C implementation to
whatever the file system wants to indicate a new line.

> If you're getting all this extra garbage

Without all that other stuff if you loaded your "text" file in to a text
editor on the system, or passed it to anything else expecting a text
file, it would not work. That is because text files on those systems are
*defined* as using fixed length space padded lines, or lines where the
line length is indicated by the first byte of the line record or whatever.

Try writing a simple text processing application that works on all
systems including those with strange (to you) native text file formats
*without* having the implementation taking care of the details would be
a *major* problem.


*Your* file formats are. Just open them in binary mode and that is what
you get.

> And a text

No, a text file is whatever the OS defines a text file as being, which
can be a *lot* more complex.


The whole point of the way text streams are handled in C is that it
*does* make it transparent. You don't have to worry about whether it is
CR, CR/LF, LF, explicit length records, padded fixed length lines or
what. However, this means the file size on disk is *not* always the
number of characters you will read if you read all the way through it.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.