Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by Terry » Thu, 05 Oct 2006 03:30:24


Here's a seemingly simple program that doesn't work using Visual C++
2005, but g++ 3.4.4 works as expected.
Perhaps I wandered into "undefined" territory somewhere.

==================================
File: try_get.cpp
==================================
#include <iostream>
#include <limits>

int main() {
static char Line[1024];
while (std::cin.get(Line, sizeof(Line), '\n')) {
std::cin.ignore(std::numeric_limits<int>::max(), '\n');
std::cout << ':' << Line << std::endl;
}
} // main
===================================
Compiling: cl -EHsc try_get.cpp
Running: try_get < try_get.cpp
yields:
===================================
:#include <iostream>
:#include <limits>
===================================
Compiling: g++ try_get.cpp
Running: a < try_get.cpp
Yields:
===================================
:#include <iostream>
:#include <limits>
:
:int main() {
: static char Line[1024];
: while (std::cin.get(Line, sizeof(Line), '\n')) {
: std::cin.ignore(std::numeric_limits<int>::max(), '\n');
: std::cout << ':' << Line << std::endl;
: }
:} // main



--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by Alberto Ga » Thu, 05 Oct 2006 09:35:06

Terry G ha scritto:

There's nothing undefined: Visual C++ is correct and g++ is buggy. About
get(), ?7.6.1.3/8 says: "If the function stores no characters, it calls
setstate(failbit)". So when the program reaches the empty line, failbit
shall be set and the loop shall end immediately.

Ganesh

--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by denise.kle » Thu, 05 Oct 2006 09:52:42

Hello Terry!


I'd say it is exectly the other way around. Of course, my expectation
takes the standard into account rather than wishful thinking. That is,
the first behavior is correct, the second wrong - assuming that both
implementations indeed use the same end of line sequence: if there is
actually a CR/LF sequence at the end of the line, both implementations
can be conceivably correct.


Nope: all well defined (well, if you discount the minor detail of not
including <istream> as is, strictly speaking, required by the standard
to get a definition of any of the stream classes rather than just the
declaration of the 8 standad stream objects <iostream> is providing).


For the above line, get(Line, sizeof(Line), '\n') will not store any
characters and thus set failbit. If the line is terminated by a CR/LF
sequence, this will be transformed into a '\n' character on Windows
aware systems but become a "\r\n" sequence on non-Windows systems, thus
storing one character ('\r') and not setting failbit. I would consider
both approaches to CR/LF handling to be valid since the standard is
silence about the details how std::cin is implemented. In particular,
it does not spell out that it has to use a std::filebuf and if so
whether it use text or binary mode: the CR/LF -> '\n' is only
recommended for std::filebuf in text mode.


Note that the above line will always extract at most one character!
Only if the call to get() already hit EOF without failing (e.g. because
the last line is not properly terminated by an end of line character)
the above line might extract a different number of character than one:
in this case it would extract no characters. See 27.6.1.3
(lib.istream.unformatted)/8-9 and /24-25 for details.

Good luck, Denise


--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by Terry » Thu, 05 Oct 2006 21:48:48

Hello Denise!

Yes, the standard is quite clear. I must have missed that in Josuttis'
excellent book.

This is the second post I"ve seen that suggests <istream> should strictly be

included.
I've never included <istream> before. Will you elaborate as to why I
<iostream> isn't enough?

terry



--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by kanz » Thu, 05 Oct 2006 22:11:48

XXXX@XXXXX.COM wrote:


Formally speaking, both could be correct no matter what. It's
implementation defined what the implementation considers "end of
line".

In practice, I'd say that quality of implementation requires
that the usual conventions for the system be respected: an
implementation under Windows which considers the byte sequence
0x0d, 0x0a as a '\r' character, followed by a line separator (or
terminator---I'm not sure which CRLF is supposed to be under
Windows) is just as wrong as an implementation under Unix which
requires the two bytes. As far as the standard goes, if they've
documented the behavior as such, they are "correct", but from a
quality of implementation point of view...

Some implementations may accept additional sequences: at least
one Unix based implementation I know will accept either CRLF or
just LF as a line terminator. (Under Unix, the convention is
clear, and the LF is a terminator, and not a separator.) I'd
say that there's no problem with the quality of implementation
there, as long as they do accept the usual terminator (and they
are being "reasonable", of course---accepting the sequence
's',LF as a terminator, and replacing it with a single '\n'
would not be reasonable, IMHO, whereas accepting CR,LF is).





Unless, of course, the line contained some trailing spaces we
can't see:-).


Implementation defined, and at least one Unix system does accept
CRLF as a line terminator. (At least, that's what one of the
authors of the compiler told me; I've never tried it.) Also,
most Windows implementations will also treat a solitary LF as a
line separator (or terminator). (At least, those I've tested
all do. Since I maintain my files on Unix systems, my Windows
code does have to deal with isolated LF's, and I've never had a
problem with it in code I've compiled myself.)


I think you're talking about the compiler here. The standard
does require that std::cin be open in text mode. (At least, the
C standard required this of stdin. I can't imagine C++ being
different in this regard, but I'm too lazy to look it up.)

I think we really should take quality of implementation issues
into account, too. As far as the standard is concerned, an
implementation could define '@' as the line terminator; in which
case, his whole file would be a single, unterminated line, which
the implementation is then free to ignore. I wouldn't use such
an implementation, of course, and I suspect that I'm not alone.



What happens if the program encounters a line of more than 1023
characters?


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by Pete Becke » Thu, 05 Oct 2006 22:28:41


be

Because the standard doesn't say that it's enough. Look up the
requirements for <iostream>.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." For more information about this book, see
www.petebecker.com/tr1book.

--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by denise.kle » Fri, 06 Oct 2006 06:26:16

Hello Terry!


The standard requires that inclusion of <iostream> declares the eight
standard stream objects. Nothing less but also nothing more. Now, you
can declare these objects without even defining their type, you only
need to declare their type. Except for an issue with template default
arguments (*), a valid implementation of <iostream> could look like
this:

#if !defined(_IOSTREAM)
#define _IOSTREAM
namespace std
{
template <typename> struct char_traits;
template <typename, typename> class basic_istream;
template <typename, typename> class basic_ostream;

extern basic_istream<char, char_traits<char> > cin;
extern basic_ostream<char, char_traits<char> > cout;
extern basic_ostream<char, char_traits<char> > cerr;
extern basic_ostream<char, char_traits<char> > clog;

extern basic_istream<wchar_t, char_traits<wchar_t> > wcin;
extern basic_ostream<wchar_t, char_traits<wchar_t> > wcout;
extern basic_ostream<wchar_t, char_traits<wchar_t> > wcerr;
extern basic_ostream<wchar_t, char_traits<wchar_t> > wclog;
}
#endif

I'm not aware of any implementation which really does this but there
are valid reasons to implement it this way. It may appear to result in
a bad user experience but once you have ported a major piece of code
from a heavily extended standard library to another platform you will
appreciate the benefit of a library providing only the bare minimum!

(*) Only the first declaration of a template using default template
arguments is allowed to provide
them. Since basic_istream and basic_ostream are defaulted on the
second argument, the
first declaration actually has to provide these. Since a user can
either include <iostream>
or <istream> first, some preprocessor logic is necessary to make
sure that the template
default arguments go to the right place.

Good luck, Denise.


--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by denise.kle » Fri, 06 Oct 2006 06:27:55

Hello James!




You are right: in this case the ignore might skip over more than one
character. I'd use std::string with std::getline() anyway.

Concerning the mode of std::cin, the C++ standard is rather vague and
doesn't really say anything (27.3.1/1): "The object cin controls input
from a stream buffer associated with the object stdin, declared in
<cstdio>." The doesn't mean anything by itself. QoI, of course, sets
some expectations.

Good luck, Denise.


--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by Terry » Fri, 06 Oct 2006 08:32:23


static char Line[1024];
while (std::cin.get(Line, sizeof(Line), '\n')) {
std::cin.ignore(std::numeric_limits<int>::max(), '\n');
// Process line.
}



My intent was to only process the first 1023 characters and then ignore the
rest.

terry



--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by kanz » Fri, 06 Oct 2006 22:33:41


Which is exactly what the code does; I understand that. I was
responding to the claim that ignore would never ignore more than
one character, not questionning your code.


--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by kanz » Fri, 06 Oct 2006 22:44:35


That's generally what I'd do as well. On the other hand, my
programs don't read data from possibly hostile "foreign"
sources; I could imagine something like Terry's code being used
to explicitly limit resource use, for example, to prevent a DoS
attack.


Actually, it means quite a bit. The C standard requires that
stdin be open in text mode. (?.19.1/7: "At program startup,
three *text* [emphesis added] streams are predefined and need
not be opened explicitly[...]".


--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by Denise Kle » Sat, 07 Oct 2006 12:37:43

Hello James!




What is the impact on this statement to std::cin, however? It says that
std::cin is "associated" to stdin. Because I'm responding to articles
written by you, this forms some association but this doesn't tell
whether my behavior is anything similar to yours! Likewise, there is
not much specification what it means that std::cin is associated with
stdio. There is probably something about reading characters
alternatingly from std::cin and stdin but I doubt there is much more.
In particular, I don't see anything which clearly imposes the
requirement that std::cin uses text mode in the same way stdin uses.
QoI and consistency would surely imply that it does but if the standard
does not spell it out, it is not required.

Good luck, Denise
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by cbarron » Sat, 07 Oct 2006 13:03:18


another approach is to create a class holding all the needed
data [except stream &] and creating an operator >> () for the class
doing exzctly what you want with streambuf functions, setting eof and
fail upon actual reading of 0 chars.
// line_reader.h
#ifndef LINE_READER_H_
#define LINE_READER_H_
#include <iosfwd>

class line_reader
{
char *buf;
std::size_t len;
char delim;
public:
line_reader(char *a,std::size_t b,char
c):buf(a),len(b),delim(c){--len;}
std::istream & do_read(std::istream &in);
};

inline std::istream & operator >> (std::istream &in,line_reader &line)
{
return line.do_read(in);
}
#endif
// line_reader.cp
#include "line_reader.h"
#include <streambuf>
#include <istream>

std::istream & line_reader::do_read(std::istream &in)
{
if(!in)
return in;
std::streambuf *sb = in.rdbuf();
char *p = buf;
char *q = buf +len;
int c;

while((c=sb->sbumpc())!=EOF && p!=q && c!=delim)
{
*p++ = c;
}
*p = '\0';
if(p==q)
{
while((c=sb->sbumpc())!=EOF && c!=delim) {}
}
if(c==EOF)
{

in.setstate(std::ios_base::eofbit|std::ios_base::failbit);
}
return in;
}

a little 'low level' perhaps but it reads upto size-1 chars or a until a
delim is found which ever occurs first, if size-1 chars are read without
a '\n' it eats everything upto and including the next delim unless it
hits end of file...

usuage is simple
char buf[1024];
line_reader reader(buf,1024,'\n');
while(in_stream >> reader)
process(buf);


--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
 
 

Visual C++ 2005 basic_istream::get(buf, size, delim) bug?

Post by kanz » Sat, 07 Oct 2006 21:26:42


In this context, association has a bit stronger meaning than in
a more general context. But I agree, it's not a 100% guarantee.
At best, it is a very strong indicator of intent. (Note too
that for any reasonable meaning of "associate" here, most
systems will not be able to have the two streams open in
different modes, because any association would mean using the
same open file at the system level.)


Strictly speaking, no. I suppose one could imagine an
implementation where stdio used CR,LF as line separators, and
iostream used a simple LF as a line terminator. Practically...


Yes, and the expectations concerning QoI and consistency are
based on statements in the standard like the one you cited.
Just because a statement in the standard doesn't set a strict
requirement doesn't mean that it "doesn't really say anything".
(Although in this case, quite frankly, I can't see any reason
for not having a strict requirement.)


--
[ See http://www.yqcomputer.com/ ]
[ comp.lang.c++.moderated. First time posters: Do this! ]