beginner question about storing binary data in buffers, seeing binary data in a variable, etc

beginner question about storing binary data in buffers, seeing binary data in a variable, etc

Post by darre » Sun, 06 Jul 2008 06:17:09


Hi there

Im working on an assignment that has me store data in a buffer to be
sent over the network. I'm ignorant about how C++ stores data in an
array, and types in general.

If i declare an array of chars that is say 10 bytes long:
char buff[10];
does this mean that i can safely store 80 bits of data?

When i think of an array of chars, i think each spot in the array as a
sequence of 8 1's or 0's. Is this a correct visualization? I guess
my question here is why do most buffers seem to be implemented as char
arrays? Can any binary value between 0 and 255 be safely put into a
char array slot (00000000 to 11111111). Why not implement a buffer
using uint8_t ?

Obviously I have a very loose grasp on how buffers are saving data,
and how a receiver gets this data on their end. I understand the
sockets stuff, just not the buffer-specific stuff. Any enlightenment
would be most appreciated.

thanks.
 
 
 

beginner question about storing binary data in buffers, seeing binary data in a variable, etc

Post by Daniel T » Sun, 06 Jul 2008 08:19:20


Use unsigned char for buffers instead of char. Yes, you can store at
least 80 bits of data, possibly more. The exact number of bits you can
store are 10 * CHAR_BIT.


Close. There may be more than 8 bits, and the first bit may be treated
special if the char is signed. Better to use unsigned char.


I don't think they are, I think most are implemented as unsigned char
arrays.


Look in limits.h. A 'char' is guaranteed to hold at least from 0 to 127
inclusive. A 'signed char' is guaranteed to hold at least from -127 to
127 inclusive. An 'unsigned char' is guaranteed to hold at least from 0
to 255 inclusive. So again, of the three, unsigned char is your best
choice.


No such type exists in the language.

 
 
 

beginner question about storing binary data in buffers, seeing binary data in a variable, etc

Post by James Kanz » Sun, 06 Jul 2008 16:44:11


Most probably don't say anything about it, because it's
unspecified. Except for the specifications of how pointer
arithmetic works within arrays.

[...]


Transmission buffers, yes. char[] or unsigned char[] are really
your only two choices. (I generally use unsigned char, but the
C++ standard does try to make char viable as well. And on most
typical architectures, where converting between char and
unsigned char doesn't change the bit pattern, both work equally
well in practice.)





On the other hand, I think in theory, char could be signed 1's
complement, and assigning a negative 0 (0xFF) could force it to
possitive (which would mean that you could never get 0xFF by
assignment---but you could memcpy it in). I think: I'm too lazy
to verify in the standard, and of course, any implementation
that actually did this would break so much code as to be
unviable.



It's still a viable alternative.

It's possible to write code for binary network protocols in a
perfectly portable manner. It's rarely worth it, since it
entails some very complex work arounds for what are, in the end,
very rare and *** machines that most of us don't have to deal
with. Thus, I know that much of the networking software I write
professionally will fail on a machine with an unusual convertion
of unsigned to signed (i.e. which isn't 2's complement, and
doesn't just use the underlying bit pattern).

James Kanze (GABI Software) email: XXXX@XXXXX.COM
Conseils en informatique orient objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sard, 78210 St.-Cyr-l'ole, France, +33 (0)1 30 23 00 34
 
 
 

beginner question about storing binary data in buffers, seeing binary data in a variable, etc

Post by Alf P. Ste » Sun, 06 Jul 2008 17:48:43

James Kanze:

I'm sorry, but while data storage is not completely specified, it's not unspecified.

The rules for pointer arithmetic are part of that specification.

The standard also imposes requirements on what order data are stored in arrays
and structures.



Again, sorry, but it's not always necessary to use char buffers. At some level
the data will be treated as just bytes, but that level need not be your C++ code
(e.g. MFC serialization is a counter-example). I think perhaps you had in mind
three unmentioned constraints, namely (1) that this is a lowest level that's
implemented in C++, (2) portable code, and (3) heterogenous network with
arbitrary client on other end.



In that case any implementation for 1's complement that used 1's complement also
for signed 'char' would be unviable... :-)

It makes an interesting case for dropping that support in the standard, and go
for requirement of two's complement for all signed integral types.



Yes, and the main reason for "why not" is that it's not a standard C++ type.



Tune up the warning level, perhaps? <g>


Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
 
 

beginner question about storing binary data in buffers, seeing binary data in a variable, etc

Post by James Kanz » Sun, 06 Jul 2008 22:58:55

n Jul 5, 10:48 am, "Alf P. Steinbach" < XXXX@XXXXX.COM > wrote:
[...]


As it happens, in the two implementations I'm aware of where
signed integers are not 2's complement, plain char is unsigned,
thus avoiding the problem. (If one of the goals of plain char
is to contain text characters, then it really should be unsigned
anyway. Historically, however, making char unsigned had a
non-negligible runtime cost on a PDP-11, and since back then,
all the world was PDP-11, and the only character set which
counted was ASCII, which only uses the lower 7 bits...)


Perhaps a better choice would be to require that plain char be
unsigned, so that you could safely use it with the results of
e.g. istream::get() or fgetc(). (Or both, but I don't think
you'll find much support for either in the committee.)







What I'm waiting for is a machine which will core dump if the
conversion fails (i.e. doesn't result in the same value). I
don't really expect to see it, however, given that one standard
idiom in C is things like:

int ch = fgetc( input ) ;
while ( ch != EOF && someOtherConditions( ch ) ) {
*p ++ = ch ; // Where p is a char*...
}

It's a bit surprising that something this widespread is
implementation defined, and may result in an implementation
defined signal (according to the C standard---the C++ standard
still has the imprecisions of C90). Because it is so
widespread, however, I don't expect to see a compiler which
doesn't support it anytime soon. (As I said, all of the
"exotic" architectures that I know make plain char unsigned,
which effectively removes the "implementation defined" here.)

James Kanze (GABI Software) email: XXXX@XXXXX.COM
Conseils en informatique orient objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sard, 78210 St.-Cyr-l'ole, France, +33 (0)1 30 23 00 34