big vector <bool> unusable, causes asserts

big vector <bool> unusable, causes asserts

Post by 440gt » Tue, 23 Feb 2010 16:27:04


For Win32, creating large vectors is problematic. Note max_size() is
reported as 0xffffffff (4GB-1).

// problem 1: create 4GB-1 vector
vector <bool> v(0xffffffff); // assert!

// problem 2: create 2GB vector
vector <bool> v(0x80000000); // ok
v.clear(); // assert!

// problem 3: create vector just under 2GB
vector <bool> v(0x7fffffff); // ok
v.clear(); // ok
v.resize(0x7fffffff); // unusable--in a debug build takes days
to complete
 
 
 

big vector <bool> unusable, causes asserts

Post by Geof » Wed, 24 Feb 2010 05:44:52


One could say the vector template is 32-bit aware without regard to
platform limitations. The max_size() member returns the value -1 /
sizeof(type) without regard for any limits imposed by the CRT.
Presumably the same thing will happen in Win64 with -1 being
0xFFFFFFFFFFFFFFFF.

If you attempt to instantiate a vector of max_size you will bump into
the CRT heap limit for the platform in question. In this case you are
hitting the 2GB per user process virtual address space limitation in
Win32. But do not be dismayed, it also SIGABRTs in OS X on x86
platforms. :)

There is no way to determine a "safe" maximum size at compile time.
The maximum usable vector will depend on the total size of the program
and other data of the process. You should use vector::push_back() or
insert() to resize your vector and not try to use vector as you would
have used malloc(). This way you will be using the minimum amount of
space necessary to perform the work.

I notice the resize of a vector<bool> takes much longer than
vector<char>. I am not sure why. You also don't say how much RAM your
system has, I did my tests on a WIN32 XP system with 4GB ram and got
about 500,000 page faults and the process maxed out at about 2068MB.
If your system only has 1 or 2 GB you might end up thrashing the
pagefile and this could make it take "days".

 
 
 

big vector <bool> unusable, causes asserts

Post by 440gt » Wed, 24 Feb 2010 09:49:53

vector <bool> is a special case because it is bit packed. So a 4GB
vector will only occupy 512MB of memory. Thus, this should not hit any
process space limits.

Even staying under the assert radar screen with a smaller size it is
so slow that it is just as good as broken. I have a modern system with
4GB RAM. resize does not touch the hard disk. It is CPU bound for
days. For comparison, sizing it using the constructor to the same size
is instantaneous. Your proposal to use push_back as a work around
takes a day or two to complete.

This is Visual Studio 2008 with the latest packs.
 
 
 

big vector <bool> unusable, causes asserts

Post by Geof » Wed, 24 Feb 2010 20:58:26


It's obviously not packed or you would not be hitting the assertion.
In my tests I see the elements as bytes in spite of the documentation.
Do you have a snippet that correctly packs?
 
 
 

big vector <bool> unusable, causes asserts

Post by Geof » Wed, 24 Feb 2010 21:57:05

I see what you mean. The resize takes forever apparently due to the
initialization.

#include <vector>
#include <iostream>

using namespace std;

int main(int argc, char* argv[])
{
vector <bool> v;
vector <bool>::iterator it;

v.resize(1000000000);
cout << v.capacity() << endl;
cout << v.size() << endl;
for (it = v.begin(); it != v.end(); it++)
*it = true;
return 0;
}
 
 
 

big vector <bool> unusable, causes asserts

Post by 440gt » Thu, 25 Feb 2010 10:18:57

> It's obviously not packed

Wrong. Read the spec or better yet look at task manager memory usage
before and after creating such a vector.


Wrong. This is thousands of times quicker.


So in summary up to this point you have given a workaround that
doesn't work, claimed to know the source of the problem that actually
has nothing to do with it, and made several other statements that are
just plain false. I was hoping someone from Microsoft might come
across this thread and see something they might wish to fix, but your
comments are making a lot of useless noise that is counter productive
to a pure technical problem reported in the beginning.
 
 
 

big vector <bool> unusable, causes asserts

Post by Geof » Thu, 25 Feb 2010 12:13:02


I have supplied you with this code, not as a workaround but as a
demonstration of your point. If your code performs differently, show
it. It does, in fact, pack the data in VS2008.


Then post your code. This code is still too slow to be useful for
gigabit bools.

I gave up on the process in debug mode after 27 minutes, which was
already a hell of a long time to initialize a billion bits IMO. I also
was able to see that in VC 2008 it IS packed. It's not packed in VC 6
which implemented vector<bool> as an array of bytes.

This is a public discussion group, not a private line to MS. I was
hoping you would shed some light on the problem but I can see you are
just another Internet know-it-all with a bad attitude.

Yes, it is slow because vector::resize() calls a protected insert
member, go read the source in the vector implementation.


The implementation is flawed, it fails when instantiating
v.resize(v.max_size()/2); I expect the entire class fails in the same
way on all platforms, not just VS2008. Saying it is "not scaleable" is
being generous. I haven't seen a genuine MS employee post here in two
years and certainly not from the VS development team.

If you had bothered to investigate the class you would know the origin
of the STL and know that it is STANDARD and not original with MS. The
class fails in the same way on other platforms. (I tested Linux and OS
X here.)

Have a nice day.