Performance problems between vc6 and vc7.1 STL

Performance problems between vc6 and vc7.1 STL

Post by RWFzeUtld » Fri, 11 Nov 2005 02:27:04


While in the process of moving all our Visual Studio 6 c++ projects over to
Visual Studio .Net 2003 Enterprise Architect it was noticed that the
performance was not as good for these applications.

On further investigation it appears to be a problem in the new STL
libraries, namely XTREE. I have written a small test harness to illustrate
the issues we are seeing, which I compiled under VC6 and run and then under
VC7.1 and run. The results are below (and the test harness source is
available if needed). It simply does 1 million iterations for each step doing
the common things people would do; inserting, looking up values and deleting.
std::list seems to be ok but std::map and std::set seem to be up to 10 times
slower (both using XTREE).

Has anyone else seen similar issues, or know if there is anything we can do
to increase the performance under Visual Studio .Net 2003?

Results from compilation under vc6:

std::list test 1000000 iterations
Results: Overall 625 millisecs.
Step 1 (insertions) took 297 ms
Step 2 (deletions) took 328ms
std::set test 1000000 iterations
Results: Overall 1172 millisecs.
Step 1 (insertions) took 594 ms
Step 2 (lookups) took 156ms
Step 3 (deletions) took 422 ms
std::map test 1000000 iterations
Lookup result = 1783293664
Results: Overall 1187 millisecs.
Step 1 (insertions) took 609 ms
Step 2 (lookups) took 156ms
Step 3 (deletions) took 422 ms

Results from compilation under vc7.1:

std::list test 1000000 iterations
Results: Overall 672 millisecs.
Step 1 (insertions) took 359 ms
Step 2 (deletions) took 313ms
std::set test 1000000 iterations
Results: Overall 4640 millisecs.
Step 1 (insertions) took 2906 ms
Step 2 (lookups) took 1328ms
Step 3 (deletions) took 406 ms
std::map test 1000000 iterations
Lookup result = 1783293664
Results: Overall 4984 millisecs.
Step 1 (insertions) took 3063 ms
Step 2 (lookups) took 1562ms
Step 3 (deletions) took 359 ms
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by Tom Widmer » Fri, 11 Nov 2005 03:01:51


The first obvious question is did you compare release build to release
build? Obviously, debug build performance isn't really a big concern.

Tom

 
 
 

Performance problems between vc6 and vc7.1 STL

Post by Arnaud Deb » Fri, 11 Nov 2005 06:42:35


Be aware that VC6 version is buggy in several ways when used in a
multithreaded context (basically, containers using <xtree> are not
thread-safe even when only reading them).There is even a patch of Dinkumware
to fix the header ( http://www.yqcomputer.com/ ).
VC2003 does not exhibit this behaviour, but it may means that the
implementation is slightly slower. What would be interesting it to test a
VC6 with the patches from Dinkumware against a VC7.1.

Arnaud
MVP - VC
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by RWFzeUtld » Fri, 11 Nov 2005 17:16:06

I have tried comparing release to release and debug to debug, I see the same
performance degradation between visual studio versions for these tests. The
results on initial question were taken from releases versions of the code.
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by RWFzeUtld » Fri, 11 Nov 2005 17:30:02

We have already applied a Dinkumware update to the xtree file for use under
VC6, although I have also tried this test harness with the original and see
about the same performance under VC6. It is the version shipped with VC7.1
that seems to cause the problems (and also the STL libraries running in the
VS 2005 release candidate have shown the same performance problems). My
biggest concern out of the results are the lookups into the std::map, which
are almost exactly 10 times slower. If this only due to fixes made for
multiprocessor machines then it is rather concerning.
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by RWFzeUtld » Fri, 11 Nov 2005 17:33:03

ere is the code from my test harness so other people can try this for
themsevles. It may be the test harness code could be optimised, or I am
missing something about the changes in VC7.1 that calls for the STL to be
used in a different way to get it to perform better, but this same code
permforms vastly different when compiled in the two visual studio plaforms.

The test harness was written as a very basic console application, compiled
and run from the command prompt for each Visual Studio version.

#include "stdafx.h"
#include "windows.h"

#include <iostream>
#include <list>

#include <map>
#include <set>
#include <conio.h>



int main(int argc, char* argv[])
{
DWORD dwStart, dwEnd, dwStep1, dwStep2;
DWORD dwIterations = 1000000;
DWORD dwCount;

////////////////////////////////////////////////////////////////////
// std::list
////////////////////////////////////////////////////////////////////
std::cout << "std::list test " << dwIterations << " iterations\n";
std::list<DWORD> lstLongs;

dwStart = GetTickCount();

// Step 1, add things to a list
for (dwCount=0; dwCount< dwIterations;dwCount++)
{
lstLongs.push_back(dwCount);
}
dwStep1 = GetTickCount();

// Step 2 remove things from the list
while(lstLongs.size() > 0)
{
dwCount = lstLongs.front();
lstLongs.pop_front();
}

dwEnd = GetTickCount();
std::cout << "Results: Overall " << dwEnd - dwStart << " millisecs.\nStep
1 (insertions) took " << dwStep1 - dwStart << " ms\nStep 2 (deletions) took "
<< dwEnd - dwStep1 << "ms\n";
////////////////////////////////////////////////////////////////////

////////////////////////////////////////////////////////////////////
// std::set
////////////////////////////////////////////////////////////////////
std::cout << "std::set test " << dwIterations << " iterations\n";
std::set<DWORD> setLongs;
std::set<DWORD>::iterator iterSetVal;

dwStart = GetTickCount();

// Step 1, add things to a set
for (dwCount=0; dwCount< dwIterations;dwCount++)
{
setLongs.insert(dwCount);
}
dwStep1 = GetTickCount();

// Step 2 lookup items in the set
for (dwCount=dwIterations+1; dwCount > 0;dwCount--)
{
iterSetVal = setLongs.find(dwCount-1);
}
dwStep2 = GetTickCount();

// Step 3 remove things from the set
while(setLongs.size() > 0)
{
iterSetVal = setLongs.begin();
setLongs.erase(iterSetVal);
}
dwEnd = GetTickCount();

std::cout << "Results: Overall " << dwEnd - dwStart << " millisecs.\nStep
1 (insertions) took " << dwStep1 - dwStart
<< " ms\nStep 2 (lookups) took " << dwStep2 - dwStep1 <<
"ms\nStep 3 (deletions) took "
<< dwEnd - dwStep2 << " ms\n";
////////////////////////////////////////////////////////////////////

////////////////////////////////////////////////////////////////////
// std::map
////////////////////////////////////////////////////////////////////
std::cout << "std::map test " << dwIterations << " iterations\n";
std::map<DWORD, DWORD> mapLongs;
std::map<DWORD, DWORD>::iterator iterMapVal;
DWORD dwTotal = 0;
dwStart = GetTickCo
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by Ulrich Eck » Fri, 11 Nov 2005 18:07:46


[...]

Bad idea, use list::empty() instead.

[...]

Same here.

[...]

And here.

The point is that implementations are allowed to cache the number of
elements but aren't forced to. Since you usually don't need that number
anyway and you can use empty() to check if they are empty, many
implementations avoid this overhead. Therefore, size() really has to count
the elements - you can imagine the amount of time it takes.

Uli
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by RWFzeUtld » Fri, 11 Nov 2005 18:31:05

I agree the test harness code is not optimal, and would certainly benefit
from your suggestion, but actually the deletions are not too bad on
performance (which is the only place this piece of code is used). The
performance issues are worse in areas not using the size() function at all.
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by Tom Widmer » Fri, 11 Nov 2005 23:09:58


I think allocator differences are the main issues here. Make sure you
compare multithreaded dll CRT to multithreaded dll CRT. Certainly the
allocator has changed quite a lot between versions, and probably offers
less fragmentation, etc. in return for less performance under some
circumstances (and possibly less cache locality giving worse std::map
look up performance too). Consider using a small object allocator or
better general allocator if allocation performance is your bottleneck.

Tom
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by P.J. Plaug » Fri, 11 Nov 2005 23:14:48


I know of absolutely no reason why list, set, and map should have slowed
down, certainly not to the extent you're describing. For a time, Microsoft
was supplying a V6 version of <xtree> (for set and map) that went overboard
on locking and was probably *really* slow, but the replacement <xtree>
from our web site avoids those problems. We've also fixed some subtle
bugs in <xtree> since V6, so sets and maps are way more robust now;
but those fixes shouldn't cause any performance hit.

I have reason to believe that Microsoft is taking this report seriously, and
that better folks than I are studying the matter. If they locate any
problems
in the STL code proper, we'll of course supply a rapid fix. But meanwhile,
I defer to Microsoft to report on any problems and fixes as they find them.

P.J. Plauger
Dinkumware, Ltd.
http://www.yqcomputer.com/
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by RWFzeUtld » Sat, 12 Nov 2005 00:11:11

Thanks for your reply. The comparsions were definitely done like with like (I
did try many combinations though).
We have found other postings regarding general slowness issues between the
two versions of Visual Studio (rather than STL specifically) and one of the
suggestions was to use the API call _set_sbh_threshold to re-enable the
small-block heap. We have tried this and do now get performance close to VC6
(and even a slight improvement from VC6), although I am not sure what
consequences there are for it's use.

Kev
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by RWFzeUtld » Sat, 12 Nov 2005 00:20:10

Thanks for your reply. It seems the problems are not with STL itself. I have
now tried the STL test harness with the Dinkumware STL 3.08 version, as this
can be made to comile under both VC6 and VC7.1. I see the same poor results
for VC7.1 and good results for VC6 (now using the same STL). This means the
problem is not STL itself. It would appear to not be the vc7.1 compiler
either as I have also tried with Intel's compiler (version 9.0) and see the
same poor results from code compiled in the VS7.1 IDE using Intels compiler.

We have found other postings regarding general slowness issues between the
two versions of Visual Studio (rather than STL specifically) and one of the
suggestions was to use the API call _set_sbh_threshold to re-enable the
small-block heap. We have tried this and do now get performance close to VC6
(and even a slight improvement from VC6), although I am not sure what
consequences there are for it's use.

Kev
 
 
 

Performance problems between vc6 and vc7.1 STL

Post by ensdor » Sun, 04 Dec 2005 00:22:32

have observed similar slowness with set and map classes in vc7,
though I was not aware there was a slowdown from vc6. Based on your
comment, would it be reasonable to conclude that the difference is the
c run-time implementation? I believe it is different between the two
versions, and I am assuming that the dinkumware version of the STL
includes only the c++ libraries and not the c run-time library.

The interesting thing (I thought) when I observed this was that the
data structures would grow slower after the first use. I ran a test
where I woud create and populate a map and delete it, and then repeat
these steps N times, re-creating the map object from scratch each time.
It would always be significantly slower on iterations 2 through N than
on iteration 1.

I will try your suggestion of using the small block heap.

-Ken

EasyKev wrote: