I am afraid your approach is basically wrong....
You should clearly realize that, *IN ACTUALITY*, all your treads just
cannot run simultaneously - the maximal *THEORETICALLY* possible number
of threads that may
run at the same time equals that of CPUs on the target machine.
Therefore, by taking the above approach, you waste A LOT of memory
(each thread has its own stack, plus don't forget about ETHREAD
structures that has to be allocated for every thread), plus make thread
dispatcher waste time on unnecessary context switches.
In other words, by creating 40 threads you degrade, rather than
improve, your performance, and waste the system resources.....