question about memory visibility on Windows

question about memory visibility on Windows

Post by avik.ghos » Sun, 15 Jun 2008 22:02:54


I have a process where memory is sometimes not updated across threads
in Windows. The description follows :

The process has two threads - an 'engine' thread that runs an event
loop using select, and a user thread that periodically makes requests
to the engine thread. The request is done by writing a byte to a
socket which is in the fd list for select. When the engine thread has
completed the task requested, it signals the user thread by writing a
byte on the write end of a pipe (not a named pipe). The user thread,
which had been blocking on the read end of the pipe, releases the
mutex it holds (to prevent other, possible user threads from
simultaneously entering) and returns when it reads the ack byte.

In short, the user thread
locks a mutex,
prepares a request by setting some variables (private variables of an
object that was allocated by the new operator),
sends a signal byte over a socket,
does a blocking read on a pipe for the ack byte and
releases the mutex.

The engine thread
reads the signal byte from the socket (woken up from select)
processes the request after reading the parameters off the private
resets the private variables and
writes an ack byte on the pipe.

The above is built on Linux (where it uses pthreads) and on Windows
where it uses Windows threads and mutexes.

On Linux, the process is well behaved.

On Windows, when a large number of requests are fired rapidly, once in
a while I see that the parameters set by the user thread are not seen
in the engine thread. What the engine thread sees is the older values,
from its point of view.

I was under the impression that system calls like socket and pipe read/
writes, as well as mutex calls would activate memory barriers.

Is it possible that this is not true on Windows, at least for pipes?
(An older version of the code, where the ack byte is sent over the
same socket, does not exhibit this behaviour).

If it is indeed the pipes, I am considering using Windows
synchronization methods like SetEvent to handle the signalling - is
that considered safe?

On Windows I compile using MSVC 6.0, but I have tried MSVC 2003 as



question about memory visibility on Windows

Post by David Schw » Wed, 18 Jun 2008 08:06:10

That's true, but it doesn't help you. The problem has nothing to do
with memory barriers. Consider:

read( ... );
if(i==7) /* here */

If the compiler can prove that the call to 'read' cannot change the
value of 'i', then it does not need to read 'i' from memory for the
second 'if'.



question about memory visibility on Windows

Post by Torsten Ro » Thu, 19 Jun 2008 01:10:12

And your engine thread doesn't locks the mutex? If not, what's the mutex
good for?

regards Torsten

kostenlose Wirtschaftssimulation:

question about memory visibility on Windows

Post by avik.ghos » Tue, 01 Jul 2008 13:18:06

Hello David,

Thanks for the help!

I apologise for not responding as soon as I resolved the problem.

It turns out that I was, in this instance, blaming Windows unfairly.
I had a race condition in my code (introduced in a later version),
which for some reason never got triggered on Linux.

Now my code actually does what I have described (basically signal
using pipes instead of condition variables) and there are no problems.


question about memory visibility on Windows

Post by avik.ghos » Tue, 01 Jul 2008 13:23:22

Hello Torsten,

Thanks for looking into my problem.

As I have mentioned in the previous reply, the problem was a race
condition that I introduced in my code in a later version. Now that is
fixed (and the code does what I have described) things work fine

The mutex that I have described is for the user object. It is there to
provide mutual exclusion amongst multiple user threads calling methods
on the same user object. There is only one engine thread, and it
operates only on its data, hence it does not need to use this mutex.


question about memory visibility on Windows

Post by David Schw » Wed, 02 Jul 2008 03:36:58

Something is really wrong in your design. Any time you have the
following logic, something is most likely wrong:

1) Acquire a mutex.
2) Do blocking I/O.
3) Release the mutex.

It is almost never correct to do blocking I/O while holding a mutex.
It is bad enough when you have to block one thread on I/O, but to
force any thread that tries to acquire a mutex to become blocked on
that I/O is almost never right.


question about memory visibility on Windows

Post by avik.ghos » Thu, 03 Jul 2008 04:40:04

Hi David,

Let me try to explain my requirement, and then my solution. I thought
what I have is at least correct, if not optimal, but I may be wrong...

There is an object instance 'A' on which various methods may be
invoked. This instance may be operated from possibly multiple user

When any of these methods are called, they need to operate on internal
data structures and then need to signal the single, unique engine
thread. The engine thread does some other things and signals back when
it is done. At this point the method on object A returns.

Basically, a method call on A operates on its own data, calls a remote
procedure on the engine thread and then returns. No other method on A
may be called while one is in progress.

To achieve the above, this is what I do (almost) :

Any method call on A

i) locks the mutex

ii) signals the engine thread to execute the remote procedure, by
sending it a message (a single byte) on a dedicated socket which is
part of its select fds.

ii) waits on a condition variable to be woken up when the engine
thread has completed the task.

The engine thread signals completion by signalling the condition

The only difference between what I have described above, and what I
actually do right now is that the signalling between A and the engine
thread is not done via a condition variable, but by reading and
writing a single byte on a dedicated pipe. The object A waits on the
read end of the pipe, and the engine thread writes on the write end.

(By the way, when I was investigating my problem, I did away with the
pipes by using actual condition variables, but did not see any
performance improvement. I therefore let it stay as it is in

Do you think there is something wrong in the above? Could it be done
differently to make it faster or simpler?



question about memory visibility on Windows

Post by David Schw » Thu, 03 Jul 2008 06:21:32

n Jul 1, 12:40m, XXXX@XXXXX.COM wrote:

I believe your method is technically illegal but will work one very
platform I know of. The problem is that you have no *memory*
synchronization between the thread that invokes the engine thread and
the engine thread. Writing a byte in one thread and reading it in
another thread is no defined to synchronize their view of memory. So
you have a case where the engine thread is reading data written by
another thread with no synchronization.

However, in practice, on every platform I know of, having one thread
write to a socket or pipe and having another thread read it will
"accidentally" synchronize memory.

It's not performance, it's correctness. You must synchronize your
threads using a mechanism that is defined to have the semantics you
need. The semantics you need is that the first thread writes some data
and the second thread must read that data, not stale data. Why do you
think pipes are required to provide this?

I think mutexes and condition variables should always be what you use
first, because they're designed and optimized for this specific
purpose. More importantly, they are defined to synchronize memory.

It also seems that you are tying up the user threads for no reason.
Unless they have to do something immediately after the engine
finishes, there is no reason they should be waiting for the engine to
finish. It's generally a poor design to have a large number of threads
that wind up having to wait for a single thread to do the "real work".
Of course, if that single thread is only doing a small amount of the
work, then that's fine.

I wouldn't suggest ripping your whole program apart and redesigning it
just to make it better when there's no problem and it's perfectly
suitable for its task. But in the future, it's probably better not to
have an engine thread. Just have one or more "engine locks" that
protect any shared data, and let any thread be the "engine thread"
when it has work to do that you would normally have the engine thread
do. (Note that this isn't always possible or always best, but it's
usually possible and it's usually best.)

Presumably there's some reason all the user threads can't run the
engine code at the same time. Whatever that reason is, protect *it*
with a mutex. Then any thread (while it holds the mutex) can be the
engine thread. This will minimize context switches.


question about memory visibility on Windows

Post by avik.ghos » Thu, 03 Jul 2008 07:21:02

hat a prompt reply!

On Jul 1, 5:21m, David Schwartz < XXXX@XXXXX.COM > wrote:

I remember a (very long) discussion on memory barriers in this group a
while ago - I came away with the notion that system calls will invoke
memory barriers. Is that not required, defined behaviour?

The reason I used pipes was one of expediency - the code could then be
the same in Windows as in Linux. The current, development version of
the code uses condition variables to achieve the signalling. From what
you say that will be a safe and legal implementation.

I believe I have valid reasons for there being one engine thread
(mostly to do with the fact that it is based on legacy single-threaded
messaging library). I do realise that I could save context switches if
I allowed any thread to do the required processing, but I will have to
think a bit about how to achieve that without modifying things too

If I have some further design questions I will post them in a new

Thanks for all your help,



question about memory visibility on Windows

Post by David Schw » Thu, 03 Jul 2008 11:07:45

n Jul 1, 3:21m, XXXX@XXXXX.COM wrote:

I do not believe that it is required, defined behavior. If one thread
is currently running on one CPU and one thread is currently running on
another, nothing requires the system call to do anything but what it's
defined to do.

In practice, the only things that will mess you up are prefetched
reads and posted writes. No CPU currently in existence has a
prefetcher or a write posting buffer nearly large enough to survive
through a system call.

However, I don't know of any standard that actually guarantees that it
will work.

Also note that writing to a pipe might not really be a system call on
Windows. It might be implemented purely in a DLL. But in practice, the
same thing will happen, any posted writes will be long gone by the
time the write is visible to another thread.

Note that this might not apply to "trivial" system calls like

That is definitely safe on any POSIX threads platform. POSIX
specifically guarantees that if one thread signals a condition
variable and another thread is woken by that same condition variable,
it will see any changes made (largely because returning from
pthread_cond_wait acquires the mutex).

It may not be worth the effort. It depends how much work the engine
thread is doing compared to the other threads and how much parallelism
you could possibly gain if you moved that into the user threads.

Again, there's no reason to rip a working design apart just because
there's a better design. And if that better design is more fragile or
complicated, and its advantages are unlikely to be seen in practice,
it isn't really better.

You're welcome.