In article < XXXX@XXXXX.COM >,
: > In article < XXXX@XXXXX.COM >,
: > : As it stands now, I would create a shared memory file, and memory map a
: > : semaphore to that file. Different programs on different computers would
: > : do this in a very simliar fashion.
: > Are you trying to use a shared memory zone across multiple computers?
: > There's a
: > number of very good reasons why that won't work.
: The shared file is on a shared drive.
There is really only one situation where you should be using a file on a shared
drive to transfer data between cluster nodes. That is when you have large
amounts of bulk data, and you're using a clustered file system on a SAN that's
significantly faster than your network.
You should never attempt to use such a shared file as a primary communications
channel. It should only be considered a side channel, to be used not unlike the
"data" channel in an FTP connection.
As an example:
Cluster node A,B,C connect to server node X via TCP sockets. All cluster nodes
share a cxfs file system mounted as /clusterdata.
Node X fills the file "/clusterdata/NodeA-Raw" with raw data from some data
source. Once filled, node X then sends a message over the TCP socket to node A
telling it to read and process "/clusterdata/NodeA-Raw".
Node A processes "/clusterdata/NodeA-Raw" into "/clusterdata/NodeA-Cooked". Once
completed, node A sends a message over the TCP socket to node X telling it to
read and integrate "/clusterdata/NodeA-Cooked", and to refill
"/clusterdata/NodeA-Raw" with new data.
Meanwhile, node X also fills "/clusterdata/NodeB-Raw", "/clusterdata/NodeC-Raw"
with data, separately signals B and C via TCP to start processing, etc etc.
You could also supplement this arrangement with *read only* shared data, like
"/clusterdata/Shared-TransformMatrix", which would *only* be updated by Node X,
and would *only* be updated when all nodes are idle; either by waiting until all
nodes have processed and returned all current data, or by signalling through the
TCP channel that the nodes should abort current processing and wait for new data.
Now, if you don't have large amounts of bulk data or if your network is
sufficiently fast to pass them that way, then I would strongly encourage you to
use a network-only approach. I would also strongly suggest a network-only
approach if you don't have a clustered FS SAN at all and are using a network
filesystem like CIFS, SMB or NFS.
If you need a high level of inter-node communication (like, if node calculations
are interdependant on the results of other node calculations) then you'll need a
different approach. You should post more details of the problem you're trying to
calculate if this is the case.
Cheers - Tony 'Nicoya' Mantler :)
Tony 'Nicoya' Mantler -- Master of Code-fu -- XXXX@XXXXX.COM