RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Slawomir J » Tue, 11 Apr 2006 22:08:32


For those who can spare the time to look at it - I would appreciate all
comments and suggestions.



Now it includes a very short - programmer model only - specification:

http://www.yqcomputer.com/





Thanks in advance



XXXX@XXXXX.COM
 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Stephen Fu » Fri, 28 Apr 2006 02:46:43


Thanks. That made it much easier for me to get an idea of what you are
getting at, and made me able to get something from (at least skimming) the
rest of the material. I now feel I can ask (what I hope are) intelligent
questions.


1. You seem to say this, but I wanted to make sure. If atom A calls atom
B, as far as A is concerned, it makes no difference to it at all whether the
call comes from A's stressed or relaxed section. That is, A's execution is
stopped until B is available and completes its stressed section independent
of whether the call came from A's stressed or relaxed section. Is that
correct?

2. You don't list any restrictions, so should I assume that long lasting
operations such as I/O can be done from an either atom's stressed or relaxed
section? Obviously, there is a performance issue when you have it is the
stressed section, and a potential correctness issue when you have it in the
relaxed section (as with any I/O that is potentially done in parallel), but
it appears there are no additional issues. Is that correct?

3. You indicate that an atom can call multiple other atoms in a single
(dare I say atomic?) action. Is this implemented with some special syntax
or is it the case that your carefully worded "isn't allowed to change its
state" until the called atom completes its stressed section meant to
indicate that the system simply lets you execute another call without
completing the first (i.e. the call doesn't count as a "state change")?

4. You talk about implimentation as if you have done an at least
prototype implementation. That is, there is actual working software. Can
you describe the implementation status of the project?

--
- Stephen Fuld
e-mail address disguised to prevent spam

 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Slawomir J » Fri, 28 Apr 2006 09:42:21

Stephen Fuld" wrote:

That's correct. A call to another atom works exactly the same way from both
stressed and relaxed sections low level wise. Functionality wise, cascading
calls from consecutive stressed sections allows linking actions to be
performed by many objects into one strictly organized operation.



> 2. You don't list any restrictions, so should I assume that long
lasting

The ideas leading to this project came from writing a lot of embedded
real-time code. Hardware I/O is one area where this essentially very simple
scheme allows designing all interactions in data-flow like fashion, without
any processing loops. It also allows complex message/data
composition/decomposition done very naturally, in easy to comprehend simple
code.



On the input side stressflow atom invoked by hardware interrupt nicely
describes all that is necessary for defining proper response. While the
stressed section is being executed, the repeat of the same interrupt is
disallowed. A series of such hardware interrupts can be collecting parts of
the same, larger message. Only when a complete message is assembled, a
single call is made to the next stress-flow atom in the program tree, which
now can be assembling higher level message/events or interrelated sequence
of them.

Similarly, on the output side, multiple calls to the same atom will be
assembling complete blocks. Once they are completed and ready to send, a
single call is made that (depending on hardware) can transfer everything at
once or in sequential elements driven by, say, a timer or output hardware
interrupts.



This all sounds completely obvious, but without such approach, all RT code
design was anything but obvious. Without being able to easily organize
dataflow type dependencies of tasks running in parallel, the interrupts
would often collect the low level data and everything else was done by some
centralized loop or loops.




Yes, a single expression is meant to be a single action and this is how the
multi-step process of calling a group of atoms together is driven. Special
syntax could be used for it, but it is cleaner without it.



Example:

Call_A + Call_B; // Single expression

Call_A, Call_B; // Also single expression using a comma operator

Call_A; Call_B; // Two separate expressions



The comma operator isn't often used, but it is part of many languages,
nevertheless - and can be used here for obtaining the new functionality. In
the examples above, if Call_A and Call_B are both stressflow atoms, this
will mean that the whole thing can start if access to both called atoms is
obtained together. Internally, this is accomplished by means of the call to
the stressflow atom being a multi-step process and by placing the
reservation step for all calls out of the same expression into single prolog
of the whole expression.



What is done and working is the low level execution engine that allows
experimenting with and proving all these concepts. To begin with, a two
double-core processor system (4 fairly separate physical processors) was
emulated to appear to be a 100 or more "virtual" processors for running all
the stressflow programs. The standard OS multitasking interface was no good
and had to be, sort of, bypassed to allow efficient implementation of this
model.



This approach allowed proving of all the concepts, although in a little
 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Stephen Fu » Sat, 29 Apr 2006 14:26:35

"Slawomir J." < XXXX@XXXXX.COM > wrote in message
news:e2p423$sg3$ XXXX@XXXXX.COM ...

snip Q & A #1 Thanks, I got it.


OK. I was coming at it from a different angle, that is, as a way of helping
(primarrily application) programmers deal with a future of ubiquitous
multi-thread/multi-core processors without getting into what are today messy
alternatives. So I wasn't thinking embedded particularly (As an aside, are
there that many embedded applications that use lots of parallel processors?)
So when I was thinking I/O, I was talking about disk/file I/O, expressed
with typical language or OS call symantics. I'm not sure that changes too
much, but since a lot of systems use synchronous I/O, it isn't clear to me
that issuing the request in one atom and having another atom attached to the
completion is a comfortable mechanism for most application programmers.

snip


Isn't that what your connector structure does? I thought the idea here was
to be able to have the Call_B proceed even if the Call_A hadn't finished, or
even started its stressed section. In the example in the web page, this
would allow the start of the calculation of Y**2 by Call_B even if Call_A
wasn't available yet to start on the calculation of X**2.

snip


If you are talking atoms as small as the ones given in your examples, I can
certainly see that. Was that purely for exposition or are you expecting
atoms with only a few instructions? In the environment I was thinking
about, you would certainly want larger atoms to reduce the
creation/switching/destruction overhead and you would probably only have say
10s of processors anyway, at least at first.



I certainly agree.

--
- Stephen Fuld
e-mail address disguised to prevent spam


 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Slawomir J » Sun, 30 Apr 2006 19:19:07

Stephen Fuld" wrote:

The nice thing about the idea is that the same scheme can be used for both
low and high level design. This allows building objects that consist of many
stressflow atoms which to the high level designer are presented as a single
stressflow atom.



Ultimately, most complex systems do not behave like control-flow devices
because reality forces them to be autonomous or at least semi-autonomous
systems. But, we are used to forcing ourselves to look at them as control
flow devices which appear as if they could do entire complex task requested
in a single, instantly executed step. Disk controller is a good example as
disk operations are anything but single step, instant operations. Pretending
that they are makes a mess of things. If we were to use that model in real
life, this would me that whenever we, for example, order a book to come by
mail, we freeze until the book comes and be unable to do anything else.



Similarly, Code that does a lot of disk IO out of a single thread becomes
unresponsive, necessitating in professional code the fairly ugly, rarely
structural main thread/worker threads model. Also, in such a model, error
handling/reporting and timing out is very messy. Software exceptions
simplify that issue quite a bit, but dataflow-type layout could deal with it
much better.



This is where the other side of the coin shows up. Dataflow model could do a
better job, but we are not used to doing it that way. You were absolutely
right to notice it as likely uncomfortable in many situations. This is
exactly why dataflow languages (in spite of being hardware and parallelism
friendly by nature) have failed to find wide acceptance in mainstream
computing. The culprit (in my opinion) lies in the everything-or-nothing
attitude of these languages.



The answer, I think, lies in a hybrid model approach that fundamentally
allows natural coexistence of tasks/steps that need to be
sequential/synchronized with steps/tasks that are data-driven. Among many
other advantages, this would allow to gradually transform code all written
and seen as sequential/control flow into code that can take advantage of
large number of processors.



To deal with the situation where it is still necessary to write a routine
that has a single entry point and returns the result(s) to the caller rather
than passing it on down the graph, the additional "socket" language tool was
provided in stressflow definition. Using the same access/synchronization
mechanisms, it allows to define a body of a stressflow atom as part of
another routine or stressflow atom. This allows writing a routine from which
calculations are decomposed into many parallel-running tasks and the results
come back to the original routine that initiated all the calculations. An
example of this is provided in the specification under "Construct for
Interfacing with Legacy Code" heading.



http://www.stressflow.com/sw/sw_socket.htm





The connector simply allows distributing a single call into many recipients
which add themselves to the connector. Compound expressions do something
similar, but by explicitly indicating by name several atoms to be called as
part of the same operation.



But the rule still is that the stressflow section of the same routine of the
same object can only be executed by one task at a time. This means that in
both situations the caller is st
 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Stephen Fu » Tue, 02 May 2006 03:20:45

"Slawomir J." < XXXX@XXXXX.COM > wrote in message
news:e2vejd$4m9$ XXXX@XXXXX.COM ...

snip


Yes, I get that and it is indeed nice. It maintains the "decomposability"
of subroutines/libraries/objects into the parallel processing domain.

snip discussion of philosophy with which I generally agree.


Yes, but, unless I am missing something, your example isn't directly
applicable. In the example you give, you maintain a continuous "thread of
control" throughout the "long running process". This is not the case with
I/O. So you have the problem of "associating" the result of the I/O with
the particular process that requested it. (You typically don't have these
in embedded applications.) Let me give an example.

Let's assume that there is no conflicts, locking, transaction, etc. issues
to make this simple. Two users request data from a file. So you have two
independent and parallel processes each of which want to read data from
different places in a file. So each one calls some process that initiates
the I/O, passing to the OS (or the library), things like the address of the
memory location into which to read the data and the file relative address of
where to read and the length. Both I/Os get initiated and the process
either goes on to do other things or terminates. At some time in the
future, one of the I/O's finishes, but, in general, you don't know which
one. So you have some process that "catches" the completion event from the
OS. Now you have the problem of knowing how to insure that the results of
the I/O get back to the correct user. But you have lost the "thread of
control" or any information that would allow you to associate which "user"
process to activate with which I/O result. Unless you as the user create
some global table and fill it in before doing the I/O then have the I/O
completion process look up in this table what to do or what provess to call
(pretty ugly) I don't see how you solve this.


OK, so in the example we have been discussing, even though the calculations
are independent, if there is a queue for Call_A (which calculates X**2), you
won't begin the calculation of Y**2 until the routine that calculates X**2
has completed its stressed section for all previous results on the queue.
ISTM that is not performance optimal, but it may not be a big hit and may be
the price you pay for the other benefits.

--
- Stephen Fuld
e-mail address disguised to prevent spam


 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Slawomir J » Wed, 03 May 2006 01:35:51

Stephen Fuld" wrote: >



Yes, I misunderstood the question. Sorry.



What you are describing is the very problem that often made dataflow
languages impractical for much of complex systems programming. The key
underlying idea of dataflow was building program/computing system out of
nodes organized into some graph/diagram. Each node gets fired when set of
data for it arrives. This is the critical issue - for a node to be waiting
for data to fire it, it had by definition be there before the data to be fed
for it got generated. This was forcing certain "globalization" of the
execution graph not only in method/calculation performed, but also in data
and execution thread context ("static dataflow").



Therefore, in your example, if a programmer was to define some disk data
reading node, this node would have to be completely allocated before he
started sending data to it. This would most often mean that there would be
one such node: data and execution context wise. If now two operations were
to be fed to it: read(A) and read(B), the results could only be fed to a
common output and when available, they would have to be somehow correctly
identified and fed back to places where they were needed - as you indicated.



This is the most serious problem with the dafaflow processing graphs. In
such "static" dataflow machine, the data could not be fed before the
previous one got completely processed by a node, or else we would risk
mismatching sets of data.



But, this is NOT what the proposed method forces us to do. In such a case,
the proposed method can be used to replicate "dynamic dataflow" machine, and
if it is done in conjunction with Object-Oriented Programming methodology,
this can be done without any messy "tagging". The "this" pointer in a
roundabout way serves the purpose of a tag.



The proposed method is far more "dynamic" than the dynamic dataflow. Each
firing creates new execution context (new min-thread) as dynamic dataflow
does. But, in addition, this can be done with the same or different data
context, while a specific node's "execution dynamicity" is fully regulated
by placement of the stressed/relaxed barrier to allow any needed
synchronization.



Going back to your example. We need to perform two separate read data
operations. This means that we have two separate objects (A and B). Someone
calls A.read and someone else calls B.read. It does not matter what the read
operation does or how long it takes, or whether A.read happens much faster
than B.read. They are separate operations both execution and data-context
wise. They do not interfere with one another in any way. If there is some
completion function to be called, say "done", then A.read will call A.done
and B.read will call B.done. No need to tag the data or build any routing
tables. The only way where A.read and B.read would call the same completion
routine would be if this is specifically desired from the standpoint of
program topography - meaning - only if we specifically want the results to
be fed to the same target.



In your specific example, the "read" method can call regular "synchronous"
OS I/O routine. It does not matter that this will freeze for a while -
because each instance of "read" is always a separate mini-thread and
access-wise is regulated by data context - meaning A.read being frozen will
never in any shape or form stop B.read from sta
 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by David DiNu » Fri, 05 May 2006 11:30:52

lawomir J. wrote:


Excuse me for interrupting but...huh? You say "by definition" the node
has to be "there" waiting for the data? Why? Where? A dataflow graph
is just a data structure that tells how data and operations are related,
so that operations can fire when their needed data is available. Yes,
one way to implement such a data structure might be to statically embed
operations on the platform and to have them each wait (perhaps even
poll) for their input data, but to criticize dataflow as a whole because
you don't like one possible implementation of it won't get you far.

For the most part, dataflow certainly doesn't dictate "where" an
operation needs to execute, nor that the operation itself should consume
any resources at all until all of its input data is ready. These are
advantages of dataflow over threaded programming, and even apparently
over your proposal, where a node must maintain internal run state while
blocked waiting for completion of other nodes. In fact, with your
pre-reservation calls in your x^2+y^2 example, you clearly dictate that
the end node (D) maintain state all the time it's waiting for its inputs
to come down the line. And, the end node apparently must then manually
decide each time an x^2 or y^2 comes down the line whether its pair
exists so that it can perform the addition and release the reservation.
Stuff like that is done automatically in mature dataflow languages.

Because a dataflow graph is just a data structure, decisions of when and
where the operations will execute can be made dynamically by a runtime
system, taking into account not only which operations are ready to fire,
but also dynamic state of the platform such as processor load, faults,
network weather, etc. The stressflow implementations you recommend
apparently have no such flexibility.

Then again, you apparently try to make this sound like an advantage.
For example, on the web page, you say:

===
The key implementation rule of stressflow is this if atom A
communicates with atoms B and C, then all that is necessary to assure
synchronization and data cohesion between A, B, and C is direct
connection between A, B, and C. No execution layers or threads that
queue or sequence connections between A, B, and C together with any
other connections, and no iddlemenin between. Being able to
accomplish this, stressflow becomes superior internal implementation
method for any oftware wiringmethod.
===

I will assume that you believe ScalPL (formerly Software Cabling)
qualifies as a "software wiring method", making your final sentence
false. If one wants to use a sufficiently restricted subset of ScalPL
(though still as powerful as Stressflow), then one does not *need* a
runtime system to facilitate the interactions, and as I have mentioned
in the past, early implementations of Software Cabling ("F-Nets" or
"LGDF2") didn't use one. A runtime system is nice to have, though, for
(for example) exploiting platform dynamics as mentioned above.

There are other advantages to an execution layer. In real life, the A,
B, and C of your statement may not be so clearcut, and may not be known
until runtime. For example, some A may be related (at different times)
to any of B1, B2, B3, ..., B400000, and to any of C1, C2, .....C546000.
While it is often technically possible to treat all of these
dependences statically and to test and enforce each (combinati
 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Stephen Fu » Sat, 06 May 2006 05:08:22


snip


Not a problem.

snip your reply to save space

So the answer to my original question is: yes, ordinary synchronous I/O is
allowed within a single atom. I understand.



After thinking about your response, rereading the web site material and more
contemplation, I think understand what I was missing. Let me try to state
something in my own words and please tell me if I got it right.

We are talking about a general situation where A calls B and C. I am going
to discuss three ways that can happen. In each of them, the "rest of A",
that is, the part after the calls will not execute until the last of B or C
finishes its stressed section. The differences is in the potential
execution of B and C. In each case, to iillustrate the differences, I will
assume that there is a queue for B but not C.

1. Both calls are in a single statement (i.e. call B, call C in your
notation).

Result: B will start executing as soon as the queue is exhasuted; C will
start executing immediately.

2. The calls are in sequential statements within A (i.e. call B; call C in
your notation)

Result. B will start executing as soon as the queue is exhausted. When it
reaches the relaxed section, then A gets control in parallel and immediately
calls C. C then starts execution immediately (though in fact, after B has
finished its stressed section due to the sequentiality of the calls).

3. The calls are in a connector. That is, A calls a connector which calls
B and C.

Result. B will not start executing untill its queue is exhasuted. Since C
must wait for B to start (I am guessing this from the wording in section 1
of your web site), it will start as soon as B does, but not before.

I admit that I am still somewhat confused about connectors, so this last one
may be wrong.

Comments?

--
- Stephen Fuld
e-mail address disguised to prevent spam
 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Stephen Fu » Sat, 06 May 2006 05:16:20


snip


No, you are most welcome to chime in. I'm glad you did since, as a matter
of fact, I was beginning to feel as if no one else was interested. I am
glad that is not true as I feel that providing ways to more easily develop
parallel programs is very important for the progesss of our field. I don't
pretend to know the "correct" answer, nor what might be commecrially
successfull, but we have to do better than the limited applicability of
automatic vectorization and the more more generally applicable but much
harder to use general shared memory and message passing paradigms.

--
- Stephen Fuld
e-mail address disguised to prevent spam
 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Slawomir J » Sun, 07 May 2006 08:46:03

David DiNucci" wrote:




I specifically criticized static dataflow model, nothing else. This wasn
even my original criticism because I simply recited commonly held views on
limitations of static dataflow. The issue appeared over the discussion of
how much similarities were there between static dataflow model and the model
I have proposed (there aren any).






Again, this is completely incorrect. Stressflow (in terms of dataflow
terminology) is dynamic whenever it is desired meaning that specifically,
by definition it allows many instances of the same node to execute in
parallel within their relaxed sections. This in turn creates certain
pitfalls of dynamic dataflow type situations, which is also a very well
known issue. In dynamic dataflow implemented in hardware the problem is
resolved by token tagging and in emulated dataflow by centralized scheduling
queues/supervisors.



This issue is extremely important for following reasons: First it is easy
to not see the issue at first because many parallel computation problems do
not run into it. The problem only occurs when computation is decomposed into
many parallel running nodes with the results having to be recomposed into
singular node/result in a way where sequence of final results being
recomposed is of paramount importance.



For example: In matrix multiplication routines this issue does not occur
because the separate results are simply merged into different places of the
result matrix, while the order in which results are placed into the result
matrix is unimportant. The ^2+y^2example was specifically defined as
situation where calculation was fully dynamic and the results had to be
recomposed into the single node and in the right order. If the separate
results were to be stored in an array, rather than fed sequentially into a
single node, the issue would again be completely avoided. However, since
this issue pops up in many real-life problems and has to be dealt with.



Token tagging could work in this case, but with certain complications,
which, in my view, are completely avoided with the method I have proposed.



I would be very interested to know how this issue is dealt with in other
methods. As related to all the commercially available products that I have
examined the answer is they don which makes them either incapable of
non-static dataflow, or incapable of any real hardware parallelism
altogether. >

> Because a dataflow graph is just a data structure, decisions of when a>d
> where the operations will execute can be made dynamically by a runti>e
> system, taking into account not only which operations are ready to fir>,
> but also dynamic state of the platform such as processor load, fault>,
> network weather, etc. The stressflow implementations you recomme>d
> apparently have no such flexibility.



That is also incorrect. The issue of load assignment and balancing without
use of centralized scheduler is addressed in great detail in the
specification.



http://www.stressflow.com/im/im_parallel_hw.htm



It is my belief (supported by experimentation) that load balancing is best
done the same way, say water floods an area and not by any pre-planning and
guessing. Meaning just as the only requirement for the water to quickly
spill over the large empty area is lack of obstacles in the way, load
balancing can spill over naturally as w
 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by David Hopw » Sun, 07 May 2006 11:11:56


I think you're misreading what it probably intended to say: that *if* we have
a single processor [core], a different implementation is appropriate. Anyway,
I fixed it by changing the last sentence to:

On machines with a single processor core where an implementation designed
for parallel operation would simply introduce overhead, this overhead can
be removed completely by using a different runtime.

--
David Hopwood < XXXX@XXXXX.COM >
 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by David DiNu » Sun, 07 May 2006 13:59:02

tephen Fuld wrote:

The issue is not really whether anyone is interested, but whether they
are interested enough to do something about it. I've been making
progress only because I recognized long ago, after trying long enough to
obtain funding, that that funding was going to the wrong places and that
wasn't likely to change soon. The funding sources were criticizing my
proposals for the very things that made them more valuable than the
proposals being funded--e.g. that they weren't MPI, or that they have a
formal underpinning. So, if you can't join 'em, beat 'em, and that's
the tack I've taken ever since. I do not ponder whether my technology
is valuable, just when people will realize its value, and how best to
help them exploit it.

So, while I'm not surprised that the field has stagnated, I am somewhat
surprised that others are surprised, or that it took them so long to
even notice. A few weeks ago, Justin Rattner was quoted
(http://news.taborcommunications.com/msgget.jsp?mid=629783&xsl=story.xsl):
"A few years ago, when Intel asked me to look at what we should be doing
in HPC, I was struck by how little progress had been made on the
programming front," says Rattner. "That's my big disappointment. The
technologies that were popular a decade or more ago are still in
widespread use today. We're still programming in MPI and still working
on technologies like OpenMP. I had hoped and expected that after a
decade or more we really would have made some fundamental advancements
on the software side."

A few weeks before, Greg Pfister from IBM made a similar observation
(http://www.ieeetcsc.org/newsletters/2006-01/why_all_mpi_discussion.html):
"With just an hour or so of web surfing, I amassed a list of 80-100
different parallel language efforts, and those are mostly current,
active, efforts. ... There are probably at least another 100 pre-web.
So why, as a first approximation, are none used? Sure, some are used
somewhat, in some cases. But go to IPDPS or the like, and all you hear
about is (a) MPI - mostly; (b) OpenMP - much less so. Not even much on
auto-parallelization, recently."

The end of the latter link shows at least part of the answer: Those who
influence funding (at least in the US) apparently believe that MPI is
good enough. I have seen direct evidence of this myself. In '93 or so,
I proposed to NASA (my then employer), for a second or third time, that
they fund a Software Cabling project. While their paid consultant (a
Stanford faculty) was for it, to be "fair" they asked many NASA sites to
rank several of the proposed parallelism projects, SC included. The
highest rankings went to projects like MPI debugging tools, lowest went
to new programming approaches (like SC). Of programming approaches, the
most popular was (by far, apparently), HPF (akin to OpenMP).


Why? Who cares? Until recently, only a few niches cared. I think that
the questions we're starting to hear from Intel and IBM and others are
indications that somebody does now care, and the reasons are not secret.
We're bumping up against heat and power, and the difference between
today's chips and tomorrow's will not so much be in the single-thread
speed, but the number of threads/cores supported. If software can't
exploit the cores, then tomorrow's chips are not much more valuable for
running software than today's---not a good spot for the chip companies
to
 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by David DiNu » Sun, 07 May 2006 14:25:27

lawomir J. wrote:
<huge snip>

I'll address other parts of your post later, but briefly:

> example:> > > > +++++++++++++++++++++++++++++++++++++++++++++> > > > http://en.wikipedia.org/wiki/Dataflow_language> > > > Dataflow programs are generally represented very differently inside the > > computer as well. A traditional program is just what it seems, a series of > > instructions that run one after the other. A dataflow program is constructed > > as a big hash table instead, with uniquely identified inputs as the keys, > > and pointers to the code as data. When any operation completes, the program > > scans down the list until it finds the first operation where all of the > > inputs are currently valid, and runs it. When that operation finishes it > >> >will typically put data into one or more outputs, thereby making some other > > operation become valid.> > > > For parallel operation only the list needs to be shared; the list itself is > > the state of th> >entire program. Thus the task of maintaining state is > > removed from the programmer and given to the language's runtime instead. > > Better yet, the overhead of parallel execution> >the need to explicitly > > share the state between processors, can be removed completely on machines > > with a single processor simply by using a different runtime.> > > > ++++++++++++++++++++++++++++++++++++++++++++++

Yecchhh! I suppose I'm not allowed to complain too much about
Wikipedia, because if I care enough, I should be willing to change it
myself, but I completely agree that this totally and completely misses
the entire point. Even terms like "scanning" or "the state of the
entire program" imply serialization and locality, to mention nothing of
the idea that the use of a hash table is somehow intrinsic to dataflow.
None of these have anything to do with the concept of dataflow as I see it.
> When I read something like that Wikipedia article, it simply makes me go > > ballistic, because this is like saying that dataflow is best implemented on > > a single processor because it will not have the parallel overhead. I will > > agree, however, that trying to edit that Wikipedia definition might have > > been a simpler way to protest the whole silly nonsense. On the other hand, > > this heresy spilled over to many other publications I could not edit. So I>
> am trying to prove otherwise.

I agree that this is frustrating, and I think we probably agree in
general far more than we disagree. However, I think there are also
important differences in our approaches, and it will probably benefit us
both to understand why we have each gone in the directions we have.

[There are two other groups which seem well suited to this discussion,
comp.parallel and comp.distributed. Of those, I recommend at least
cross-posting to comp.distributed for two reasons: (1) because we are
both very concerned with the decentralized aspects, and that's what
comp.distributed is all about, and (2) because (like comp.arch and
unlike comp.parallel) it is not moderated, so posts have fewer potential
bottlenecks to clear.]

-Dave
---
http://www.elepar.com

 
 
 

RFC - New Object-Oriented Method of Parallel Programming (Shorter Version)

Post by Slawomir J » Sun, 07 May 2006 23:18:08


This is somewhat imprecise and if left at that it could lead to impression
that the scheme creates deadlocks rather than prevents them. A specific
instance of such compound (B,C) call executes only when it grabs ownership
of both B and C access mechanisms together and it never grabs any of them
unless it can grab both.



More specifically: This type of call is only used as B and C have to happen
strictly together as inseparable parts of the same larger operations.
Meaning, when the B reaches the state where it could execute its part of
such a specific waiting instance of compound B,C call (what I assume you
mean by exhausting the B queue), it tries to reserve both B and C access
mechanisms. If it fails, it loses the place at B and reserves one at C.
Meaning, if it is C that is currently most contested, all the requests
needing C will automatically line up there. At the same time, releasing B
now means that some compound expression needing B and, say, currently
available F, can now execute.








Yes, except the call to C also is subject to C ownership restrictions, same
as call to B if they are both stressflow atoms.







The compound type connector (it would be possible to build others) simply
allows defining the compound expression as a list to which target atoms add
themselves, rather than being called explicitly by name. The call to B and C
now happens by calling the list storing them (that has same interface as a
single atom). But, both B and C have to be reserved together for the
execution of both to begin as was the case with the explicit compound call.



Very Best Regards



Slawomir J. XXXX@XXXXX.COM