Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Bill Hue » Wed, 29 Dec 2004 06:10:09



Doesn't the NVidia driver use their own version of DRM/DRI ?
If so, then did you tell it to use the Linux kernel versions of that
driver ?


I was just having a discussion about this last night with a friend
of mine and I'm going to pose this question to you and others.

Is a real-time enabled kernel still relevant for high performance
video even with GPUs being as fast as they are these days ?

The context that I'm working with is that I was told (been out of
*** for a long time now) that GPus are so fast these days that
shortage of frame rate isn't a problem any more. An RTOS would be
able to deliver a data/instructions to the GPU under a much tighter
time period and could delivery better, more consistent frame rates.

Does this assertion still apply or not ? why ? (for either answer)

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Valdis.Kle » Wed, 29 Dec 2004 07:00:16

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
On Mon, 27 Dec 2004 13:06:14 PST, Bill Huey said:


More to the point - can a RT kernel help *last* year's model? My
laptop only has a GeForce4 440Go (which is closer to a GeForce2 in
reality) and a 1.6Gz Pentium4. So it isn't any problem at all
to even find xscreensaver GL-hacks that bring it to its knees.

Even the venerable 'glxgears' drops down to about 40FPS in an 800x600
window. I'm sure the average game has a *lot* more polygons in it than
glxgears does. xscreensaver's 'sierpinski3d' drops down to 18FPS when it
gets up to 16K polygons.

Linux has long been reknowned for its ability to Get Stuff Done on much older
and less capable hardware than the stuff from Redmond. Got an old box that
crawled under W2K and Win/XP won't even install? Toss the current RedHat or
Suse on it, and it goes...

Would be *really* nice if we could find similar tricks on the graphics side. ;)


Shortage of frame rate is *always* a problem. No matter how many
millions of polygons/sec you can push out the door, somebody will want
to do N+25% per second. Ask yourself why SGI was *EVER* able to sell
a machine with more than one InfiniteReality graphics engine on it, and
then ask yourself what the people who were using 3 IR pipes 5-6 years
ago are looking to use for graphics *this* year.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Exmh version 2.5 07/13/2001

iD8DBQFB0IMgcC3lWbTT17ARAgwkAKDUaRGbUXkXgqnX+lJ+EiQ6rBeKZgCcDd3H
mBtPNFZ7JugUb2tfrsNaFP8=
=fITT
-----END PGP SIGNATURE-----

 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Andrew McG » Thu, 06 Jan 2005 09:20:09


It is if you want to do any audio at the same time, as you usually do.

Andrew

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Ingo Molna » Sat, 29 Jan 2005 16:50:08


correct.


no, it's not a big scalability problem. rwlocks are really a mistake -
if you want scalability and spinlocks/semaphores are not enough then one
should either use per-CPU locks or lockless structures. rwlocks/rwsems
will very unlikely help much.


yes, that complexity to get it perform in a deterministic manner is why
i introduced this (major!) simplification of locking. It turns out that
most of the time the actual use of rwlocks matches this simplified
'owner-recursive exclusive lock' semantics, so we are lucky.

look at what kind of worst-case scenarios there may already be with
multiple spinlocks (blocker.c). With rwlocks that just gets insane.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by William Le » Sat, 29 Jan 2005 21:00:18


I wouldn't be so sure about that. SGI is already implicitly relying on
the parallel holding of rwsems for the lockless pagefaulting, and
Oracle has been pushing on mapping->tree_lock becoming an rwlock for a
while, both for large performance gains.






tasklist_lock is one large exception; it's meant for concurrency there,
and it even gets sufficient concurrency to starve the write side.

Try test_remap.c on mainline vs. -mm to get a microbenchmark-level
notion of the importance of mapping->tree_lock being an rwlock (IIRC
you were cc:'d in at least some of those threads).

net/ has numerous rwlocks, which appear to frequently be associated
with hashtables, and at least some have some relevance to performance.

Are you suggesting that lockless alternatives to mapping->tree_lock,
mm->mmap_sem, and tasklist_lock should be pursued now?


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Ingo Molna » Sun, 30 Jan 2005 00:40:10


i dont really buy it. Any rwlock-type of locking causes global cacheline
bounces. It can make a positive scalability difference only if the
read-lock hold time is large, at which point RCU could likely have
significantly higher performance. There _may_ be an intermediate locking
pattern that is both long-held but has a higher mix of write-locking
where rwlocks/rwsems may have a performance advantage over RCU or
spinlocks.

Also this is about PREEMPT_RT, mainly aimed towards embedded systems,
and at most aimed towards small (dual-CPU) SMP systems, not the really
big systems.

But, the main argument wrt. PREEMPT_RT stands and is independent of any
scalability properties: rwlocks/rwsems have so bad deterministic
behavior that they are almost impossible to implement in a sane way.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by William Le » Sun, 30 Jan 2005 01:10:09


The performance relative to mutual exclusion is quantifiable and very
reproducible. These results have people using arguments similar to what
you made above baffled. Systems as small as 4 logical cpus feel these
effects strongly, and it appears to scale almost linearly with the
number of cpus. It may be worth consulting an x86 processor architect
or similar to get an idea of why the counterargument fails. I'm rather
interested in hearing why as well, as I believed the cacheline bounce
argument until presented with incontrovertible evidence to the contrary.

As far as performance relative to RCU goes, I suspect cases where
write-side latency is important will arise for these. Other lockless
methods are probably more appropriate, and are more likely to dominate
rwlocks as expected. For instance, a reimplementation of the radix
trees for lockless insertion and traversal (c.f. lockless pagetable
patches for examples of how that's carried out) is plausible, where RCU
memory overhead in struct page is not.




I suppose if it's not headed toward mainline the counterexamples don't
really matter. I don't have much to say about the RT-related issues,
though I'm aware of priority inheritance's infeasibility for the
rwlock/rwsem case.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Ingo Molna » Sun, 30 Jan 2005 01:20:10


yes, i dont doubt the results - my point is that it's not proven that
the other, more read-friendly types of locking underperform rwlocks.
Obviously spinlocks and rwlocks have the same cache-bounce properties,
so rwlocks can outperform spinlocks if the read path overhead is higher
than that of a bounce, and reads are *** . But it's still a poor
form of scalability. In fact, when the read path is really expensive
(larger than say 10-20 usecs) an rwlock can produce the appearance of
linear scalability, when compared to spinlocks.


yeah.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Ingo Molna » Sun, 30 Jan 2005 05:20:10


it seems the most scalable solution for this would be a global flag plus
per-CPU spinlocks (or per-CPU mutexes) to make this totally scalable and
still support the requirements of this rare event. An rwsem really
bounces around on SMP, and it seems very unnecessary in the case you
described.

possibly this could be formalised as an rwlock/rwlock implementation
that scales better. brlocks were such an attempt.


nono, i have no such plans.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Bill Hue » Sun, 30 Jan 2005 08:40:09


From how I understand it, you'll have to have a global structure to
denote an exclusive operation and then take some additional cpumask_t
representing the spinlocks set and use it to iterate over when doing a
PI chain operation.

Locking of each individual parametric typed spinlock might require
a raw_spinlock manipulate lists structures, which, added up, is rather
heavy weight.

No only that, you'd have to introduce a notion of it being counted
since it could also be aquired/preempted by another higher priority
thread on that same procesor. Not having this semantic would make the
thread in that specific circumstance effectively non-preemptable (PI
scheduler indeterminancy), where the mulipule readers portion of a
real read/write (shared-exclusve) lock would have permitted this.

http://www.yqcomputer.com/ ~bhuey/rt-share-exclusive-lock/rtsem.tgz.1208

Is our attempt at getting real shared-exclusive lock semantics in a
blocking lock and may still be incomplete and buggy. Igor is still
working on this and this is the latest that I have of his work. Getting
comments on this approach would be a good thing as I/we (me/Igor)
believed from the start that this approach is correct.

Assuming that this is possible with the current approach, optimizing
it to avoid CPU ping-ponging is an important next step

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Kyle Moffe » Tue, 01 Feb 2005 09:10:09

For anybody who wants a good executive summary of RCU, see these:
http://www.yqcomputer.com/
http://www.yqcomputer.com/ #WHATIS



Well, RCU is nice because as long as there are no processes attempting
to
modify the data, the performance is as though there was no locking at
all,
which is better than the cacheline-bouncing for rwlock read-acquires,
which must modify the rwlock data every time you acquire. It's only
when
you need to modify the data that readers or other writers must repeat
their calculations when they find out that the data's changed. In the
case of a reader and a writer, the performance reduction is the same as
a
cmpxchg and the reader redoing their calculations (if necessary).


With RCU the high priority task (unlikely to be preempted) gets to run
all
the way through with its calculation, and any low priority tasks are the
ones that will probably need to redo their calculations.


Yeah, unfortunately it's harder to write good reliable RCU code than
good
reliable rwlock code, because the semantics of RCU WRT memory access are
much more difficult, so more people write rwlock code that needs to be
cleaned up. It's not like normal locking is easily comprehensible
either.
:-\

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r
!y?(-)
------END GEEK CODE BLOCK------


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/