Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Esben Niel » Tue, 28 Dec 2004 23:50:05


haven't seen much traffic on real-time preemption lately. Is it due
to Christmas or lost interest?

I noticed that you changed rw-locks to behave quite diferently under
real-time preemption: They basicly works like normal locks now. I.e. there
can only be one reader task within each region. This can can however lock
the region recursively. I wanted to start looking at fixing that because
it ought to hurt scalability quite a bit - and even on UP create a few
unneeded task-switchs. However, the more I think about it the bigger the
problem:

First, let me describe how I see a read-write lock. It has 3 states:
a) unlocked
b) locked by n readers
c) locked by 1 writer
There can either be 1 writer within the protected region or n>=0
readers within the region. When a writer wants to take the lock,
calling down_write(), it has to wait until the read count is 0. When a
reader wants to take the lock, calling down_read(), he has only to wait
until the the writer is done - there is no need to wait for the other
readers.

Now in a real-time system down_X() ought to have a deterministic
blocking time. It should be easy to make down_read() deterministic: If
there is a writer let it inherit the calling readers priority.
However, down_write() is hard to make deterministic. Even if we assume
that the lock not only keeps track of the number of readers but keeps a
list of all the reader threads within the region it can traverse the list
and boost the priority of all those threads. If there is n readers when
down_write() is called the blocking time would be
O(ceil(n/#cpus))
time - which is unbounded as n is not known!

Having a rw-lock with deterministic down_read() but non-deterministic
down_write() would be very usefull in a lot of cases. The characteritic is
that the data structure being protected is relative static, is going
to be used by a lot of RT readers and the updates doesn't have to be done
with any real-time requirements.
However, there is no way to know in general which locks in the kernel can
be allowed to work like that and which can't. A good compromise would be
limit the number of readers in a lock by the number of cpu's on the
system. That would make the system scale over several CPUs without hitting
unneeded congestions on read-locks and still have a determnistic
down_write().

down_write() shall then do the following: Boost the priority of all the
active readers to the priority of the caller. This will in turn distribute
the readers over the cpu's of the system assuming no higher priority RT
tasks are running. All the reader tasks will then run to up_read() in
time O(1) as they can all run in parellel - assuming there is no ugly
nested locking ofcourse!
down_read() should first check if there is a writer. If there is
boost it and wait. If there isn't but there isn't room for another reader
boost one of the readers such it will run to up_read().

An extra bonus of not having the number of readers bounded: The various
structures needed for making the list of readers can be allocated once.
There is no need to call kmalloc() from within down_read() to get a list
element for the lock's list of readers.

I don't know wether I have time for coding this soon. Under all
circumstances I do not have a SMP system so I can't really test it if I
get time to code it :-(

Esben



On Tue, 14 Dec 2004, Ingo Molnar wrote:


-
To unsubscribe from this list: send the line "unsubs
 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Esben Niel » Wed, 29 Dec 2004 01:30:19


Oh, I thought I would have fun with some kernel programming in my
vacation! :-)

I agree with you that it ought to be configureable. But if you set it to
something like 100 it is _not_ deterministic in practise. I.e. during your
tests you have a really hard time making 100 readers at the critical
point. Most likely you only have a handfull. Then when you deploy the
system where you might have a webserver presenting data read under a
rw-lock is overloaded spawns 100 worker tasks. Bang! Your RT application
doesn't run!
It doesn't help you to have something bounded if the bound is insanely
high such you never reach it in tests. And if you try to calculate the
worst case scenarious for such a system you would find it doesn't
schedule. I.e. you have to buy a bigger CPU or do some other drastic
thing!

A limit like 1 reader per CPU on the other hand behaves like a
normal mutex priority inheritance wrt. determinism. And it scales like the
stock Linux rw-lock which in practise is also limited to 1 task per CPU as
preemption is switched off :-)


Well, these kind of things isn't something you want to combine with 3d
graphics right away anyway!
I have even had problems with crashing on 2.4.xx with a NVidia and
hyperthreading on a machine I helped install :-(
I am afraid NVidia cards and preempt realtime is far in the future :-(


Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/

 
 
 

Real-time rw-locks ( [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

Post by Esben Niel » Tue, 01 Feb 2005 07:20:10


I agree that RCU ought to do the trick in a lot of cases. Unfortunately,
people haven't used RCU in a lot of code but an rwlock. I also like the
idea of rwlocks: Many readers or just one writer. I don't see the need to
take that away from people. Here is an examble which even on a UP will
give problems without it:
You have a shared datastructure, rarely updated with many readers. A low
priority task is reading it. That is preempted a high priority task which
finds out it can't read it -> priority inheritance, task switch. The low
priority task finishes the job -> priority set back, task switch. If it
was done with a rwlock two task switchs would have been saved.


Yes it does. But one could make a compromise: The up_write() should _not_
be deterministic. In that case it would be very simple to implement.
up_read() could still be deterministic as it would only involve boosting
one writer in the rare case such exists. That kind of locking would be
very usefull in many real-time systems. Ofcourse, RCU can do the job as
well, but it puts a lot of contrains on the code.

However, as Linux is a general OS there is no way to know wether a
specific lock needs to be determnistic wrt. writing or not as the actual
application is not known at the time the lock type is specified.


Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/