[RFC] cpuset: remove sched domain hooks from cpusets

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Paul Jacks » Fri, 20 Oct 2006 18:30:11


rom: Paul Jackson < XXXX@XXXXX.COM >

Remove the cpuset hooks that defined sched domains depending on the
setting of the 'cpu_exclusive' flag.

The cpu_exclusive flag can only be set on a child if it is set on
the parent.

This made that flag painfully unsuitable for use as a flag defining
a partitioning of a system.

It was entirely unobvious to a cpuset user what partitioning of sched
domains they would be causing when they set that one cpu_exclusive bit
on one cpuset, because it depended on what CPUs were in the remainder
of that cpusets siblings and child cpusets, after subtracting out
other cpu_exclusive cpusets.

Furthermore, there was no way on production systems to query the
result.

Using the cpu_exclusive flag for this was simply wrong from the get go.

Fortunately, it was sufficiently borked that so far as I know, no
one has made much use of this feature, past the simplest case of
isolating some CPUs from scheduler balancing. A future patch will
propose a simple mechanism for this simple case.

Furthermore, since there was no way on a running system to see what
one was doing with sched domains, this change will be invisible to
any using code. Unless they have deep insight to the scheduler load
balancing choices, they will be unable to detect that this change
has been made in the kernel's behaviour.

Signed-off-by: Paul Jackson < XXXX@XXXXX.COM >

---

Documentation/cpusets.txt | 17 ---------
include/linux/sched.h | 3 -
kernel/cpuset.c | 84 +---------------------------------------------
kernel/sched.c | 27 --------------
4 files changed, 2 insertions(+), 129 deletions(-)

--- 2.6.19-rc1-mm1.orig/kernel/cpuset.c 2006-10-19 01:47:50.000000000 -0700
+++ 2.6.19-rc1-mm1/kernel/cpuset.c 2006-10-19 01:48:10.000000000 -0700
@@ -754,68 +754,13 @@ static int validate_change(const struct
}

/*
- * For a given cpuset cur, partition the system as follows
- * a. All cpus in the parent cpuset's cpus_allowed that are not part of any
- * exclusive child cpusets
- * b. All cpus in the current cpuset's cpus_allowed that are not part of any
- * exclusive child cpusets
- * Build these two partitions by calling partition_sched_domains
- *
- * Call with manage_mutex held. May nest a call to the
- * lock_cpu_hotplug()/unlock_cpu_hotplug() pair.
- * Must not be called holding callback_mutex, because we must
- * not call lock_cpu_hotplug() while holding callback_mutex.
- */
-
-static void update_cpu_domains(struct cpuset *cur)
-{
- struct cpuset *c, *par = cur->parent;
- cpumask_t pspan, cspan;
-
- if (par == NULL || cpus_empty(cur->cpus_allowed))
- return;
-
- /*
- * Get all cpus from parent's cpus_allowed not part of exclusive
- * children
- */
- pspan = par->cpus_allowed;
- list_for_each_entry(c, &par->children, sibling) {
- if (is_cpu_exclusive(c))
- cpus_andnot(pspan, pspan, c->cpus_allowed);
- }
- if (!is_cpu_exclusive(cur)) {
- cpus_or(pspan, pspan, cur->cpus_allowed);
- if (cpus_equal(pspan, cur->cpus_allowed))
- return;
- cspan = CPU_MASK_NONE;
- } else {
- if (cpus_empty(pspan))
- return;
- cspan = cur->cpus_allowed;
- /*
- * Get all cpus from current cpuset's cpus_allowed not part
- * of exclusive children
- */
- list_for_each_entry(c, &cur->children, sibling) {
- if (is_cpu_exclusive(c))
- cpus_andnot(cspan, cspan, c->cpus_allo
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Nick Piggi » Fri, 20 Oct 2006 19:30:16

This is a multi-part message in MIME format.
Send instant messages to your online friends http://www.yqcomputer.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/


Before we chuck the baby out...


Sigh, it isn't. That's simply how cpusets tried to use it.


As far as a user is concerned, the cpusets is the interface. domain
partitioning is an implementation detail that just happens to make
it work better in some cases.


You shouldn't need to, assuming cpusets doesn't mess it up.

Here is an untested patch. Apparently takes care of CPU hotplug too.

--
SUSE Labs, Novell Inc.

 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Paul Jacks » Sat, 21 Oct 2006 04:10:07


I'm guessing we're agreeing that the routines update_cpu_domains()
and related code in kernel/cpuset.c are messing things up.

I view that code as a failed intrustion of some sched domain code into
cpusets, and apparently you view that code as a failed attempt to
manage sched domains coming from cpusets.

Oh well ... finger pointing is such fun ;).

(Fortunately I've forgotten who wrote these routines ... best
I don't know. Whoever you are, don't take it personally. It
was nice clean code, caught between the rock and the flood.)



So ... instead of throwing the baby out, you want to replace it
with a puppy. If one attempt to overload cpu_exclusive didn't
work, try another.

I have two problems with this.

1) I haven't found any need for this, past the need to mark some
CPUs as isolated from the scheduler balancing code, which we
seem to be agreeing on, more or less, on another patch.

Please explain why we need this or any such mechanism for user
space to affect sched domain partitioning.

2) I've had better luck with the cpuset API by adding new flags
when I needed some additional semantics, rather than overloading
existing flags. So once we figure out what's needed and why,
then odds are I will suggest a new flag, specific to that purpose.

This new flag might well logically depend on the cpu_exclusive
setting, if that's useful. But it would probably be a separate
flag or setting.

I dislike providing explicit mechanisms via implicit side affects.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson < XXXX@XXXXX.COM > 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Nick Piggi » Sat, 21 Oct 2006 04:30:09

aul Jackson wrote:

At the moment they are, yes.


:)

I don't know about finger pointing, but the sched-domains partitioning
works. It does what you ask of it, which is to partition the
multiprocessor balancing.


It isn't overloading anything. Your cpusets code has assigned a
particular semantic to cpu_exclusive. It so happens that we can
take advantage of this knowledge in order to do a more efficient
implementation.

It doesn't suddenly become a flag to manage sched-domains; its
semantics are completely unchanged (modulo bugs). The cpuset
interface semantics have no connection to sched-domains.

Put it this way: you don't think your code is currently
overloading the cpuset cpus_allowed setting in order to set the
task's cpus_allowed field, do you? You shouldn't need a flag to
tell it to set that, it is all just the mechanism behind the
policy.


Until very recently, the multiprocessor balancing could easily be very
stupid when faced with cpus_allowed restrictions. This is somewhat
fixed, but it is still suboptimal compared to a sched-domains partition
when you are dealing with disjoint cpusets.

It is mostly SGI who seem to be running into these balancing issues, so
I would have thought this would be helpful for your customers primarily.

I don't know of anyone else using cpusets, but I'd be interested to know.


There is no new semantic beyond what is already specified by
cpu_exclusive.


This is more like providing a specific implementation for a given
semantic.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Martin Bli » Sat, 21 Oct 2006 05:00:15


We (Google) are planning to use it to do some partitioning, albeit on
much smaller machines. I'd really like to NOT use cpus_allowed from
previous experience - if we can get it to to partition using separated
sched domains, that would be much better.

From my dim recollections of previous discussions when cpusets was
added in the first place, we asked for exactly the same thing then.
I think some of the problem came from the fact that "exclusive"
to cpusets doesn't actually mean exclusive at all, and they're
shared in some fashion. Perhaps that issue is cleared up now?
/me crosses all fingers and toes and prays really hard.

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Paul Jacks » Sat, 21 Oct 2006 09:20:11


Are you saying that you wished that cpusets was not implemented using
cpus_allowed, but -instead- implemented using sched domain partitioning?

Well, as you likely can guess by now, that's unlikely.

Cpusets provides hierarchically nested sets of CPU and Memory Nodes,
especially useful for managing nested allocation of processor and
memory resources on large systems. The essential mechanism at the core
of cpusets is manipulating the cpus_allowed and mems_allowed masks in
each task.

Cpusets have also been dabbling in the business of driving the sched
domain partitioning, but I am getting more inclined as time goes on to
think that was a mistake.



What are you asking for again? ;).

Are you asking for a decent interface to sched domain partitioning?

Perhaps cpusets are not the best way to get that.

I hear tell from my colleague Christoph Lameter that he is considering
trying to make some improvements, that would benefit us all, to the
sched domain partitioning code - smaller, faster, simpler, better and
all that good stuff. Perhaps you guys at Google should join in that
effort, and see to it that your needs are met as well. I would
recommend providing whatever kernel-user API's you need for this, if
any, separately from cpusets.

So far, the requirements that I am aware of on such an effort:
1) Somehow support isolated CPUs (no load balancing to or from them).
For example, at least one real-time project needs these.
2) Whatever you were talking about above that Google is planning, some
sort of partitioning.
3) Somehow, whether by magic or by implicit or explicit partitioning
of the systems CPUs, ensure that its load balancing scales to cover
my employers (SGI) big CPU count systems.
4) Hopefully smaller, less #ifdef'y and easier to understand than the
current code.
5) Avoid poor fit interactions with cpusets, which have a different
shape (naturally hierarchical), internal mechanism (allowed bitmasks
rather than scheduler balancing domains), scope (combined processor
plus memory) and natural API style (a full fledged file system to
name these sets, rather than a few bitmasks and flags.)

Good luck.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson < XXXX@XXXXX.COM > 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Nick Piggi » Sun, 22 Oct 2006 01:10:12


The I believe, is that an exclusive cpuset can have an exclusive parent
and exclusive children, which obviously all overlap one another, and
thus you have to do the partition only at the top-most exclusive cpuset.

Currently, cpusets is creating partitions in cpus_exclusive children as
well, which breaks balancing for the parent.

The patch I posted previously should (modulo bugs) only do partitioning
in the top-most cpuset. I still need clarification from Paul as to why
this is unacceptable, though.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://www.yqcomputer.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Paul Jacks » Sun, 22 Oct 2006 04:10:13

gt; The patch I posted previously should (modulo bugs) only do partitioning

That patch partitioned on the children of the top cpuset, not the
top cpuset itself.

There is only one top cpuset - and that covers the entire system.

Consider the following example:

/dev/cpuset cpu_exclusive=1, cpus=0-7, task A
/dev/cpuset/a cpu_exclusive=1, cpus=0-3, task B
/dev/cpuset/b cpu_exclusive=1, cpus=4-7, task C

We have three cpusets - the top cpuset and two children, 'a' and 'b'.

We have three tasks, A, B and C. Task A is running in the top cpuset,
with access to all 8 cpus on the system. Tasks B and C are each in
a child cpuset, with access to just 4 cpus.

By your patch, the cpu_exclusive cpusets 'a' and 'b' partition the
sched domains in two halves, each covering 4 of the systems 8 cpus.
(That, or I'm still a sched domain idiot - quite possible.)

As a result, task A is screwed. If it happens to be on any of cpus
0-3 when the above is set up and the sched domains become partitioned,
it will never be considered for load balancing on any of cpus 4-7.
Or vice versa, if it is on any of cpus 4-7, it has no chance of
subsequently running on cpus 0-3.

If your patch had been just an implicit optimization, benefiting
sched domains, by optimizing for smaller domains when it could do so
without any noticable harm, then it would at least be neutral, and
we could continue the discussion of that patch to ask if it provided
an optimization that helped enough to be worth doing.

But that's not the case, as the above example shows.

I do not see any way to harmlessly optimize sched domain partitioning
based on a systems cpuset configuration.

I am not aware of any possible cpuset configuration that defines a
partitioning of the systems cpus. In particular, the top cpuset
always covers all online cpus, and any task in that top cpuset can
run anywhere, so far as cpusets is concerned.

So ... what can we do.

What -would- be a useful partitioning of sched domains?

Not being a sched domain wizard, I can only hazard a guess, but I'd
guess it would be a partitioning that significantly reduced the typical
size of a sched domain below the full size of the system (apparently it
is quicker to balance several smaller domains than one big one), while
not cutting off any legitimate load balancing possibilities.

The static cpuset configuration doesn't tell us this (see the top
cpuset in the example above), but if one combined that with knowledge
of which cpusets had actively running jobs that should be load
balanced, then that could work.

I doubt we could detect this (which cpusets did or did not need to be
load balanced) automatically. We probably need to have user code tell
us this. That was the point of my patch that started this discussion
several days ago, adding explicit 'sched_domain' flag files to each
cpuset, so user code could mark the cpusets needing to be balanced.

Since proposing that patch, I've changed my recommendation. Instead
of using cpusets to drive sched domain partitioning, better to just
provide a separate API, specific to the needs of sched domains, by
which user code can partition sched domains. That, or make the
balancing fast enough, even on very large domains, that we don't need
to partition. If we do have to partition, it would basically be for
performance reasons, and since I don't see any automatic way to
correctly partition sched domains, I thin
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Paul Jacks » Sun, 22 Oct 2006 04:30:10


See the reply I just posted to Nick on this.

His patch didn't partition at the top cpuset, but at its children.
It could not have done any better than that.

The top cpuset covers all online cpus on the system, which is the
same as the default sched domain partition. Partitioning there
would be a no-op, producing the same one big partition we have now.

Partitioning at any lower level, even just the immediate children
of the root cpuset as Nick's patch does, breaks load balancing for
any tasks in the top cpuset.

And even if for some strange reason that weren't a problem, still
partitioning at the level of the immediate children of the root cpuset
doesn't help much on a decent proportion of big systems. Many of my
big systems run with just two cpusets right under the top cpuset, a
tiny cpuset (say 4 cpus) for classic Unix daemons, cron jobs and init,
and a huge (say 1020 out of 1024 cpus) cpuset for the batch scheduler
to slice and dice, to sub-divide into smaller cpusets for the various
jobs and other needs it has.

These systems would still suffer from any performance problems we had
balancing a huge sched domain. Presumably the pain of balancing a
1020 cpu partition is not much less than it is for a 1024 cpu partition.

So, regrettably, Nick's patch is both broken and useless ;).

Only a finer grain sched domain partitioning, that accurately reflects
the placement of active jobs and tasks needing load balancing, is of
much use here.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson < XXXX@XXXXX.COM > 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Dinakar Gu » Sun, 22 Oct 2006 05:40:15

i Paul,

This mail seems to be as good as any to reply to, so here goes

On Fri, Oct 20, 2006 at 12:00:05PM -0700, Paul Jackson wrote:

ok I see the issue here, although the above has been the case all along.
I think the main issue here is that most of the users dont have to
do more than one level of partitioning (having to partitioning a system
with not more than 16 - 32 cpus, mostly less) and it is fairly easy to keep
track of exclusive cpusets and task placements and this is not such
a big problem at all. However I can see that with 1024 cpus this is not
trivial anymore to remember all of the partitioning especially if the
partioning is more than 2 levels deep and that its gets unwieldy

So I propose the following changes to cpusets

1. Have a new flag that takes care of sched domains. (say sched_domain)
Although I still think that we can still tag sched domains at the
back of exclusive cpusets, I think it best to separate the two
and maybe even add a separate CONFIG option for this. This way we
can keep any complexity arising out of this, such as hotplug/sched
domains all under the config.
2. The main change is that we dont allow tasks to be added to a cpuset
if it has child cpusets that also have the sched_domain flag turned on
(Maybe return a EINVAL if the user tries to do that)

Clearly one issue remains, tasks that are already running at the top cpuset.
Unless these are manually moved down to the correct cpuset heirarchy they
will continue to have the problem as before. I still dont have a simple
enough solution for this at the moment other than to document this at the
moment. But I still think on smaller systems this should be fairly easy task
for the administrator if they really know what they are doing. And the
fact that we have a separate flag to indicate the sched domain partitioning
should make it harder for them to shoot themselves in the foot.
Maybe there are other better ways to resolve this ?

One point I would argue against is to completely decouple cpusets and
sched domains. We do need a way to partition sched domains and doing
it along the lines of cpusets seems to be the most logical. This is
also much simpler in terms of additional lines of code needed to support
this feature. (as compared to adding a whole new API just to do this)

-Dinakar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Paul Jacks » Sun, 22 Oct 2006 06:50:11


I take it you are looking for some reasonable and acceptable
constraints to place on cpusets, sufficient to enable us to
make it impossible (or at least difficult) to botch the
load balancing.

You want to make it difficult to split an active cpuset, so as
to avoid the undesirable limiting of load balancing across such
partition boundaries.

I doubt we can find a way to do that. We'll have to let our
users make a botch of it.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson < XXXX@XXXXX.COM > 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Paul Jacks » Sun, 22 Oct 2006 06:50:15

gt; One point I would argue against is to completely decouple cpusets and

The "simpler" (fewer code lines) point I can certainly agree with.

The "most logical" point I go back and forth on.

The flat partitions, forming a complete, non-overlapping cover, needed
by sched domains can be mapped to selected cpusets in their nested
hierarchy, if we impose the probably reasonable constraint that for
any cpuset across which we require to load balance, we would want that
cpusets cpus to be entirely contained within a single sched domain
partition.

Earlier, such as last week and before, I had been operating under the
assumption that sched domain partitions were hierarchical too, so that
just because a partition boundary ran right down the middle of my most
active cpuset didn't stop load balancing across that boundary, but just
perhaps slowed load balancing down a bit, as it would only occur at some
higher level in the partition hierarchy, which presumably balanced less
frequently. Apparently this sched domain partition hierarchy was a
figment of my over active imagination, along with the tooth fairy and
Santa Claus.

Anyhow, if we consider that constraint (don't split or cut an active
cpuset across partitions) not only reasonable, but desirable to impose,
then integrating the sched domain partitioning with cpusets, as you
describe, would indeed seem "most logical."


This I would not like. It's ok to have tasks in cpusets that are
cut by sched domain partitions (which is what I think you were getting
at), just so long as one doesn't mind that they don't load balance
across the partition boundaries.

For example, we -always- have several tasks per-cpu in the top cpuset.
These are the per-cpu kernel threads. They have zero interest in
load balancing, because they are pinned on a cpu, for their life.

Or, for a slightly more interesting example, one might have a sleeping
job (batch scheduler sent SIGPAUSE to all its threads) that is in a
cpuset cut by the current sched domain partitioning. Since that job is
not running, we don't care whether it gets good load balancing services
or not.

I still suspect we will just have to let the admin partition their
system as they will, and if they screw up their load balancing,
the best we can do is to make all this as transparent and simple
and obvious as we can, and wish them well.

One thing I'm sure of. The current (ab)use of the 'cpu_exclusive' flag
to define sched domain partitions is flunking the "transparent, simple
and obvious" test ;).


Could you (or some sched domain wizard) explain to me why we would even
want sched domain partitions on such 'small' systems? I've been operating
under the (mis?)conception that these sched domain partitions were just
a performance band-aid for the humongous systems, where load balancing
across say 1024 CPUs was difficult to do efficiently.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson < XXXX@XXXXX.COM > 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Dinakar Gu » Sun, 22 Oct 2006 07:40:16

n Fri, Oct 20, 2006 at 02:41:53PM -0700, Paul Jackson wrote:

I cannot think of any reason why this change would affect per-cpu tasks.


ok here's when I think a system administrator would want to partition
sched domains. If there is an application that is very sensitive to
performance and latencies and would have very low tolerance for
interference from any other code running on the cpus, then the
admin would partition the sched domain and separate this application
from the rest of the system. (per-cpu threads obviously will
continue to run in the same domain as the app)

So in this example, clearly there is no sense in letting a batch job
run in the same sched domain as our application. Now lets say if our
performance and latency sensitive application only runs during the
day, then the admin can turn off the sched domain flag and tear down
the sched domain for the night. This will then enable the batch job
running in the parent cpuset to get a chance to run on all the cpus.

Returning -EINVAL when trying to attach a job to the top cpuset when
it has a child cpuset(s) that has the sched_domain flag turned on, would
mean that the administrator would know that s/he does not have all of
the cpus in that cpuset for their use. However by attaching jobs
(such as the batch job in your example) to the top cpuset, before
doing any sched domain partitioning would mean that they make the best
use of resources as well (sort of a backdoor). However if you feel
that this puts too much of a restriction on the admin for creating
tasks such as the batch job, then we would have to do without it
(just documenting the sched_domain and its effects)


I think this is a case of one set of folks talking <32 cpu systems
and another set talking >512 cpu systems


Well it makes a difference for applications that have a RT/performance
sensitive componant that needs a sched domain of its own

-Dinakar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
 
 
 

[RFC] cpuset: remove sched domain hooks from cpusets

Post by Siddha, Su » Sun, 22 Oct 2006 08:40:06

How about something like use_cpus_exclusive flag in cpuset?

And whenever a child cpuset sets this use_cpus_exclusive flag, remove
those set of child cpuset cpus from parent cpuset and also from the
tasks which were running in the parent cpuset. We can probably allow this
to happen as long as parent cpuset has atleast one cpu.

And if this use_cpus_exclusive flag is cleared in cpuset, its pool of cpus will
be returned to the parent. We can perhaps have cpus_owned inaddition to
cpus_allowed to reflect what is being exclusively used
and owned(which combines all the exclusive cpus used by the parent and children)

So effectively, a sched domain parition will get defined for each
cpuset having 'use_cpus_exclusive'.

And this is mostly inline with what anyone can expect from exclusive
cpu usage in a cpuset right?

Job manager/administrator/owner of the cpusets can set/reset the flags
depending on what cpusets/jobs are active.

Paul will this address your needs?

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/