[PATCH] cpuset confine pdflush to its cpuset

[PATCH] cpuset confine pdflush to its cpuset

Post by Paul Jacks » Tue, 25 Oct 2005 09:30:10


This patch keeps pdflush daemons on the same cpuset as their
parent, the kthread daemon.

Some large NUMA configurations put as much as they can of
kernel threads and other classic Unix load in what's called a
bootcpuset, keeping the rest of the system free for dedicated
jobs.

This effort is thwarted by pdflush, which dynamically destroys
and recreates pdflush daemons depending on load.

It's easy enough to force the originally created pdflush deamons
into the bootcpuset, at system boottime. But the pdflush
threads created later were allowed to run freely across the
system, due to the necessary line in their startup kthread():

set_cpus_allowed(current, CPU_MASK_ALL);

By simply coding pdflush to start its threads with the
cpus_allowed restrictions of its cpuset (inherited from kthread,
its parent) we can ensure that dynamically created pdflush
threads are also kept in the bootcpuset.

On systems w/o cpusets, or w/o a bootcpuset implementation,
the following will have no affect, leaving pdflush to run on
any CPU, as before.

Signed-off-by: Paul Jackson < XXXX@XXXXX.COM >

---

mm/pdflush.c | 13 +++++++++++++
1 files changed, 13 insertions(+)

--- 2.6.14-rc4-mm1-cpuset-patches.orig/mm/pdflush.c 2005-10-17 22:39:41.033879927 -0700
+++ 2.6.14-rc4-mm1-cpuset-patches/mm/pdflush.c 2005-10-23 17:17:03.720802617 -0700
@@ -20,6 +20,7 @@
#include <linux/fs.h> // Needed by writeback.h
#include <linux/writeback.h> // Prototypes pdflush_operation()
#include <linux/kthread.h>
+#include <linux/cpuset.h>


/*
@@ -170,12 +171,24 @@ static int __pdflush(struct pdflush_work
static int pdflush(void *dummy)
{
struct pdflush_work my_work;
+ cpumask_t cpus_allowed;

/*
* pdflush can spend a lot of time doing encryption via dm-crypt. We
* don't want to do that at keventd's priority.
*/
set_user_nice(current, 0);
+
+ /*
+ * Some configs put our parent kthread in a limited cpuset,
+ * which kthread() overrides, forcing cpus_allowed == CPU_MASK_ALL.
+ * Our needs are more modest - cut back to our cpusets cpus_allowed.
+ * This is needed as pdflush's are dynamically created and destroyed.
+ * The boottime pdflush's are easily placed w/o these 2 lines.
+ */
+ cpus_allowed = cpuset_cpus_allowed(current);
+ set_cpus_allowed(current, cpus_allowed);
+
return __pdflush(&my_work);
}


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson < XXXX@XXXXX.COM > 1.650.933.1373
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[PATCH] cpuset confine pdflush to its cpuset

Post by Hirokazu T » Tue, 25 Oct 2005 15:10:03

Hi Paul,

I realized CPUSETS has another problem around pdflush.

Some cpuset may make most of pages in it dirty, while the others don't.
In this case, pdflush may not start since the ratio of the dirty pages
in the box may be less than the watermark, which is defined globally.
This may probably make it hard to allocate pages from the cpuset
or the nodes it depends on. This wouldn't be good for NUMA machine
without cpusets either.

Do you have any plans about it?



Thanks,
Hirokazu Takahashi.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/

 
 
 

[PATCH] cpuset confine pdflush to its cpuset

Post by Andrew Mor » Tue, 25 Oct 2005 15:50:10


Per-zone dirty thresholds (quite messy), per-zone writeback (horrific,
linear searches or data structure proliferation everywhere).

Let's see a (serious) worload/testcase first, hey? vmscan.c writeback off
the LRU is a bit slow, but we should be able to make it suffice.


"page dirty"? It's what bdflush became when writeback went from
being block-based to being page-based.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[PATCH] cpuset confine pdflush to its cpuset

Post by Paul Jacks » Tue, 25 Oct 2005 16:00:16


A reasonable request.



Ah - thanks.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson < XXXX@XXXXX.COM > 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[PATCH] cpuset confine pdflush to its cpuset

Post by Hirokazu T » Tue, 25 Oct 2005 16:30:15

Hi Paul,



Can you do this?
I think you may probably use a large NUMA machine.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[PATCH] cpuset confine pdflush to its cpuset

Post by Paul Jacks » Tue, 25 Oct 2005 16:40:03

Takahashi-san replied to pj:

In theory, yes. I certainly have access to large NUMA machines.

However, it is likely not a priority for me. My focus is on work that
will benefit workloads that do not depend on pdflush (except to want to
be sure that pdflush is -not- running in a cpuset containing a dedicated
job.)

That seems to keep me busy enough (and keep my employer paying me),
so I might never get to this problem. I might, but the odds are
not good.

Sorry.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson < XXXX@XXXXX.COM > 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/