[patch 3/3] NUMA slab locking fixes -- fix cpu down and up locking

[patch 3/3] NUMA slab locking fixes -- fix cpu down and up locking

Post by Ravikiran » Sun, 05 Feb 2006 10:40:09


his fixes locking and bugs in cpu_down and cpu_up paths of the NUMA slab
allocator. Sonny Rao < XXXX@XXXXX.COM > reported problems sometime back
on POWER5 boxes, when the last cpu on the nodes were being offlined. We could
not reproduce the same on x86_64 because the cpumask (node_to_cpumask) was not
being updated on cpu down. Since that issue is now fixed, we can reproduce
Sonny's problems on x86_64 NUMA, and here is the fix.

The problem earlier was on CPU_DOWN, if it was the last cpu on the node to go
down, the array_caches (shared, alien) and the kmem_list3 of the node were
being freed (kfree) with the kmem_list3 lock held. If the l3 or the
array_caches were to come from the same cache being cleared, we hit on badness.

This patch cleans up the locking in cpu_up and cpu_down path.
We cannot really free l3 on cpu down because, there is no node offlining yet
and even though a cpu is not yet up, node local memory can be allocated
for it. So l3s are usually allocated at keme_cache_create and destroyed at kmem_cache_destroy. Hence, we don't need cachep->spinlock protection to get
to the cachep->nodelist[nodeid] either.

Patch survived onlining and offlining on a 4 core 2 node Tyan box with a
4 dbench process running all the time.

Signed-off-by: Alok N Kataria < XXXX@XXXXX.COM >
Signed-off-by: Ravikiran Thirumalai < XXXX@XXXXX.COM >

Index: linux-2.6.16-rc2/mm/slab.c
===================================================================
--- linux-2.6.16-rc2.orig/mm/slab.c 2006-02-03 15:10:04.000000000 -0800
+++ linux-2.6.16-rc2/mm/slab.c 2006-02-03 15:18:51.000000000 -0800
@@ -884,14 +884,14 @@ static void __drain_alien_cache(struct k
}
}

-static void drain_alien_cache(struct kmem_cache *cachep, struct kmem_list3 *l3)
+static void drain_alien_cache(struct kmem_cache *cachep, struct array_cache **alien)
{
int i = 0;
struct array_cache *ac;
unsigned long flags;

for_each_online_node(i) {
- ac = l3->alien[i];
+ ac = alien[i];
if (ac) {
spin_lock_irqsave(&ac->lock, flags);
__drain_alien_cache(cachep, ac, i);
@@ -902,7 +902,7 @@ static void drain_alien_cache(struct kme
#else
#define alloc_alien_cache(node, limit) do { } while (0)
#define free_alien_cache(ac_ptr) do { } while (0)
-#define drain_alien_cache(cachep, l3) do { } while (0)
+#define drain_alien_cache(cachep, alien) do { } while (0)
#endif

static int __devinit cpuup_callback(struct notifier_block *nfb,
@@ -936,6 +936,9 @@ static int __devinit cpuup_callback(stru
l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
((unsigned long)cachep) % REAPTIMEOUT_LIST3;

+ /* The l3s don't come and go as cpus come and
+ go. cache_chain_mutex is sufficient
+ protection here */
cachep->nodelists[node] = l3;
}

@@ -949,27 +952,39 @@ static int __devinit cpuup_callback(stru
/* Now we can go ahead with allocating the shared array's
& array cache's */
list_for_each_entry(cachep, &cache_chain, next) {
- struct array_cache *nc;
+ struct array_cache *nc, *shared, **alien;

- nc = alloc_arraycache(node, cachep->limit,
- cachep->batchcount);
- if (!nc)
+ if (!(nc = alloc_arraycache(node,
+ cachep->limit, cachep->batchcount)))
goto bad;
+ if (!(shared = alloc_arraycache(node,
+ cachep->shared*cachep->batchcount, 0xbaadf00d)))
+ goto bad;