[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Greg K » Mon, 19 Nov 2007 04:50:16



Can you see if the problem showed up in 2.6.23.2 or .3 to help narrow
this down?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Davi » Mon, 19 Nov 2007 05:10:09


This is the culprit, reverting fixes the issue.

Cheers
David

--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -80,10 +80,11 @@ void softlockup_tick(void)
print_timestamp = per_cpu(print_timestamp, this_cpu);

/* report at most once a second */
- if (print_timestamp < (touch_timestamp + 1) ||
- did_panic ||
- !per_cpu(watchdog_task, this_cpu))
+ if ((print_timestamp >= touch_timestamp &&
+ print_timestamp < (touch_timestamp + 1)) ||
+ did_panic || !per_cpu(watchdog_task, this_cpu)) {
return;
+ }

/* do not print during early bootup: */
if (unlikely(system_state != SYSTEM_RUNNING)) {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/

 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Greg K » Mon, 19 Nov 2007 05:40:13


Great, thanks for tracking this down.

Ingo, this corrisponds to changeset
a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch
incorrect? Should this patch in the -stable tree be reverted?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Ingo Molna » Mon, 19 Nov 2007 10:00:14

* Greg KH < XXXX@XXXXX.COM > wrote:


hm, there are no such problems in .24 and the cpu_clock() and other
fixes i did were not picked up. Find the missing fixes below. They
should work just fine in .23 as it has the cpu_clock() functionality
too.

[ NOTE: the most robust thing is to make the .23 version match the .24
version of kernel/softlockup.c, so i included two other harmless
changes in this diff as well. ]

Ingo

----------->
commit a5f2ce3c6024a5bb895647b6bd88ecae5001020a
Author: Ingo Molnar < XXXX@XXXXX.COM >
Date: Tue Oct 16 23:26:08 2007 -0700

softlockup watchdog: style cleanups

kernel/softirq.c grew a few style uncleanlinesses in the past few
months, clean that up. No functional changes:

text data bss dec hex filename
1126 76 4 1206 4b6 softlockup.o.before
1129 76 4 1209 4b9 softlockup.o.after

( the 3 bytes .text increase is due to the "<1>" appended to one of
the printk messages. )

Signed-off-by: Ingo Molnar < XXXX@XXXXX.COM >
Signed-off-by: Andrew Morton < XXXX@XXXXX.COM >
Signed-off-by: Linus Torvalds < XXXX@XXXXX.COM >

commit 43581a10075492445f65234384210492ff333eba
Author: Ingo Molnar < XXXX@XXXXX.COM >
Date: Tue Oct 16 23:26:08 2007 -0700

softlockup: improve debug output

Improve the debuggability of kernel lockups by enhancing the debug
output of the softlockup detector: print the task that causes the lockup
and try to print a more intelligent backtrace.

The old format was:

BUG: soft lockup detected on CPU#1!
[<c0105e4a>] show_trace_log_lvl+0x19/0x2e
[<c0105f43>] show_trace+0x12/0x14
[<c0105f59>] dump_stack+0x14/0x16
[<c015f6bc>] softlockup_tick+0xbe/0xd0
[<c013457d>] run_local_timers+0x12/0x14
[<c01346b8>] update_process_times+0x3e/0x63
[<c0145fb8>] tick_sched_timer+0x7c/0xc0
[<c0140a75>] hrtimer_interrupt+0x135/0x1ba
[<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80
[<c0105aa3>] apic_timer_interrupt+0x33/0x38
[<c0104f8a>] syscall_call+0x7/0xb
=======================

The new format is:

BUG: soft lockup detected on CPU#1! [prctl:2363]

Pid: 2363, comm: prctl
EIP: 0060:[<c013915f>] CPU: 1
EIP is at sys_prctl+0x24/0x18c
EFLAGS: 00000213 Not tainted (2.6.22-cfs-v20 #26)
EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000
ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0
[<c0105e4a>] show_trace_log_lvl+0x19/0x2e
[<c0105f43>] show_trace+0x12/0x14
[<c01040be>] show_regs+0x1ab/0x1b3
[<c015f807>] softlockup_tick+0xef/0x108
[<c013457d>] run_local_timers+0x12/0x14
[<c01346b8>] update_process_times+0x3e/0x63
[<c0145fcc>] tick_sched_timer+0x7c/0xc0
[<c0140a89>] hrtimer_interrupt+0x135/0x1ba
[<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80
[<c0105aa3>] apic_timer_interrupt+0x33/0x38
[<c0104f8a>] syscall_call+0x7/0xb
=======================

Note that in the o
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Greg K » Wed, 21 Nov 2007 08:30:07

n Sat, Nov 17, 2007 at 04:34:56PM -0800, Jeremy Fitzhardinge wrote:

Can you try applying the patch below to see if that solves the problem
for you?

thanks,

greg k-h

-------------

From: Ingo Molnar < XXXX@XXXXX.COM >
Date: Sun, 18 Nov 2007 01:55:38 +0100
Subject: softlockup watchdog fixes and cleanups
To: Greg KH < XXXX@XXXXX.COM >
Cc: David < XXXX@XXXXX.COM >, Jeremy Fitzhardinge < XXXX@XXXXX.COM >, XXXX@XXXXX.COM , Javier Kohen < XXXX@XXXXX.COM >, Andrew Morton < XXXX@XXXXX.COM >, XXXX@XXXXX.COM , XXXX@XXXXX.COM
Message-ID: < XXXX@XXXXX.COM >
Content-Disposition: inline

From: Ingo Molnar < XXXX@XXXXX.COM >


This is a merge of commits a5f2ce3c6024a5bb895647b6bd88ecae5001020a and
43581a10075492445f65234384210492ff333eba in mainline to fix a warning in
the 2.6.23.3 kernel release.

softlockup watchdog: style cleanups

kernel/softirq.c grew a few style uncleanlinesses in the past few
months, clean that up. No functional changes:

text data bss dec hex filename
1126 76 4 1206 4b6 softlockup.o.before
1129 76 4 1209 4b9 softlockup.o.after

( the 3 bytes .text increase is due to the "<1>" appended to one of
the printk messages. )

Signed-off-by: Ingo Molnar < XXXX@XXXXX.COM >
Signed-off-by: Andrew Morton < XXXX@XXXXX.COM >
Signed-off-by: Linus Torvalds < XXXX@XXXXX.COM >


softlockup: improve debug output

Improve the debuggability of kernel lockups by enhancing the debug
output of the softlockup detector: print the task that causes the lockup
and try to print a more intelligent backtrace.

The old format was:

BUG: soft lockup detected on CPU#1!
[<c0105e4a>] show_trace_log_lvl+0x19/0x2e
[<c0105f43>] show_trace+0x12/0x14
[<c0105f59>] dump_stack+0x14/0x16
[<c015f6bc>] softlockup_tick+0xbe/0xd0
[<c013457d>] run_local_timers+0x12/0x14
[<c01346b8>] update_process_times+0x3e/0x63
[<c0145fb8>] tick_sched_timer+0x7c/0xc0
[<c0140a75>] hrtimer_interrupt+0x135/0x1ba
[<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80
[<c0105aa3>] apic_timer_interrupt+0x33/0x38
[<c0104f8a>] syscall_call+0x7/0xb
=======================

The new format is:

BUG: soft lockup detected on CPU#1! [prctl:2363]

Pid: 2363, comm: prctl
EIP: 0060:[<c013915f>] CPU: 1
EIP is at sys_prctl+0x24/0x18c
EFLAGS: 00000213 Not tainted (2.6.22-cfs-v20 #26)
EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000
ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0
[<c0105e4a>] show_trace_log_lvl+0x19/0x2e
[<c0105f43>] show_trace+0x12/0x14
[<c01040be>] show_regs+0x1ab/0x1b3
[<c015f807>] softlockup_tick+0xef/0x108
[<c013457d>] run_local_timers+0x12/0x14
[<c01346b8>] update_process_times+0x3e/0x63
[<c0145fcc>] tick_sched_timer+0x7c/0xc0
[<c0140a89>] hrtimer_interrupt+0x135/0x1ba
[<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80
[<c0105aa3>] apic_timer_interrupt+0x33/0x38
[<c0104f8a>] syscall_call+0x7/0xb
=======================

Note that in the old format we only knew that some system call locked
up, we didnt know _which_. With the new format we know that it's at a
specific place in sys_prctl(). [which was where i created an artificial
kernel lockup to t
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Chuck Ebbe » Wed, 21 Nov 2007 09:40:07


Those are just cosmetic / cleanup changes.

Don't you need commit a3b13c23f186ecb57204580cc1f2dbe9c284953a ??
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Jeremy Fit » Wed, 21 Nov 2007 10:50:10


I don't think this patch will help; it only has cosmetic changes in
addition to the original message printing fix. I think it also needs
change a3b13c23f186ecb57204580cc1f2dbe9c284953a:

diff -r 79f0ea1e0e70 -r 06f060ab58aa kernel/softlockup.c
--- a/kernel/softlockup.c Tue Oct 09 21:00:40 2007 +0000
+++ b/kernel/softlockup.c Wed Oct 17 08:42:46 2007 -0700
@@ -40,14 +40,16 @@ static struct notifier_block panic_block
* resolution, and we don't need to waste time with a big divide when
* 2^30ns == 1.074s.
*/
-static unsigned long get_timestamp(void)
+static unsigned long get_timestamp(int this_cpu)
{
- return sched_clock() >> 30; /* 2^30 ~= 10^9 */
+ return cpu_clock(this_cpu) >> 30; /* 2^30 ~= 10^9 */
}

void touch_softlockup_watchdog(void)
{
- __raw_get_cpu_var(touch_timestamp) = get_timestamp();
+ int this_cpu = raw_smp_processor_id();
+
+ __raw_get_cpu_var(touch_timestamp) = get_timestamp(this_cpu);
}
EXPORT_SYMBOL(touch_softlockup_watchdog);

@@ -91,7 +93,7 @@ void softlockup_tick(void)
return;
}

- now = get_timestamp();
+ now = get_timestamp(this_cpu);

/* Wake up the high-prio watchdog task every second: */
if (now > (touch_timestamp + 1))


J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Ingo Molna » Wed, 21 Nov 2007 15:10:13


yes, it does need the cpu_clock() changes as i mentioned.

commit a3b13c23f186ecb57204580cc1f2dbe9c284953a
Author: Ingo Molnar < XXXX@XXXXX.COM >
Date: Tue Oct 16 23:26:06 2007 -0700

softlockup: use cpu_clock() instead of sched_clock()

sched_clock() is not a reliable time-source, use cpu_clock() instead.

but we only have cpu_clock() from v2.6.23 onwards - so we should not
apply the original patch to v2.6.22. (we should not have applied your
patch that started the mess to begin with - but that's another matter.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Ingo Molna » Wed, 21 Nov 2007 15:10:14


yes:


i just forgot to attach the cpu_clock() changes - they are in a3b13c23.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Greg K » Thu, 22 Nov 2007 02:20:09


Ok, I've now added that patch too :)

Hopefully this is all straightened out now, I'll go cut a -rc for the
next stable so people can test...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Greg K » Thu, 22 Nov 2007 02:20:09


Well, I can easily back that one out, if that is easier than adding 2
more patches to try to fix up the mess here.

Let me know if you feel that would be best.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Ingo Molna » Thu, 22 Nov 2007 05:50:09


i'd leave it alone - doing that we have in essence the softlockup
detector turned off. Reverting to the older version might trigger false
positives that need the new stuff.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Greg K » Thu, 22 Nov 2007 06:20:07


Ok, I'll see if the current round of patches fix up everyone complaints
:)

thanks for sending these,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/
 
 
 

[stable] Soft lockups since stable kernel upgrade to 2.6.23.8

Post by Ingo Molna » Thu, 22 Nov 2007 07:00:09


so just to reiterate, to make sure we have the same plans: lets leave
v2.6.22 and earlier kernels alone - and lets strive for the latest
patches and code for v2.6.23 (and v2.6.24, evidently).

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to XXXX@XXXXX.COM
More majordomo info at http://www.yqcomputer.com/
Please read the FAQ at http://www.yqcomputer.com/