SunBlade 1500 lockup event - Please Help!

SunBlade 1500 lockup event - Please Help!

Post by Scott Abfa » Sat, 11 Jun 2005 23:46:32



I am running a number of SunBlade 1500 workstations loaded
with Solaris 8. (Yes, I know, it's an old version but I have no
choice).

Recently there has been a series of events on some of the workstations
where they lock up. Here are the symptoms:

- The screen is black.

- The system is unresponsive to the mouse and keyboard.

- The power button is lit.

- The LED to the ethernet and an LED on the motherboard are lit.

- The fan cooling the RAM is running.

- I cannot ping the machine.

The only recourse I have it to hard reset the machine by holding down
the power off button. Then the machine comes up again.

It's happened within a set of a few machines in the last month; prior
to that there were two incidents in the last six months that may or may
not have been the same. It definiately feels like a new event.

After the reboot I see nothing out of the ordinary in /var/adm.

We also collect ps, vmstat, iostat, kvmstat, etc every ten minutes and
log them. The system appeared more or less idle and there was nothing
unusual happening.

As far as I can tell, the system just stops working and sits there.

I am completely baffled and have no idea what to do. Any suggestions
would be very much appreciated.

Scott Allen Abfalter
AirNet Communications, Inc.
XXXX@XXXXX.COM ; XXXX@XXXXX.COM
(321) 953-6818
 
 
 

SunBlade 1500 lockup event - Please Help!

Post by Thomas Mai » Sat, 11 Jun 2005 23:52:44


You don't get a crash dump, do you? Did you try to hook up a serial
console to one of the machines and take a look what it is saying before
it dies? Is it still accessible via the serial console? does stop-a
still work? If stop-a does not work anymore, I would guess you are
having a hardware defect. But it sounds strange that multiple machines
have the same problem. Did you patch the systems recently? What
patches do you have installed? Is your OBP up-to-date?

Tom

 
 
 

SunBlade 1500 lockup event - Please Help!

Post by Scott Abfa » Sat, 11 Jun 2005 23:58:11


Tom,

Thanks for your thoughts.

Stop-A does not work. It is completely unresponsive.

It is Solaris 8 HW 05/03 with no patches (yes, yes, I know!).

In any event, we have not patched recently.

I have output and input device nvram settings to screen and not tty and
so I assumed there was nothing there. I can try it.

They are using whatever OBP was on there when they shipped (all within
the last year), but that gives me another thing to check.

Anyone else have anything else I can check on as well?

Scott
 
 
 

SunBlade 1500 lockup event - Please Help!

Post by Scott Abfa » Sat, 11 Jun 2005 23:58:59


Oh, and, I should note that not ALL of the systems are doing it, even
though they are all installed from the same OS CD-ROM and are all
identically configured except for hostname/ip type stuff.
 
 
 

SunBlade 1500 lockup event - Please Help!

Post by B.M. Wrigh » Sun, 12 Jun 2005 06:31:05


...

Did you turn on the power management/sleep mode? If so, turn
that off, see if it resolves your problem. This is just a guess, but I
remember using it on a few desktops and they would "sleep" but never
come out of it correctly. You said the OS is unpatched so whatever bug
caused that is probably not resolved on these.
 
 
 

SunBlade 1500 lockup event - Please Help!

Post by x » Mon, 13 Jun 2005 19:59:03


This sounds a lot like a device is getting into an invalid state, and ***
the system's buss. I've seen this hung so bad the processor couldn't
even access the buss, since the card was *** it (an older UPA
system like an Ultra60, where graphics cards sat on the processor buss).

This could be something like a graphics card, or some other devic(It's likely doing a DMA trransfer, and getting stuck, and *** everything.

I'd switch your console to the serial port, as someone else mentioned.
If things really get messed up, it can often print to the serial port,
when it can't talk to a graphics card. This is especially true if
the problem IS the graphics card.

Also, you might notice which slots cards are in. Sometimes this matters.
All the SB1500 PCI slots are not on the same PCI buss.
Some do power management slightly different, too.
(disabling PM, mentioned below, is also a good thing)










>> >> Did you turn on the power management/sleep mode? If so, turn >>that off, see if it resolves your problem. This is just a guess, but I >>remember using it on a few desktops and they would "sleep" but never >>come out of it correctly. You said the OS is unpatched so whatever bug >>caused that is probably not resolved on these. >>