SQL 2000 Problems on One Node Only

SQL 2000 Problems on One Node Only

Post by Jerr » Tue, 11 Oct 2005 09:41:35


We cannot seem to figure out why our new SQL 2000 installation will
only seem to operate correctly on one of the nodes. When running on
Node2, it works fine, but will not start completely when failed over to
Node1. Both nodes have been upgraded to SP4, but Node1 has never run
correctly. The ERRORLOG looks fine. The Event log looks like:
Last good message:
Event Type: Information
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 17055
Date: 10/9/2005
Time: 4:53:27 PM
User: N/A
Computer: VIRTUALNAME
Description:
17126 :
SQL Server is ready for client connections

First bad message:
Event Type: Error
Event Source: MSSQLSERVER
Event Category: (3)
Event ID: 17052
Date: 10/9/2005
Time: 4:53:27 PM
User: N/A
Computer: NODE1
Description:
[sqsrvres] ODBC sqldriverconnect failed

Second bad message:
Event Type: Error
Event Source: MSSQLSERVER
Event Category: (3)
Event ID: 17052
Date: 10/9/2005
Time: 4:53:27 PM
User: N/A
Computer: NODE1
Description:
[sqsrvres] checkODBCConnectError: sqlstate = 28000; native error =
4814; message = [Microsoft][ODBC SQL Server Driver][SQL Server]Login
failed for user '(null)'. Reason: Not associated with a trusted SQL
Server connection.

Those two message then repeat a lot of times before clustering finally
gives up and fails SQL. I can ping the virtual fine from both nodes,
as well as each of the nodes' names themselves. The MSDTC service is
set up in clustering and running fine. Everything else in the cluster
fails over without issue. Anyone have any ideas? Thanks!

(for good measure, here is the SQL ERRORLOG):
SQL ERRORLOG:
2005-10-09 17:09:02.82 server Microsoft SQL Server 2000 - 8.00.2039
(Intel X86)
May 3 2005 23:18:38
Copyright (c) 1988-2003 Microsoft Corporation
Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

2005-10-09 17:09:02.82 server Copyright (C) 1988-2002 Microsoft
Corporation.
2005-10-09 17:09:02.82 server All rights reserved.
2005-10-09 17:09:02.82 server Server Process ID is 2212.
2005-10-09 17:09:02.82 server Logging SQL Server messages in file
'E:\Data\SQL\DB\MSSQL\log\ERRORLOG'.
2005-10-09 17:09:02.82 server SQL Server is starting at priority
class 'normal'(2 CPUs detected).
2005-10-09 17:09:02.90 server SQL Server configured for thread mode
processing.
2005-10-09 17:09:02.91 server Using dynamic lock allocation. [2500]
Lock Blocks, [5000] Lock Owner Blocks.
2005-10-09 17:09:02.94 server Attempting to initialize Distributed
Transaction Coordinator.
2005-10-09 17:09:04.18 spid4 Starting up database 'master'.
2005-10-09 17:09:04.30 spid4 Server name is 'VIRTUALNAME'.
2005-10-09 17:09:04.30 spid4 Starting up database 'msdb'.
2005-10-09 17:09:04.30 server Using 'SSNETLIB.DLL' version
'8.0.2039'.
2005-10-09 17:09:04.30 spid5 Starting up database 'model'.
2005-10-09 17:09:04.30 server SQL server listening on
correct.virtual.ip.address: 1433.
2005-10-09 17:09:04.33 server SQL server listening on TCP, Shared
Memory.
2005-10-09 17:09:04.33 server SQL Server is ready for client
connections
2005-10-09 17:09:04.37 spid5 Clearing tempdb database.
2005-10-09 17:09:04.93 spid5 Starting up database 'tempdb'.
2005-10-09 17:09:05.02 spid4 Recovery complete.
2005-10-09 17:09:05.02 spid4 SQL global counter collection task is
created.
 
 
 

SQL 2000 Problems on One Node Only

Post by SQL M » Tue, 11 Oct 2005 09:57:10

i

Check that the SQL Server Service and Cluster Service Account are in the
same NT groups on Node1 as they are on Node2.

Looks like the cluster service can not log in and check that SQL Server is
up and running.

Regards
--------------------------------
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland

IM: XXXX@XXXXX.COM

MVP Program: http://www.microsoft.com/mvp

Blog: http://www.msmvps.com/epprecht/

"Jerry" < XXXX@XXXXX.COM > wrote in message
news: XXXX@XXXXX.COM ...



 
 
 

SQL 2000 Problems on One Node Only

Post by Jerr » Tue, 11 Oct 2005 12:41:36

Thank you for replying! I checked both the cluster account (a domain
account that is in the local administrator group on both nodes), and
the SQL Server account (currently running under my Domain Admin account
until we get this working, so it is also in the local Administratos
group at this point) and they are set up identically. The SQL Server
service does start up for a short period of time, but then it starts
getting those errors and clustering pulls it back down. The GPO we
apply is the exact same one for both nodes, so I don't think it's
anything in there, but I did check it over to be sure.

Any further help is greatly appreciated! Thanks...
 
 
 

SQL 2000 Problems on One Node Only

Post by Anthony Th » Wed, 12 Oct 2005 00:10:29

The fact that in the error message you are getting Login Failed for user
(null) means that the Cluster service is trying to log in with Windows
Authentication, but that it has lost its access token.

Make sure that node 1 can reach the Domain Controllers and that the Windows
Time is synchronizing correctly.

Since both the Cluster and SQL Server service accounts are Local
Administrators, there shouldn't be anything at the NTFS or registry level;
however, in your GPO's, have you given all of the USER ACCESS RIGHTS to
either the Local Administrators group or these service accounts explicitly?

Also, how is SQL Server set up to allow Local Administrators to log into SQL
Server? Did you leave the BUILTIN\Administrators group? You shouldn't, but
should give explicit access to these two service accounts.

The SQL Server service account will need to be a SQL Server system
administrator; however, the Cluster service account will only need to login,
no other privileges. It only needs to run SELECT @@SERVER as the IsAlive
check.

Sincerely,


Anthony Thomas


--
 
 
 

SQL 2000 Problems on One Node Only

Post by Jerr » Wed, 12 Oct 2005 02:37:54

Thanks for the reply... Node1 can reach the DC without issue (there is
nof irewalling between the two) and the event log shows successful time
syncs. The time between the two nodes is identical. I also checked
the Local Security Policy on both machines, they are identical. I have
not done anything yet inside of SQL, so the BUILTIN\Administrators
group is still intact.

After a while, SQL stopped running with the exact same errors when
failed over to node2 now as well. So, something in the install was
unhappy, and I am attempting to install again.

At this point, WINS is required right? We don't have WINS at all here,
so it's frustrating that it has to be set up just for this SQL instance
;-)
 
 
 

SQL 2000 Problems on One Node Only

Post by Geoff N. H » Wed, 12 Oct 2005 03:40:17

You don't need WINS if you have a working DNS infrastructure. You do need a
domain of some type. Also, if you change security accoutns for SQL, use
Enterprise Mangler. Changing the service accounts for SQL in the services
applet won't work. Also, make sure the cluster service account is also
member of the local admins group on each server.

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
 
 
 

SQL 2000 Problems on One Node Only

Post by Jerr » Wed, 12 Oct 2005 04:16:31

Our DNS infrastructure is working fine, but I installed WINS just for a
safe bet, and it's not helping the matter. SQL really does not want to
work on this cluster at all! I have now removed and re-installed, and
the install sort of succeeds (except for the end part, when setup tries
to bring the resource online, which fails). SQL Server will not start
on either cluster now, giving that same set of errors.

I checked, the cluster account and the SQL Service account are all in
the Local Administrators group. I installed SQL under my domain
account (which is in the local admins group), with the same problems.
I cannot really get fully installed even, because the SQL service does
not start on either node in the final step of the setup process.

Since it would seem the only error I can get are those event log
errors, I'm not sure where to go from here... Anyone have any other
thoughts?
 
 
 

SQL 2000 Problems on One Node Only

Post by Geoff N. H » Wed, 12 Oct 2005 04:40:24

Since it is a setup error, there will be a setup log file in the windows
directory. Post it back here, please.

Is this a named instance on Win 2003?
Installation of a named instance of SQL Server 2000 virtual server on a
Windows 2003-based cluster fails
http://www.yqcomputer.com/ ;en-us;815431

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
 
 
 

SQL 2000 Problems on One Node Only

Post by Jerr » Wed, 12 Oct 2005 05:24:18

his is the default instance on Win2003, not a named instance... The
setup logs look like:

sqlclstr.log:
ClusterGroupEnum(6): enter
ClusterGroupEnum(6): [SQL Server]
~~~ XXX OnlineClusterResource starts for SQL Server
~~~ ClusterResourceStart... tick=0, state=129
~~~ ClusterResourceStart... tick=1, state=129
**through**
~~~ ClusterResourceStart... tick=180, state=129
~~~ ClusterResourceStart... tick=181, state=4
~~~ XXX OnlineClusterResource failed
[sqlclusterSetup.cpp:1785] : 0xffffffff (-1): <<<FormatMessage failed:
317>>>
[sqlclusterSetup.cpp:1478] : 50049 (0xc381): <<<FormatMessage failed:
317>>>

dasetup.log:
...
State after Install:
Setup was Successful: 1
Setup Requires Reboot: 0
Setup Will Reboot the Machine: 0
Exiting: Setup is shutting down..

sqlstp16.log:
11:50:09 Begin Setup
11:50:09 8.00.194
11:50:09 Mode = Silent
11:50:09 ModeType = CLUSTER
11:50:09 Cluster node.
11:50:09 g_szIssPath=C:\WINDOWS\setup~0.iss
11:50:09 GetDefinitionEx returned: 0, Extended: 0x0
11:50:09 ValueFTS returned: 1
11:50:09 ValuePID returned: 1
11:50:09 ValueLic returned: 1
11:50:09 System: Windows NT Enterprise Server
11:50:09 SQL Server ProductType: Enterprise Edition [0x3]
11:50:09 IsNTCluster returned: 1
11:50:09 Begin Action: SetupInitialize
11:50:09 End Action SetupInitialize
11:50:09 Begin Action: SetupInstall
11:50:09 Reading
Software\Microsoft\Windows\CurrentVersion\CommonFilesDir ...
11:50:09 CommonFilesDir=C:\Program Files\Common Files
11:50:09 Windows Directory=C:\WINDOWS\
11:50:09 Program Files=C:\Program Files\
11:50:09 TEMPDIR=C:\DOCUME~1\adminname\LOCALS~1\Temp\1\
11:50:09 Begin Action: SetupInstall
11:50:09 digpid size : 256
11:50:09 digpid size : 164
11:50:09 Begin Action: CheckFixedRequirements
11:50:09 Platform ID: 0xf00000
11:50:09 Version: 5.2.3790
11:50:09 File Version - C:\WINDOWS\system32\shdocvw.dll: 6.0.3790.2480
11:50:09 End Action: CheckFixedRequirements
11:50:10 Begin Action: ShowDialogs
11:50:10 Initial Dialog Mask: 0x183000f7, Disable Back=0x1
11:50:10 Begin Action ShowDialogsHlpr: 0x1
11:50:10 Begin Action: DialogShowSdWelcome
11:50:10 End Action DialogShowSdWelcome
11:50:10 Dialog 0x1 returned: 1
11:50:10 End Action ShowDialogsHlpr
11:50:10 ShowDialogsGetDialog returned: nCurrent=0x2,index=1
11:50:10 Begin Action ShowDialogsHlpr: 0x2
11:50:10 Begin Action: DialogShowSdMachineName
11:50:10 [DlgMachine]
11:50:10 Result = 1
11:50:10 Type = 268435466
11:50:10 Name = Node1
11:50:10 ShowDlgMachine returned: 1
11:50:10 Name = Node1, Type = 0x1000000a
11:50:10 Begin Action: CheckRequirements
11:50:10 Processor Architecture: x86 (Pentium)
11:50:10 Service Pack: 256
11:50:10 ComputerName: Node1
11:50:10 User Name: adminname
11:50:10 IsAllAccessAllowed returned: 1
11:50:10 OS Language: 0x409
11:50:10 End Action CheckRequirements
11:50:10 This combination of Package and Operating System allows a full
product install.
11:50:10 End Action DialogShowSdMachineName
11:50:10 begin ShowDialogsUpdateMask
11:50:10 nFullMask = 0x183000f7, nCurrent = 0x2, nDirection = 0
11:50:10 Updated Dialog Mask: 0xbf3c037, Disable Back = 0x1
11:50:10 Dialog 0x2 returned: 0
11:50:10 End Action ShowDialogsHlpr
11:50:10 ShowDialogsGetDialog returned: nCurrent=0x4,index=2
11:50:10 Begin Action ShowDialogsHlpr: 0x4
11:50:10 Begin Action: DialogShowSdInstallMode
11:50:10 [DlgInstallMode]
11:50:10 Result = 1
11:50:1
 
 
 

SQL 2000 Problems on One Node Only

Post by Jerr » Wed, 12 Oct 2005 05:32:44

FYI, I also did create the alias suggested in the article you linked,
it did not help :-(

Thanks...
 
 
 

SQL 2000 Problems on One Node Only

Post by Geoff N. H » Wed, 12 Oct 2005 05:40:38

mm. Looks like the install went just fine. Try moving the resource group
to one node, bringing the disk, IP address, and Netwrok name resources
online, then starting SQL Server manually from the command line? Then try
connecting and seeing how it runs. After that, you can stop SQL manually
and restart it using the Cluster tool. Repeat on the other node. It sounds
strongly like the Cluster service is not able to manage the SQL Service
correctly. It strongly looks like either a connection issue or a
permissions issue. Again, check the Cluster service account group
membership on both nodes.

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP

"Jerry" < XXXX@XXXXX.COM > wrote in message
news: XXXX@XXXXX.COM ...


 
 
 

SQL 2000 Problems on One Node Only

Post by Jerr » Wed, 12 Oct 2005 06:03:22

Yes, good ideas... The services run fine if I kill off the setup and
manually start the service on whichever node currently has the disk
group. So, it does look like clustering itself isn't happy with SQL.
If the MSSQL service is started, and I try to bring it online in
clustmgr, it starts spewing out those event log errors, and then stops
the service. Double-checked the Cluster Service, and it is in the
Administrators group... Any other ideas?
 
 
 

SQL 2000 Problems on One Node Only

Post by Anthony Th » Wed, 12 Oct 2005 11:56:03

Services that are managed by Cluster resources can not be running. In fact,
and I would check this for the SQL Server services, all cluster resource
services should be set to start Manually. Then, from Cluster Admin, bring
online.

Have you tried to trace SQL Server while you bring it online. The only
thing the Cluster does is try to login and run SELECT @@SERVER_NAME.

If it is comming through as user (null), then there is something wrong with
the Cluster service account.

Sincerely,


Anthony Thomas


--
 
 
 

SQL 2000 Problems on One Node Only

Post by Jerr » Wed, 12 Oct 2005 14:05:28

The services are all set to manual, I was just trying to prove whether
SQL can run or not at all. If the cluster thinks it's failed, and I
start the service manually, it'll start and work fine. If the service
is stopped, and I start it with cluster manager, I'll get those errors
and it will start, but eventually get shut down by cluster manager.
This happens on either node...

That's great info (what SQL does when Clustering tries to start it).
I'll get profiling up tommorow, and check that the cluster service
account has the proper permissions to log into SQL, but I have not yet
modified anything is SQL, so since the cluster service account is a
local admin, I think it should already have permissions to log in...

Thanks for everyone's help... Any other ideas, please let me know!
 
 
 

SQL 2000 Problems on One Node Only

Post by Geoff N. H » Wed, 12 Oct 2005 22:33:29

Log in to the console as the cluster service account and try to resolve
connection to the SQL Server. Make sure your IP address and network
settings are correct. Setup uses a local named pipe to connect for its
configuration, but the cluster service uses IP to resolve the network name
for the looksalive() and isalive() functions. Bring the SQL service online
manually and see if you can connect as the cluster service from the local
node and the remote node. As you noted, you are testing parts of the SQL
setup to see what works and what doesn't. It may feel frustrating, but we
are making progress here.

BTW, if this process is too slow, you can always open a PSS case. This one
may be worth it.

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP