Problem
A client called wanting to find
why a database had failed-over to the DR server( this environment was
configured mirror few OLTP databases) ; I was caught off guard, but gave him
the usual reasons and wanted to have look at the server to be precise. But then
to my amazement the client mentioned only one databases had failed over and the
other databases were sitting on the primary server.
What I did
1.) Investigate
the SQL Server errorlog
2.) Investigate
the Windows errorlog
Steps
·
Found the time the failover happened from the
sql Log
Execute the following statement on
the sql server instance Sp_Readerrorlog 0,1,’Fileover’; for more information of
the Sp_ReadErrorlog function
·
Now that I found the time window of the failure
, I reviewed the events leading to the failover
As observed, the database mirroring for one database was
inactive ( row 1841) and row 1842 provides a high level reason; it seemed to be a connectivity issue. What
this generally implies, is of a disruption of connectivity between the primary server
and witness servers.
·
Now let’s have a look at the windows error log
on the witness
As the message reads due to 10
seconds delay to respond the witness had decided to failover.
Note
-
Not all databases will failover due to a network
glitch, just the database that experienced the response delay
Assume you decide the default failover elapse time was changed from 10 seconds to 60 seconds. From a database high availability perspective the system will not be available for a further 50 seconds. This may not be acceptable from application high availability perspective and should be consulted with the business owner.
ReplyDelete