Skip to main content

Automatic Database Failover



 Problem
A client called wanting to find why a database had failed-over to the DR server( this environment was configured mirror few OLTP databases) ; I was caught off guard, but gave him the usual reasons and wanted to have look at the server to be precise. But then to my amazement the client mentioned only one databases had failed over and the other databases were sitting on the primary server.

What I did
1.)    Investigate the SQL Server errorlog
2.)    Investigate the Windows errorlog

Steps
·         Found the time the failover happened from the sql Log
Execute the following statement on the sql server instance Sp_Readerrorlog 0,1,’Fileover’; for more information of the Sp_ReadErrorlog function

·         Now that I found the time window of the failure , I reviewed the events leading to the failover 

As observed, the database mirroring for one database was inactive ( row 1841) and row 1842 provides a high level reason;  it seemed to be a connectivity issue. What this generally implies, is of a disruption of connectivity between the primary server and witness servers.

·         Now let’s have a look at the windows error log on the witness 
As the message reads due to 10 seconds delay to respond the witness had decided to failover.

Note
-          Not all databases will failover due to a network glitch, just the database that experienced  the response delay
               

Comments

  1. Assume you decide the default failover elapse time was changed from 10 seconds to 60 seconds. From a database high availability perspective the system will not be available for a further 50 seconds. This may not be acceptable from application high availability perspective and should be consulted with the business owner.

    ReplyDelete

Post a Comment

Popular posts from this blog

Create a dacpac To Compare Schema Differences

It's been some time since i added anything to the blog and a lot has happened in the last few months. I have run into many number of challenging stuff at Xero and spread my self to learn new things. As a start i want to share a situation where I used a dacpac to compare the differences of a database schema's. - This involves of creating the two dacpacs for the different databases - Comparing the two dacpacs and generating a report to know the exact differences - Generate a script that would have all the changes How to generate a dacbpac The easiest way to create a dacpac for a database is through management studio ( right click on the databae --> task --> Extract data-tier-application). This will work under most cases but will error out when the database has difffrent settings. ie. if CDC is enabled To work around this blocker, you need to use command line to send the extra parameters. Bellow is the command used to generate the dacpac. "%ProgramFiles...

High Watermarks For Incremental Models in dbt

The last few months it’s all been dbt. Dbt is a transform and load tool which is provided by fishtown analytics. For those that have created incremental models in dbt would have found the simplicity and easiness of how it drives the workload. Depending on the target datastore, the incremental model workload implementation changes. But all that said, the question is, should the incremental model use high-watermark as part of the implementation. How incremental models work behind the scenes is the best place to start this investigation. And when it’s not obvious, the next best place is to investigate the log after an test incremental model execution and find the implementation. Following are the internal steps followed for a datastore that does not support the merge statements. This was observed in the dbt log. - As the first step, It will copy all the data to a temp table generated from the incremental execution. - It will then delete all the data from the base table th...

The maximum number of working threads (100) are already running

The problem                 This afternoon, out of the blue, the development folks called over wanting to know why the DB server was not responding, sure enough the databases were not accessible from application and from MMS. I knew there weren’t any maintenance happening and so I logged in to the server remotely and found that the sql services were still running as usual and the services had not restarted. To my surprise, in 10-15 mins everyone was able connect to the server again.  My first thoughts were,  it would have been an issue with the network and due to the glitch the servers weren’t accessible during the  time period. Environment details : -           The sql server were on a hyper v with a single CPU and 1024 memory -           There was 80 + transaction replications setup and further 20-30 sql ser...