Recent as late last week, I ran into a baffling situation where a async node in the MsSQL HA solution was not in sync for few hours. Eventough, the high watermark kept increasing for the synchronization between the primary and async node's the latency kept increasing significantly. By the way , HA solution is in AWS in and the nodes are multiple regions.
The impact of having a node not in sync
- Transaction log on the primary will blow out to be uncontrollable
- CDC (Change data capture ) will not work
- Transaction log on the primary will blow out to be uncontrollable
- CDC (Change data capture ) will not work
The impact to the transaction log is quite known fact, but what caught me off guard was the impact to CDC. It was later that it occurred that CDC consumes the transaction log agent, and the transaction log agent does not process the log records until they are harden at all the HA nodes in the availability group. This lead to more concerns on what needs to be done in case of catastrophic situation ie. When the primary node in a HA configuration is compromised and the sync secondary takes over the primary role, more details on a case is available here.
FYI , The cause for the latency between the primary and async node was largely to do with a combination of mishaps. A known workload continued , index optimization was processing , and finally there was bandwidth degradation between the two nodes.
Hope this helps someone out in the brave world of HA in MsSQL
Comments
Post a Comment