Clustering in SQL Server

Clustering in SQL Server is a Windows level high availability option to increase the availability of physical server hardware, operating system, and SQL server instances. If any of these three fails on one server, The Other server in cluster automatically takes task of failed SQL server instance  

Clustering in SQL Server

Clustering in SQL Server is Service level high availability feature. If in case one node down the SQL Server services are moved to node2. In this feature data loss is zero.

             A Windows failover cluster can support up to 16 nodes in SQL server 2012. Cluster reduce downtime to minimum. Most clustering deployment is between two servers only. The cluster service called as nodes     

How Clustering in SQL Server works

How clustering works
A single SQL server 2012 instance can run on only a single at a time.  In clustering the server on which SQL services running called as active node other server on which the service is not running called as passive node. If active node fails passive node becomes active and continues user requests of the server.  Active node access the database on a shared disk and sends Heartbeat signals over private network and passive node receive them. If active node stops passive node notices that Heartbeat stopped passive node assumes that the active node has failed and initiates a failover. Passive node now takes over the control of shared disk.

SQL server cluster assigns own virtual name and virtual IP address. These details shared by both nodes users connects to cluster using this virtual name, if a fail over occurs, the cluster will still retain on same virtual name and IP address although a new node will be responding to user requests.

Advantages

  • cluster reduces downtime in upgrading hardware or install a new service pack on SQL server cluster
  • provides high availability to Physical server hardware, operating system and SQL server

Disadvantages

  • Clustering requires expensive hardware and software
  • Geographical dispersed clusters SAN hardware and software for data transfer

Architecture of Clustering

Architecture of Clustering

virtual name is shared by both Servers, Active node connect to storage disk and access SQL databases, MSDTC and the quorum transactions goes to which ever server is active, when Heartbeat signal between NIC fails automatic fails happens using internal Heartbeat mechanism

Types of Clustering

1.Active-Active cluster

2.active-passive cluster


Active/Active cluster is to make better use of available hardware. Instead of keeping passive node ideal there are two instances, one on each physical node of cluster to save hardware costs so we can use active-active cluster. When a failover occurs two instances will run on single server. Each instance needs it own logical shared disk. The disks of 2 SQL Server instances will be different, disks cannot be shared among SQL server instances.

     when two instances run on single server the performance may affect. We need to configure both servers with high configuration to meet active-active cluster requirements.

 Active-passive cluster

 The node SQL server instance running is called as active node and the other server SQL server not running referred as passive node, it is in a state of readiness. If a failover occurs passive node automatically becomes active and starts serving user requests, so formally active node becomes a passive node. Here both nodes connected to external storage disk, only active nodes access the SQL databases MSDTC and quorum

Prerequisites to configure clustering

  • virtual server.
  • nodes
  • shared disk
  • quorum
  • public and private network  MSDTC

virtual name, both servers is using virtual name of cluster. User connects to SQL cluster using this virtual name and IP address. Primary node will respond to user requests. If a failover occurs to secondary node still cluster will retain the same virtual name and IP address. Users may not notice that the failover happens inside the cluster

Nodes

Nodes referred as physical servers in clustering. These nodes should have same access to shared storage. Each node talks to one another via a network. If one node does not communicate to other node in cluster the secondary node, will it take ownership of user transactions.

Automatic Failover: one server stops another node becomes owner of cluster.

Manual Failover: DBA intentionally failover for maintenance

Shared Disk:

Clustered instances store data on shared disk. Both nodes physically connect to disk, but only active node can access the instance databases. This shared disk has at least 2 logical partitions. One partition is used for storing the clustered instances SQL Server databases and another is used for quorum.

Quorum

Quorum is a log file similar T-log file, records any change made on active node, when passive node becomes active it reads quorum log file and applies changes on newly active node. This file quorum.log resides in the quorum disk. Quorum is main interpreter between nodes. It stores latest cluster configuration and helps other nodes to take ownership when one node fails.

Voting Mechanism
Quorum is designed to handle split brain scenario when nodes unable to communicate each other, each node assumes and bring resources online. When same SQL resources brought online on more nodes at same time will lead to data corruption. This scenario is called split brain

Example

we have 4 nodes, one instance of SQL running on each node. Node 1 and 2 lost communications with node 3 and 4. In this scenario nodes 1 and 2 try to bring SQL instance resources owned by Nodes 3 and 4, in the same way nodes 3 and 4 will try to bring SQL resource owned by node 1 and 2, which will lead to disk corruption and other issues.  Quorum is designed to handle the scenario.  The cluster will force the cluster service to stop in one of the nodes to ensure only one owner for a particular SQL server resource group

Voting Process in Quorum

Quorum is based on voting algorithm, where more than half (above 50%) must be online to communicate each other. Cluster knows how many nodes available. Each node will have vote if number of votes drop below majority cluster service will stop.

Cluster requires more than half of total votes to achieve quorum. This is to avoid tie between nodes. In 3 node cluster 2 voters must be online to have quorum.

To achieve majority votes, we have 4 quorum models.

  • node majority
  • node/disk majority
  • file share
  • no majority

Public and Private Network

Each node of a cluster must have at least 2 network cards. One network card will be connected to public network and other to a private network.

Public network is the network where users are connected and communicate to clustered SQL Server instance.

Private network is used for communications between the nodes of cluster. It is used mainly for what is called the heartbeat signal. 2 forms of communications are executed.

LookAlive: Verifies whether SQL Server service runs on the online every 5 seconds by default.

Isalive: Verifies that SQL Server accepts connections by executing SP_Server_Disgnostics command

MSDTC

Microsoft distributed transaction coordinator is used for distributed transactions between clustered nodes and other remote servers If a transaction in a query have distributed transactions, we need MSDTC as cluster resource

Other Disaster Recovery option

Log shipping

Leave a Comment

Your email address will not be published.