Click here to Skip to main content
15,886,110 members
Articles / Productivity Apps and Services / Sharepoint
Tip/Trick

Improving Backup Performance

Rate me:
Please Sign up or sign in to vote.
5.00/5 (5 votes)
1 Oct 2018CPOL4 min read 5.5K   1  
Improving backup performance

Introduction

We had a Sharepoint production environment consisting of 2 instances hosted on a active active Windows Server 2012 cluster. One of the instances contained around 15 TB of data whereas the other instance consisted of around 8 TB of data. We had a separate VLAN adapter configured so that the backups can flow dedicatedly through it without any issues. The n/w throughput of this particular VLAN adapter is around 10 GBPS. Both the nodes of the cluster had around 266 GB RAM and 40 CPUs each. Both the SQL instances are allocated 120 GB RAM. When each of the SQL instances is on a dedicated node; the backup jobs for both the instances completes within 8 hours and 5 hours respectively. We take the backup of the database onto a backup appliance named data domain. This speed is obvious because the nodes have a separate VLAN adapter whose n/w throughput is a maximum of 10 GBPs. But when the SQL instances do a failover and both of them reside on the same node of the cluster; the database backup all of a sudden starts taking more time to complete; which obviously is a matter of concern. Since the instance was on SQL 2014 version, we were performing the backups through backup compression utility.

When we started our investigations, the very first suspect was the Hardware Configuration on the server and we decided to have a look into it.

  1. Drive Latency – No issues at all.

    When we executed the below T-SQL against each of the Instances; we found that the Overall Latency of all the drives is less than 5 which means that there is No Problem with respect to the Latency. Please note that the below T-SQL has been taken from Glenn Berry’s Blog Post, I have always found it useful in cases related to Performance Tuning.

    SQL
    SELECT [Drive],
                   CASE 
                                  WHEN num_of_reads = 0 THEN 0 
                                  ELSE (io_stall_read_ms/num_of_reads) 
                   END AS [Read Latency],
                   CASE 
                                  WHEN io_stall_write_ms = 0 THEN 0 
                                  ELSE (io_stall_write_ms/num_of_writes) 
                   END AS [Write Latency],
                   CASE 
                                  WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
                                  ELSE (io_stall/(num_of_reads + num_of_writes)) 
                   END AS [Overall Latency],
                   CASE 
                                  WHEN num_of_reads = 0 THEN 0 
                                  ELSE (num_of_bytes_read/num_of_reads) 
                   END AS [Avg Bytes/Read],
                   CASE 
                                  WHEN io_stall_write_ms = 0 THEN 0 
                                  ELSE (num_of_bytes_written/num_of_writes) 
                   END AS [Avg Bytes/Write],
                   CASE 
                                  WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
                                  ELSE ((num_of_bytes_read + num_of_bytes_written)/_
                                       (num_of_reads + num_of_writes)) 
                   END AS [Avg Bytes/Transfer]
    FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
                            SUM(io_stall_read_ms) AS io_stall_read_ms, _
                            SUM(num_of_writes) AS num_of_writes,
                            SUM(io_stall_write_ms) AS io_stall_write_ms, _
                            SUM(num_of_bytes_read) AS num_of_bytes_read,
                            SUM(num_of_bytes_written) AS num_of_bytes_written, _
                            SUM(io_stall) AS io_stall
          FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
          INNER JOIN sys.master_files AS mf WITH (NOLOCK)
          ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
          GROUP BY LEFT(UPPER(mf.physical_name), 2)) AS tab
    ORDER BY [Overall Latency] OPTION (RECOMPILE);
  2. The CPU utilization during the period when the backup executed was just under 10% which means that there is no issue at all.
  3. The memory utilization was 70% and around 30% memory was available for the Operating System.
  4. Storage team confirmed that there is no issue with respect to the storage on the Data Domain.
  5. Network team confirmed that there is no problem on the N/W Side; nor any blockages seen with respect to Firewall.
  6. Server Support Team confirmed that all the configuration settings look OK.

Taking the investigating a little ahead, we noticed the following behavior:

  • Backup job for one instance starts at 4:30 PM and for the other instance, it starts at 10 PM.
  • Whenever the Backup Jobs executes, it starts the backups in ascending order of the database names. Just FYI, the database backup jobs are created using maintenance plans.
  • When the DB backups for instance 2 starts at 10 PM, the backups for instance 1 is already in progress and during that period if a huge backup is going on for both the instances, then it slows down the entire operation.
  • In order to overcome this problem, we decided to perform the database backups for instance 1 in decreasing order of size and for instance 2 in increasing order. Just FYI, I took a decision to perform the database backups in decreasing order of size on instance1 because instance 1 was holding huge set of databases and majority of them were pretty huge in size, i.e., around 700 GB – 1.5 TB. When the database backups start at 4:30 PM in decreasing order of their size; it completes backups of majority of the huge databases by 10 PM and when the backups for instance 2 start at 10 PM, there is no major congestion seen.

Database Names in Decreasing Order of Backup Size

SQL
select 
a.name, 
SUM(((size*8)/1024)) [Size_MB] 
FROM sys.databases a,sys.master_files b
where a.database_id=b.database_id
and a.name not in('tempdb',)
and DATABASEPROPERTYEX(a.name, 'Status') = 'ONLINE'
group by a.name
order by 2 desc

The above query can be used for Increasing Order of Backup Size by just using order by as Order by 2.

You can write a While Loop, use the above Query in a Dynamic SQL and take the Database Backups.

In order to further boost the Performance; we decided to use MAXTRANSFERSIZE option in the Backup Command and specified the Value of the same as 4194304.

MAXTRANSFERSIZE: As per Microsoft Books Online, this parameter specifies the largest unit of transfer in bytes to be used between SQL Server and the backup media. The possible values are multiples of 65536 bytes (64 KB) ranging up to 4194304 bytes (4 MB).

With all the above changes; the DB Backup Time for Instance One reduced from around 23 hours to 16.5 hours and for Instance Two it came from 13 hours 58 minutes to 9 hours 38 minutes.

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect CapGemini India Private Limited
India India
Satnam Singh is a DBA Manager with Capgemini in India. Satnam has around 14 years of experience on Microsoft SQL Server Technology. His main area of expertise is T-SQL, High Availability solutions, Migrations, Upgradations, Performance Tuning etc. During his free time when he is not talking about SQL, Satnam loves spending time with his family. Satnam Lives along with his Parents in Mumbai,India.

Comments and Discussions

 
-- There are no messages in this forum --