Why backup
In the opening chapter of Craig Mullin's book, Database Administration, he says "In many ways, business today is data." The role within most organizations responsible for protecting data is the database administrator� you. That's right; the entire business is in your capable hands, running on that server that doesn't ever crash, with all those end users that don't ever make mistakes using applications built by those developers who right faultless code every time the first time, not to mention the able assistance of that new co-op that has 'sa' privileges thanks to your boss. OK. Stop crying. There are things you can do to protect the data under your care and one of the most important is running regular backups of that data.
Backups
Microsoft, in the SQL Server Books Online defines backups as "a copy of data that is used to restore and recover data after a system failure" which is pretty much what we're looking for. Backups can be created a number of ways and can incorporate all or some of the data, as well as some part of the transaction log. While this article is focused on 2005 syntax, most of the concepts are applicable to 2000. This is a huge topic. At best, I'm going to scratch the surface and give you enough information so you won't start crying again. After reading this, you should be able to set up a reasonable set of backups for your system.
In order to begin working on backups, the business needs define a recovery model. A recovery model is essentially what you're going to do with the transaction log data. The two most often used recovery models are pretty easy to define; Simple or Full. Before you assume that, of course you need Full to protect your data, you need to know what they do. The business is going to tell you if you need to recover to a point in time or if you simply need the last full backup. It's going to define if your data is recoverable by other means or if you have to protect as much as possible as it comes across the wire. You use Simple recovery if you can lose some data in the transaction log, you can restore all secondary read/write file groups when you restore the primary and you just don't need recovery to a point in time. These are mostly secondary databases that are not an absolute vital part of the enterprise or reporting systems with read only access so there isn't a transaction log to worry about anyway. You use Full if every bit of the data is vital, you need to recover to a point in time, or, usually in the case of very large databases (VLDB) you need to restore individual files and file groups independently of other files and file groups. With both you can now run a Copy-Only backup which allows you to copy the database to a backup file, but doesn't affect the log, differential backup schedules or impact recovery to a point in time. I'll try to drill down on as many of these topics as possible through the article, but not the files & filegroups.
Working With Simple Recovery
Enough talk. Let's get down to running backups. Let's assume that we're in Simple recovery on a small to mid-sized database. Your simplest backup strategy is to simply run a regular full backup of the database.
BACKUP DATABASE AdventureWorks
TO DISK = 'C:\Backups\AdventureWorks.BAK'
What's with all the typing you ask? Don't we have GUI tools to handle the work for us? Yes, most simple backups can be performed using SQL Server Management Studio. However, if you want to learn and understand what Management Studio is doing for you, or if you want some fine grained control over what is backed up, how and where, then you're going to have to break out the keyboard and put away the mouse.
The above is a basic backup to disk. Most DBA's I know backup to file and then scrape the files onto a tape or some other media. This is because files on disk are simple & quick to recover whereas media can sometimes be a bit of a pain. For example, we generally have two to three days worth of backups on our file systems for immediate recovery. We only go to the tape systems if we need to run restores for older backups.
What did that script do? It made a copy of all the committed data in the database. It also copied uncommitted log entries. These are used during recovery to either commit or rollback changes that occurred to the data during the backup process.
Normally backing up a database affects other backup & restore processes. For example after running the statement above, any differential backups (a backup that only copies data changed since the last backup) would be using this as the starting point for data changes, not the backup you ran last night. SQL 2005 introduces a concept to backups that allow us to keep from interrupting the cycle, copy only backups.
BACKUP DATABASE AdventureWorks
TO DISK = 'C:\Backups\AdventureWorks.bak'
WITH COPY_ONLY;
Already we've found one of those more granular moments when the Management Studio wouldn't help you. If you want a copy only backup, you have to use the command line.
Let's assume for a moment, that we're still in simple recovery, but we're dealing with a larger database, say something above 100gb in size. Full backups can actually start to slow down the process a bit. Instead, after consultation with the business, we've decided to do a weekly full backup and daily incremental backups. Incremental backups only backup the data pages that have changed since the last full backup.
BACKUP DATABASE AdventureWorks
TO DISK = 'C:\backups\AdventureWorks.bak'
WITH DIFFERENTIAL;
Now, if we had to restore this database, we'd first go to the last full backup, restore that, and then restore the differential backups in order (more on that later).
Note that after running all these sample backups (assuming you're typing along as you read) that we have not been changing the file names. Yet we haven't run into any errors. When running backups as we've done, SQL Server treats the file like a backup device and simply keeps appending the backups to the file.
If you want to simply overwrite the existing file you'll need to modify your backup statements:
BACKUP DATABASE Adventureworks
TO DISK = 'C:\backups\AdventureWorks.bak'
WITH INIT;
There are a number of other backup options that I won't be detailing here. Read the books online to see details on BLOCKSIZE
, EXPIREDATE
, RETAINDAYS
, PASSWORD
, NAME
, STATS
, etc.
You can also run a statement that will check the integrity of a database backup. It doesn't check the integrity of the data within a backup, but it does verify that the backup is formatted correctly and accessible.
RESTORE VERIFYONLY
FROM DISK = 'C:\backups\Adventureworks.bak'
Full Recovery and Log Backups
We've primarily been working on a database that was in set to Simple in it's recovery (this used to be called Truncate Log on Checkpoint) which doesn't maintain the transaction logs for later recovery. Every backup under this mechanism is a database backup. Log backups are simply not possible. However, you've only protected the data as of the last good backup. Let's change our assumptions. Now we're dealing with a mission critical application & database. We want to be able to recover this database up to the latest minute, but we can live with no more than 15 minutes worth of lost data. Now we're in Full recovery and the transaction logs are building up between full backups. Let's further assume that we've got a larger database. First, on a scheduled basis, in this case every 15 minutes, based either on data needs or business requirements, we'll backup the logs:
BACKUP LOG Adventureworks
TO DISK = 'C:\backups\AdventureWorks_Log.bak';
This script will backup committed transactions from the transaction log. It has markers in the file that show the start & stop time. It will truncate the log when it successfully completes, cleaning out the committed transactions that have been written to the backup. In an emergency, you can use the WITH NO_TRUNCATE
statement to capture data from the transaction log regardless of the state of the system. This is for emergencies only.
Note that we are not using the INIT
statement. When doing log backups, you've got options. First option, run all the backups to a single file, where they'll stack and all you have to do, on restore (covered later) is cycle through them. Second option, name the backups uniquely, probably using date & time in the string. In that case, safety says, use INIT
. We've used both approaches in our systems for different reasons. You can decide what is best for your technology & business requirements.
Most of the options available to the database backup are included in Log backup including COPY_ONLY
. This would allow you to capture a set of transaction data without affecting the log or the next scheduled log backup. This would be handy for taking production data to another system for troubleshooting etc.
If you have your database set to FULL
Recovery, you need to run log backups. Sometimes, people forget and the log fills up disk drives. In this case, you can run:
BACKUP LOG AdventureWorks WITH NO_LOG;
This is absolutely not recommended because it breaks the log chain, the series of log backups from which you would recover your database to a point in time. Microsoft recommends running a full backup immediately after using this statement.
Restoring Databases
As important as backups are, and they are vital, without the ability to restore the database, the backups are useless. Restoring a full database backup is as simple as it was to create:
RESTORE DATABASE Adventureworks
FROM DISK = 'C:\Backup\AdventureWorks.bak';
It's really that simple. Unless, as we did originally, we were backing up everything to a single file. In that case, you'll need to specify which file you're accessing. If you don't know which file, you'll need to generate a list:
RESTORE HEADERONLY
FROM DISK = 'C:\Backup\Adventureworks.bak';
This will give you the same list as I showed above from Management Studio. So now, if we wanted to restore the second file in the group, the COPY_ONLY
backup, we would
RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backup\Adventureworks.bak'
WITH FILE = 2;;
Unfortunately, if you're following along, you may find that you just generated this error:
Msg 3159, Level 16, State 1, Line 1
The tail of the log for the database "AdventureWorks" has not been backed up. Use BACKUP LOG WITH NORECOVERY
to backup the log if it contains work you do not want to lose. Use the WITH REPLACE
or WITH STOPAT
clause of the RESTORE
statement to just overwrite the contents of the log.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.
What this means is, that your database is in full recovery mode, but you haven't backed up it's log recently. You can overwrite the log if you change the syntax to this:
RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2,
REPLACE;
That's the first time we've stacked the WITH
clauses, but it won't be the last. Read through the books online. Most of the WITH clause statements can be used in combination with the others.
What happens if we want to restore to a different database? For example, we made a copy of our database in the second backup. Maybe that was to move it down to a production support server where we were going to do some work on it, separate from other copies of the database. If we take the simple approach, well, try this:
RESTORE DATABASE AdventureWorks_2
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2;
In this case you should have a whole series of errors relating to files not being overwritten. You really can create new databases from backups, but if you're doing it on a server with the existing database, you'll need to change the location of the physical files using the logical names. In order to know the logical names of the files for a given database, run this prior to attempting to move the files:
RESTORE FILELISTONLY
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2;
This can then be used to identify the appropriate logical names in order to generate this script:
RESTORE DATABASE AdventureWorks_2
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2,
MOVE 'AdventureWorks_Data' TO 'C:\backups\aw2_log.mdf',
MOVE 'AdventureWorks_Log' TO 'C:\backups\aw2_log.ldf';
These are actually all the easy ways of running a restore for a database. The last method is to apply the differential backup. This requires two steps. First, we'll restore the database, but with a twist.
RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 1,
NORECOVERY,
REPLACE;
RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\AdventureWorks.bak'
WITH FILE = 3;
Most of this is probably self-explanatory based on what we've already gone over. The one wrinkle is the inclusion of the phrase NORECOVERY
. Very simply, during a restore, transactions may have started during the backup process. Some of them complete and some don't. At the end of a restore, completed transactions are rolled forward into the database and incomplete transactions are rolled back. Setting NORECOVERY
keeps transactions open. This allows for the next set of transactions to be picked up from the next backup in order.
We're mainly dealing with simple backups & restores in this article, but a more advanced restore in 2005 allows secondary file groups to be restored while the database is online. It's primary file group must be online during the operation. This will be more helpful for very large database systems.
Restoring Logs
Restoring logs is not much more difficult than the incremental database restore that we just did. There's just quite a bit more involved in getting to a moment in time. Assuming you're backing up your logs to a single file or device:
RESTORE HEADERONLY
FROM DISK = 'C:\Backups\Adventureworks_log.bak';
Otherwise, you simply go and get the file names you need. First run the database restore, taking care to leave it in a non-recovered state. Follow this up with a series of log restores, this time going to a point in time.
RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 1,
NORECOVERY,
REPLACE,
STOPAT = 'Oct 23, 2006 14:30:29.000';
RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 1,
NORECOVERY,
STOPAT = 'Oct 23, 2006 14:30:29.000';
RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 2,
NORECOVERY,
STOPAT = 'Oct 23, 2006 14:30:29.000';
RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 3,
NORECOVERY,
STOPAT = 'Oct 23, 2006 14:30:29.000';
RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 4,
STOPAT = 'Oct 23, 2006 14:30:29.000';
Now what we have is a database that is up to the exact, last committed transaction at 14:30:29 on the 23rd of October. Just in case you forget to leave the
NORECOVERY
off, or simply stop in the middle and would like to bring the database back online, you can use this statement to complete the process
RESTORE DATABASE Adventureworks
WITH RECOVERY;
Database Snapshots
SQL Server 2005 introduced the concept of a snapshot, or a read-only, static view of a database. Snapshots are primarily created in order to supply a read-only version of a database for reporting purposes. However, they do function similar to backups. The one primary difference is, all transactions are rolled back. There is no option for rolling forward, capturing logs, etc., that backups provide. Because of this they are much faster than backups both to create and restore. A good use of them, in addition to reporting, might be to run a snapshot prior to maintenance after you've already removed all the active users (and their transactions) from the system. These are stored on the server, so you must make sure you've got adequate storage.
The syntax is different because you're not backing up a database, you're creating a new one:
CREATE DATABASE Adventureworks_ss1430
ON (NAME = AdventureWorks_Data,
FILENAME = 'C:\Backups\AdventureWorks_data_1430.ss')
AS SNAPSHOT OF AdventureWorks;
Now it will be accessible for read only. Since we're primarily concerned with using this as a backup mechanism, let's include the method for reverting a database to a database snapshot.
First, identify the snapshot you wish to use. If there is more than one on any database that you're going to revert, you'll need to delete all except the one you are using.
DROP DATABASE Adventureworks_ss1440;
Then you can revert the database by running a RESTORE
statement (mixed metaphors, not good)
RESTORE DATABASE Adventureworks
FROM DATABASE_SNAPSHOT = Adventureworks_ss1430;
That's it. On my system, running the database snapshots of Adventureworks took 136 ms. The full backup took 5,670 ms. The restore of the snapshot took 905ms and the database restore took 13,382ms. Incorporating this into a production rollout process could result in significant benefits
Again, it's worth noting that there are some caveats to using the snapshot. You have to have enough disk space for a second copy of the database. You need to be careful dealing with snapshots since most of the syntax is similar to that used by databases themselves. Last, while there are snapshots attached to a database you can not run a restore from a database backup on that database.
Best Practices
Database backups should not be a technical decision. They should be dictated by the business. Small systems with low transaction rates and/or reporting systems that are loaded regularly will only ever need a full database backup. Medium sized systems and large systems become dependent on the type of data managed to determine what types of backups.
For a medium sized system, a daily backup with log backups during the day, probably would answer most data requirements in a timely manner.
For a large database the best approach is to mix & match the backups to ensure maximum recoverability in minimum time. For example, run a weekly full backup. Twice a day during the week, run an incremental backup. Every 10 minutes during the day, run a log backup. This gives you a large number of recovery mechanisms.
For very large databases, you'll need to get into running filegroup and file backups because doing a full backup or even a differential backup of the full database may not be possible. A number of additional functions are available to help out in this area, but I won't be going into them here.
You should take the time to develop some scripts for running your backups and restores. A naming convention so you know what database, from which server, from which date, in what specific backup and format will be very conducive to your sanity. A common location for backups, log, full or incremental, should be defined. Everyone responsible should be trained in both backup and recovery and troubleshooting the same.
The real test is to run your backup mechanisms and then run a restore. Then try a different type of restore, and another, and another. Be sure that, not only have you done due diligence in defining how to backup the system, but that you've done the extra step of ensuring that you can recover those backups. If you haven't practiced this and documented the practice and then tested the document, in effect, you're not ready for a disaster.
Summary
Backups within your enterprise should be like voting in Chicago, early and often. Setting up basic backups is quite simple. Adding on log backups & differentials is easy as well. Explore the options to see how to add in file and file group backups and restores to increase the speed of your backups and restores both of which will increase system availability and up time. Keep a common naming standard. Be careful when using snapshots, but certainly employ them. Store your files in a standard location between servers. Practice your recoveries. Finally, to really make your backups sing, pick up a copy of RedGate's SQL Backup, which speeds up backups and compresses them, using less disk space and time.