The Most Important Task for DBAs

It's what matters most for a Database Administrator: being able to recover databases with as little loss of data as possible. In order to accomplish this you need two things of equal importance:

  1. A well-executed backup strategy
  2. A tested process of recovery

What this Article Covers

In this article we'll be focusing on gaining insight into just how well your backup strategy is being executed: are you covering yourself for an acceptable amount of data loss for databases of varying criticality? The primary measurement of that is your backup history. We will leave the subjects of storing backups in a secondary location to protect against machine and site failures and the testing of backups for another day. What I want to concentrate on is simply the following:

  1. How old are my most recent backups?
  2. Am I backing up all my databases (that require recovery)?
  3. Does the backup age meet my requirements/standards?
  4. Am I backing up correctly for the recovery model of my SQL database?

I can satisfy these questions all with a single query.

Querying for Backup Age (and More)

Backup metadata is stored inside of the SQL Server msdb system database. There are four main system views storing the backup metadata and for this query we only need to make use of two of them:

  • msdb.dbo.backupset: this view provides the base metadata for backup history covering when the backup occurred, who performed the backup, the type of backup, size of the backup and whether compression was employed, the LSN (log sequence number) information for the backup to allow for recreating the backup chain for recovery order, and much more.
  • msdb.dbo.backupmediafamily: provides information about the location of the backup files created in the backup process

I'm also utilizing the sys.databases system view from the master database in order to identify the recovery model for the database (is it logging transactions to allow for point-in-time recovery or not?) I also need to use a LEFT JOIN between the sys.databases view and the backup history views in order to identify databases existing on the instance of SQL Server but without any backup history which signifies databases at risk of loss.

Here is the query I've built to address the four questions above:

WITH full_backups AS
                                PARTITION BY BS.database_name,
                                ORDER BY BS.database_name ASC,
                                        BS.backup_finish_date DESC
                        ) AS [Row Number], AS [database_name],
        BS.type AS backup_type,
FROM master.sys.databases AS D
        LEFT JOIN msdb.dbo.[backupset] AS BS
                ON = BS.database_name
--WHERE BS.[type] = '<backup_type,,D>'
--WHERE BS.[name] = '<database_name,,Foo_DB>'

        CASE FB.backup_type
                WHEN 'D' THEN 'Data'
                WHEN 'I' THEN 'Differential'
                WHEN 'L' THEN 'Transaction Log'
        END AS backup_type_desc,
        DATEDIFF(hour,FB.backup_finish_date, GETDATE()) AS backup_hours,
        DATEDIFF(minute,FB.backup_finish_date, GETDATE()) AS backup_minutes
FROM full_backups FB
 LEFT JOIN msdb.dbo.[backupset] BS ON FB.backup_set_id = BS.backup_set_id
 LEFT JOIN msdb.dbo.backupmediafamily BMF ON BS.media_set_id = BMF.media_set_id
 WHERE FB.[Row Number] = 1
ORDER BY FB.database_name, FB.[Row Number], FB.backup_type

At the head of the query is a CTE (or Common Table Expression) that essentially creates a virtualized result set that can be re-used throughout the remainder of the script. This CTE stores all database name and recovery model information from sys.databases and joins those results to dbo.backupset in order to present backup type and date using the available backup metadata that is stored in msdb. I employ the ROW_NUMBER windowing function in order to order the results by the most recent to oldest backup date and reset that value for each combination of database name and type of backup. I also want to return the backup_set_id because I need it in order to join to msdb.dbo.backup_media_family later in the script.

Note that I've commented out two template parameter values in optional WHERE clauses in this CTE. If you're interested in only a few databases or only certain backup types you could remove commenting and use either or craft them into a single WHERE clause to filter the results. Depending on how diligent you are with maintaining your msdb data this could save considerable resources when running. I'm also a big fan of only returning the information you need and that includes the intermediate steps of queries as well. Not just the end results.

Now that we have a repository of backup (and possibly lack of backup) information ordered by the newest to oldest backup for each database and backup type we can use the remainder of this query to filter to just the most-recent backup for each database and backup type (data, differential, transaction log) and employ some functions to calculate the age of the backups in both hours and minutes.

I've exported the results to Excel for formatting purposes and to highlight items you may wish to watch:

The end result is a recordset showing each database and it's last data, differential, and log backup from the history the instance currently has inside the msdb database. Remember the four questions I originally identified earlier in the article?

  1. How old are my most recent backups?
  2. Am I backing up all my databases (that require recovery)?
  3. Does the backup age meet my requirements/standards?
  4. Am I backing up correctly for the recovery model of my SQL database?

We have the answers to Question 1 in the last two columns in the results provided in terms of both hours and minutes.

The answers to Questions 2 and 3 depend upon your company standards. If you don't have or don't know what those standards are I recommend you determine those after reading this article and evaluating if your backup strategy meets those standards.

I did not filter out tempdb from the results because I wanted to show that the query will return results for any databases that have no backup history. If you had a database that required backups (specifically addressing Question 2) and no backup history existed it would still show in the result set of the query. Note that tempdb does not need to be backed up as it's recreated whenever the SQL Server services are started or restarted.

Finally Question 4: are you backing up your databases appropriately for the recovery model of the database. I've highlighted an example in yellow above that shows where that is not happening. The Registration database is in Full recovery mode meaning that each transaction involving a delete, update, or insert is being appended to the end of the transaction log. That appending process will continue even if the transaction log fills if you've set auto-grow on for the transaction log file. The file will continue to grow until it's consumed all available space on the drive it's hosted on or the auto-growth setting exceeds the amount of available space on the drive. That is unless you are being sure to take transaction log backups for any database in Full recovery. A transaction log backup is the only process that keeps the transaction log in check: marking portions of the log for re-use once the log has been backed up. Once the logging process reaches the end of the physical log file the log is re-used by cycling back to the earliest portion of the transaction log marked for re-use. As you can see from the results above we're performing data and differential backups against that database but not transaction log backups.


This is a good base query for collecting backup history. I'm going to be using it over the next month to show how you can use it to also address issues around identifying compression savings, backup history for a single database, and other questions along the way.