A Simple Way to Archive Data

We needed a way to archive data.  I have seen this request multiple times in my career and the most common solutions I have seen either used:

  1. INSERT data into the archive table, then DELETE data from the original table, or
  2. SSIS, or
  3. Table partitioning

All of these options are great, but they all have drawbacks that we weren’t happy with.

We needed our process to meet the following criteria:

  1. Archive anything older than 1 year
  2. Store archive data in a separate database
  3. Run the archive process daily
  4. Do not interfere with other database transactions
  5. Minimal administrative overhead (Isn’t this always the case? J)

Once again, all of the options I mentioned in the first paragraph could have met these criteria, and I’m sure there are many other options as well.  However, I came across an article that presented exactly what I needed:

https://www.mssqltips.com/sqlservertip/2259/sql-server-2008-consume-output-directly-from-the-output-command/

It’s not anything new, as it was introduced in the 2008 version, but it is pretty handy.  I like this option because it only accesses the table from which you are archiving a single time (as opposed to option #1 above), and it comes with very low administrative overhead (unlike options #2 and #3 above).

I created a stored proc which uses dynamic SQL to build archive statements which utilize the method from the article.  The proc is called from a SQL Server Agent job, which is run every 10 seconds for a 2 hour period every night.  We had to find the “sweet spot” of how many records to archive at a single time, versus how often to run the job (this is because if we query too many records, we can start blocking user queries, but if we query too few, or run the job too infrequently, then we don’t keep up with the volume of data that needs to be archived).  The other good thing about this method is that if the proc is still running when the next scheduled execution comes up, it will just skip that execution and try again 10 seconds later – in our case, missing a few runs is not a big deal.

The other thing to notice is that the DEADLOCK PRIORITY is set to LOW.  This will ensure that this proc is always the deadlock victim, and not other user queries.

Anyway, here is the link to the project.

Enjoy!

-Clint

Leave a Reply

Your email address will not be published. Required fields are marked *