[Techtalk] Data archiving

Sat Nov 30 14:12:08 EST 2002

Hey Michelle,

I've run into this problem a number of times and here's the way I usually
approach it:

1. Is the data tied to any kind of time period such as month or year?  If
so, take a look at the business considerations and determine how long the
data is relevant to daily operations as opposed to historical reporting.  If
there is a clean break point between daily and historical needs, it makes
the most sense to archive the data into historical tables rather than the
daily-use tables.

2. If there are no clean breaks by time period, you might consider if there
are some kind of quotas that you can establish for various tables.  For
example, if you're tracking site usage history, you might only keep that
last 25 visits for each user instead of their entire history.

3. For data that you remove from the primary tables, give some thought to
how the user may actually need to access the data.  If it's only for the
rare report, you might archive the data into delimited files, but provide a
mechanism for reloading the data into temporary tables for reporting
purposes.  If the data is needed more frequently, then the archive tables in
the database will probably be your better choice.

4. If historical reporting is needed frequently, you might also consider
setting up a second database server that operates as a data warehouse.  This
option has the big advantage of allowing you to distribute the load across
servers.  I've seen many companies where the database administrator flatly
refuses to allow users to run ad-hoc queries against a production server
because of load concerns.  In these cases, they are generally much more
willing to support a data warehouse or 'reporting' database rather than
having users beat the production database to death.  If you take this
option, you can also add additional tables for summary data such as daily,
weekly, monthly or yearly totals.  When you transfer the data to the
warehouse, you update the summary data on the way in, and users have the
benefit of lower overhead for reports.

In one particular application, we were dealing with information that is used
for telephone call routing analysis.  The system generated hundreds of
thousands of records every day.  Since we had a clean break each day, we
simply created an instance of the database schema for each day.  That
allowed us to easily drop databases after a predetermined period such as a
month or year to reduce the overall database size.  But, there was some
added complexity when it came to cross-database reporting.  In the end, we
found the most frequent requirement was for daily reporting so this worked
quite well.

In another application, I was tracking daily, weekly, monthly and yearly
sales totals for a retail store with 36,000 items. We setup one table each
for daily, weekly, monthly and yearly values.  As the sales were processed
for each day, we would update all of the summary tables in addition to the
daily detail.  After a few months, we were able to start archiving out the
daily numbers on a monthly basis, and working with the weekly, monthly and
yearly numbers.  At the end of the year, we then had the option of archiving
the other tables in the same fashion, or we could keep the history for
comparing numbers across years.

Hope this helps...

William D. 'Diggy' Bell
Principal
DB Software Development
http://www.dbsoftdev.com

----- Original Message -----
From: "Michelle Murrain" <tech at murrain.net>
To: <techtalk at linuxchix.org>
Sent: Saturday, November 30, 2002 12:31 PM
Subject: [Techtalk] Data archiving

> I've been writing web-based database stuff for a while - but it's all
> been in the "small" scale of databases - in the small hundreds of
> records or less.
>
> Well, one particular installation is getting bigger, and the way that
> I've written the application, it's getting very unwieldy for them to
> deal with the output of tables, and they don't want to see anything
> more than 1 month old. They want to delete this stuff, but I shudder
> to think of that.
>
> So, I'm thinking of 3 different strategies for archiving:
>
> 1) Take the old stuff out of the database and stick it in a delimited
> text file so they can get it later, if they want.
> 2) Take the old stuff and toss it in an "archive" table within the
> same database
> 3) leave it in the table, and provide an easier front-end to choose
> the range of data they look at, and perhaps choose a default or
> something.
>
> I'm leaning toward either 1 or 3, I might actually implement both.
> But any suggestions in terms of performance as well as overall db
> design for people who've done this sort of thing would be helpful.
>
> Thanks.
> --
> .Michelle
>
> --------------------------
> Michelle Murrain, Technology Consulting
> tech at murrain.net     http://www.murrain.net
> 413-253-2874 ph
> 413-222-6350 cell
> 413-825-0288 fax
> AIM:pearlbear0 Y!:pearlbear9 ICQ:129250575
>
> "A vocation is where the world's hunger & your great gladness meet."
> Frederick Buechner
> _______________________________________________
> Techtalk mailing list
> Techtalk at linuxchix.org
> http://mailman.linuxchix.org/mailman/listinfo/techtalk
>