Tuesday, June 13, 2006

Refreshing Data

Moving data from one database to another is a very common task for a database administrator. So common, in fact, that I have never bothered to write about it, because most DBAs already have their own preferred method.

If you haven't chosen your approach yet, allow me to show you the general flow, some points to keep in mind, and some of the decisions you'll have to make based upon the nature of your data and the tools at your disposal.

1. Lose Your Integrity

What do you do when you have one tables that relies on another, through a foreign key? In order to avoid having your import fail, you would need to have the tables loaded in the right order. Or, you could just disable all constraints.

Why all constraints? Why not just the foreign keys? Well, checking constraints takes time. Presumably this is data you're grabbing from an environment where these constraints have already been checked, why check them again? If you don't trust that data, don't worry, you'll catch the errors when you try to re-enable the constraints later on.

You may want to disable triggers while you're at it. Presumably those have already fired when the data was first entered, and those can be time consuming as well.

While you're in the disabling mood, drop your indexes. It will just take time to update them with every row you insert. Just drop them now and re-create them when you're done. Much faster over-all.

One more thing, you want to do these and all subsequent steps with logging turned off. Why keep a redo log of each transaction? That's just taking time and space. Chances are your mentality is that either all the data gets in, or all of it doesn't.

2. Wipe Out Your Data

You may already have data in the target database that you want to replace. You could insert your data in such a way that it checks if the data already exists before adding it. But that takes time and effort. Instead, you may just want to wipe out and re-insert all your data.

What is the best way to eliminate your data? Don't use "delete." Delete is for selectively deleting a subset of rows, or if you want to be able to rollback your work. Instead, in this case, truncate your data. No logging: much faster.

By the way, when you're truncating, you'll be glad that you disabled your foreign key constraints. Otherwise you'd have to be very careful in the order you truncated your tables.

You may want to consider using "reuse storage" which will keep the space reserved for the data. Since you're probably going to be inserting a roughly equal or superior amount of data, this will save you from having to expand your extents during the import.

An alternative to all this is to drop all the tables and then have your import process re-create the tables with the data involved. That option is only appealing if you're not 100% confident the table structure is the same, otherwise I don't see the point.

Before you wipe out your data, I'm assuming either you already have a backup, or you don't care about this data. Otherwise, you should have done this first:

3. Dump Your Data

Which is outside the scope of this article. :)

4. Copy Your Dump

5. Import the Data

There are many different tools you can use to import your data, and each one has many different methods and options. Here is a brief list of your choices which, of course, depend on how you dumped your data.

SQL*Loader/SQLLDR (Check the Oracle Server Utilities Guide, Part 2)
IMP utility
Oracle 10g Data Pump Export and Import (DBMS_DATAPUMP)
Oracle Data Manager (Oracle Enterprise Manager Console)
Your own custom application (possibly using Oracle Call Interface/OCI).

If your data import is still ridiculously slow, ask yourself if you're using RAID 5. Are you? RAID 5 is slow, man. No getting around that, that's the nature of the beast.

6. Restore Your Integrity

Now that the job is done, it is time to restore your integrity (and lose your faith in man). Re-enable your constraints, and triggers, and restore your indexes. This may take some time, so get a coffee.

7. Recalculate Statistics

In order for the optimizer to work effectively you'll need to re-calculate your statistics (DBMS_STATS.GATHER_SCHEMA_STATISTICS). If it knows how big the tables are, the optimizer will be able to make the best choices in query plans. You can import the statistics, if they were kept up-to-date in your production system, but might as well gather them now.

That's it!

This was by no means meant to be a comprehensive step-by-step guide with an exhaustive list of everything to consider. But at least now I've made an attempt to draw out the basic flow, and some of the things you should keep in mind. Hopefully this will help you design a solution of your own, most appropriate for your data and how you use it.

Comments:
Rob,

Good outline. I think a lot of people will find it useful. It's one of those things we all have to do but no one ever seems to write about.

LewisC
 
Regarding "reuse storAGE", HOW does append mode insert combine with this?

I am a bit confused:

Append mode (as in insert /*+ append */) implies blocks are directly formatted on the data files. I am not sure if this means all data is created AFTER the HWM. If everythinhg goes after HWM, then reuse storage would probably not work - right?

Please comment.

Thanks.
 
I'm not sure I understand your question, and I'm also not sure I could, especially without a test case.

I also think your best bet is to post the question in a forum or newsgroup where several others could join me in having the opportunity to help.

Thanks,
Robert
 
Hi Robert
Thanks for sharing this information.
I need to ask two things
1. Is all the mentioned involves in the data refreshing
2. could describe the difference between data refresh, capacity planning, cloning ,data integrity.
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?