Recovering Database with Missing Archived Logs

Introduction

In this paper we will present a method for retrieving data from an Oracle database that cannot be opened normally because one or more datafiles is inconsistent with the others. An example of a scenario where you would find yourself in this situation is as follows:

A disk failed on our server and we lost a datafile. We restored the datafile from a hot backup taken a week ago, but it turns out we are missing a few redo logs archived since then. Oracle is complaining because we can’t produce the required archived redo logs, and we can’t open the database. The datafile in question contains our most important tables. Is there any way we can salvage our data?

Every DBA should know that there is a problem here. The missing archived redo logs contain transactions which affect the data in the database. So it’s a given that you’re going to lose some data, but the question is, “How much?” Oracle takes a hard-line position and will not let you open the database normally because a data integrity issue exists. However, you may be able to retrieve much of your data if you use non-traditional means to get Oracle to drop its hard-line attitude. Retrieving the data that can be salvaged with the understanding that some data will be lost could be a whole lot better than losing all of the data because a subset of it has been corrupted.

In the next section of this paper, we will look at an overview of how you might go about salvaging data from an inconsistent datafile and get the database back up and running properly again. In the succeeding sections of this paper, we’ll dig into each step of the process in greater detail.

Overview

If you’ve lost a datafile that contained only indexes for heap-organized tables, or other data that is easily recreated, then your best bet might be to drop the tablespace and recreate and repopulate it from scratch. But if you’ve lost a datafile that contained important data not easily recreated, and all you have is an old backup of the file without all of the intervening archived redo logs, then you will want to extract what data you can from the problem tablespace, drop the tablespace, and then recreate and repopulate the tablespace.

Although the exact steps will vary depending on the particular situation, the general steps involved are:

Taking a cold backup of what you have now.
Restoring the lost datafile from a backup and applying the archived redo logs that you do have.
Setting an undocumented instance parameter which will allow you to open the database in its current state.
Doing exports and selects to retrieve what data you can from the problem tablespace.
Restoring the entire database from the cold backup taken earlier.
Taking the damaged datafile offline.
Doing exports and selects to retrieve additional data not salvaged in step 4.
Restoring again from the cold backup.
Dropping the problem tablespace.
Recreating the problem tablespace.
Rebuilding the data in the problem tablespace with the data extracted in steps 4 and 7.

Some of these steps can be quite tedious and time-consuming. You may choose to skip a few or even several of the steps depending on how much disk space, tedium, and database down time you are willing to pay in exchange for potentially salvaging more of the lost data.

As we discuss each of the steps in greater detail, we will walk through an example case where a datafile called ordtab03.dbf in tablespace ORDTAB was lost due to a disk crash. This datafile contained many extents of the ORDERS table. The datafile was restored from a hot backup taken July 4, 2004, but some of the archived redo logs between July 4 and the present day have been lost.

Step 1: Backup the Database

The first thing you should do is take a cold backup of whatever datafiles, online redo logs, and control files you currently have. If rebuilding everything from existing backups, then make sure you have a place to keep the backup files handy, as you’ll probably want them more than once. It’s best if you have cold backups. If you’ve just lost one or a few datafiles and the database is still open, make a hot backup of each of the remaining datafiles and save this somewhere (and make sure you keep the archived redo logs generated during and after the hot backup).

In a later step we will be using an undocumented and unsupported Oracle feature in order to make the best of a bad situation. Taking a backup of the database now, before we “cross the line” into unsupported territory, allows us to return to this side of the line in a later step.

After you’ve created the backup, it’s time to work on the database. Before you shut the database down, create a backup control file script:

        ALTER DATABASE BACKUP CONTROLFILE TO TRACE RESETLOGS;

This will create a script in your user_dump_dest directory to create a new control file. The name of the file will be similar to that of other trace files-something like [instance_name]_ora_[PID].trc. Rename the file to newcontrol.sql.

Now we want to edit the file. Open it with a text editor such as vi or emacs. There will be about 15 lines at the top of the file with version and connection information. We don’t want these messing up our script, so delete all the lines above the first line beginning with a pound sign (#). (Instead of deleting these lines, you could make them comments by putting a pound sign at the beginning of each.)

Then delete the line that begins “RECOVER DATABASE…” toward the end of the file and save the file.

Step 2: Restore the Lost Datafile and Apply Archived Redo Logs

At this point you should restore the lost datafile from a backup and apply what archived redo logs you have in order to roll the datafile contents forward as far as possible. You will have to stop at the first missing archived redo log. In our example, we restored the ordtab03.dbf datafile from the July 4, 2004 hot backup and applied the archived redo logs that we had available.

If you were to try to open the database normally at this point, you would get an ORA-01589 error:

        ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

If you then tried an ALTER DATABASE OPEN RESETLOGS command, you would get an ORA-01195 error:

        ORA-01195: online backup of file %s needs more recovery to be consistent

This is where Oracle is taking its hard-line approach. The datafile restored from the backup was not recovered to a point in time consistent with the other datafiles in the database. Therefore data corruption may exist and Oracle will not let you open the database normally.

Step 3: Set an Undocumented Instance Parameter and Open the Database

At this point we leave the world of standard practice and cross the line into unsupported activity. It’s time to edit the database’s init.ora file or spfile. First, you’ll want to set job_queue_processes to 0 if it’s not already, as you don’t need jobs running while you’re extracting data. Then you need to set the parameter:

        _allow_resetlogs_corruption=TRUE

This parameter is a “hidden” or undocumented parameter-one of those which you’re never supposed to use unless told to do so by Oracle Support. We probably should have pointed out earlier that you shouldn’t be doing this entire exercise unless you’ve failed with everything Oracle Support has told you to do. The description of this parameter reads “allow resetlogs even if it will cause corruption”. Some of the caveats to be found on Metalink regarding this parameter include:

Steps as mentioned here beneath are only applicable in a situation that no restore/recovery can be performed of the database in question. Applying beneath steps means that data will be exported from a database being in an inconsistent state (no instance recovery can be performed), this is true for user data as well for the data dictionary.

You should NOT use unsupported parameters without the advice from support.

Since you are using these undocumented parameters without specific instruction from Oracle Support, we can’t support this database. If someone did instruct you to set these parameters, then you may want to continue discussion on this issue with that individual.

If you are able to startup this database, it would be only to do a full export and recreate it. If you want to pursue this possibility, please log an itar. This issue cannot be handled through this forum.

That said, if you want to continue, after changing the parameter, then move to the directory where you saved your newcontrol.sql script in the first step. (If you didn’t move the script, it will still be in your user_dump_dest directory.) Connect to the database as SYSDBA and run the newcontrol.sql script.

Your database is now open (though NOT consistent). You can run queries and sometimes everything will appear perfectly normal:

        SQL> SELECT COUNT(*) FROM OE.orders;

          COUNT(*)
        ----------
            403439

The database is in an inconsistent state and you are skating on thin ice. The above query worked fine because Oracle didn’t need to access any data blocks that were corrupt or referenced inconsistent undo entries. The query could have just as easily failed with an ORA-00600 error such as:

        SQL> SELECT COUNT(*) FROM OE.orders;

        SELECT COUNT(*) FROM OE.orders
                                *
        ERROR at line 1:
        ORA-00600: internal error code, arguments: [kcfrbd_2], [14],
        [19081], [8], [1280], [1280], [], []

Step 4: Do Exports and Selects to Retrieve Data

As we saw in the previous step, some queries will work fine and some will fail right away. Still others will return partial results, quitting when an inconsistency is found:

        SQL> SELECT order_id FROM OE.orders
          2  WHERE entry_date > TO_DATE ('04-JUL-2004');

          ORDER_ID
        ----------
            496103
            496104
            496105
               ...
            511325
            511326
            511327
            511328
        ERROR:
        ORA-00600: internal error code, arguments: [kcfrbd_2], [14],
        [19081], [8], [1280], [1280], [], []

        15225 rows selected.

We can use this ability to “pick around” problem data blocks:

        SQL> SELECT order_id FROM OE.orders
          2  WHERE order_id > 511400;

          ORDER_ID
        ----------
            511401
            511402
            511403
              ...
            513398
            513399
            513400
            513401

        2001 rows selected.

Database exports are now possible too, though some errors will occur:

        About to export specified users ...
        . exporting object type definitions for user OE 
        EXP-00090: cannot pin type "OE"."ORDER_ELEMENT_TYPE"
        EXP-00056: ORACLE error 22303 encountered
        OCI-22303: type "OE"."ORDER_ELEMENT_TYPE" not found
          ...
        . about to export OE's tables via Conventional Path ...
        . . exporting table     BATCH_JOBSS       4382 rows exported
          ...
        . . exporting table  CUSTOMER_TYPES
        EXP-00056: ORACLE error 600 encountered
        ORA-00600: internal error code, arguments: [4146], [45144], [45124], [], [], [], [], []
        . . exporting table  DEFAULT_VALUES        391 rows exported

Even for the tables with errors, some data will likely be extracted and written to the export file. Moreover you can easily determine which tables you’re able to export all rows from, so you won’t have to make further extraction efforts with them.

Step 5: Restore the Database from Backup

This step, along with the next two, is optional. Together these three steps present another approach that may allow you to retrieve more of your data. Restoring the database from a backup at this point effectively undoes any damage caused by the use of the undocumented _allow_resetlogs_corruption instance parameter. This time through, we will not make any attempt to recover the lost datafile.

Step 6: Take the Damaged Datafile Offline

In this step you take the damaged datafile offline. The purpose here is to get the database to a point where everything is completely consistent, and the data that would be inconsistent is simply deemed unavailable.

This is fairly straightforward:

        ALTER DATABASE DATAFILE '/u07/oradata/PRD/ordtab03.dbf' OFFLINE;

Step 7: Do Exports and Selects to Retrieve Additional Data

At this point you may be able to retrieve additional data for salvage that you were not able to get at earlier. For example, you may be able to fetch useful data from indexes belonging to tables that are damaged. If you inadvertently try to access the damaged datafile, you’ll get an ORA-00376 error:

        ORA-00376: file 39 cannot be read at this time
        ORA-01110: data file 39: '/u07/oradata/PRD/ordtab03.dbf'

Step 8: Restore the Database from Backup

Now you restore the database from backup for the last time. This step officially rolls the database back to a point in time before the use of the undocumented instance parameter, and therefore returns the database to a supported state. Note that if you restored the database from backup in step 5 and have not updated any data in the database since then, you may be able to skip this step.

Step 9: Drop the Problem Tablespace

First you’ll need to determine whether there are any referential integrity constraints from tables outside the problem tablespace which refer to primary or unique keys of tables inside the problem tablespace. You can use a query such as the following:

        SELECT CR.constraint_name
        FROM   dba_constraints CR, dba_constraints CP, 
               dba_tables TP, dba_tables TR
        WHERE  CR.r_owner = CP.owner
        AND    CR.r_constraint_name = CP.constraint_name
        AND    CR.constraint_type = 'R'
        AND    CP.constraint_type IN ('P', 'U')
        AND    CP.table_name = TP.table_name
        AND    CP.owner = TP.owner
        AND    CR.table_name = TR.table_name
        AND    CR.owner = TR.owner
        AND    TR.tablespace_name <> 'ORDTAB'
        AND    TP.tablespace_name = 'ORDTAB';

If there were any such constraints, you would need to create scripts to recreate them (if you don’t already have them). If you are using export dumps to rebuild the data (in step 11), the constraints can possibly be restored from the export files.

Drop the tablespace containing the damaged datafile with a statement like:

        DROP TABLESPACE ordtab INCLUDING CONTENTS CASCADE CONSTRAINTS;

Step 10: Recreate the Problem Tablespace

Here you simply want to recreate the tablespace so that in the next step you can repopulate it with the data you extracted earlier. You could consult an old export file in order to recreate the tablespace exactly as it was, or you could take this opportunity to switch to a locally managed tablespace, enable the autoallocate feature, adjust storage parameters, and so on.

Step 11: Rebuild the Data in the Problem Tablespace

At this time you are ready to reload the tablespace with the data that you salvaged earlier. If you used the export utility to extract data from some or all tables, you can use import to restore the schema objects and data. If export encountered an error while reading a table, import should still be able to recreate the rows that were successfully extracted before the error occurred. If you used queries to extract data into flat files, then you can use SQL*Loader or any number of other tools to put the data back into the database.

Now you will have a functional database with as much data as you were able to extract from the damaged database. If you are lucky you were able to get almost all of your data. If not, at least you were probably able to get some data and get your database working again.

The first thing you should do immediately after repopulating the tablespace is take a backup of this repaired database. The second thing you should do is implement proper procedures for backing up the database, storing the backups and archived redo logs for appropriate retention periods, and testing the backup/recovery process on a regular basis. Your data

Recovering Database with Missing Archived Logs

How to check if an rpm package is installed on a Red Hat Linux server

How to change the SYSADMIN password

Silent Installation of Oracle database

Installing Oracle 11g RAC on virtual servers using VMware.

HOW-TO: Mount USB Thumbdrives in Solaris

fatal: libm.so.2: open failed: No such file or directory

Leave a Reply Cancel reply

Innovation & Knowledge Hub

Useful Links

Get In Touch With Me

Similar Posts

Leave a Reply Cancel reply

Innovation & Knowledge Hub

Useful Links

Get In Touch With Me