Some time ago i’ve encountered a problem with a database of a customer. It is Oracle 9.2.0.1 on Windows 2000 with Oracle Fail Safe.
On Alert.log we found:
KCF: write/open error block=0x4351 online=1
file=2 O:\ORACLE\ORADATA\GEOP\UNDOTBS01.DBF
error=27070 txt: 'OSD-04016: Error queuing an asynchronous I/O request.
O/S-Error: (OS 2) The system cannot find the file specified.'
Automatic datafile offline due to write error on
file 2: O:\ORACLE\ORADATA\GEOP\UNDOTBS01.DBF
Tue Jul 10 02:08:42 2007
Errors in file o:\oracle\admin\geop\udump\geop_ora_844.trc:
ORA-00376: file 2 cannot be read at this time
ORA-01110: data file 2: 'O:\ORACLE\ORADATA\GEOP\UNDOTBS01.DBF'
ORA-00372: file 2 cannot be modified at this time
ORA-01110: data file 2: 'O:\ORACLE\ORADATA\GEOP\UNDOTBS01.DBF'
where the two lines
ORA-00376: file 2 cannot be read at this time
ORA-01110: data file 2: 'O:\ORACLE\ORADATA\GEOP\UNDOTBS01.DBF'
were repeated thousands of times.
at same time in windows event viewer:
Event Type: Warning
Event Source: Ftdisk
Event Category: None
Event ID: 50
Date: 07/07/2007
Time: 07:55:15
User: N/A
Computer: GEOCALL2
Description:
{Lost Delayed-Write Data} The system was attempting to transfer file data from buffers to \Device\HarddiskVolume5. The write operation failed, and only some of the data may have been written to the file.
Data:
0000: 00 00 04 00 02 00 56 00 ......V.
0008: 00 00 00 00 32 00 04 80 ....2..
0010: 00 00 00 00 00 00 00 00 ........
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
0028: 0e 00 00 c0 ...À
and repeated messages
Event Type: Warning
Event Source: Disk
Event Category: None
Event ID: 51
Date: 07/10/2007
Time: 02:21:01
User: N/A
Computer: GEOCALL2
Description:
An error was detected on device \Device\Harddisk3\DR3 during a paging operation.
Data:
0000: 04 00 22 00 01 00 72 00 .."...r.
0008: 00 00 00 00 33 00 04 80 ....3..
0010: 2d 01 00 00 0e 00 00 c0 -......À
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
0028: 04 00 00 00 03 00 00 00 ........
0030: 00 00 00 00 2a 00 00 00 ....*...
0038: 00 08 00 00 00 00 00 00 ........
0040: 2a 00 02 4f db 2f 00 00 *..OÛ/..
0048: 08 00 ..
I’ve to say that this Oracle installation is not very lucky, messages of disk problems in event viewer sometimes returns, but hardware vendor tell us that there are no problems on the hardware.
Another thing that i’ve to remember is to read with my eyes the alert.log. In fact i was called by a collaborator and i did not see the line
Automatic datafile offline due to write error on
Immediately i’ve thought a corruption on the file, i’ve created a new UNDO tablespace, i’ve changed UNTOTBS parameter to point to the new tablespace. Then we tried to remove the old tablespace but we got a message that a rollback segment was “active”. In V$TRANSACTION there was no records. I was not able to understand why Oracle was telling us that. The database could be opened but the application on one step was still given an error message by Oracle stating that old undo tablespace datafile were not available. So we decided to restore from backup the tablespace, we recovered. After that i onlined the tablespace, i made a “select count(*)” from a table (the table used by application that had given the error). After that i’ve been able to drop tablespace with datafile.
Conclusion
My description has been confused but the conclusion, and the lesson i’ve learned is that UNDO TABLESPACE may contain data needed to the integrity of the database. I think that is the case of “delayed block cleanout”. If there are active trasactions that is obvious and it is visible by V$TRANSACTION system view, but in the case of “delayed block cleanout” i think that information is not easily available.