Re:求救，orcal数据块损坏

精华区

当前位置：网易精华区>>讨论区精华>>编程开发>>● 数据库技术>>Oracle产品>>Re:求救，orcal数据块损坏

主题：Re:求救，orcal数据块损坏

发信人: qxszjj(绝对红色)
整理人: pengxiaohui(2001-09-26 10:50:02), 站内信件

可以跳过坏块，进行数据挽救,步骤还是比较烦琐的！！！

我看见过Oracle的杂志上有怎么对ORA-1578的处理，

WebIV:View NOTE

Article-ID: <Note:28814.1>
Circulation: PUBLISHED (EXTERNAL)
Folder: server.Rdbms.DBA.Corruption
Topic: Handling Block corruptions (Oracle 7)
Platform: GENERIC Generic issue
Subject: TECH: ORA-1578 and Data Block Corruption
Author: RPOWELL (BASED ON OLS ARTICLE)
Creation-Date: 20-SEP-1996 12:50:45
Modified-Date: 09-SEP-1998 08:16:14
Document-Type: BULLETIN
Impact: MEDIUM
Skill-Level: ACCOMPLISHED
Component: RDBMS Server 07
Content-Type: TEXT/PLAIN
Attachments: NONE

(1) Introduction
~~~~~~~~~~~~~~~~
This short article discusses the oracle error ORA-1578 and describes
the main actions to take to deal with it. Please read the complete
article before taking any action. You are advised to contact Oracle
support with as much information as possible for advice on which
option is applicable to your scenario.

Throughout the article there are 'Evidence>>' tags which show what
information you should collect. All SQL statements are for use in
either SQLDBA or Server Manager when connected 'internal'.

In some cases you may be sent this article in connection with
errors other than ORA 1578. In these cases Oracle support will inform

you of the 'FILE' and 'BLOCK' numbers to use in conjunction with the
steps here.

(2) What is an ORA 1578 ?
~~~~~~~~~~~~~~~~~~~~~~~~~
An ORA-1578 error is raised when Oracle sees a data block which it
thinks is corrupt. The error should always provide you with the file

number and the block number of the block in question. For the
purposes of this article we shall use:

<F> to denote the file number reported in the error
to denote the block number reported in the error

The original cause of the error could be any of:
- Bad IO hardware / firmware
- Operating system IO or caching problems
- Memory or paging problems
- Oracle incorrectly attempting to access an unformatted block
- Part of the datafile overwritten
- Disk repair utilities
- any of a number of other causes.

The point in time when the error is raised may be much later than
when any corruption actually occurred.

(3) Initial Actions:
~~~~~~~~~~~~~~~~~~~~
Whenever such an error occurs note down the FULL error message/s
and look in the instance alert log and trace files for any associated

errors. It is important to do this first to assess whether this is
a single block corruption or a more severe issue.

Once you have determined that it appears to be a single block problem

then the steps below can be used to help determine what action
can be taken.

IMPORTANT: If there are multiple errors OR you also have operating
system level errors against the affected file there is
little point proceeding until the underlying problem
has been addressed. Get your hardware vendor to check
the system over and contact Oracle Support with details
of all errors.

Evidence>> - Record the original error in full, along with details of

the application which encountered the error.
- Save an extract from the alert log from a few hours before
the FIRST recorded problem up to the current point in time.
- Save any tracefiles mentioned in the alert log.
- Record any recent OS problems you have encountered.
- Note if you are using any special features - Eg: ASYNC IO,
fast write disk options etc..
- Record your current BACKUP position (Dates, Type etc...)
- Note if your database is in ARCHIVELOG mode or not
Eg: Issue "ARCHIVE LOG LIST" in SQLDBA.

(4) Which File and Object was the Error Reported Against ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The first step to addressing a reported corruption is to find out whic
h
file the corruption occurs on:

SELECT * FROM v$datafile WHERE file#= <F>;

The next step is to find out which object the error was reported
against. The following query will tell you the object name , owner
and type:

SELECT owner, segment_name, segment_type
FROM dba_extents
WHERE file_id = <F>
AND  BETWEEN block_id AND block_id + blocks - 1;

Based on the 'segment type' returned:

- If the segment type is TEMPORARY or CACHE or no rows are returned
recheck you have entered the SQL and parameters correctly.
If you get the same result contact Oracle support.

- If the segment is a ROLLBACK segment contact Oracle support.
The database is likely to require recovery.

- If the segment is an INDEX determine which TABLE it belongs to:
Eg:
SELECT table_owner, table_name
FROM dba_indexes
WHERE index_name='name-of-segment';

- If the segment is a CLUSTER determine which tables it contains.
Eg:
SELECT owner, table_name
FROM dba_tables
WHERE cluster_name='name-of-segment';

- If the segment is a TABLE just note the name and owner.

- If the segment type is none of the above contact Oracle support.

If the table involved is a data dictionary table contact Oracle
support. The database is likely to require recovery.

Evidence>> You should now have recorded:
- The filename of the bad file
- The corrupt object name, owner and type
- Any tables directly associated with this object

(5) Is there a Permanent Corruption ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before deciding on the best action it is important to determine if
there is a permanent corruption on the object or not. The 'ANALYZE'
command can be used to help check this.

Eg: For either an INDEX or TABLE issue the following command
for the relevant 'owner.tablename':

ANALYZE TABLE owner.tablename VALIDATE STRUCTURE CASCADE;

For a cluster issue the command:

ANALYZE CLUSTER owner.clustername VALIDATE STRUCTURE CASCADE;

The possible outcomes are:
ORA 1578 with same arguments as the original error
ORA 1578 with different arguments to the original error
ORA 1498 error
No errors.
Some other error.

Whatever the outcome it is advisable to repeat the ANALYZE command a

second time. This helps show if there is a consistent repeatable
problem (most likely permanent corruption on disk) or if the problem
is either intermittant / random (more likely memory or intermittant
hardware fault). Note that ANALYZE will stop at the first severe
corruption so it is possible that the FILE and BLOCK number reported
as bad will differ from the initial error if there is more than
one corrupt block on the object. This is why it is advisable to
run ANALYZE at least twice.

The next actions depend on the outcome of the 2 (or more) analyze
commands:

ORA 1578 with repeatable arguments:
This is most likely a physically corrupt block.

ORA 1578 with different arguments each time:
Get your hardware vendor to check out the
memory, page space, and IO subsystems before
proceeding.

ORA 1498 error:
This means we have not seen the same corruption
before but the object has an internal corruption
of some form.
Your first step should be to contact the hardware
vendor to check your memory, page space and IO
subsystems for intermittant problems.
In addition you should contact Oracle Support with the
trace files produced by the ANALYZE command. The
content of the files may be completely unrelated to
the original ORA 1578 error.

No errors:
This means we have not seen the same corruption
as originally recorded.
You should still contact the hardware vendor to check
your memory, page space and IO subsystems for
intermittant problems.
It is also advisable to export the problem table as
ANALYZE will not detect all types of error.

Some other error:
Contact Oracle Support.

Evidence>> - Analyze commands entered and resulting error/s (if any).

- Check if there is a trace file in the USER_DUMP_DEST directory
and include this also.

(6) Recovering from Hardware Errors:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you have encountered potential hardware errors on a particular
disk or controller it is advisable to relocate the files.
This article does NOT cover full details of recovering from hardware
errors (Eg: Disk controller returning different blocks to those
requested) but in summary if the file reporting the corruption error

is affected by hardware problems you should relocate it to a good disk

and repeat the tests in section 5.
If you are in ARCHIVELOG mode, as is recommended for production
systems, you should just need to:

OFFLINE the affected data file,
Restore its last backup on a GOOD disk,
Rename the datafile to the NEW location,
Recover the datafile (using the archived logs),
Online the file.

The difficulty here is to know which backup (if any) does not
contain the corruption/s.

If you are NOT in ARCHIVELOG mode you cannot do this. You can offline

the bad file, copy it elsewhere, rename it and online it but this
assumes that a copy command will read the file without error which
may not be true.

If you relocate any files you should now repeat section 5.

(7) A Physical Corruption:
~~~~~~~~~~~~~~~~~~~~~~~~~~
To get to this section you should be encountering repeatable corruptio
n
errors on an ANALYZE ... VALIDATE ... command. You should also have
either moved the file/s off of any suspect hardware or replaced such
hardware if possible.

If the FILE and BLOCK errors from the ANALYZE command are different to

those originally reported substitute them into the SELECT statement in

section (4) above to determine the actual corrupt object.

If this object is a user INDEX you can simply drop and re-create it.

If this object is a TABLE or CLUSTER it must be understood that the
data within the corrupt block is lost. The rest of the table can
usually be salvaged as described in the following sections.

Evidence>>
- Before trying to salvage data from the corrupt table you should
collect the following physical evidence if there is a need to try
and identify the actual cause of the corruption:
i) An operating system HEX dump of the bad block
On UNIX:
dd if=FILENAME bs=DB_BLOCK_SIZE skip= count=3 | od -x
^^^^^^^^ ^^^^^^^^^^^^^ ^^^
Eg:
dd if=ts11.dbf bs=4k skip=1223 count=3 | od -x

On VMS:
DUMP/BLOCKS=(start:XXXX,end:YYYY)/out=file.out dbfile.dbf

Where XXXX=Operating system block number (in 512 byte chunks)
To calculate this multiply the block number reported by
'DB_BLOCK_SIZE/512'.

ii) If you are in ARCHIVELOG mode make a safe copy of the archived
log files around the time of the error, and preferably for a few
hours before the error was reported. These may be required later.

(8) Going back in Time ?
~~~~~~~~~~~~~~~~~~~~~~~~
Whatever sort of block the problem occurred on one possible option
to be considered while looking at alternatives is to recover
the whole database to a point in time BEFORE the corruption appeared.

The difficulty with this option is that it is not always possible
to know when the problem first appeared.

Evidence>> From this section on there are no more Evidence tags as the

focus is on salvaging data. However, you are strongly advised
to keep a track of exactly what you do should things go wrong.

(9) Salvaging data from a table:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before attempting to salvage the rest of the data check:

- Do you have a recent export from which you could rebuild
the table completely ?

- Do you have a backup copy of the database from which
you could create such an export ?

If you need to retrieve data from the table itself you effectively
need to select out the data avoiding the bad block/s.
There are several ways you can achieve this:

- In Oracle 7.1 onwards you can use a ROWID range scan.

- If there is a primary key you can select via this index.

- If there is a non-unique key you can select some of the
data via this index.

- For certain kinds of corrupt block you can set an Oracle
event to SKIP the block. This only applies to one particular
kind of corruption and is unlikely to be of use unless
you are on Oracle7 release 7.2 or higher. This option
is NOT discussed in this article. See <Note:33405.1> for
details.

Each of these is discussed below. Before looking at extracting the
data you need to understand what a 'ROWID' is. This is discussed
first.

(10) ROWID:
~~~~~~~~~~~
Each row in a table in the database can be uniquely referenced by
a 'ROWID' psuedo-column. A rowid is a hexadecimal representation of
of a rows location in the database thus:

BLOCK.ROW-SLOT.FILE

Eg: If we issued SELECT EMPNO, ROWID FROM EMP we would get something
like:

EMPNO ROWID
----- -----
100 00000003.0000.000A
101 00000003.0001.000A
... etc

Hence these 2 rows are in file 0xA which is 10 decimal, block 3.

For any bad block we can make a template ROWID by converting <F> and
from the error into hexadecimal and using the above ROWID format:

Eg: For file 11, block 22 we would have:

File: 11 == B hex
Block: 22 == 16 hex

Hence the rowid's for rows in this block could range from

00000016.0000.000B through to 00000016.7FFF.000B

as the maximum row-slot number is 7FFF hex.

(11-A) Oracle 7.1 Rowid Range Scans:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you are on a release prior to 7.1 see section 11-B.
In Oracle 7.1 onwards you can select data from the base table using a

ROWID range scan hint. This allows you to select AROUND the corrupt
block.

The syntax of the command is simply:

SELECT /*+ ROWID(tablename) */ column list
FROM tablename
WHERE rowid ...

The WHERE clause has to do a ROWID range comparison such as:

WHERE rowid BETWEEN 'low-rowid' and 'high-rowid';

We can use this type of statement several times to select data around

the bad blocks and insert it into a SALVAGE table. Note that when
constructing a where clause we must not reference a ROWID in the
corrupt block directly but use the ROWIDs either side of the corrupt

block.

Eg:
SQL> SELECT count(*) FROM bigemp;
ERROR:
ORA-01578: ORACLE data block corrupted (file # 8, block # 8147)
ORA-01110: data file 8: '/oracle/m4/v714/instance/dbs/usr3714.dbf'

Corrupt File ID: 8 == 8 Hex
Corrupt Block ID: 8147 == 1FD3 Hex

Hence:
First ROWID in the corrupt block: 00001FD3.0000.0008
Last possible ROWID before this block: 00001FD2.7FFF.0008
First ROWID after this block: 00001FD4.0000.0008

So we can:

SQL> CREATE TABLE salvage AS SELECT * FROM bigemp WHERE 1 = 2;

SQL> INSERT INTO salvage
SELECT /*+ ROWID(bigemp) */ * FROM bigemp
WHERE rowid <= '00001FD2.7FFF.0008';

SQL> INSERT INTO salvage
SELECT /*+ ROWID(bigemp) */ * FROM bigemp
WHERE rowid >= '00001FD4.0000.0008';

Note we are using <= and >= with rowid's either side of the corrupt
block.

(11-B) Pulling Data Out Of Tables Through Indexes:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This option is really for use in versions before Oracle 7.1 where ROWI
D
range scans do not exist. Suppose you have the same corrupted table
BIGEMP with columns EMPNO, ENAME and DEPTNO and that you have a
unique index on the EMPNO column. Assuming the index is also not
corrupt and that we are looking at a not null column we can select
data from the table via the index as the index holds the ROWIDs for
each row in the table.

Eg: SELECT rowid, empno FROM bigemp WHERE empno>=0;

ROWID EMPNO
------------------ ----------
00001FC3.000C.0008 177
00001FC3.000D.0008 178
00001FC3.0010.0008 181
00001FC3.0011.0008 182
...

We use EMPNO>=0 to enable the index. This does not error as the
above query is satisfied wholly from the INDEX.

Note that if the index were on a character column we would change
the above where clause to something like WHERE EMPNO > '';

As the index holds the ROWID we can use a filter in the WHERE clause
to identify indexed column values that reside in the bad block. We use

a where clause of the form:
ROWIDTOCHAR( ROWID ) LIKE 'BLOCK.%.FILE'

Eg: Using the previous values for the corrupt block and file:

SQL> SELECT empno from BIGEMP
WHERE EMPNO >= 0
AND ROWIDTOCHAR(ROWID) LIKE '00001FD3.%.0008';

EMPNO
----------
821
822
....
861

If there are only a few rows or they are contiguous values you can
select the data out using a clause like:

WHERE empno <821

Eg: Create a duplicate table:

CREATE TABLE TEMP AS SELECT * FROM BIGEMP WHERE 1 = 2;

and then insert into that table around the corruption:

INSERT INTO TEMP SELECT * FROM BIGEMP WHERE EMPNO < 821;
INSERT INTO TEMP SELECT * FROM BIGEMP WHERE EMPNO > 861;

If there are a lot of rows in the corrupt block OR the indexed values

are not contiguous you need to use a more complex method of extraction
:

Eg: To get a list of key values that we want to extract:

CREATE TABLE pk_table AS
SELECT empno FROM bigemp
WHERE empno >= 0
AND ROWIDTOCHAR(ROWID) NOT LIKE '00001FD3.%.0008';

Now build the destination table:

CREATE TABLE temp AS SELECT * FROM bigemp WHERE 1 = 2;

And insert the data:

INSERT INTO temp SELECT bigemp.* FROM bigemp, pk_table
WHERE bigemp.empno >= 0
AND bigemp.empno = pk_table.empno;

(12) Possible Problems:
~~~~~~~~~~~~~~~~~~~~~~~
Chained Rows:
It is possible for a row to be either "chained" or "migrated".
Such a row spans more than one Oracle block. The 'ROWID' is the
location of the first portion (or 'head') of the row. If there is
a corruption on a block containing a piece of a chained row other
than its 'head' block then you will need to adjust the above
procedures to exclude not only the corrupt block but also any rows
which have part of their row-piece in the corrupted block.

Such rows can be identified by selecting 'ROWID , TABLE.*' from the
table using a relevant WHERE clause to avoid the known corrupt block.

If you get an error then the last row returned by this query should
show the ROWID before the row which has part of its contents in the
corrupt block. You should adjust the relevant query above to avoid
this "chained" or "migrated" row also.

It may require several attempts at building 'salvage' queries before

it is possible to extract all available data without error.

Long Columns:
LONG columns have restrictions in that only certain SQL operations
are allowed on them. You can use the same methods as described in
section 11-A or 11-B to extract the data but you may need to use the
sqlplus 'COPY' command to actually copy the data. Note that sqlplus
COPY has a port specific limit on the maximum size of LONG you can
copy. Refer to the SQLPLUS User Guide and your port specific
documentation to determine if this is feasible or not.

(13) What data have I lost ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you have an index on the corrupt table you can often determine
something about the data you have lost using the same method as
in section 11-B.
Suppose you have the same corrupted table BIGEMP with columns EMPNO,

ENAME and DEPTNO and that you have a unique index on the EMPNO column.

You can select data from the index for the corrupt block thus:

SQL> SELECT empno from BIGEMP
WHERE EMPNO >= 0
AND ROWIDTOCHAR(ROWID) LIKE '00001FD3.%.0008';

Where "EMPNO >= 0" forces an index range scan
and "ROWIDTOCHAR(ROWID) LIKE '00001FD3.%.0008'" restricts the results

to those entries which would fall in the bad block.

You can use this approach on all indexes on the table to get as much
information as possible about the missing data.

如何Load 数据到Oracle table的long raw字段中

Loading image into long raw is tedious.
You have to use sqlldr, and manually calculate something.
Therefore it is recommended to use blob instead.

1) SQL*Loader method (v7 & v8 using LONG RAW)

We need to use the "fix" keyword.  The max record length for long raw
is 64K,
and thus the "fix" option cannot exceed 65535 (=64K).  To load a 198K
image file, we can use the following
control file:

load data
infile '...' "fix 33066"
append concatenate 6 into table ...
(name constant 'test',
picture raw(198396))

This control file means break the 198396 into 6 chunks of size 33066,
and
concatentate them to a signle record.  The numbers must be manually ca
lculated.


2) Using BLOB in v8 ( a more convenient and programmable way)

Following is an example to load the JPEG "/u16/TAR/man.jpg" (~200K)
into database.

create table testlob (id number,   pic blob);
create directory IMG_DIR as '/u16/TAR';

-- The main script to load image
set serveroutput on
declare
image  blob;
ifile  bfile := bfilename ('IMG_DIR', 'man.jpg');
len    number(38);
image_id  number := 1;
begin
dbms_lob.fileopen(ifile); -- open in read only mode

if (dbms_lob.fileisopen(ifile) = 1)
then
   -- initialize blob object record
   insert into testlob values (image_id, empty_blob());
   commit;

   dbms_output.put_line('Image loading start at ' ||
to_char(sysdate,'DD-MON-YYYY HH24:MI:SS'));

   -- load image as internal BLOB
   len := dbms_lob.getlength(ifile);
   select pic into image from testlob where id=image_id for update;
   dbms_lob.loadfromfile(image,ifile,len);
   commit;
   dbms_lob.fileclose(ifile);

   dbms_output.put_line('Image loading complete at ' ||
to_char(sysdate,'DD-MON-YYYY HH24:MI:SS'));

else
   dbms_output.put_line('Cannot read input image');
end if;
end;
/

[关闭][返回]