11g RAC:由于VMware快照引起的数据库无法启动
在我的环境中,有一个RHEL上的11g RAC环境,在做出进一步修改之前,我曾经在集群、数据库状态正常的时候,做了一个VMware的快照,然而,当我恢复该快照之后,数据库异常了。
快照是这样的:
两个节点都无法启动实例:
节点一的情况是这样的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
[root@rac1 ~]# su - oracle [oracle@rac1 ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Sat Apr 9 20:24:30 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SQL> startup ORACLE instance started. Total System Global Area 1603411968 bytes Fixed Size 2253664 bytes Variable Size 1056967840 bytes Database Buffers 536870912 bytes Redo Buffers 7319552 bytes Database mounted. ORA-03113: end-of-file on communication channel Process ID: 37265 Session ID: 1 Serial number: 5 SQL> |
节点二的情况是这样的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
[oracle@rac2 ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Sat Apr 9 20:22:39 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SQL> startup ORACLE instance started. Total System Global Area 1603411968 bytes Fixed Size 2253664 bytes Variable Size 1056967840 bytes Database Buffers 536870912 bytes Redo Buffers 7319552 bytes Database mounted. ORA-16038: log 1 sequence# 21 cannot be archived ORA-00742: Log read detects lost write in thread %d sequence %d block %d ORA-00312: online log 1 thread 1: '+DATA01/sara/onlinelog/group_1.261.907768799' SQL> |
关于上述两个报错,alert日志的呈现是这样的:
节点一:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
Sat Apr 09 20:32:48 2016 alter database open This instance was first to open Picked broadcast on commit scheme to generate SCNs Sat Apr 09 20:32:48 2016 LGWR: STARTING ARCH PROCESSES Sat Apr 09 20:32:48 2016 ARC0 started with pid=29, OS id=38720 ARC0: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES Sat Apr 09 20:32:49 2016 ARC1 started with pid=33, OS id=38729 Sat Apr 09 20:32:50 2016 ARC2 started with pid=35, OS id=38731 ARC1: Archival started ARC2: Archival started ARC1: Becoming the 'no FAL' ARCH ARC1: Becoming the 'no SRL' ARCH ARC2: Becoming the heartbeat ARCH Sat Apr 09 20:32:50 2016 ARC3 started with pid=36, OS id=38733 ARCH: All Archive destinations made inactive due to error 742 ARCH: Closing local archive destination LOG_ARCHIVE_DEST_1: '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_21_907768799.dbf' (error 742) (sara1) Committing creation of archivelog '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_21_907768799.dbf' (error 742) ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE Errors in file /u01/app/oracle/diag/rdbms/sara/sara1/trace/sara1_ora_37978.trc: ORA-16038: log 1 sequence# 21 cannot be archived ORA-00742: Log read detects lost write in thread %d sequence %d block %d ORA-00312: online log 1 thread 1: '+DATA01/sara/onlinelog/group_1.261.907768799' Sat Apr 09 20:32:51 2016 System state dump requested by (instance=1, osid=37978), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/sara/sara1/trace/sara1_diag_37898_20160409203251.trc USER (ospid: 37978): terminating the instance due to error 16038 Dumping diagnostic data in directory=[cdmp_20160409203251], requested by (instance=1, osid=37978), summary=[abnormal instance termination]. Instance terminated by USER, pid = 37978 |
节点二:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
Sat Apr 09 20:31:26 2016 alter database open This instance was first to open Picked broadcast on commit scheme to generate SCNs ARCH: STARTING ARCH PROCESSES Sat Apr 09 20:31:27 2016 ARC0 started with pid=34, OS id=39160 ARC0: Archival started ARCH: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES Sat Apr 09 20:31:28 2016 ARC1 started with pid=35, OS id=39187 Sat Apr 09 20:31:28 2016 ARC2 started with pid=36, OS id=39189 ARC1: Archival started ARC2: Archival started ARC1: Becoming the 'no FAL' ARCH ARC1: Becoming the 'no SRL' ARCH ARC2: Becoming the heartbeat ARCH Sat Apr 09 20:31:28 2016 ARC3 started with pid=37, OS id=39192 ARCH: All Archive destinations made inactive due to error 742 ARCH: Closing local archive destination LOG_ARCHIVE_DEST_1: '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_21_907768799.dbf' (error 742) (sara2) Committing creation of archivelog '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_21_907768799.dbf' (error 742) ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE Abort recovery for domain 0 Errors in file /u01/app/oracle/diag/rdbms/sara/sara2/trace/sara2_ora_39031.trc: ORA-16038: log 1 sequence# 21 cannot be archived ORA-00742: Log read detects lost write in thread %d sequence %d block %d ORA-00312: online log 1 thread 1: '+DATA01/sara/onlinelog/group_1.261.907768799' ORA-16038 signalled during: alter database open... ARC0: All Archive destinations made inactive due to error 742 ARC0: Closing local archive destination LOG_ARCHIVE_DEST_1: '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_21_907768799.dbf' (error 742) (sara2) Committing creation of archivelog '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_21_907768799.dbf' (error 742) ARCH: Archival error occurred on a closed thread. Archiver continuing ORACLE Instance sara2 - Archival Error. Archiver continuing. |
从上面的日志可以看到,出问题的是“arch1,21”。
所以,关闭所有的节点的数据库实例,并仅在一个节点上,启动实例到mount阶段,查看关于联机日志(Redo Log)的更详细内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
[oracle@rac1 ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Sat Apr 9 21:10:08 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SQL> startup mount; ORACLE instance started. Total System Global Area 1603411968 bytes Fixed Size 2253664 bytes Variable Size 1056967840 bytes Database Buffers 536870912 bytes Redo Buffers 7319552 bytes Database mounted. SQL> SQL> set linesize 500 SQL> SQL> select group#,thread#,sequence#,members,bytes/1024/1024 "MB",status,archived from v$log; GROUP# THREAD# SEQUENCE# MEMBERS MB STATUS ARC ---------- ---------- ---------- ---------- ---------- ---------------- --- 1 1 21 1 50 INACTIVE NO 2 1 22 1 50 CURRENT NO 3 2 13 1 50 CURRENT NO 4 2 12 1 50 INACTIVE YES SQL> SQL> col member for a50 SQL> select * from gv$logfile order by inst_id,group#; INST_ID GROUP# STATUS TYPE MEMBER IS_ ---------- ---------- ------- ------- -------------------------------------------------- --- 1 1 ONLINE +DATA01/sara/onlinelog/group_1.261.907768799 NO 1 2 ONLINE +DATA01/sara/onlinelog/group_2.262.907768803 NO 1 3 ONLINE +DATA01/sara/onlinelog/group_3.266.907769255 NO 1 4 ONLINE +DATA01/sara/onlinelog/group_4.267.907769257 NO 2 1 ONLINE +DATA01/sara/onlinelog/group_1.261.907768799 NO 2 2 ONLINE +DATA01/sara/onlinelog/group_2.262.907768803 NO 2 3 ONLINE +DATA01/sara/onlinelog/group_3.266.907769255 NO 2 4 ONLINE +DATA01/sara/onlinelog/group_4.267.907769257 NO 8 rows selected. SQL> SQL> select * from gv$logfile where member like '%907768799' order by inst_id,group#; INST_ID GROUP# STATUS TYPE MEMBER IS_ ---------- ---------- ------- ------- -------------------------------------------------- --- 1 1 ONLINE +DATA01/sara/onlinelog/group_1.261.907768799 NO 2 1 ONLINE +DATA01/sara/onlinelog/group_1.261.907768799 NO SQL> |
清理联机日志文件:
1 2 3 4 5 6 7 8 9 10 |
SQL> alter database clear unarchived logfile group 1; Database altered. SQL> SQL> alter database clear logfile group 1; Database altered. SQL> |
这个过程的alert日志:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Sat Apr 09 21:22:04 2016 alter database clear unarchived logfile group 1 WARNING! CLEARING REDO LOG WHICH HAS NOT BEEN ARCHIVED. BACKUPS TAKEN BEFORE 04/04/2016 01:43:46 (CHANGE 1507859) CANNOT BE USED FOR RECOVERY. Clearing online log 1 of thread 1 sequence number 21 Completed: alter database clear unarchived logfile group 1 Sat Apr 09 21:23:45 2016 alter database clear logfile group 1 Clearing online log 1 of thread 1 sequence number 0 Completed: alter database clear logfile group 1 |
然后,再次查看REDO LOG的状态:
1 2 3 4 5 6 7 8 9 10 |
SQL> select group#,thread#,sequence#,members,bytes/1024/1024 "MB",status,archived from v$log; GROUP# THREAD# SEQUENCE# MEMBERS MB STATUS ARC ---------- ---------- ---------- ---------- ---------- ---------------- --- 1 1 0 1 50 UNUSED YES 2 1 22 1 50 CURRENT NO 3 2 13 1 50 CURRENT NO 4 2 12 1 50 INACTIVE YES SQL> |
这样,数据库就可以启动(OPEN)了:
1 2 3 4 5 6 7 8 9 10 11 12 |
SQL> alter database open; Database altered. SQL> SQL> select instance_name,status from v$instance; INSTANCE_NAME STATUS ---------------- ------------ sara1 OPEN SQL> |
这时候,节点二应该也正常了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
SQL> select instance_name,status from v$instance; INSTANCE_NAME STATUS ---------------- ------------ sara2 MOUNTED SQL> SQL> alter database open; Database altered. SQL> SQL> select instance_name,status from v$instance; INSTANCE_NAME STATUS ---------------- ------------ sara2 OPEN SQL> |
最后,本文中的“alter database clear logfile group 1;”,如果是生产系统,请慎重。
——————————————————————————
Done。