Home » Server Options » RAC & Failsafe » Cluster Installation Fails
icon4.gif  Cluster Installation Fails [message #603735] Thu, 19 December 2013 00:13 Go to next message
burasami
Messages: 20
Registered: April 2010
Junior Member
Hi All,

I have started the fresh installation

at one stage installation asked to run root.sh but it started to through following error.

INFO: /opt/app/oracle/product/10.2.0/db_1/root.sh #On nodes rubikon120,rubikon121  
INFO: To execute the configuration scripts:  
    1. Open a terminal window  
    2. Log in as "root"  
    3. Run the scripts in each cluster node  
    4. Return to this window and click "OK" to continue  
Note: Do not run the scripts simultaneously on the listed nodes.  
INFO: Starting to execute configuration assistants  
INFO: Command = /opt/app/oracle/product/10.2.0/db_1/bin/racgons add_config rubikon120.xxx.com.yyy:6200  rubikon121.xxx.com.yyy:6200   
Command = /opt/app/oracle/product/10.2.0/db_1/bin/racgons has failed   
 Execution Error : WARNING: rubikon120.xxx.com.yyy:6200 already configured.  
WARNING: rubikon121.xxx.com.yyy:6200 already configured.  
INFO: Configuration assistant "Oracle Notification Server Configuration Assistant" failed   
-----------------------------------------------------------------------------  
*** Starting OUICA ***  
Oracle Home set to /opt/app/oracle/product/10.2.0/db_1  
Configuration directory is set to /opt/app/oracle/product/10.2.0/db_1/cfgtoollogs. All xml files under the directory will be processed  
INFO: The "/opt/app/oracle/product/10.2.0/db_1/cfgtoollogs/configToolFailedCommands" script contains all commands that failed, were skipped or were cancelled. This file may be used to run these configuration assistants outside of OUI. Note that you may have to update this script with passwords (if any) before executing the same.  
-----------------------------------------------------------------------------  
SEVERE: OUI-25031:Some of the configuration assistants failed. It is strongly recommended that you retry the configuration assistants at this time. Not successfully running any "Recommended" assistants means your system will not be correctly configured.  
1. Check the Details panel on the Configuration Assistant Screen to see the errors resulting in the failures.  
2. Fix the errors causing these failures.  
3. Select the failed assistants and click the 'Retry' button to retry them.  
INFO: User Selected: Yes/OK  
INFO: Starting to execute configuration assistants  
INFO: Command = /opt/app/oracle/product/10.2.0/db_1/bin/racgons add_config rubikon120.xxx.com.yyy:6200  rubikon121.xxx.com.yyy:6200   
Command = /opt/app/oracle/product/10.2.0/db_1/bin/racgons has failed   
 Execution Error : WARNING: rubikon120.xxx.com.yyy:6200 already configured.  
WARNING: rubikon121.xxx.com.yyy:6200 already configured.  



I stopped the installation step at middle and searched in google and try with several option from web
such as check voting and ocr permission

but the files are in correct permission.


OCR has root:oinstall rw-r----

Voting disk has oracle:oinstall rw-r----

after that i have run the following command


oracle@rubikon120:~/software/clusterware/clusterware/cluvfy> ./runcluvfy.sh stage -post crsinst -n rubikon120,rubikon121 -verbose  
Performing post-checks for cluster services setup  
Checking node reachability...  
Check: Node reachability from node "rubikon120"  
  Destination Node                      Reachable?  
  ------------------------------------  ------------------------  
  rubikon121                            yes  
  rubikon120                            yes  
Result: Node reachability check passed from node "rubikon120".  
Checking user equivalence...  
Check: User equivalence for user "oracle"  
  Node Name                             Comment  
  ------------------------------------  ------------------------  
  rubikon121                            passed  
  rubikon120                            passed  
Result: User equivalence check passed for user "oracle".  
Checking Cluster manager integrity...  
Checking CSS daemon...  
  Node Name                             Status  
  ------------------------------------  ------------------------  
  rubikon121                            running  
  rubikon120                            running  
Result: Daemon status check passed for "CSS daemon".  
Cluster manager integrity check passed.  
Checking cluster integrity...  
  Node Name  
  ------------------------------------  
  rubikon120  
  rubikon121  
Cluster integrity check failed. This check did not run on the following nodes(s):  
        rubikon121  
Checking OCR integrity...  
Checking the absence of a non-clustered configuration...  
WARNING:  
CSS is probably working with a non-clustered, local-only configuration on nodes:  
        rubikon121  
Verification will proceed with nodes:  
        rubikon120  
Uniqueness check for OCR device passed.  
Checking the version of OCR...  
OCR of correct Version "2" exists.  
Checking data integrity of OCR...  
Data integrity check for OCR passed.  
OCR integrity check failed.  
Checking CRS integrity...  
Checking daemon liveness...  
Check: Liveness for "CRS daemon"  
  Node Name                             Running  
  ------------------------------------  ------------------------  
  rubikon121                            no  
  rubikon120                            no  
Result: Liveness check failed for "CRS daemon".  
Checking daemon liveness...  
Check: Liveness for "CSS daemon"  
  Node Name                             Running  
  ------------------------------------  ------------------------  
  rubikon121                            yes  
  rubikon120                            yes  
Result: Liveness check passed for "CSS daemon".  
Checking daemon liveness...  
Check: Liveness for "EVM daemon"  
  Node Name                             Running  
  ------------------------------------  ------------------------  
  rubikon121                            yes  
  rubikon120                            no  
Result: Liveness check failed for "EVM daemon".  
Liveness of all the daemons  
  Node Name     CRS daemon                CSS daemon                EVM daemon  
  ------------  ------------------------  ------------------------  ----------  
  rubikon121    no                        yes                       yes  
  rubikon120    no                        yes                       no  
CRS integrity check failed.  
Checking node application existence...  
Checking existence of VIP node application  
  Node Name     Required                  Status                    Comment  
  ------------  ------------------------  ------------------------  ----------  
  rubikon121    yes                       unknown                   failed  
  rubikon120    yes                       unknown                   failed  
Result: Check failed.  
Checking existence of ONS node application  
  Node Name     Required                  Status                    Comment  
  ------------  ------------------------  ------------------------  ----------  
  rubikon121    no                        unknown                   ignored  
  rubikon120    no                        unknown                   ignored  
Result: Check ignored.  
Checking existence of GSD node application  
  Node Name     Required                  Status                    Comment  
  ------------  ------------------------  ------------------------  ----------  
  rubikon121    no                        unknown                   ignored  
  rubikon120    no                        unknown                   ignored  
Result: Check ignored.  
Post-check for cluster services setup was unsuccessful on all the nodes.  


now am freeze without other option to go ahead. Kindly help me out to resolve this RAC installation

Thanks & Regards
Sami
Re: Cluster Installation Fails [message #603743 is a reply to message #603735] Thu, 19 December 2013 01:30 Go to previous messageGo to next message
trantuananh24hg
Messages: 744
Registered: January 2007
Location: Ha Noi, Viet Nam
Senior Member
First time, please post more information:

- What's platform type?
- What's shared-storage type?
- Listing Network configure (/etc/hosts; /etc/hostname.NIC for Solaris; /etc/network-config for Linux)
- Are you using multipathing? If yes, which is kind of multipathing, shared-storage, bounce network or both of them? With bounce network, what's type of multipathing? Active-Active or Active-Passive?
- Listing of devices for OCR and voting disk? In Solaris, they're might be slice, in Linux, they're might be slice using raw-binding or not.
- Are you using mknode?
- At last, please post content from error log here
Re: Cluster Installation Fails [message #603760 is a reply to message #603743] Thu, 19 December 2013 03:13 Go to previous messageGo to next message
burasami
Messages: 20
Registered: April 2010
Junior Member
Hi All,

Thanks for your reply.

What's platform type? 


Suse Enterprise Edition 10

What's shared-storage type?


OCFS2

Are you using mknode? 


mkfs.ocfs2

shared-storage, bounce network or both of them? With bounce network,


oracle@rubikon120:/> ssh rubikon121 date
Thu Dec 19 14:49:16 NPT 2013
oracle@rubikon120:/>
[02:35:36 PM] sn: oracle@rubikon121:~>  ssh rubikon120 date
Thu Dec 19 14:50:02 NPT 2013
oracle@rubikon121:~>



ossd log


[    CSSD]2013-12-16 15:10:19.477 [1199630656] >TRACE:   clssgmReconfigThread:  completed for reconfig(1), with status(1)
[    CSSD]2013-12-16 15:10:19.588 [1140881728] >TRACE:   clssgmClientConnectMsg: Connect from con(0x2aaaaad22250) proc(0x2aaaaad271c0) pid() proto(10:2:1:1)
[    CSSD]2013-12-16 15:10:19.589 [1140881728] >TRACE:   clssgmClientConnectMsg: Connect from con(0x2aaaaad27bf0) proc(0x2aaaaad2a2d0) pid() proto(10:2:1:1)
[    CSSD]2013-12-16 15:10:19.589 [1140881728] >TRACE:   clssgmClientConnectMsg: Connect from con(0x2aaaaad24db0) proc(0x2aaaaad2a000) pid() proto(10:2:1:1)
[    CSSD]2013-12-16 15:10:20.393 [1191237952] >TRACE:   clssnmWaitForAcks: done, msg type(15)
[    CSSD]2013-12-16 15:10:20.393 [1191237952] >TRACE:   clssnmDoSyncUpdate: Sync Complete!
[    CSSD]2013-12-16 15:10:20.393 [1132489024] >TRACE:   clssnmSendFatalOn: req to syncLeader(1)
[    CSSD]2013-12-16 15:10:20.414 [1124096320] >TRACE:   clssnmFatalThread: Fatal mode enabled
[    CSSD]2013-12-16 15:13:27.239 [1107310912] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(1) LATS(5101196) Disk lastSeqNo(1)
[    CSSD]2013-12-16 15:13:28.825 [1132489024] >TRACE:   clssnmConnComplete: connected to node 2 (con 0x752c20), state 1 birth 0, unique 1387186106/1387186106  prevConuni(0)
[    CSSD]2013-12-16 15:13:28.949 [1191237952] >TRACE:   clssnmDoSyncUpdate: Initiating sync 2
[    CSSD]2013-12-16 15:13:28.949 [1191237952] >TRACE:   clssnmSetupAckWait: Ack message type (11) 
[    CSSD]2013-12-16 15:13:28.949 [1191237952] >TRACE:   clssnmSetupAckWait: node(1) is ALIVE
[    CSSD]2013-12-16 15:13:28.949 [1191237952] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE
[    CSSD]2013-12-16 15:13:28.949 [1191237952] >TRACE:   clssnmSendSync: syncSeqNo(2)
[    CSSD]2013-12-16 15:13:28.949 [1132489024] >TRACE:   clssnmHandleSync: Acknowledging sync: src[1] srcName[rubikon120] seq[6] sync[2]
[    CSSD]2013-12-16 15:13:28.949 [1191237952] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(2)
[    CSSD]2013-12-16 15:13:29.020 [2131051872] >USER:    NMEVENT_SUSPEND [00][00][00][02]
[    CSSD]2013-12-16 15:13:29.953 [1191237952] >TRACE:   clssnmWaitForAcks: done, msg type(11)
[    CSSD]2013-12-16 15:13:29.953 [1191237952] >TRACE:   clssnmDoSyncUpdate: node(0) missCount(193) state(0)
[    CSSD]2013-12-16 15:13:29.953 [1191237952] >TRACE:   clssnmDoSyncUpdate: node(2) is transitioning from joining state to active state
[    CSSD]2013-12-16 15:13:29.953 [1191237952] >TRACE:   clssnmSetupAckWait: Ack message type (13) 
[    CSSD]2013-12-16 15:13:29.953 [1191237952] >TRACE:   clssnmSetupAckWait: node(1) is ACTIVE
[    CSSD]2013-12-16 15:13:29.953 [1191237952] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE
[    CSSD]2013-12-16 15:13:29.953 [1191237952] >TRACE:   clssnmSendVote: syncSeqNo(2)
[    CSSD]2013-12-16 15:13:29.953 [1132489024] >TRACE:   clssnmSendVoteInfo: node(1) syncSeqNo(2)
[    CSSD]2013-12-16 15:13:29.953 [1191237952] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)
[    CSSD]2013-12-16 15:13:30.956 [1191237952] >TRACE:   clssnmWaitForAcks: done, msg type(13)
[    CSSD]2013-12-16 15:13:30.957 [1191237952] >TRACE:   clssnmCheckDskInfo: Checking disk info...
[    CSSD]2013-12-16 15:13:31.961 [1191237952] >TRACE:   clssnmEvict: Start
[    CSSD]2013-12-16 15:13:31.961 [1191237952] >TRACE:   clssnmWaitOnEvictions: Start
[    CSSD]2013-12-16 15:13:31.961 [1191237952] >TRACE:   clssnmWaitOnEvictions: Node(0) down, LATS(0),timeout(5105916)
[    CSSD]2013-12-16 15:13:31.961 [1191237952] >TRACE:   clssnmSetupAckWait: Ack message type (15) 
[    CSSD]2013-12-16 15:13:31.961 [1191237952] >TRACE:   clssnmSetupAckWait: node(1) is ACTIVE
[    CSSD]2013-12-16 15:13:31.961 [1191237952] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE
[    CSSD]2013-12-16 15:13:31.961 [1191237952] >TRACE:   clssnmSendUpdate: syncSeqNo(2)
[    CSSD]2013-12-16 15:13:31.961 [1132489024] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2013-12-16 15:13:31.961 [1132489024] >TRACE:   clssnmDeactivateNode: node 0 () left cluster

 CSSD]2013-12-18 02:19:06.104 [1174452544] >TRACE:   clssnmPollingThread: node rubikon121 (2) missed(2) checkin(s)
[    CSSD]2013-12-18 02:19:08.112 [1174452544] >TRACE:   clssnmPollingThread: node rubikon121 (2) missed(2) checkin(s)
[    CSSD]2013-12-18 11:59:52.844 [1174452544] >TRACE:   clssnmPollingThread: node rubikon121 (2) missed(2) checkin(s)


alert log from node 1

2013-12-16 15:10:16.273
[cssd(23486)]CRS-1605:CSSD voting file is online: /oracrsfiles/oracrs/vote.crs. Details in /opt/app/oracle/product/10.2.0/db_1/log/rubikon120/cssd/ocssd.log.
2013-12-16 15:10:19.477
[cssd(23486)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rubikon120 .
2013-12-16 15:10:20.832
[crsd(23116)]CRS-1012:The OCR service started on node rubikon120.
2013-12-16 15:10:20.898
[evmd(23359)]CRS-1401:EVMD started on node rubikon120.
2013-12-16 15:10:21.277
[crsd(23116)]CRS-1201:CRSD started on node rubikon120.
2013-12-16 15:13:32.060
[cssd(23486)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rubikon120 rubikon121 .
2013-12-16 17:33:13.782
[evmd(11079)]CRS-1401:EVMD started on node rubikon120.
2013-12-16 17:33:14.333
[crsd(10814)]CRS-1012:The OCR service started on node rubikon120.
2013-12-16 17:41:19.945
[cssd(16116)]CRS-1605:CSSD voting file is online: /oracrsfiles/oracrs/vote.crs. Details in /opt/app/oracle/product/10.2.0/db_1/log/rubikon120/cssd/ocssd.log.



OCFS2 status from Node 1

rubikon120:/ #  /etc/init.d/o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold = 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Active
rubikon120:/ #


OCFS2 status from Node 2

rubikon121:/ # /etc/init.d/o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold = 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Active
rubikon121:/ #



Thanks & Regards
Sami
Re: Cluster Installation Fails [message #603857 is a reply to message #603760] Fri, 20 December 2013 00:06 Go to previous message
burasami
Messages: 20
Registered: April 2010
Junior Member
Hi All,

Kindly let me know how to go further..

Thanks & Regards
Sami
Previous Topic: Difference between activeversion, releaseversion and softwareversion
Next Topic: Oracle RAC installation
Goto Forum:
  


Current Time: Thu Mar 28 16:27:45 CDT 2024