Hosts gets disconnected from VC and VM goes into not responding state. HA did not move the VMs as expected

Hostname: ESXi.happy.com 

Observations:   
- As per vmksummary, the ESXi was powered off at 24-03-2015 16:33:20 UTC time and booted back at 24-03-2015 17:02:30. 
Version: ESXi Build: 5.1.0 build-914609 



vmksummary.log 2015-03-24T16:00:02Z heartbeat: up 118d2h28m30s, 
40 VMs; [[6936104 vmx 22283044kB] [5776072 vmx 24316588kB] [6181469 vmx 
26300548kB]] [[18131 sfcb-vmware_raw 23%max] [18093 sfcb-vmware_int 38%max] 
[3647013 vmx 73%max]] 

2015-03-24T16:33:20Z bootstop: Host is powering off 

2015-03-24T17:00:01Z heartbeat: up 0d0h18m1s, 0 
VMs; [[17201 localcli 2904kB] [16865 wdog-16866 4268kB] [16866 vmsyslogd 
6720kB]] [[17204 python 0%max] [17205 hostd-probe 0%max] [16898 vobd 1%max]] 

2015-03-24T17:02:30Z bootstop: Host has booted 

2015-03-24T18:00:01Z heartbeat: up 0d1h18m1s, 34 
VMs; [[22220 vmx 8290648kB] [22224 vmx 12580604kB] [22190 vmx 54428644kB]] 
[[18368 sfcb-vmware_raw 5%max] [18362 sfcb-vmware_bas 16%max] [18374 sfcb-pycim 
18%max]]   


- Further we notice that ESXi IP address is being conflicted by another mac's IP address : 00:25:b5:2a:2a:7d constantly as is evident in vmkernel,vobd and hostd logs.   

vmkernel.log   
2015-03-24T16:24:48.849Z cpu7:4416205)Tcpip_Vmk: 
112: arp: 00:25:b5:2a:2a:7d is using my IP address 10.101.1.60 on vmk0! 

2015-03-24T16:24:48.849Z cpu7:4416205)Tcpip_Vmk: 
112: arp: 00:25:b5:2a:2a:7d is using my IP address 10.101.1.60 on vmk0! 


vobd.log   
2015-03-24T16:24:48.849Z: [netCorrelator] 10205599890523us: [vob.net.vmknic.ip.duplicate] A duplicate IP address was detected for 10.101.1.60 on interface vmk0. The current owner is 00:25:b5:2a:2a:7d. 

2015-03-24T16:24:48.849Z: [netCorrelator] 10205673694189us: [esx.problem.net.vmknic.ip.duplicate] Duplicate IP address 
detected for 10.101.1.60 on interface vmk0, current owner being 00:25:b5:2a:2a:7d. 

2015-03-24T16:24:48.849Z: [netCorrelator] 10205599890555us: [vob.net.vmknic.ip.duplicate] A duplicate IP address was detected for 10.101.1.60 on interface vmk0. The current owner is 
00:25:b5:2a:2a:7d. 

2015-03-24T16:24:48.849Z: [netCorrelator] 10205673694471us: [esx.problem.net.vmknic.ip.duplicate] Duplicate IP address 
detected for 10.101.1.60 on interface vmk0, current owner being 
00:25:b5:2a:2a:7d. 


hostd.log   
2015-03-24T16:24:48.907Z [46F80B90 info 
'ha-eventmgr'] Event 55540 : A duplicate IP address was detected for 
10.101.1.60 on the interface vmk0. The current owner is 00:25:b5:2a:2a:7d. 

2015-03-24T16:24:50.356Z [3D0C0B90 info 
'VmkVprobSource'] VmkVprobSource::Post event: (vim.event.EventEx) { 

-->    dynamicType = <unset>, 
-->    key = 1108541272
, 
-->    chainId = 0, 
-->    createdTime = "1970-01-01T00:00:00Z", 
-->    userName = "", 
-->    datacenter = (vim.event.DatacenterEventArgument) null, 
-->    computeResource = (vim.event.ComputeResourceEventArgument) null, 
-->    host = (vim.event.HostEventArgument) { 
-->       dynamicType = <unset>, 
-->       name = "ESXI-A.ROOT.LOCAL", 

  
Due to IP conflict issue and the other mac being the owner of IP address 10.27.68.53, Host lost connectivity and therefore ESXi was declared HA Isolated.   

Chronology of HA events are as below:   

fdm.log 
  
- Host was declared isolated by the Cluster since it failed to contact the other ESXi hosts and gateway. 
  
2015-03-24T16:25:13.594Z [FFE81B90 verbose 'Cluster' opID=SWI-1851eefc] [ClusterManagerImpl::CheckHostNetworkIsolation] Waited 5 seconds for isolation icmp ping reply. Isolated 

2015-03-24T16:25:13.594Z [FFE81B90 info 'Policy' opID=SWI-1851eefc] [LocalIsolationPolicy::Handle(IsolationNotification)] host isolated is true 

2015-03-24T16:25:13.595Z [FFE81B90 info 'Policy' opID=SWI-1851eefc] [LocalIsolationPolicy::Handle(IsolationNotification)] Disabling execution of isolation policy by 30 seconds. 

2015-03-24T16:25:14.585Z [71FA8B90 verbose 'Election' opID=SWI-1728bb5e] [ClusterElection::MasterStateFunc]Am isolated! Dropping to STARTUP!   


At this point of time, the isolation response policy is checked and based on that the VMs are selected for trigger which is shutdown as per HA settings in existing environment.   

2015-03-24T16:26:18.641Z [46F80B90 info 'TaskManager' opID=SWI-48ebf0e7] Task Created : 
haTask-1154-vim.VirtualMachine.shutdownGuest-538802824


Total number of VMs which were covered for host isolation for shutting down: 

cat fdm.log |  grep -i  'Isolation response for VM'  | wc -l   

All the VMs were resides in the esxi host are powered off based on the HA isolation response policy.   

2015-03-24T16:28:29.865Z [45E80B90 info 'ha-eventmgr'] Event 55798 : VM-A on ESXI-A.ROOT.LOCAL in ha-datacenter is powered off

2015-03-24T16:28:32.447Z [3D07FB90 info 'ha-eventmgr'] Event 55799 : VM-B 
on ESXI-A.ROOT.LOCAL in ha-datacenter is powered off
  
Total number of VMs which got shutdown by HA due to host isolation: 
  
zgrep -c "ESXI-A.ROOT.LOCAL in ha-datacenter is powered off" hostd.0.gz 

- A manual power off ESXi was done at 24-03-2015 16:33:20.  
 

hostd.log 

2015-03-24T16:33:16.629Z [3F640B90 verbose 'SoapAdapter'] Responded to service state request 
2015-03-24T16:33:20.478Z [46F80B90 info 'ha-host'] Recvd. ACPI power event from the vmkernel 
2015-03-24T16:33:20.537Z [3D101B90 info 'Vmomi'] Result: 
--> (vim.fault.NotAuthenticated) { 
-->    dynamicType = <unset>, 
-->    faultCause = (vmodl.MethodFault) null, 
-->    object = 'vim.HostSystem:ha-host', 
-->    privilegeId = "Host.Config.Maintenance", 
-->    msg = "", 
--> } 

2015-03-24T16:33:20.538Z [46F80B90 error 'ha-host'] Unknown exception during shutdown 
2015-03-24T16:33:20.539Z [46F80B90 info 'SysCommandPosix'] ForkExec(/sbin/poweroff)  7094218 

  
Recommendations:
 
  Check the mac address 00:25:b5:2a:2a:7d that caused the IP conflict on ESXi vmk0.

Popular posts from this blog

Part - 1 : Windows Administrator: L1: Interview question & Answer for AD, DNS, DHCP, WINS & DFS

Windows: Interview Q & A: L1 & L2 Interview question

How to create a Bootable ESXi Installer using USB Flash Drive