hiho guys
I have a two node cluster setup with drbd storage mirroring ad iscsi-ha managing all of it. ha-lizard is doing great. Alas - although everything is neat and shiny, there are tiny issues that once a week make me tremble and sweat.
Today - and the week before also, we had in our log "check_ip_health: [10.10.10.3] response = FAIL The server doesn't show any problems so far, but all VMs (Linux Ubuntu) on it have a storage error, and the rioot system switched to read only.
iscsi-ha status shows no drbd problem, all update/update, storages in perfect sync.
After trying to fsck the root partition, there is an I/O Error, which prohibits any successful restore of the virtual machine.
Reboot doesn't deliver any result, it simply hangs
The next step was to reboot the Host, where the crashed VM was...
Trying to connect scp-center to the pool on the OTHER yet working host, didn't succeed, it didn't claim itself as master, thus the pool stayed in an undefined state.
After reboot of the host 2 (where the crashed vm was residing), didn't return back into the pool automatically. After logging on, I found out that the services ha-lizard and iscsi-ha werent started. (FAILED, not OK)
Restart of both of it succeeded, then the Cluster war back again, the crashed VM booted again and were living.
I would like to know, how can I make these severs failsafe, so that all services boot successfully? Any idea. I cannot go online and retart after boot the ha-services,
Any Ideas, what I can check, to find out the configuration error?
Kind regards
Christoph
I have a two node cluster setup with drbd storage mirroring ad iscsi-ha managing all of it. ha-lizard is doing great. Alas - although everything is neat and shiny, there are tiny issues that once a week make me tremble and sweat.
Today - and the week before also, we had in our log "check_ip_health: [10.10.10.3] response = FAIL The server doesn't show any problems so far, but all VMs (Linux Ubuntu) on it have a storage error, and the rioot system switched to read only.
iscsi-ha status shows no drbd problem, all update/update, storages in perfect sync.
After trying to fsck the root partition, there is an I/O Error, which prohibits any successful restore of the virtual machine.
Reboot doesn't deliver any result, it simply hangs
The next step was to reboot the Host, where the crashed VM was...
Trying to connect scp-center to the pool on the OTHER yet working host, didn't succeed, it didn't claim itself as master, thus the pool stayed in an undefined state.
After reboot of the host 2 (where the crashed vm was residing), didn't return back into the pool automatically. After logging on, I found out that the services ha-lizard and iscsi-ha werent started. (FAILED, not OK)
Restart of both of it succeeded, then the Cluster war back again, the crashed VM booted again and were living.
I would like to know, how can I make these severs failsafe, so that all services boot successfully? Any idea. I cannot go online and retart after boot the ha-services,
Any Ideas, what I can check, to find out the configuration error?
Kind regards
Christoph