So in the Homelab life each weekend is normally the “Fix it” time. If I can get time outside of the weekend I’ll try to get to it, because my OCD just drives me nuts seeing that red. Usually though it falls on the weekend.
This weekend the errors I looked at are the following:
PostgresSQL Archiver error
So for this I did some research in VMTN and also a lot of other blogs etc. The key points were what I found in the vCenter Log here:
In /var/log/vmware/vpostgres/pg_archiver.log-[n].stderr
, you see error similar to:
2018-05-22T10:27:36.133Z ERROR pg_archiver could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
So this KB points to an HA or Timeout error, and I was seeing these errors, but the issue was the Watchdog service. After some more google-fu I came across Magander3’s Post Which all credit goes to for this fix. Basically here is what I did:
On each ESXI host in your cluster login via SSH and run the following command, “/etc/init.d/sfcbd-watchdog status
” This should return “sfcbd is not running"
- If this is the case try “
/etc/init.d/sfcbd-watchdog start
” just to see if that will help. - If it starts great! Run the start command on each ESXI host.
- If not run “
esxcli system wbem get
” and you should get a fair amount of data, but your looking for “Enabled:false” - If you see that run “
esxcli system wbem set –enable true
” There are two “-” before enable. This should show “Enabled:true” - Run the previous “
wbem get
” command And you should see “Enabled:true” now. - Run your “
/etc/init.d/sfcbd-watchdog status
” again and it should show true.
After this ran on each ESXI host the error service started on vCenter and I could reset it.
PSC Service Health Alarm
This was a poser, and there are still a lot of different PSC errors in VMTN for 6.7u2. Well, I did a vCenter upgrade, some other house cleaning to see if it would clear this error, Then I found the following KB which basically tells you to sync the time across ESXI hosts and vCenter.
But the issue for me was how the sync was setup. I set everything to the same NTP server and the same time zone, but nothing worked. For some reason the PSC service just still would not work.
I fixed it by turning off NTP on the vCenter server and setting it to sync to the host. Once I set that up, BOOM it started.
Hope this helps someone, Sorry for the lack of pictures as I know those help. Have a good Monday!