Weekend vCenter Error Fixing

So in the Homelab life each weekend is normally the “Fix it” time. If I can get time outside of the weekend I’ll try to get to it, because my OCD just drives me nuts seeing that red. Usually though it falls on the weekend.

This weekend the errors I looked at are the following:Screen Shot 2020-01-02 at 2.40.44 PM

PostgresSQL Archiver error

So for this I did some research in VMTN and also a lot of other blogs etc. The key points were what I found in the vCenter Log here:

In /var/log/vmware/vpostgres/pg_archiver.log-[n].stderr, you see error similar to:

2018-05-22T10:27:36.133Z ERROR  pg_archiver could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

So this KB  points to an HA or Timeout error, and I was seeing these errors, but the issue was the Watchdog service. After some more google-fu I came across Magander3’s Post Which all credit goes to for this fix. Basically here is what I did:

On each ESXI host in your cluster login via SSH and run the following command, “/etc/init.d/sfcbd-watchdog status” This should return “sfcbd is not running"

  • If this is the case try “/etc/init.d/sfcbd-watchdog start” just to see if that will help.
  • If it starts great! Run the start command on each ESXI host.
  • If not run “esxcli system wbem get” and you should get a fair amount of data, but your looking for  “Enabled:false”
  • If you see that run “esxcli system wbem set –enable trueThere are two “-” before enable. This should show “Enabled:true”
  • Run the previous “wbem get” command And you should see “Enabled:true” now.
  • Run your “/etc/init.d/sfcbd-watchdog status” again and it should show true.

After this ran on each ESXI host the error service started on vCenter and I could reset it.

PSC Service Health Alarm

This was a poser, and there are still a lot of different PSC errors in VMTN for 6.7u2. Well, I did a vCenter upgrade, some other house cleaning to see if it would clear this error, Then I found the following KB which basically tells you to sync the time across ESXI hosts and vCenter.

But the issue for me was how the sync was setup. I set everything to the same NTP server and the same time zone, but nothing worked. For some reason the PSC service just still would not work.

I fixed it by turning off NTP on the vCenter server and setting it to sync to the host. Once I set that up, BOOM it started.

Hope this helps someone, Sorry for the lack of pictures as I know those help. Have a good Monday!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s