Issue
I apologize for the cross posting, as this also exists over at Unix & Linux Stack Exchange. I need to find an answer for this.
Our VMs run on VMWare. I am not certain the version, but I want to say it is ESX 6 (or vSphere 6). The guests I support and administer are all CentOS 7
Today, I wen through and extended both a physical and logical volume to use recently added disk space. The physical partition had previously been extended using fdisk, I just didn't complete my steps to resize the LVM. The allocation and resizing went fine with no issues. I then restarted the VM to ensure the changes took and no errors occurred.
This is where I have my issue. After the reboot I can no longer ssh into the VM. I have access to the VMs through the vSphere client. The status of the VM looked good. It was in a running state, I could see all IP addresses being used (we run several docker containers so a number of IP addresses display), there was CPU and memory usage. However, it seems as though none of the services started up, such as ssh, docker (as we have several web apps deployed in containers and I cannot access the web apps.
After reboot, I tried to ssh into the VM and the connection timed out. So I opened a remote console from the vSphere client and tried logging in with two different admin users. Neither would work, I was return to the login prompt after what seemed like 30 seconds. I've restarted the VM trying different options, such as using the rescue kernel, going into grub and trying various settings such as root, linux kernel, etc. Each time the VM starts and loads to the login prompt and that's it.
I would assume, if there was an error I would see that and be taken to an emergency/crash shell, but no such thing. From all appearances, the VM seems to have started correctly.
On our vCenter, I only have minimal rights. So my questions are:
- Is there any way to bypass the login, from the console only, to be able to review the boot records? I assume no as that would be a huge security risk.
- Is there anyway to see if anything has been reported from the guest to the host? Again, I assume there is, and I do not have the access to view the output.
- Is there any way to get the boot records from the guest as there is no way to log into it? I assume there is not
- I have an ISO I have mounted, but I cannot change the setting to force a BIOS Setup, so Is there a way, using the grub command line, to fake the VM into not having an installed OS? I want to do this in an attempt to repair the install, boot record, etc.
Solution
I was able to figure out how to get into a recovery prompt and determine the issue.
I restarted the VM and at the grub menu I simply pressed "c" to enter a console. I then attached an ISO to the virtual CDROM and exited the grub command-line. That allowed for the the VM to boot from the ISO, allowing me to enter into recovery mode.
I was then able to review the message log and I saw where I resized the the fs and did pv and lv checks followed by the reboot. The log showed system boot and always normal, but once it reached a running state there were all kinds of strange items listed, such as:
- IPTABLE drops
- Docker not being able to start, pull, or other wise work with containers
- network interfaces entering into disabled mode
- rsyslog exceptions
and the list goes on. I also realized something else. I saw firehol starting and I remembered I had installed it, but never finished configuring it, and I thought I uninstalled it. So I went through and removed firefol and ipranges via yum. I also cleaned out /var/lib/docker of all images, containers, and volumes. I then restarted the VM and I was able to ping the ip address, I could now ssh into the VM and all seems to be working.
Answered By - Paul Stoner