Vagrant VM not saving any changes; creates new VM upon every `halt/suspend + up` (Windows Vagrant, VirtualBox, RailsDevBox) - ruby-on-rails

I am trying to use Virtual Box + Vagrant + Rails Dev Box on a Windows machine. I am able to run vagrant up and vagrant ssh to get into the virtual machine but none of the changes that I make are being saved, such as installing rails. Every time I halt Vagrant and start again, it's as if I loaded it for the first time. The process goes through all the steps that make the first vagrant up.
I have noticed, however, is that the one change I made to the Vagrantfile (synced folders location) is persisting. Also, if I use vagrant reload, the changes seem to persist. So it looks like the problem only happens if I use vagrant halt or vagrant suspend followed by vagrant up. But I wasn't under the impression that vagrant halt (and especially not vagrant suspend) is supposed to destroy a VM.
And when I open VirtualBox Manager I can see a bunch of instances of rails-dev-box VM's that have been created.
I found an issue that looks like this in the Vagrant Github issues site but honestly the discussion is totally over my head and I wasn't able to understand the resolution even though that thread is closed.
If anyone knows what is causing and could explain in "newbie" terms or, if my problem is too opaque, could walk me through the next steps to debug, I would really appreciate it!
EDIT
After writing this out and thinking about it more, I realized the problem is not actually that "changes aren't being saved." They're being saved...but I'm just being taken to a brand new VM any time I run vagrant up.

Have you tried vagrant resume instead of vagrant up?

I have been having this same issue, and have a couple possible solutions.
The file Vagrant uses to keep track of the default VM is .vagrant/machines/default/virtualbox/id. (The contents of that file will be the UUID of the VM, which can be found in the VirtualBox.xml file.) If there is anything wrong with the VM referenced there, then Vagrant apparently just deletes the file and tells you to start over.
One GitHub issue reported this same problem, in that case the VM was in an "invalid state" in VirtualBox. The problem was solved by deleting a corrupted save state.
That may be the issue that you are having, though it was not mine. If and when I figure out why I am having this issue, I will update this answer.
Edit: I booted up my vagrant VM manually from VirtualBox Manager (though I was not able to use it for development, because the shared folder wasn't set up by Vagrant). I didn't see any major problems, so I shut it back down. Next I replaced the id file manually, as described above. I did vagrant up again and it worked. Note that in order to keep it from being re-provisioned, you also need to re-create the action_provision file in that same folder. The contents of mine are: 1.5:<uuid>, where <uuid> is the machine's UUID. (I am not sure what the 1.5 means..) I have since done both vagrant suspend and vagrant halt (each followed by vagrant up) multiple times, and haven't had any further problems. I still don't know what caused this issue. If I have any other problems, or find out why this happened, I will again report back here.
Edit 2: I have finally figured out the problem. First, to help track things down, I did VAGRANT_LOG=debug vagrant status. This produced a lot of output, but as I dug through it, I found one line in particular of interest: VBoxManage.exe: error: could not find a registered machine with uuid <uuid>. I then did some manual debugging, and found that VBoxManage.exe list vms returned blank. It turns out that VirtualBox had lost track of my .VirtualBox folder, because I was doing all of this in Cygwin. Some more debugging, and I found that the environment variable that VirtualBox was looking for was HOME. I set that to my Windows home directory (/cygdrive/c/Users/<me>) in my .bashrc, and it fixed the problem, this time more permanently.
TL;DR:
Open the VirtualBox GUI and make sure there are no errors related to your VM.
If you are in Cygwin, ensure that HOME is set to the right value.
Otherwise, prepend VAGRANT_LOG=debug to any Vagrant command to debug further.

First, get the state of the VM to where you want it i.e. all things installed and configured.
While the VM is running, executing vagrant global-status gets the 'id' of the box and then do a vagrant package {id} packages it into a local package.box file.
vagrant box add package.box --name {whatever-name} - should get your box added to the list of available ones.
Then you can either create a new Vagrantfile or modify the existing one to load the vm with this new name.
Hope this helps.

Related

docker desktop wsl 2 backend has stopped unexpectedly

I am facing issue with Docker desktop. it is getting crashed in 15-20 min.
Docker Desktop 4.3.2
Docker 20.10.11
Windows 10 PRO
reference link: https://github.com/docker/for-win/issues/12477
I met that problem too, but that's when I start a relatively big image/container, and don't see many useful solutions on the community.
One tutorial that I follow is, add some config documents to limit the RAM use of the WSL-2, and restart my PC, it seems to work.
User Win+R, then enter '%UserProfile%' to enter the user dictionary in Windows, create a file: '.wslconfig', and then edit the file as follows:
[wsl2]
memory=8GB
swap=0
localhostForwarding=true
The 'memory' should be edited by the condition of our laptops, I used 4GB in my one. Then restart the PC.
I'm new to Docker, don't know whether it's the right solution, but it seems work in my laptop:D Hope that can help someone meet the same problem.

Docker for windows doesn't work. What could I do with it?

Yeah, I know, there are some questions similar to that, but none of them helped me. My docker for windows has once stopped working. Now I can't even uninstall it, because in the applications it doesn't appear, I cannot install it, the installation stucks always on the same process, I tried to delete all the files and then install again, tried to stop all the proccesses related to docker, I tried everything. I cannot even uninstall, cannot reinstall, cannot run... cannot do anything with it... help!
I finally solved that. I deleted manually all the directories related to docker (ProgramFiles, ProgramData, [user root directory], Appdata/Roaming etc.) and this way the installer finally finished. There was another problem, that WSL2 engine tried to start, but couldn't. WSL was enabled, the engine starting process was running for ca. 1 hour... but nothing happend and couldn't even change the engine. So I reinstalled once again, and didn't install the WSL frame. It started automatically with Hyper-V and now it works fine.

Running Docker locally in browser on Windows10: Error - IP Not Found

Ok, so here's my background first. I'm a noob in the world of commandline interfaces, but have been building websites off and on for a long time (10 yrs) using GUI's. So, I'm trying to make the switch to CLI's while also learning Docker.
Right now I'm stuck at trying to get Docker to load anything into my browser window. Here's what I've done successfully:
Installed Docker CE on my Windows 10 Machine
Setup a new virtual switch in Hyper-V Manager per these instructions and restarted the computer.
Created a new machine using this line of code docker-machine create -d hyperv --hyperv-virtual-switch "Primary Virtual Switch" manager1
Now here's where things get a little interesting. When I run the above function (I've done it twice now) it stalls on the line: Waiting for host to start...
I waited for five minutes to see if it would do anything, before killing the operation. (oh, did I mention I'm running PowerShell in Adminstrator Mode - right click on the icon to "run as administrator").
So when I re-open PowerShell to check if the new "manager1" machine has been created it comes back affirmative, but with this:
PS C:\WINDOWS\system32> docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
manager1 * hyperv Running Unknown IP not found
As far as I can tell these are the steps that I need to take to get Docker to run locally in my browser window, but for the life of me, I'm lost!
Oh, I did downgrade my docker-machine version per a suggestion that I read in a git forum comment, but that was to remedy an issue with the docker-machine create command. Part of me wonders if I'm doing too much. But I honestly don't know what to do next.
UPDATE:
I don't know that this is progress, but in the Virtual Switch Manager, I did switch the external network device from the "Ethernet Connection" to the "Dual Band Wireless" option. Then I restarted my machine. Now I'm showing that the state of the machine is "Timeout". I've also started and stopped my "docker-machine manager1". It sites on the (manager1) Waiting for host to start... line for about half a minute, then proceed to the Waiting for SSH to be available... where it just sits. Here's another screen shot that captures this:
Ok, my confusion was a result of looking beyond the documentation and getting a little confused. I still don't have this fully figured out but everything I was attempting to accomplish is detailed in this getting started with Windows 10 pro article.

Jenkins Slave Service Not Running: How to Debug

I have a Jenkins set-up where there is a master running on OS X with a Windows slave running on the same box as VM.
On many occasions when the VM is restarted the Jenkins service appears to either not start or possible encounters an error.
The set-up of the service looks correct and the VM is configured to automatically login as the Jenkins user, when its manually started everything seems to work fine so I can only assume the problem is on start-up of the box.
I have two questions:
Are there any well known gotcha's that can cause this?
Does anyone have some good strategies for debugging this? I'm assuming the answer will be somewhere in the Windows Logs but finding it is proving difficult (since the box and the user both contain the word Jenkins a simple find isn't helpful).

Erlang machine stopped instantly (distribution name conflict?). The service is not restarted as OnFail is set to ignore

I am using RabbitMQ. For some reason the rabbitMQ service stops as soon as you start it. I saw following error in the event log:
RabbitMQ: Erlang machine stopped instantly (distribution name conflict?). The service is not restarted as OnFail is set to ignore.
Someone told me to run this command: erl -sname rabbit
This command generates following output:
{(no error logger present")i neirtr otre: r"mEirnraotri nign ipnr odcoe_sbso o<
t0".,2{.b0a>d awrigt,h[ {eexrilt_p rviaml_uleo:a d{ebra,dcahregc,k[_{feirlle__pr
reismu_llto,a3d,e[r{,fcihleec,k"_efrill_e_prreismu_llto,a3d,e[r{.feirlle",}\,"{e
lriln_ep,r29i3m}_]l}o,a{dienri.te,rgle\t"_}b,o{olti,n1e,,[2{9f3i}l]e},,"{iinniit
t.,egrelt"_}b,o{olti,n1e,,[78{9f}i]l}e,,{\i"niinti,tg.eetr_lb\o"o}t,,{2l,i[n{ef,
i7l8e9,}"]i}n,i{ti.neirtl,"g}e,t{_lbionoet,,7762},][}{,f{iilnei,t\,"dion_ibto.oe
tr,l3\,"[}{,f{illien,e",i77n6i}t].}e,r{li"n}i,t{,ldion_eb,o74o3t},]3},][}{}f
ile,\"init.erl\"},{line,743}]}]}\n"
I am not sure how to interpret this output. I wonder the error is specific to RabbitMQ or erlang.
I have no idea how to procceed. Please suggest.
I have just run into this problem setting up RabbitMq as a service up on a new Windows server. The only thing I can think of that broke it for me is renaming the new windows box after installing the RabbitMq service, but before testing it for the first time.
First off I noticed it ran as an application fine. I solved it by installing the service again using the command from the manual install instructions:
rabbitmq-service install
Assuming that you have your path variables included for the RabbitMq sbin directory.
The only thing that worked for me was to clear the directory C:\Users\xxxxx\AppData\Roaming\RabbitMQ.
(cf. https://groups.google.com/forum/#!topic/rabbitmq-users/138RHzzsORU)
In my scenario, Two directories of Erlang under C:\Program Files with different versions were there, I uninstalled one of the version, also uninstalled RabbitMQ service from Windows services list - Restarted the system.
Again ran RabbitMQ setup - RabbitMQ service was setup successfully.
I ran into the same issue when installing RabbitMQ 3.7.17 via Chocolatey on a Windows Server 2016.
After trying most of the suggested solutions, the one that worked for me was:
rabbitmq-service remove
rabbitmq-service install
rabbitmq-service start
PS: if your PATH is not configured for RabbitMQ, this is the folder you need to run the commands from: C:\Program Files\RabbitMQ Server\rabbitmq_server-3.7.17\sbin (if your version is also 3.7.17).
For anyone else looking up this error: double check your config files and SSL files. I ran into this issue when I had specified the ssl_options.cacertfile with ca.pem but the file was mistyped as ca-pem in the directory. Unfortunately RabbitMQ wasn't smart enough to catch the missing file and was dumping with no logs.
I found a name conflict with an env-variable, I use since years - means, this was not a problem with the previous version.
I have "Logs" and apps will write into that directory, usually with their own subdirectories. RMQ uses the same variable name and means a plain filename.
So using: "C:\Users\rabbit\AppData\Roaming\RabbitMQ\log\log" made it working for me - this is in the rabbit's users private environment. So the global settings are now not seen by rabbit. Uff. And it looks like, this is really meant as a filename and after I changed it again to "rabbit#c4711-node.log", it writes like the earlier version. The service starts now for me - but this was really messy and I don't trust it at the moment ;-)
From my perspective, one should run such a service under its own account. If the service is already there, create a local user account - I've used "rabbit" and give it a password. The account I created, got admin right from me - but I currently just dont know, it this is needed. At least it should not - will see this later. If you have account/credentials, go to the service manager and click properties for the service. On the second tab ("log on"), check "this account" and enter username an password. If you have an account for the service you should be able to login with user.
Then you can specify environment variables with user scope.
To do this, logon with the user you created. Go to ControlPanel/System and click "advanced":
In the Environment UI, enter user specific variables
in the top panel:
Note: This was not my rabbit user, because I currently cannot login there. The variables, I entered - not guaranteed, it is correct - are the following:
RABBITMQ_BASE=C:\Users\rabbit\AppData\Roaming\RabbitMQ
RABBITMQ_CONFIG_FILE=C:\Users\rabbit\AppData\Roaming\RabbitMQ\rabbitmq
RABBITMQ_LOGS=C:\Users\rabbit\AppData\Roaming\RabbitMQ\log
RABBITMQ_LOG_BASE=C:\Users\rabbit\AppData\Roaming\RabbitMQ\log
RABBITMQ_NODE_IP_ADDRESS=192.168.26.3
This works for me.
The last time I installed it - some years ago - it was better to understand - this time, sorry, I dont .....
But made it workig.
According to RabbitMQ Install on Windows guide here
Troubleshooting When Running as a Service
In the event that the Erlang VM crashes whilst RabbitMQ is running as
a service, rather than writing the crash dump to the current directory
(which doesn't make sense for a service) it is written to an
erl_crash.dump file in the base directory of the RabbitMQ server (set
by the RABBITMQ_BASE environment variable, defaulting to
%APPDATA%\%RABBITMQ_SERVICENAME% - typically %APPDATA%\RabbitMQ
otherwise).
Basically it means to add a Environment Variable named RABBITMQ_BASE with value %APPDATA%\RabbitMQ
This fixed my problem.
I ran into this issue and the only way I could solve it was by unintalling RabbitMQ, unsintalling Erlang, rebooting the server and installing a clean Erlang and a clean RabbitMQ.
After all this, I could finally install and start the RabbitMQ instance as a windows service.
Tried all the solutions in this post and nothing worked.
Lucky for me it was in our development server, so the loss was acceptable.
The downside to this approach is that you loose all configs (all users, virtual hosts, etc).
It's all gone and you have to reconfigure the RabbitMQ instance from scratch.
Checking in from 2021:
None of this worked for me, the problem was actually that I had another instance of RabbitMQ running inside my WSL Ubuntu distro.
I had the same issue and I just downloaded the latested version of erlang and RabbitmQ and this resolved the issue for me.
While I got the same error, and the root cause for me seems related to Erlang cookie, I fixed it by doing:
Create a folder to store cookie, for example I am using C:\erl-23.2\home .
Add new system environment variable HOMEDRIVE, set the value to C:\
Add new system environment variable HOMEPATH, set the value to erl-23.2\home
This is making use of the rule:
%HOMEDRIVE%%HOMEPATH%.erlang.cookie (usually C:\Users%USERNAME%.erlang.cookie for user %USERNAME%) if both the HOMEDRIVE and HOMEPATH environment variables are set
Since I was doing a migration when the error popped up, I still had my original .erlang.cookie in C:\Users\Me, but the new installation generated a new .erlang.cookie during installation in C:\Windows\System32\config\systemprofile. After making them equal again and performing these steps from the sbin dir, it worked again.
rabbitmq-service remove
rabbitmq-service install
rabbitmq-service start
I had this today trying to install rabbitmq 3.8.0 with erlang 22.0 (64Bit).
Even completely re-installing both erlang and rabbit, deleteing all directories and registry did not help at all. Also i tried to set the needed PATH variables for erlang manually and re-installing the service each time.
The only solution working for me was installing another version of erlang. In my sepcific case i used erlang 21.3 in the 32bit version.
Doing that, no manually action was necessary and rabbit was up and running (after re-installing the service).

Resources