AWS instance crashes after I run a build job on Jenkins - jenkins

The build job shows that it's running, but it will eventually crash ('eventually' because I have created different instances, and they end up crashing when I want to deploy the application). After it becomes unresponsive, I'm not able to login to the EC2 instance anymore. I get the following error: "port 22: Operation timed out". I'm using the free tier linux and t2.micro instance type. I tried restarting the instance, but it's still unresponsive.
Does this happen to anybody else?

Related

Why does self made(.NET) Worker Service as Windows Service give "Error 1067: The process terminated unexpectedly" while starting the serivce?

I created a worker service in .NET and I planned to deploy that service to windows machine so I created it as windows service.
When I started deploying the service (using sc.exe), everything went smoothly until I tried to start up the service and I got error: "Error 1067: The process terminated unexpectedly".
It did start up in my dev machine so the problem must be related with the target machine.
Answers from "googling" didn't help me much and were kind of disperse.
What might cause this error?
For me the solution was simple. My target machine was missing the directory that my service was trying to use. I added the proper directory and folder into my dev machine and the error was gone.

Composer instance freeze, metadata.google.internal authentication error

Our Composer instance dropped all its active workers in the middle of the day. Node memory and cpu utilization disappeared for 2 out of 3 nodes.
First errors were:
_mysql_exceptions.OperationalError: (2006, "Can't connect to MySQL server on 'airflow-sqlproxy-service.default.svc.cluster.local' (110))"
Restarting Composer instance (with a dummy env variable) does not help, gives the below error:
Killing GKE workers in error does not help either. Stackdriver has this:
ERROR: (gcloud.container.clusters.describe) You do not currently have an active account selected.)
And another error seems to point to internal Google authentication service problem:
ERROR: (gcloud.container.clusters.get-credentials) There was a problem refreshing your current auth tokens: Unable to find the server at metadata.google.internal)
The Composer storage bucket seems to have 'Storage Legacy Bucket ...' permissions for some service accounts. Some changes going on in the authentication backend or what could be the underlying cause of the sudden and weird freeze?
Versions are composer-1.8.2 and airflow-1.10.3.

How to debug ElasticBeanstalk error "X% of the requests are failing with HTTP 5xx"

My problem is similar to AWS: None of the Instances are sending data but has a slightly different error message.
I have a Rails application running on ElasticBeanstalk, and it appears to be running correctly. Periodically, Enhanced Health Monitoring sends me error messages such as:
Environment health has transitioned from Ok to Degraded. 20.0 % of the
requests are failing with HTTP 5xx.
where the percentage varies up to 100%. Even though I've made no changes, a minute later I get a followup message telling me that everything is back to normal:
Environment health has transitioned from Degraded to Ok.
I've downloaded the full logs from ElasticBeanstalk but I don't know exactly where to look (there are around 20 different log files in various directories).
I'm currently using the free AWS tier with the smallest instances of database, server, etc. Could this be the cause? Which of the log files should I be looking in, and what should I be looking for?
I run rails apps on Elastic Beanstalk and have found it helpful to think about Beanstalk as a computer (in this case an Amazon EC2 instance) running your rails app and a web server (either Passenger or Puma). When you get a 500 error, it could be because your rails app didn't properly deploy–in which case Passenger or Puma will return an error—or your app is deployed properly but encountered an error just like it might on your local machine.
In either case, to diagnose an error, download the full logs from your AWS console (open the correct app environment and then choose Logs > Request Logs > Full logs > Download). Deployment errors are harder to diagnose, but I recommend starting by looking in var-XX/logs/log/eb-activity.log. I suspect your error is coming from your rails app itself, in which case I recommend looking in var-XX/app/support/logs/passenger.log and production.log. To find a 500 error, search for "500 Internal" and then treat the error like you would any other rails error.
You can go to the EC2 instance and run the application just like you would run on your local machine and see the logs.
You can ssh into your EC2 instance using the command eb ssh and go to /opt/python/ directory (It will be different for Ruby or other programming languages).
/opt/python/run is the directory where you will find the version of your application which is run from the EC2 instance. Look for the directory venv and app inside run directory.
Note: The above folder structure is for Python but a similar folder structure post deployment can be found for any other programming language. Just look for the standard directory structure for the deployment environment for your programming language.
For Python:
/opt/python: Root of where you application will end up.
/opt/python/current/app: The current application that is hosted in the environment.
/opt/python/on-deck/app: The app is initially put in on-deck and then, after all the deployment is complete, it will be moved to current. If you are getting failures in your container_commands, check out out the on-deck folder and not the current folder.
/opt/python/current/env: All the env variables that eb will set up for you. If you are trying to reproduce an error, you may first need to source /opt/python/current/env to get things set up as they would be when eb deploy is running.
/opt/python/run/venv: The virtual env used by your application; you will also need to run source /opt/python/run/venv/bin/activate if you are trying to reproduce an error.
I know it is a little late but I wanted to comment the trick I use to find the error, I use to connect via ssh and then, once in the app I try to enter "rails console" It uses to fail, but it shows normally the error you´re making. This little trick saved my life several times. Hope it helps!

Deploy is everytime in progress on AWS opsworks

Deploy is not finished and failed. I tried stop instance but all operation are in progress. What I need do?
This usually happens when the instances in which you are running the command do not have a way to connect to the internet. On rare occasions, this could also indicate that the Opsworks agent is not running, but that is less likely.
Check the firewall settings and outbound internet access. SSH into the machines and try to ping something on the internet.
If you are deploying your app to a private VPC, then you need to add NAT instances so that the instances have internet access.

After upgrading from micro to small Amazon EC2 instance, I cannot deploy any new code

I upgraded from micro instance to small instance on Amazon EC2.
When I wanted to deploy a new code, the code was not deployed due to
** [deploy:update_code] exception while rolling back: Capistrano::ConnectionError, connection failed for: ELASTIC_IP (Errno::ETIMEDOUT: Operation timed out - connect(2))
connection failed for: ELASTIC_IP (Errno::ETIMEDOUT: Operation timed out - connect(2))
So it looks like the upgrade ignore the old elastic IP. Thus, I created a new Elastic IP and assigned this IP to the new instance and this error gone.
But when I access www.my_project.com, or 11.22.33.44 (elastic IP) or the Public DNS (ec2-11-222-333-444.compute-1.amazonaws.com), there is still an empty page and not my application.
The code is deployed via Capistrano without any error. On the old micro instance I used nginx - is this nginx accessible also on the new instance or do I need to set up/install again?
How to make my app accessible?
Thank you
If I had to guess, it's that the SSH key (not the EC2 key pair, but the actual SSH key coming from the machine) has changed, and by default, SSH on your local machine will block the connection for security reasons.
If you have a Mac/Linux machine you're using, you can look inside ~/.ssh/known_hosts and remove the entry for your Elastic IP, save the changes, and try to SSH into the machine again to confirm the connection.
Not sure of the right path in Windows, but you'd make the same changes.
Aws needs manual monitoring when you end up with issues like this.
While you were upgrading your instance, what was the approach you took?
Either you
create an ami with instance and volumes and then launch the ami with fresh small instance or
Detach the ebs volume and attach to a small instance and made required configuration changes.
ssh into the instance and check for
If you could manually deploy the code.
If it's a git repo, you can pull and push changes directly.
All the processes related to nginx,db etc are running.
Where does the default home page for instance lands to . for ex documentroot in case of apache.conf.
I cannot rule out the possibility of key mismatch still the error doesn't points to that.

Resources