I have a rails app on AWS Elastic Beanstalk. I process background tasks using delayed_job. I set up an eb worker instance to handle this. It works but shows as failed (red) in the dashboard. I believe this is because of the following error that I get every few seconds:
error: AWS::SQS::Errors::AccessDenied: Access to the resource https://sqs.us-west-2.amazonaws.com/xxx...xxxx is denied
I tried to remove sqs by means of the following to no avail:
services:
sysvinit:
aws-sqsd:
enabled: false
ensureRunning: false
How do I stop sqs? Ideally it would never be installed in the first place. If I can't modify the install configuration is there a way to prevent this error from affecting the status of my environment?
You are launching a worker tier environment. That is why SQS is being created for your environment. You should launch a "Web Server" environment if you do not want to launch a worker environment. Worker Tier environment in Elastic Beanstalk allows you to poll messages from an SQS queue periodically.
Read more about worker tier environments here:
http://aws.amazon.com/blogs/aws/background-task-handling-for-aws-elastic-beanstalk/
Read more about environment tiers here:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
If you want to use a Worker Tier environment with SQS enabled you can get rid of the AccessDenied exception by giving access to the IAM instance profile to access SQS as explained here:
https://stackoverflow.com/a/24880344/161628
Related
I'm gonna set up sample elasticbeanstalk environment for multi-container docker.
But it is not created due to error.
environment tier: web-server
other configuration info: https://i.stack.imgur.com/gKwBn.png
I want to create sample elasticbeanstalk environment for multi-container docker.
But the actual is not created.
Here is the error statement.
WARN Environment health has transitioned from Pending to Severe.
Initialization in progress (running for 15 minutes). None of the instances are sending data.
ERROR Stack named 'awseb-e-at4dw9xg2u-stack' aborted operation.
Current state: 'CREATE_FAILED' Reason: The following resource(s) failed to create: [AWSEBInstanceLaunchWaitCondition].
ERROR LaunchWaitCondition failed.
The expected number of EC2 instances were not initialized within the given time.
Rebuild the environment. If this persists, contact support.
This issue is solved by allowing inbound & outbound traffic for default ACL network.
inbound
outbound
I deployed Airflow webserver, scheduler, worker, and flower on my kubernetes cluster using Docker images.
Airflow version is 1.8.0.
Now I want to send worker logs to S3 and
Create S3 connection of Airflow from Admin UI (Just set S3_CONN as
conn id, s3 as type. Because my kubernetes cluster is running on
AWS and all nodes have S3 access roles, it should be sufficient)
Set Airflow config as follows
remote_base_log_folder = s3://aws-logs-xxxxxxxx-us-east-1/k8s-airflow
remote_log_conn_id = S3_CONN
encrypt_s3_logs = False
and first I tried creating a DAG so that it just raises an exception immediately after it's running. This works, log can be seen on S3.
So I modified so that the DAG now creates an EMR cluster and waits for it to be ready (waiting status). To do this, I restarted all 4 docker containers of airflow.
Now the DAG looks working, a cluster is started and once it's ready, DAG marked as success. But I could see no logs on S3.
There is no related error log on worker and web server, so I even cannot see what may cause this issue. The log just not sent.
Does anyone know if there is some restriction for remote logging of Airflow, other than this description in the official documentation?
https://airflow.incubator.apache.org/configuration.html#logs
In the Airflow Web UI, local logs take precedence over remote logs. If
local logs can not be found or accessed, the remote logs will be
displayed. Note that logs are only sent to remote storage once a task
completes (including failure). In other words, remote logs for running
tasks are unavailable.
I didn't expect it but on success, will the logs not be sent to remote storage?
The boto version that is installed with airflow is 2.46.1 and that version doesn't use iam instance roles.
Instead, you will have to add an access key and secret for an IAM user that has access in the extra field of your S3_CONN configuration
Like so:
{"aws_access_key_id":"123456789","aws_secret_access_key":"secret12345"}
I'm creating a rails app with both web and worker tier in elastic beanstalk.
It's normal to start up web environment with ELB settings inside ebextensions. But I cannot startup worker environment because it doesn't have ELB.
Is there any way to separate ebextensions for each environments without creating another branch for this?
Refer to this link second answer
https://forums.aws.amazon.com/message.jspa?messageID=685803
Amazon Web Service has now a worker tiers in their Elastic Beanstalk. But, it nevertheless confuse us who come from the days of Worker dyno.
As a comparison, in Heroku, one can configure two dynos (something like processor?) each for web and worker. The web will work for any request, and will timeout normally at 15 secs. Thus, if you have a request that last more than that, your request will simply timed-out although not terminated per se. In that case, you should use worker and your web dyno should visit the endpoint several times per minutes (maybe) to check if there is any result to be brought back to the user. To make either worker or web dyno, what you need is just slide the slider and you are good to go. Sometimes, you may need a Procfile. But there is nothing fancy, or something really difficult, or confusing about it.
In AWS EBS (Elastic Beanstalk), since day 1 you hit eb init, you will be asked whether it is a Standard or Worker. When you hit Standard, it seems there is no way to make it as worker as well.
In our situation, the worker and standard web is located under one application. So, how could we use an EBS instance both for worker and standard. Our worker is using sidekiq, and redis. Please, point to any guidance or help us in this matter.
AWS Elastic Beanstalk has two types of Environments - Web tier and Worker tier.
Web tier environments are meant for web applications - http/https request processing. You get one or more EC2 instances behind a load balancer. You can get other resources like database per your requirement. You can choose the platform you wish e.g. Ruby, Python, Java, Node.js, PHP, Docker.
Worker environments are meant for asynchronous message processing. When you create a worker environment you do not have a load balancer. All your EC2 instances are in an autoscaling group. All these instances are running a daemon which is polling a single SQS queue for messages. When a message is pulled by the daemon from the SQS queue, the daemon sends a HTTP Post request on localhost:80. You can configure the port but the important thing is that the daemon posts the message as an HTTP request on localhost. Your worker application is actually a web application that receives the post request and processes the message. After the message is successfully processed the worker daemon expects that your web application running on localhost returns a HTTP 200 OK response. The daemon then deletes the message from SQS queue. You can write your worker application for any platform just like standard web server applications - Ruby, Python, Java, Node.js, PHP, Docker.
Based on my understanding of your usecase I would recommend creating two Elastic Beanstalk environments - one Standard and one Worker environment. The Standard web server receives HTTP requests and processes them synchronously. This environment puts the relevant data in an SQS queue. The second environment is a worker and the daemon running on this environment polls this SQS queue for messages. Your second environment is a web application that is NOT open to the internet. The worker daemon posts the messages as HTTP requests to your worker environment. Thus you can process long running workloads asynchronously using this second worker environment.
With worker environments you can use your own queues or Elastic Beanstalk can generate a queue for you. You can configure parameters like message visibility timeout, http connections based on your requirements or you can use the defaults.
Below are some links that may be useful for you:
http://aws.amazon.com/blogs/aws/background-task-handling-for-aws-elastic-beanstalk/
http://blogs.aws.amazon.com/application-management/post/Tx1Y8QSQRL1KQZC/Elastic-Beanstalk-Video-Tutorial-Worker-Tier
https://stackoverflow.com/a/23942498/161628
Does this meet your requirements? Please let me know if you have further questions.
Update
You need to upload your source code at two places - once for the worker environment and once for the web server environment. If someone was starting from scratch then they might have two separate code bases. But I think in your case I think it should be perfectly fine to have a single code base shared between the two environments. Suppose your web request arrives at '/register', then the register() method in your application can post messages to an SQS queue and be done with the HTTP request. Now your worker environment will poll the SQS queue and post messages over HTTP on localhost to the URL '/async_register' which will invoke a method async_register() in your application and do the asynchronous processing. These two methods can live in the same source code bundle which can be shared by both the worker and web server environment. The code path taken by worker and web server will be different so that web server environments will invoke register() and worker environments will invoke the async_register() method.
Another caveat is that HTTP requests sent by the worker daemon on localhost will contain an HTTP header - "User-Agent": "aws-sqsd/1.1". Read more here. So in your web application you can have a single listener to post requests on "/register" and depending on whether this header is present or not you invoke register() or async_register() methods internally.
Also I think if you want to share the code base between the two environments you can upload the code base at only one place. Your environments are logically grouped into applications. So you can have a single application. You upload your source code to this application using the "CreateApplicationVersion" API call. Suppose you upload an application version with label 'v1'. You can now create a worker environment and a web server environment under the same application. When you create an environment you need to provide a version to deploy to your enviroment. In this case you can deploy v1 to both environments. So you will be sharing the same source code for both environments. When you have a new version "v2". You upload this version and then perform an update on both environments changing their version to "v2".
The same version of the source code can be deployed to both environments. They will be running on different EC2 instances because one environment is dedicated for responding to web requests and one environment is dedicated for responding to asynchronous web requests (worker).
Is it possible to watch (tail) the logs from an app hosted on Elastic Beanstalk from the command line?
I know this is possible in Heroku using heroku logs -t, but am unsure if there is a way to do this with Beanstalk. If not, any suggestions or tips on how to best manage the logs?
You can ssh to Elastic Beanstalk instance and tail any log. Alternatively, you can publish logs to S3.
For example, the following logs are available for Python environment:
/opt/python/log/httpd.out
/var/log/httpd/error_log
/var/log/cfn-hup.log
/opt/python/log/supervisord.log
/var/log/eb-tools.log
/var/log/httpd/access_log
/var/log/eb-cfn-init-call.log
/var/log/eb-publish-logs.log
/var/log/cfn-init.log
You can find the list of logs available for your environment in web console: Logs > Snapshot Logs > View log file
You will be better off using services like loggly and splunk. You can live watch the logs from multiple server simultaneously and live in the browser itself.
There are free plans also available.