Why is my containerized Selenium application failing only in AWS Lambda? - docker

I'm trying to get a function to run in AWS Lambda that uses Selenium and Firefox/geckodriver in order to run. I've decided to go the route of creating a container image, and then uploading and running that instead of using a pre-configured runtime. I was able to create a Dockerfile that correctly installs Firefox and Python, downloads geckodriver, and installs my test code:
FROM alpine:latest
RUN apk add firefox python3 py3-pip
RUN pip install requests selenium
RUN mkdir /app
WORKDIR /app
RUN wget -qO gecko.tar.gz https://github.com/mozilla/geckodriver/releases/download/v0.28.0/geckodriver-v0.28.0-linux64.tar.gz
RUN tar xf gecko.tar.gz
RUN mv geckodriver /usr/bin
COPY *.py ./
ENTRYPOINT ["/usr/bin/python3","/app/lambda_function.py"]
The Selenium test code:
#!/usr/bin/env python3
import util
import os
import sys
import requests
def lambda_wrapper():
api_base = f'http://{os.environ["AWS_LAMBDA_RUNTIME_API"]}/2018-06-01'
response = requests.get(api_base + '/runtime/invocation/next')
request_id = response.headers['Lambda-Runtime-Aws-Request-Id']
try:
result = selenium_test()
# Send result back
requests.post(api_base + f'/runtime/invocation/{request_id}/response', json={'url': result})
except Exception as e:
# Error reporting
import traceback
requests.post(api_base + f'/runtime/invocation/{request_id}/error', json={'errorMessage': str(e), 'traceback': traceback.format_exc(), 'logs': open('/tmp/gecko.log', 'r').read()})
raise
def selenium_test():
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('-headless')
options.add_argument('--window-size 1920,1080')
ffx = Firefox(options=options, log_path='/tmp/gecko.log')
ffx.get("https://google.com")
url = ffx.current_url
ffx.close()
print(url)
return url
def main():
# For testing purposes, currently not using the Lambda API even in AWS so that
# the same container can run on my local machine.
# Call lambda_wrapper() instead to get geckodriver logs as well (not informative).
selenium_test()
if __name__ == '__main__':
main()
I'm able to successfully build this container on my local machine with docker build -t lambda-test . and then run it with docker run -m 512M lambda-test.
However, the exact same container crashes with an error when I try and upload it to Lambda to run. I set the memory limit to 1024M and the timeout to 30 seconds. The traceback says that Firefox was unexpectedly killed by a signal:
START RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30 Version: $LATEST
/app/lambda_function.py:29: DeprecationWarning: use service_log_path instead of log_path
ffx = Firefox(options=options, log_path='/tmp/gecko.log')
Traceback (most recent call last):
File "/app/lambda_function.py", line 45, in <module>
main()
File "/app/lambda_function.py", line 41, in main
lambda_wrapper()
File "/app/lambda_function.py", line 12, in lambda_wrapper
result = selenium_test()
File "/app/lambda_function.py", line 29, in selenium_test
ffx = Firefox(options=options, log_path='/tmp/gecko.log')
File "/usr/lib/python3.8/site-packages/selenium/webdriver/firefox/webdriver.py", line 170, in __init__
RemoteWebDriver.__init__(
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status signal
END RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30
REPORT RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30 Duration: 20507.74 ms Billed Duration: 21350 ms Memory Size: 1024 MB Max Memory Used: 131 MB Init Duration: 842.11 ms
Unknown application error occurred
I had it upload the geckodriver logs as well, but there wasn't much useful information in there:
1608506540595 geckodriver INFO Listening on 127.0.0.1:41597
1608506541569 mozrunner::runner INFO Running command: "/usr/bin/firefox" "--marionette" "-headless" "--window-size 1920,1080" "-foreground" "-no-remote" "-profile" "/tmp/rust_mozprofileQCapHy"
*** You are running in headless mode.
How can I even begin to debug this? The fact that the exact same container behaves differently depending upon where it's run seems fishy to me, but I'm not knowledgeable enough about Selenium, Docker, or Lambda to pinpoint exactly where the problem is.
Is my docker run command not accurately recreating the environment in Lambda? If so, then what command would I run to better simulate the Lambda environment? I'm not really sure where else to go from here, seeing as I can't actually reproduce the error locally to test with.
If anyone wants to take a look at the full code and try building it themselves, the repository is here - the lambda code is in lambda_function.py.
As for prior research, this question a) is about ChromeDriver and b) has no answers from over a year ago. The link from that one only has information about how to run a container in Lambda, which I'm already doing. This answer is almost my problem, but I know that there's not a version mismatch because the container works on my laptop just fine.

I have exactly the same problem and a possible explanation.
I think what you want is not possible for the time being.
According to AWS DevOps Blog Firefox relies on fallocate system call and /dev/shm.
However AWS Lambda does not mount /dev/shm so Firefox will crash when trying to allocate memory. Unfortunately, this handling cannot be disabled for Firefox.
However if you can live with Chromium, there is an option for chromedriver --disable-dev-shm-usage that disables the usage of /dev/shm and instead writes shared memory files to /tmp.
chromedriver works fine for me on AWS Lambda, if that is an option for you.
According to AWS DevOps Blog you can also use AWS Fargate to run Firefox/geckodriver.
There is an entry in the AWS forum from 2015 that requests mounting /dev/shm in Lambdas, but nothing happened since then.

Related

Cannot import HOL in Isabelle batch mode from Docker

I'm trying to use HOL in Isabelle in batch mode from Docker, but it can't seem to find HOL.
If I have this My.thy file
theory My
imports HOL
begin
end
and then run this to process the file in batch mode
docker run --rm -it -v $PWD/My.thy:/home/isabelle/My.thy makarius/isabelle:Isabelle2022_ARM process -T My
I get
*** No such file: "/home/isabelle/HOL.thy"
*** The error(s) above occurred for theory "Draft.HOL" (line 2 of "~/My.thy")
*** (required by "Draft.My")
Exception- TOPLEVEL_ERROR raised
However, I can import Main. In more detail, if I change My.thy to be
theory My
imports Main
begin
end
then running the same Docker command as above to run the batch process results in
Loading theory "Draft.My"
### theory "Draft.My"
### 0.039s elapsed time, 0.078s cpu time, 0.000s GC time
val it = (): unit
How can I import HOL Isabelle's batch mode in Docker?

Running usd_from_gltf in an AWS Lambda

I'm trying to run Google's usd_from_gltf utility inside AWS Lambda, using a custom Docker image. The setup seems to be working locally but when executing the same Lambda in AWS, it fails for certain input files.
Minimal app
https://github.com/petrbroz/glb-to-usdz-test
This is a minimalistic AWS SAM app with a Lambda function called GlbToUsdzFunction that downloads a Glb file from specified URL and converts it to Usdz. The Lambda function uses a custom Docker image (https://github.com/leon/docker-gltf-to-udsz), and Python's subprocess to run the usd_from_gltf tool to handle the conversion.
Sample file URLs
https://petrbroz.s3.us-west-1.amazonaws.com/glb-to-usdz-issues/snowmobile.glb
https://petrbroz.s3.us-west-1.amazonaws.com/glb-to-usdz-issues/wall-e.glb
When running locally
The Lambda function succeeds for both snowmobile.glb and wall-e.glb. Here's an example output for the former:
$ sam build
$ echo "{ \"url\": \"https://petrbroz.s3.us-west-1.amazonaws.com/glb-to-usdz-issues/snowmobile.glb\" }" | sam local invoke "GlbToUsdzFunction" --event -
Reading invoke payload from stdin (you can also pass it from file with --event)
Invoking Container created from glbtousdzfunction:glb-to-usdz-lambda
Building image.................
Skip pulling image and use local one: glbtousdzfunction:rapid-1.46.0-x86_64.
START RequestId: 720b6b49-e36c-4429-96fb-9e0e5c02c09b Version: $LATEST
Downloading file
Converting file
Warning: extensionsUsed: Extension is in extensionsUsed but not actually referenced: KHR_texture_transform [GLTF_WARN_EXTENSION_UNREFERENCED]
END RequestId: 720b6b49-e36c-4429-96fb-9e0e5c02c09b
REPORT RequestId: 720b6b49-e36c-4429-96fb-9e0e5c02c09b Init Duration: 0.22 ms Duration: 19997.59 ms Billed Duration: 19998 ms Memory Size: 1024 MB Max Memory Used: 1024 MB
{"status": "success"}
When running in AWS
The Lambda function succeeds for snowmobile.glb but fails for wall-e.glb. Here's the output for the latter:
START RequestId: b1bdc496-ec12-430e-a641-2574af354d60 Version: $LATEST
Downloading file
Converting file
ERROR: USD: Insufficient permissions to write to destination directory '/var/tmp' (Replace) [UFG_ERROR_USD]
ERROR: USD: Failed to map '/var/tmp/output.usdc': No such file or directory (AddFile) [UFG_ERROR_USD]
Warning: USD: Failed to add temporary layer at '/var/tmp/output.usdc' to the package at path 'output.usdz'. (_CreateNewUsdzPackage) [UFG_WARN_USD]
ERROR: Cannot write USD: "/tmp/output.usdz" [UFG_ERROR_IO_WRITE_USD]
Command '['usd_from_gltf', '/tmp/input.glb', '/tmp/output.usdz']' returned non-zero exit status 255.
END RequestId: b1bdc496-ec12-430e-a641-2574af354d60
REPORT RequestId: b1bdc496-ec12-430e-a641-2574af354d60 Duration: 2039.96 ms Billed Duration: 5166 ms Memory Size: 1024 MB Max Memory Used: 101 MB Init Duration: 3125.71 ms
Has anyone run into this? Am I doing something wrong here, or is this perhaps a bug on the AWS side, or on the usd_from_gltf side?
Some USD conversions cause the library to write intermediate files and it looks like it is built to use /var/tmp for this purpose. Since Lambdas can only write to /tmp, the workaround we came up with is to link /var/tmp to /tmp:
in the glb-to-usdz Dockerfile, add a line like RUN rm -rf /var/tmp && ln -s /tmp /var/tmp
This allows your second example to succeed.

Cannot find files that should be inside my running docker container

I'm doing some work with the reverse engineering tool angr, and I'm trying to run it in a container.
My current directory looks likes this:
ask#Garsy:~/Notes/ethHack/wetransfer-85179d/Export$ ls
angry.py impossible_password_location.csv report.md
impossible_password.bin impossible_password_strings.txt test.txt
I then run a specific angr image like so:
ask#Garsy:~/Notes/ethHack/wetransfer-85179d/Export$ sudo docker run -it --rm -v $pwd:/local angr/angr
Where I believe that using $pwd:/local should give me acces to the before shown files inside the container (following [this][1] guide [5:40]).
I run the container, and try to write some python:
(angr) angr#38b067fffc2d:~$ ipython3
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import angr
In [2]: angr.Project("/impossible_password.bin")
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-2-3f794a899665> in <module>
----> 1 angr.Project("/impossible_password.bin")
~/angr-dev/angr/angr/project.py in __init__(self, thing, default_analysis_mode, ignore_functions, use_sim_procedures, exclude_sim_procedures_func, exclude_sim_procedures_list, arch, simos, engine, load_options, translation_cache, support_selfmodifying_code, store_function, load_function, analyses_preset, concrete_target, **kwargs)
124 self.loader = cle.Loader(thing, **load_options)
125 elif not isinstance(thing, str) or not os.path.exists(thing) or not os.path.isfile(thing):
--> 126 raise Exception("Not a valid binary file: %s" % repr(thing))
127 else:
128 # use angr's loader, provided by cle
Exception: Not a valid binary file: '/impossible_password.bin'
where it can't find the file. Same goes for local/impossible_password.bin". How do I get the files of my current direcotry to be available when I spin up the container?
[1]: https://www.youtube.com/watch?v=9dQFM5O4KFk

dataflow failed to set up worker

Tested my pipeline on DirectRunner and everything's working good.
Now I want to run it on DataflowRunner. It doesn't work. It fails even before enter my pipeline code, and I'm totally overwhelmed by the logs in stackdriver - just don't understand what they mean and really don't have any clue on what's wrong.
execution graph looks loaded fine
worker pool starts and 1 worker is trying to run through the setup process, however looks never success
some logs that I guess might provide useful information for debugging:
AttributeError:'module' object has no attribute 'NativeSource'
/usr/bin/python failed with exit status 1
Back-off 20s restarting failed container=python pod=dataflow-fiona-backlog-clean-test2-06140817-1629-harness-3nxh_default(50a3915d6501a3ec74d6d385f70c8353)
checking backoff for container "python" in pod "dataflow-fiona-backlog-clean-test2-06140817-1629-harness-3nxh"
INFO SSH key is not a complete entry: .....
How should I tackle this problem?
Edit:
my setup.py here if it helps: (copyed from [here], only modifiedREQUIRED_PACKAGES
and setuptools.setup section)
from distutils.command.build import build as _build
import subprocess
import setuptools
# This class handles the pip install mechanism.
class build(_build): # pylint: disable=invalid-name
"""A build command class that will be invoked during package install.
The package built using the current setup.py will be staged and later
installed in the worker using `pip install package'. This class will be
instantiated during install for this specific scenario and will trigger
running the custom commands specified.
"""
sub_commands = _build.sub_commands + [('CustomCommands', None)]
# Some custom command to run during setup. The command is not essential for this
# workflow. It is used here as an example. Each command will spawn a child
# process. Typically, these commands will include steps to install non-Python
# packages. For instance, to install a C++-based library libjpeg62 the following
# two commands will have to be added:
#
# ['apt-get', 'update'],
# ['apt-get', '--assume-yes', install', 'libjpeg62'],
#
# First, note that there is no need to use the sudo command because the setup
# script runs with appropriate access.
# Second, if apt-get tool is used then the first command needs to be 'apt-get
# update' so the tool refreshes itself and initializes links to download
# repositories. Without this initial step the other apt-get install commands
# will fail with package not found errors. Note also --assume-yes option which
# shortcuts the interactive confirmation.
#
# The output of custom commands (including failures) will be logged in the
# worker-startup log.
CUSTOM_COMMANDS = [
['echo', 'Custom command worked!']]
class CustomCommands(setuptools.Command):
"""A setuptools Command class able to run arbitrary commands."""
def initialize_options(self):
pass
def finalize_options(self):
pass
def RunCustomCommand(self, command_list):
print 'Running command: %s' % command_list
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='y\n'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print 'Command output: %s' % stdout_data
if p.returncode != 0:
raise RuntimeError(
'Command %s failed: exit code: %s' % (command_list, p.returncode))
def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)
# Configure the required packages and scripts to install.
# Note that the Python Dataflow containers come with numpy already installed
# so this dependency will not trigger anything to be installed unless a version
# restriction is specified.
REQUIRED_PACKAGES = ['apache-beam==2.0.0',
'datalab==1.0.1',
'google-cloud==0.19.0',
'google-cloud-bigquery==0.22.1',
'google-cloud-core==0.22.1',
'google-cloud-dataflow==0.6.0',
'pandas==0.20.2']
setuptools.setup(
name='geotab-backlog-dataflow',
version='0.0.1',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
)
worker-startup log: and it ended at the following exception
I /usr/bin/python failed with exit status 1
I /usr/bin/python failed with exit status 1
I AttributeError: 'module' object has no attribute 'NativeSource'
I class ConcatSource(iobase.NativeSource):
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/concat_reader.py", line 26, in <module>
I from dataflow_worker import concat_reader
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/maptask.py", line 31, in <module>
I from dataflow_worker import maptask
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 26, in <module>
I from dataflow_worker import executor
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 63, in <module>
I from dataflow_worker import batchworker
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/start.py", line 26, in <module>
I exec code in run_globals
I File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
I "__main__", fname, loader, pkg_name)
I File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
I AttributeError: 'module' object has no attribute 'NativeSource'
I class ConcatSource(iobase.NativeSource):
You seem to be using incompatible requirements in your REQUIRED_PACKAGES directive, i.e. you specify "apache-beam==2.0.0" and "google-cloud-dataflow==0.6.0", which conflict with each other. Can you try removing / uninstalling the "apache-beam" package and install / include the "google-cloud-dataflow==2.0.0" package instead?

apache_beam.runners.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed:

I set up a Google Cloud project in Cloud Shell, and tried to run this tutorial script https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/flowers/sample.sh
Ran into this error:
***#***:~/git/cloudml-samples/flowers$ ./sample.sh
Your active configuration is: [cloudshell-4691]
Using job id: flowers_***_20170113_162148
python trainer/preprocess.py \
--input_dict "$DICT_FILE" \
--input_path "gs://cloud-ml-data/img/flower_photos/eval_set.csv" \
--output_path "${GCS_PATH}/preproc/eval" \
--cloud
WARNING:root:Using fallback coder for typehint: Any.
WARNING:root:Using fallback coder for typehint: Any.
WARNING:root:Using fallback coder for typehint: Any.
DEPRECATION: pip install --download has been deprecated and will be removed in the future. Pip now has a download command that should be used instead.
Collecting google-cloud-dataflow==0.4.4
Using cached google-cloud-dataflow-0.4.4.zip
Saved /tmp/tmpSoHiTi/google-cloud-dataflow-0.4.4.zip
Successfully downloaded google-cloud-dataflow
# Takes about 30 mins to preprocess everything. We serialize the two
Traceback (most recent call last):
File "trainer/preprocess.py", line 436, in <module>
main(sys.argv[1:])
File "trainer/preprocess.py", line 432, in main
run(arg_dict)
File "trainer/preprocess.py", line 353, in run
p.run()
File "/home/slalomconsultingsf/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 159, in run
return self.runner.run(self)
File "/home/slalomconsultingsf/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow_runner.py", line 195, in run
% getattr(self, 'last_error_msg', None), self.result)
apache_beam.runners.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed:
(b85b0a598a3565cb): Workflow failed.
I was not able to get any clue where I was doing wrong from the Error Log of GoogleCloud Dataflow
Appreciate any answer and help for troubleshooting.
Enable the Dataflow API. In the Pantheon top search box typing "dataflow api" will take you to a window where you can click "Enable API".
I think this will fix it for you. I disabled my Dataflow API and got the same error as you, then when it was re-enabled the problem went back away.

Resources