Why is my Dockerized Browserless-Chrome hanging when a Mediawiki Selenium test causes a new page to be loaded? - docker

Background
I have a makefile which fetches and spins up a Dockerized Mediawiki instance on my local machine, from scratch, with a single make command:
https://gitlab.wikimedia.org/mhurd/mediawiki-docker-make
This works fairly well. Please try it!
Goal
I'd also like to be able to run Mediawiki's Selenium tests (as-is) in Docker containers and watch them execute "live", with the makefile, again, making it easy to kick this process off.
Approach
This work-in-progress branch adds a container for running Mediawiki Selenium tests, and a browserless-chrome container so you can watch and debug the tests "live", as they run:
https://gitlab.wikimedia.org/mhurd/mediawiki-docker-make/-/tree/selenium
As you can see from the top of this branch's readme, after running make to spin everything up, running make runseleniumtests kicks the tests off. When doing so a browser window is automatically opened which lets you see the Selenium tests which are running in the browserless-chrome container. This works... to a point.
Problem
Unfortunately, after a few tests are seen to be running, I'm seeing this error:
Protocol error (Input.dispatchMouseEvent): Target closed.

I suspect the way the browserless-chrome container manages its Chrome sessions may be to blame, as it appears the error is happening when a test causes a new page to be loaded, but I'm not sure.
Any ideas appreciated, and please try it yourself - the whole point of the makefile is to make spinning this up from scratch super simple. It only takes a couple commands and you should be able to see the tests running and the resulting error. Thanks!
Misc
Running curl -s http://127.0.0.1:3000/sessions | python3 -m json.tool on the host machine after attempting to run the tests shows there are multiple browserless-chrome sessions, but I'm unsure how to make it behave more like a non-dockerized setup - which has no problem with tests which cause page loads.
In email communication with Browserless support, they mention seeing similar issues but haven't been able to track down why:
It seems to be that the root cause may be due to "click" events aren't
working properly when your viewing the live debugger. Can you confirm
that if you don't open the live viewer, it allows the test to progress
and eventually throws new errors regarding selectors not being found? (Edit: I did)
I've also run into this live debugger behavior before and in my
experience I stopped watching the live debugger, and since I couldn't
see which selector it wasn't finding I exported a screenshot and in
that particular case, it ended up being an issue with my viewport,
since the viewport in the remote session was smaller, the styles
rendered differently then locally on my machine, when I set the
viewport size, the selectors were found - but then again, that was my
particular issue, might not be yours.
But yes, for some odd reason when you view live sessions, you can run
into this odd issue which we haven't been able to understand.

Related

Request for working configurations of openresty + mobdebug + EmmyLua remote debugging in PHPStorm

I'm out of my wits. I'm trying to set up remote debugging of lua code in dockerized openresty. I use PHPStorm with EmmyLua extension, and the mobdebug library on the Lua end. I have been reading and hearing reports of this working for people, but for me stopping on a breakpoint (or immediately after mobdebug.start()) works about 15% of the time (evidence that I am not completely misconfiguring the thing), including exactly 0% of those places in my code that I actually want to debug.
I will not be debugging this issue. I intend to work around it by using an exact setup that is known to work, so I need someone for whom it does work to tell me what their setup is:
OS version
openresty version
mobdebug version
any custom patches or hacks you might have applied to get the debugging working
luasocket version (probably relevant)
PHPStorm version
EmmyLua version
docker and docker-compose version, if applicable
whatever you may suspect to be relevant
I am willing to completely raze my development environment and rebuild it exactly to the working spec, just to have working Lua debugging.
EDIT: for those interested, here are my detailed symptoms:
I can't stop at actual breakpoints, ever (i.e. after I initially stop after mobdebug.start() and then "Resume program" and a line with a breakpoint is hit, but it doesn't stop there)
I can stop after mobdebug.start() in code executed from init_by_lua_block, i.e. once per server start / config reload
I can't stop after mobdebug.start() in any code executed during handling a request, i.e. ssl_certificate_by_lua_block, rewrite_by_lua_block etc. This is probably understandable because coroutines are involved
All my attempts at enabling coroutine debugging in request handling code either error out or have no effect:
mobdebug.coro() in init_worker_by_lua_block() errors out with "API disabled in current context" somewhere in mobdebug.lua
mobdebug.on() in the function I want to debug either has no effect, or errors out with "attempt to yield across C-call boundary"; I haven't discerned the pattern yet.
stopping on a breakpoint (or immediately after mobdebug.start()) works about 15% of the time
Stopping after mobdebug.start() should work under all circumstances, except when there is a connection already established to the same debugger controller, so the fact that it doesn't usually points to the system that tries to establish multiple debugging sessions to the same controller/IDE (or no connection can be established at all).
Similarly, there are several reasons why breakpoints may not be working, but if they work in a file as part of a specific setup, then I'd expect them to always work in that case. Some of the reasons why breakpoints may not be working are listed in the documentation: https://studio.zerobrane.com/doc-faq#why-breakpoints-are-not-triggered.
mobdebug provides a command line-based controller, so for troubleshooting purposes it may be easier to use that instead of a more complex setup.

Locally-run tests pass, but Jenkins tests fail; why, and how can I fix this?

I'm running a fairly large suite of python-based tests with a much larger number of steps on an Ubuntu Linux VM. When I run them via any number of methods manually (via the console) they all run and pass just fine.
After I ported them to a Jenkins server, four out of the thirty fail. I tried the usually recommended fix - increasing the wait time for keywords to work to 1s before every single click - so I'm fairly certain it isn't a timing issue. The site loads a lot faster than that on Windows, which I know is slower than Jenkins on Linux.
After Googling around a little for an answer, I found that apparently no one has come up with an accepted answer, either on this site or other Q/A sites.
Here's the error messages I'm getting from Jenkins.
ElementNotVisibleException: Message: element not visible
(Session info: chrome=61.0.3163.79
(Driver info: chromedriver=2.26.436382 (70eb799289ce4c2208441fc057053a5b07ceabac),platform=Linux 4.10.0-33-generic x86_64)
WebDriverException: Message: unknown error: Cannot read property 'innerHTML' of undefined
(Session info: chrome=61.0.3163.79
(Driver info: chromedriver=2.26.436382 (70eb799289ce4c2208441fc057053a5b07ceabac),platform=Linux 4.10.0-33-generic x86_64)
The other two are both element not visible exceptions identical to the first, both of which happen on a Click Button keyword that is not the first Click Button keyword in the test suite. The first one happens on a Click Element keyword that has worked perfectly since I wrote it, and the last one happens on tried-and-true JavaScript call to get the text of an element.
Why would something work locally on two different operating systems and then fail on Jenkins?
Why would something work locally on two different operating systems and then fail on Jenkins?
The most common might be that the jenkins system is running slower, and your tests aren't being hyper-vigilant about waiting for pages to finish loading before trying to interact with it. Jenkins boxes often can be under a heavy load, and if both the client and server are running on the same box, either or both could be contributing to the problem.
Another reason could be that you're running different versions of the browsers and/or selenium drivers on the jenkins box.
Another reason could be that the resolution of the (virtual?) displays is different, causing elements to be shifted to a different position.
The browsers on the jenkins box could have different profiles, resulting in a different set of plugins or antivirus software running. These can contribute to the speed at which a page renders, or could cause unwanted popups that cover portions of the screen.

Gitlab Pages delivers random content

I am experiencing weird behavior with the Pages feature of GitLab Omnibus package running on an Ubuntu 16.04 virtual machine. Some projects use Pages with Jekyll built by GitLab CI, which has been working as expected since it was first published with Gitlab CE.
For a couple of days now, visiting any of the homepages of those sites shows the content of just one of the projects. Each of them should of course show different content, but they all show the same. Even stranger: the content shown on each of the sites changes over time to one of the other projects, and I can not see whether this is deterministic.
Restarting the build processes of each of the projects did not fix this, neither did gitlab-ctl reconfigure, stop and start, nor rebooting the entire VM.
To investigate on that issue, I edited (which I assume is) the resulting file of the build process at /var/opt/gitlab/gitlab-rails/shared/pages/www/www.domain.org/public/index.html. Not in the first place, but later on during the already stated "rotating" content, the edits showed up on the webpage.
So what is going on there? Is this some caching issue? Is it malconfiguration? Is it a bug? Please help me find and fix the problem, as those are production websites.
Looks like this is actually an issue

Running Geb + spock tests headless

I have a number of geb functional tests for a grails application.
The tests are working as expected when executed from terminal or IDE.
Although the tests need to be executed by hudson, so they are run in headless mode using Xvfb.
The problem is that the tests keep failing, or behaving unexpectedly, returning errors like RequiredPageContentNotPresent and Stale Element Reference Exception in places that doesn't make sense.
For example:
(at LicencePage is verified above, and page isn't changed)
when:
addDocument(Data.Test_Doc_name,Data.Test_Doc_file)
sometimes throws
Failure: Add Actual Licence (HomePageSpec)
| geb.error.RequiredPageContentNotPresent: The required page content 'addDocument - SimplePageContent (owner: LicencePage, args: [Functional Test Doc, /var/lib/hudson/jobs/KB-Functional_Tests/workspace/app/../manual_test_data/so_v3/os_test_1], value: null)' is not present
at geb.content.TemplateDerivedPageContent.require(TemplateDerivedPageContent.groovy:61)
at geb.content.PageContentTemplate.create_closure1(PageContentTemplate.groovy:63)
at geb.content.PageContentTemplate.create(PageContentTemplate.groovy:82)
at geb.content.PageContentTemplate.get(PageContentTemplate.groovy:54)
at geb.content.NavigableSupport.getContent(NavigableSupport.groovy:45)
at geb.content.NavigableSupport.methodMissing(NavigableSupport.groovy:121)
at geb.Browser.methodMissing(Browser.groovy:194)
at geb.spock.GebSpec.methodMissing(GebSpec.groovy:51)
at HomePageSpec.Add Actual Licence (HomePageSpec.groovy:228)
The method addDocument() is defined on an 'abstract' page, which LicencePage is extending. In most cases like this, if I copy the method code directly into my Spec, it is going to work, although its ruining all the structure I have on my test pages.
Anyone has experience running geb tests with Xvfb? Have you faced these issues?
All tests are passing when executed locally, and this not a data issue as the DB is always cleared
Also, without making any changes, the tests are behaving non-deterministic (on hudson) so the above exception is not always thrown. Without any changes at all, tests are sometimes successful and sometimes fail.
The description you gave seems to be the symptom of a flackey test-suite. we were facing this problem as well some time ago. A good starting point for this is this presentation (around min. 35) and the documentation about the wait stuff in geb.
If you think, it could have something to do with xvfb (where i have no experiences with), you could try to use phantomjs as the test-runner and check if it works correctly.

Can I see what happens in PhantomJS?

Sometimes, when running automated tests in PhantomJS using Cucumber for Rails 4, it would be really, really useful to actually sit in front of my screen, look in a window, and see exactly what the browser is doing.
There are times when your code is right and your test is right, but testing fails intermittently nonetheless. It's often because of a script, or an animation, or some CSS that gets in the way. But seeing a screenshot, drilling into a DOM inspector, or using the debugger is not enough to catch those edge cases.
Is there any way to have a window looking at what PhantomJS is doing in the background? It could be something in X Window, or running in a VNC Server, etc. Anything visual would greatly help with debugging, especially in with those finicky details.
I found a little program called PhantomVNC, but it's not telling me much on how it works. It looks like something just feeding a series of screenshots through VNC.
I tried PhantomJS and Capybara-WebKit, but neither of those headless browsers other a "head" option. Selenium-WebDriver seems complicated to set up and only seems to work with a full browser like Firefox, and that may cause more problems than it solves.
If you have any ideas, please let me know. Thank you in advance.
Not sure specifically about Cuc/Rails - but I specifically have a second chrome/webdriver Virtual Machine for this. I use PhantomJS specifically for the headless automatic acceptance testing, but when there is an issue with the tests and I'm trying to diagnose it can be helpful to view the browser live.
In other words, spin up a Ubuntu desktop VM, install chrome, install selenium server and the chrome webdriver. Log in to the VM, open chrome, make sure selenium is running, then configure your test suite to connect to the selenium webdriver service (usually port :4444) of this instance. Run your tests and you should be able to watch the chrome instance on the VM.

Resources