I have a simple rails application written for scraping a web page. The controller calls the scraper utility in which I am using firefox in headless mode using watir-webdriver. The application works and returns the results properly. The way I would call the application is something like this:
http://somedomain.com:3000/scrapers.json?session=1349426645_562&l=test
and it returns a json string.
It takes about 15 seconds for the scraper to complete. While one request is in progress, when I try to launch an other request, the request is queued until the previous one completes. I am not sure if its a limitation on the rails application side or on watir-webdriver or headless gem using Xvfb.
Any pointers would help.
Thanks,
Sridhar
There are better libraries for screen scraping such as mechanize. In fact, there are some applications made just for scraping.
I found out that I can use the CGI module to do my work. Since I didnt need a rails application, I used ruby CGI to call the ruby script and pass parameters through URL. I can also launch multiple instances in a non-blocking fashion using this approach. I had to re-write my controller as a stand alone ruby program in order to do this.
Related
I want to use Cypress as a testing tool with the cypress-on-rails plugin.
However during a cypress scenario I want to enable/wrap all rails backend requests with vcr so all requests are captured and replayed.
Typically you would tag a rspec or cucumber file that essentially wraps an entire block of code to perform this. The nature of cypress is that it's completely client/javascript driven and a scenario plays out with multiple ajax requests from the client.
The insert/eject methods on which the use_cassette method is built are publicly available.
https://www.rubydoc.info/gems/vcr/VCR#insert_cassette-instance_method
VCR.insert_cassette('my_cassette')
# do stuff
VCR.eject_cassette
Many things on the web seem to suggest that VCR can be used with Capybara.
I have three problems.
This doesn't make much sense to me, because the test driver and the application code don't share memory.
I'm not finding full recipes on how to set this up.
I'm finding bits and pieces of how people have set this up, but it's outside of the context of rails 5.1, which does the capybara setup behind the scenes.
How do I configure a Rails 5.1 app, Capybara, and VCR to work together for system tests?
(My headless browser is phantomjs, driven by poltergeist. But I don't need to intercept requests from the browser, only server-side requests. If I needed to intercept from the browser I would probably use a full http proxy server, like puffing-billy.)
I'm assuming you mean Rails 5.1 since Rails 5 doesn't have system tests.
The copy of the application Capybara runs for testing is run in a separate thread, not a separate process. This means they do have access to the same memory, and loaded classes
There is nothing special required for configuring WebMock or VCR beyond what their READMEs already provide
The setup of Capybara and how Rails handles it is irrelevant to the configuration of WebMock or VCR. Additionally, even when using Rails 5.1 system tests all of the normal Capybara configuration options are still usable.
That all being said, there are a couple of things to be aware of here. Firstly, WebMock/VCR can only deal with requests made by your app (not from the browser which you stated you don't need) and it's generally better to use faked services (if possible) rather than WebMock/VCR when doing end to end system tests since there is less interference with the code under test.
If this doesn't answer your issues, post a question with a specific issue you're having, the code that's causing your issue, and the error you're getting.
I would like to use PhantomJS as part of my main application lifecycle to take screenshots of a remote URL submitted by the user.
I'm familiar with using Poltergeist in conjunction with Capybara/Rspec. But how would I go about initializing the page object manually?
To initialize a capybara session in your app you can just do something like
session = Capybara::Session.new(:poltergeist)
( as documented here) and then rather than using page just call Capybara methods on session. One thing to note is that if you're going to test the app with Capybara too you will probably want to register a separate driver for the app and testing - https://github.com/jnicklas/capybara#configuring-and-adding-drivers . Also since Capybaras config is not thread-safe changing any of Capybaras setting would potentially affect both the test session and the in app session.
A far better solution may be to setup a separate Node.js service which runs phantom.js - in fact there are quite a few projects that provide a ready made screen capture webserver / console command.
Capybara is a testing tool and invoking a javascript runtime via ruby adds tons of overhead as well as not being thread-safe. The fact that it is not designed to be run in production is also a pretty big concern.
Instead you would simply call your screenshot service via HTTP or by running a shell command from Ruby.
I really like phantomjs in Rails app.
My suggest are using:
watir (https://github.com/watir/watir)
phantomjs (http://phantomjs.org/download.html)
You can take a screen shot very easy by using follow this: http://watir.github.io/docs/screenshots/
And if you want to use Page, i thinks you should see PageObject in here: https://github.com/watir/watir/wiki/Page-Objects
I am building a pool of PhantomJS instances, and I am trying to make it so that each instance is autonomous (it fetches next job to be done).
My concern is to choose between these two:
Right now I have a Rails app that can give to PhantomJS which URL needs to be parsed next. So, I could do an HTTP get call from PhantomJS to my Rails app and Rails would respond with a URL that is pending to be done (most likely Rails would get that from a queue).
I am thinking on building a stand alone Redis server that PhantomJS would access via Webdis, so Rails would push the jobs there, and PhantomJS instances would fetch from it directly.
I am trying to think what would be the correct decision in terms of performance: PhantomJS hitting the Rails server (so Rails needs to get the job from the queue and send it to PhantomJS), or just making PhantomJS to access a Redis server directly.
Maybe I need more info but why isn't the performance answer obvious? Phantom JS hitting the Redis server directly means less stuff to go through.
I'd consider developing whatever is easier to maintain. What's the ballpark req/minute? What sort of company (how funded / resource-strapped are you)?
There's also more OOTB solutions like IronMQ that may ease the pain
I have an edge case, although a very customer visible one, where Tomcat begins processing requests before all dependencies are properly loaded for a Ruby on Rails stack running underneath JRuby.
Once Tomcat is restarted, there is something similar to the following happening:
undefined method `utc_offset' for nil:NilClass
[RAILS_ROOT]/gems/gems/activesupport-2.3.8/lib/active_support/values/time_zone.rb:206:in `<=>'
This happens when the following code is invoked on one of my services:
#timezones = ActiveSupport::TimeZone.all
If you wait a few more seconds and refresh the requesting page, it'll load no problem.
Is there a way to ensure that Tomcat does not start processing these requests until the entire stack, ActiveSupport, ActiveRecord etc is loaded? Has anyone experienced any similar symptoms?
This sounds like a possible bug in JRuby-Rack, assuming that's what you're using to run your Rails app in Tomcat. JRuby-Rack is supposed to load the entirety of config/environment.rb before it will process requests, so I'm not sure how this would happen to you, but perhaps I've overlooked something. Could you share some more data (or maybe code or an app that reproduces the issue) about how you induced the error at http://kenai.com/jira/browse/JRUBY_RACK or http://bugs.jruby.org?
I'm not sure if there is something like that in Tomcat directly, but you can write a javax.servlet.Filter that will intercept all requests, and deny them until your application is loaded. When application is fully loaded, you ask filter to stop denying requests. (This isn't pure Ruby solution though).