Need automated Rails gem alternative to Mechanize for non-testing purposes - ruby-on-rails

I am writing a script that automates the completion a web form in my Rails app using the form entries given on the client side. However, this site uses Javascript, and so Mechanize is out of the question.
However, everything I've read about Mechanize's alternatives -- Watir Webdriver, Selenium, Capybara Webkit -- all focus seemingly exclusively on testing. However, my Rails web app would take in form entries from users, and then enter them using one of these tools into another website. For example, I would need to upload an image (ie :image) and enter in different text (ie :city) into form fields as part of this app, which would take the entries and enter them into the website.
So my first question: Can I use any Mechanize alternatives for something besides testing? And second: Can anyone refer to code examples on the web for non-testing usages of any of the above automators?

I don't have any concrete examples of javascript-enabled alternatives used in non-testing contexts, but I do have a suggestion: if you know the website that you will be submitting the form info to, it's probably better to find out what the javascript is doing and mimic that instead. Dig into the site's javascript code and figure out what type of data is being submitted to what URL, and just mimic that using standard HTTP operations -- skip the javascript rendering/interaction part altogether.
There is a lot of overhead incurred when rendering a page with javascript, which is why these tools (Watir, Selenium, Capybara and the like) are not generally used in actual client-facing application contexts.

Watir has a headless gem. You can give it a try watir headless

You should be able to use watir-webdriver to take the data (image, city) from one site and upload to other site. Below is brief code sample to help you get started.
require 'watir-webdriver'
$browser1 = Watir::Browser.new : chrome #You can use phantomjs for headless http://phantomjs.org/
$browser1.goto http://website1.com
city_field = $browser1.text_field (:id => 'city')
city = city_field.value
$browser2 = Watir::Browser.new : chrome
$browser2.goto http://website2.com
city_field_site2 = $browser2.text_field (:id => 'city')
city_field_site2.set city

Related

Several PhantomJS calls in a RoR application

I have a RoR application that given a set of N URLs to parse, will perform N shell calls for a given PhantomJS (actually is a CasperJS) script.
So,
Right now I have something like this:
urls_to_parse = ['first.html', 'second.html',...]
urls_to_parse.each do |url|
parse_results = \`casperjs parse_urls.js '#{url}'\`
end
I have never done this before. Launching shell scripts from a RoR/Ruby application, so I am wondering if this is a good approach and what alternative may I have. So, why I use PhantomJS in combination with RoR?
I basically have an API (RoR app) that keeps receiving urls that need to be parsed. They need to be parsed in a headless browser manner. The page actually needs to be rendered (that's why I don't use Nokogiri or any other HTML parser).
I am concerned about putting this up to production performance wise, and before going forward I would like to know if I am doing this correctly, or I can do it in a better way.
It's possible I thought about doing the same thing, but even with a headless browser I would be really concerned about the speed and bandwidth your server is going to need to have. I use capser in conjuction with Python and it works very well for me. I read stdout spit back from firing the casper scripts, but I don't parse and scrape on the fly like you're talking about doing. I would imagine it's okay, but ideally you already have a cached database of results when people search. Maybe if it is a very very basic search you'll be okay, but I don't know.

Submitting dynamic forms on another website

I'm trying to submit input to the form, and parse the results in a RoR app. I've tried using mechanize, but it has some trouble with the way the page dynamically updates the results. It doesn't help that most fields are hidden.
Is there anyway to get mechanize to do what I'm looking for, or are there any alternatives to mechanize which I can use?
So whenever I want to do something like this, I go with the gem selenium-webdriver. It spawns a real browser (supports all major brands) and lets you control it with ruby code. You can do almost everything a real user could do. In addition, you have access to the (rendered) dom, so javascript generated content is not a problem.
Performance is much slower than with pure library clients, so its not a good fit for use in a web request cycle.
http://rubygems.org/gems/selenium-webdriver

How to use Cucumber to test non-Ruby, non-Rack API's

I use cucumber for lots of things. I really like it as a BDD environment.
So I'd like to use it as an external tool to test an API. I'd like to do things like:
Scenario: Hit api /info path and get info back
When I visit the API path '/info'
Then I should see the following text "Here's info on the API"
or something similar. I mainly want to treat the API as a black box and test only inputs and outputs. I don't plan on inspecting anything inside the API.
Most of the libraries I've looked at that work with Cucumber (for example Capybara) seem to be designed around Rack-based applications. I'd like something similar to that but with no dependency on Rack.
What gems, if any, exist that have no rack dependencies. Or is there a way to use Capybara to test an API that's on a remote server?
I wouldn't use Capybara to test a remote API because Capybara is made for testing applications is used for testing applications with a HTML UI (as Aslak points out in the comments).
Instead, I would use Cucumber* in conjunction with something like HTTParty which would be the tool used to make the HTTP requests and parse them neatly. Here's an idea:
When /^I visit the API path '(.*?)'/ do |path|
#result = HTTParty.get("http://theapi.com/#{path}")
end
Then /^I should see the following result:$/ do |result|
#result.should == result
end
The final step here you would use like this:
Then I should see the following result:
"""
{ success: true }
"""
* I would actually use RSpec personally, I find the syntax less clumsy.
I've been using cucumber against a Drupal application for some time now. It's working out well.
This helped me set up capybara with selenium
https://github.com/thuss/standalone-cucumber
If you want to use mechanize, it's a bit buggy. I had to use 0.3.0-rc3 as there were some issues following redirects etc. There are still a few issues with submitting forms with field names containing "[]" characters. I can't quite remember as another person on my team discovered that bug.

sample Rails Application that includes email support page with captcha

What's the quickest / easiest starting point for a simple Rails application that has a main page, and an email "contact us" page, with captcha support? Is there a popular base Rails app that I could download that would already have this functionality as a starting point?
(e.g. for just a basic informational type web site, but with the abily for the user to send support requests back to support, but via a web page with captcha)
thanks
IMHO you shouldn't use Rails, nor any other Framework for a task like that. For a simple contact form you could put a standalone php page plus some static html pages on your server and you're done.
If you doesn't know Rails yet (or any other web framework written in any language) it would be a pain to setup a such structure only to display a contact form. Is like to take a gun to kill a fly.
BTW to come to your question, I don't know any project which do what you're asking for, maybe you want to try to do that by yourself, it's pretty simple, what you need is ActionMailer and a captcha plugin
Just my two cents.

How do I get content from a website using Ruby / Rails?

I want to copy some specific content from a website using ruby/rails.
The content I need is inside a marquee html tag, divided by divs.
How can I get access to this content using ruby?
To be more precise - I want to use some kind of ruby gui (Preferably shoes).
How do I do it?
This isn't really a Rails question. It's something you'd do using Ruby, then possibly display using Rails, or Sinatra or Padrino - pick your poison.
There are several different HTTP clients you can use:
Open-URI comes with Ruby and is the easiest. Net::HTTP comes with Ruby and is the standard toolbox, but it's lower-level so you'd have to do more work. HTTPClient and Typhoeus+Hydra are capable of threading and have both high-level and low-level interfaces.
I recommend using Nokogiri to parse the returned HTML. It's very full-featured and robust.
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.example.com'))
puts doc.to_html
If you need to navigate through login screens or fill in forms before you get to the page you need to parse, then I'd recommend looking at Mechanize. It relies on Nokogiri internally so you can ask it for a Nokogiri document and parse away once Mechanize retrieves the desired URL.
If you need to deal with Dynamic HTML, then look into the various WATIR tools. They drive various web browsers then let you access the content as seen by the browser.
Once you have the content or data you want, you can "repurpose" it into text inside a Rails page.
If I'm to understand correctly, you want a GUI interface to a website scraper. If that's so, you might have to build one yourself.
The easiest way to scrape a website is using nokogiri or mechanize gems. Basically, you will give those libraries the address of the website and then use their XPath capabilities to select the text out of the DOM.
https://github.com/sparklemotion/nokogiri
https://github.com/sparklemotion/mechanize (for the documentation)

Resources