Save screenshot with Watir - ruby-on-rails

I am using Watir with Ruby on Rails.
I need to save screenshots of couple of pages using Watir. I have managed to get the page that I want to open to show in a browser, but cannot save the screenshot yet. Here's my code:
#browser =
folios_screenshot_path = Rails.root.join('screenshots/')
#page = Page.find(5)
cur_url = root_url + 'pages/' +
#browser.goto cur_url
#browser.div(:id => "page").wait_until_present
#browser.driver.save_screenshot(pagess_screenshot_path + '/' + + '.png')
In the page that I load, there's a div element with id 'page', and I am trying to make Watir wait till that element is loaded in the Watir browser. But in my main browser, I get the error Unable to load page within 10 seconds, and the screenshot doesn't get saved either. Any idea on what's wrong?

There are several watir gems: watir (drives IE on windows), safariwatir (drives safari on mac), watir-webdriver (drives all popular browsers except safari on all popular operating systems).
You are using safariwatir gem, but you are trying to save screenshot using watir-webdriver's driver.save_screenshot. I would suggest that you take a screen shot with Firefox.
Just install watir-webdriver gem and change
#browser =
#browser = :ff
For more information, read free version of my Watir book:

Try following browser class, it works for me.


How to use Waitr::Browser to show dynamic site content for Nokogiri to scrape

I created a scraper that finds jobs on various career sites.
On about 80% of the sites it works but I have a hard time making it work on the rest of the pages.
I thought the reason is that some of the pages have JavaScript on their page which generates dynamic content. And therefore the scraper fails. So I tried Watir as well as Mechanize, but still it does not work. is an example URL. Can anyone scrape it?
Here is my Watir scraper:
def watirscraper
require 'nokogiri'
require 'watir'
puts "starting newscraper"
opts = {
headless: true
# if (chrome_bin = ENV.fetch('GOOGLE_CHROME_SHIM', nil))
# opts.merge!( options: {binary: chrome_bin})
# end
browser = :chrome, opts
browser.goto self.career_url
company = self
job_url = self.career_url
html_doc = Nokogiri::HTML.parse(browser.html)
jobtitle = html_doc.css(":contains('Developer'):not(:has(:contains('Developer')))").map(&:text)
puts jobtitle
You'll need to wait for the page to stabilize before you can pull the content. Many client-side applications need at least a few seconds to boot up, some more.
One way to refactor this:
def wait_for_content(browser, selector)
html_doc = Nokogiri::HTML.parse(browser.html)
return if (html_doc.css(selector).first)
# May want to have a limit here so it doesn't spin forever
Where you can call it like:
wait_for_content(browser, ":contains('Developer'):not(:has(:contains('Developer')))")
jobtitle = ...
Or something along those lines.
First of all, you are using an isolated Nokogiri statement like Nokogiri::HTML.parse(browser.html) inside Watir code. When you use code like this, you can't call methods on Watir elements.
All you have to do here is install the watigiri gem which is an addon for Watir. Once you have installed it, you can the method text! on an element object which automatically uses Nokogiri internally. But this method doesn't wait for the page to be loaded completely,
If the page is being loaded while you are scraping it, you have to use text on the element.
Watir uses Nokogiri when you write:
b.element(name: "something").text!
Watir uses Selenium when you write:
b.element(name: "something").text
For more info see Watigiri.

How to scrape images from eBay and Amazon using XPath in Nokogiri from JSON

I'm trying to scrape images from websites using Nokogiri and XPath, so far with limited success. For a typical website whose HTML has img and src, I can use:
tmp2 = Nokogiri::HTML(open(site_url))
tmp2.xpath("//img/#src").each do |src| whatever
However, some sites like Amazon and eBay only trigger certain images with JavaScript. If I look at the code I can see the data in arrays. For example, from Amazon:
<script type="text/javascript">
P.when('jQuery', 'cf').execute(function($, cf){
P.when('A', 'jQuery', 'ImageBlockATF', 'cf').register('ImageBlockBTF', function(A, $, imageBlockATF, cf){
var data = {"indexToColor":[],"burjImageBlock":0,"isSwatchHoverConsistent":1,"heroFocalPoint":null,"visualDimensions":["color_name"],"productGroupID":"apparel_display_on_website","newVideoMissing":0,"useIV":0,"useClickZoom":null,"useChildVideos":0,"numColors":7,"logMetrics":0,"defaultColor":"initial","airyConfig":{"enableContinuousPlay":null,"installFlashButtonText":"Install Flash Player","contentTitle":null,"autoplayCutOffTimeSeconds":null,"ageGate":{"monthNames":["January","February","March","April","May","June","July","August","September","October","November","December"],"deniedPrompt":"We're sorry. You are not old enough to watch this video.","submitText":"Submit","prompt":"This video is not intended for all audiences. What date were you born?"},"videoAds":null,"videoUnsupportedPrompt":"Sorry, this video is unsupported on this browser.","desiredMode":null,"swfUrl":"","isAutoplayEnabled":null,"installFlashPrompt":"Adobe Flash Player is required to watch this video.","isLiveStream":null,"regionCode":"NA","contentId":null,"playbackErrorPrompt":"Sorry, an error has occurred while attempting video playback. Please try again later.","contentMinAge":null,"isForesterTrackingDisabled":null,"streamingUrls":null,"parentId":null,"foresterMetadataParams":{"client":"Dpx","requestId":"1MX7VHFRVAS6TWY64BXC","marketplaceId":"ATVPDKIKX0DER","session":"182-9511970-7757812","method":"Apparel.ImageBlock"},"jsUrl":""},"mainImageMaxSizes":null,"staticStrings":{"playVideo":"Click to play video","rollOverToZoom":"Roll over image to zoom in","images":"Images","video":"video","clickToZoom":"Click on image to zoom in","touchToZoom":"Touch the image to zoom in","videos":"Videos","close":"Close","pleaseSelect":"Please select","clickToExpand":"Click to open expanded view","allMedia":"All Media"},"notThumbnailClickImmersiveView":1,"gIsNewTwister":1,"title":"Threads 4 Thought Women's Tabitha Basic Tank Top","ivRepresentativeAsin":{"6":"B00T46V76W","4":"B00WM3O7ES","1":"B00T46YZES","3":"B00WM3NLPE","2":"B00T46VD16","5":"B00T46VGXQ"},"mainImageSizes":[[342,445],[385,500],[425,550],[466,606],[522,679]],"isQuickview":0,"ipadVideoSizes":[[340,444],[384,500]],"colorToAsin":{"Coral Dreams":{"asin":"B00T46V76W"},"Heather Grey":{"asin":"B00WM3NLPE"},"Black":{"asin":"B00T46YZES"},"White":{"asin":"B00T46VGXQ"},"Deep Blue Sea":{"asin":"B00T46VD16"},"Sea Glass":{"asin":"B00WM3O7ES"}},"thumbExperimentEnabledValue":1,"showLITBOnClick":0,"videoSizes":[[342,445],[384,500]],"stretchyGoodnessWidth":[1280,1440,1640,1800],"autoplayVideo":0,"hoverZoomIndicator":"","sitbReftag":"","useHoverZoom":1,"staticImages":{"zoomOut":"","hoverZoomIcon":"","zoomIn":"","zoomLensBackground":"","videoThumbIcon":",0,0,38,50_.gif","spinner":"","zoomInCur":"","videoSWFPath":"","arrow":"","zoomOutCur":""},"videos":[],"gPreferChildVideos":0,"altsOnLeft":1,"ivImageSetKeys":{"Coral Dreams":"6","Heather Grey":"3","Black":"1","initial":0,"White":"5","Deep Blue Sea":"2","Sea Glass":"4"},"useHoverZoomIpad":"","isUDP":1,"alwaysIncludeVideo":0,"widths":[1280,1440,1640,1800],"maxAlts":7,"useChromelessVideoPlayer":1,"mainImageHeightPartitions":null};
data["customerImages"] = eval('[]');
data["colorImages"] = {"Coral Dreams":[{"large":"","variant":"MAIN","hiRes":"","thumb":",50_.jpg","main":{"":["466","606"],"":["522","679"],"":["423","550"],"":["342","445"],"":["385","500"]}},{"large":"","variant":"BACK","hiRes":"","thumb":",50_.jpg","main":{"":["385","500"],"":["522","679"],"":["342","445"],"":["466","606"],"":["423","550"]}}],"Heather Grey":[{"large":"","variant":"MAIN","hiRes":"","thumb":",50_.jpg","main":{"":["466","606"],"":["385","500"],"":["423","550"],"":["522","679"],"":["342","445"]}},{"large":"","variant":"BACK","hiRes":"","thumb":",50_.jpg","main":{"":["342","445"],"":["423","550"],"":["385","500"],"":["522","679"],"":["466","606"]}}],"Black":[{"large":"","variant":"MAIN","hiRes":"","thumb":",50_.jpg","main":{"":["423","550"],"":["342","445"],"":["522","679"],"":["385","500"],"":["466","606"]}},{"large":"","variant":"BACK","hiRes":"","thumb":",50_.jpg","main":{"":["385","500"],"":["522","679"],"":["342","445"],"":["466","606"],"":["423","550"]}}],"White":[{"large":"","variant":"MAIN","hiRes":"","thumb":",50_.jpg","main":{"":["423","550"],"":["522","679"],"":["385","500"],"":["342","445"],"":["466","606"]}},{"large":"","variant":"BACK","hiRes":"","thumb":",50_.jpg","main":{"":["466","606"],"":["342","445"],"":["522","679"],"":["385","500"],"":["423","550"]}}],"Deep Blue Sea":[{"large":"","variant":"MAIN","hiRes":"","thumb":",50_.jpg","main":{"":["342","445"],"":["522","679"],"":["423","550"],"":["385","500"],"":["466","606"]}},{"large":"","variant":"BACK","hiRes":"","thumb":",50_.jpg","main":{"":["342","445"],"":["385","500"],"":["522","679"],"":["466","606"],"":["423","550"]}}],"Sea Glass":[{"large":"","variant":"MAIN","hiRes":"","thumb":",50_.jpg","main":{"":["342","445"],"":["522","679"],"":["466","606"],"":["385","500"],"":["423","550"]}},{"large":"","variant":"BACK","hiRes":"","thumb":",50_.jpg","main":{"":["385","500"],"":["342","445"],"":["522","679"],"":["466","606"],"":["423","550"]}}]};
data["heroImage"] = {};
data["landingAsinColor"] = 'Coral Dreams';
data["shouldApplyResizeFix"] = false;
return data;
The filenames I want to grab don't have src (i.e. In this case, the array is called data["colorImages"]. But I can't hard-code anything because the same thing happens on eBay.
The filenames I need here are in enImgCarousel.
On a side note, when I use the following JavaScript bookmarklet for each URL to get images, I'm able to get the correct images:
for (b=0;b<document.images.length;b++){
a+='<img src='+document.images[b].src+'><br>'};
alert('No images!')
Back to Nokogiri and XPath, I've also tried:
tmp2.xpath("//img").each do |src|...
tmp2.xpath("html//img").each do |src|
Any ideas how I should do this or which direction to go in?
This is alternative way to solve what you want; you can use Capybara and Poltergeist.
I assume you don't have to dive into JavaScript with this solution.
If you scrape, I recommend that you consider Capybara with Poltergeist, you can find many sources to reference.
This is the code I tried:
require 'capybara'
require 'capybara/dsl'
require 'capybara/poltergeist'
Capybara.register_driver :poltergeist_debug do |app|, inspector: true)
Capybara.javascript_driver = :poltergeist_debug
Capybara.current_driver = :poltergeist_debug
# Amazon Case
doc_amazon = Nokogiri::HTML.parse(page.html)
doc_amazon.xpath("//img/#src").each do |src|
p src.value
#ebay case
doc_ebay = Nokogiri::HTML.parse(page.html)
doc_ebay.xpath("//img/#src").each do |src|
p src.value
If you want to dig into it:
# => ""
# => ""
Are you trying to generate a database of competitors items with pricing, etc.?
Are you trying to grab entire categories or individual sellers?
The reason why I ask is you can get an RSS feed of items each seller lists if they have turned that feature on. This way, you do not have to waste time scraping a page when you can get the central data from an RSS feed.
When parsing webpages, depending upon where you are in the webpage (you mentioned carousel) the indices you are encountering are from the stash of thumbnails representing the larger images.
I recommend looking at the eBay API and the Amazon API and finding the RSS feeds for the sellers first.
As far as getting past any Javascript issues, the webpage loads rotating slideshows and carousels dynamically, so you will have to use Mechanize (as RAJ suggested above) or Beautiful Soup or Selenium to get fully rendered web pages in which all images are in a scrapable state.
Feel free to post your source if there is anything else I can help with.
Sorry, as I am posting the answer from mobile phone, I can't write full code right away, however, I can give you a way. You should use Mechanize with selenium-webdriver & watir instead of only Nokogiri.
Using Mechanize, you will be able to handle elements coming from JavaScript. You can mock the actual moves on browser i.e. you can code for clicking on links/buttons, you can wait for image load and then can scrape it. And all this can be done using Mechanize very easily.

Having Difficulty Using Nokogiri to Pull <li> Element

I'm trying to develop a scraper to pull in content from NewEgg. I installed Nokogiri on Ruby on Rails and as far as I can tell it's working. However, I'm having difficulty pulling in a specific element that holds the pricing information and I'm not entirely sure why it isn't working. The code below should look for the list class "price-current " and put every instance of that code. Instead, I get no results.
require 'rubygems'
require 'open-uri'
require 'nokogiri'
page = Nokogiri::HTML(open(""))
page.xpath('//li[#class="price-current "]').each do |item|
puts item
I've been tearing my hair out for the last two hours trying to figure this out with no success. Any insight would be much appreciated!
EDIT: So, #MarkReed was right about the information I'm looking for being generated by JS. Looking through the code, there appears to be a lot of detail that's in a hash. Is it possible to use RegEx in Nokogiri to pull that information?
var utag_data = {
page_breadcrumb:'Home > Computer Hardware > Memory > Desktop Memory > Team Group > Item#:N82E16820313436',
page_tab_name:'Computer Hardware',
product_subcategory_name:['Desktop Memory'],
product_title:['Team Zeus Yellow 8GB (2 x 4GB) 240-Pin DDR3 SDRAM DDR3 1600 (PC3 12800) Desktop Memory Model TZYD38G1600HC9DC01'],
product_manufacture:['Team Group'],
search_scope:jQuery('#haQuickSearchStore option:selected').text(),
You appear to be searching for DOM elements which are dynamically added by Javascript in the browser after the page loads. They do not exist in the HTML originally fetched from the URL, and so are not accessible to Nokogiri.

Feedzirra is not updating the article content

I'm using Feedzirra to get some rss content. This content can be updated by a third person. In my controller I have the following code:
my_feed = Feedzirra::Feed.fetch_and_parse RSS_FEED
#my_feed_text = my_feed.entries.first.sanitize! if ( my_feed && my_feed != 0 )
Eveything works fine until someone update the content of the only article in the my_category category...I'm getting the old content again and again. I try to put the RSS_FEED url in a browser and I get the new content...but in my application (also tried in rails console) I keep getting the old content.
Any hint?
Apparently the problem was with the refresh of wordpress rss. Today is working fine, the change is visible within 5-10 minutes.

I can't get rails plugin wicked_pdf to work

I wanted to create PDFs for my rails application using wkhtml2pdf and wicked_pdf.
I downloaded and extracted wkhtml2pdf beta 4 and placed it in /usr/local/bin/wkhtml2pdf
I tried running it on a web site and it gave a nice result.
In my rails application (2.3.4) I installed wicked_pdf:
script/plugin install git://
script/generate wicked_pdf
Everything seemed to be ok.
inside script/console I run the following - (with the following output)
wp =
=># WickedPdf:0xb62f2c70 #exe_path="/usr/local/bin/wkhtmltopdf"
HTML_DOCUMENT = "<html><body>Hello World</body></html>"
=> "<html><body>Hello World</body></html>"
pdf = wp.pdf_from_string HTML_DOCUMENT
=> "/usr/local/bin/wkhtmltopdf - - -q"
=> "\n\n\n\n\n\n\n\n\n\n"
of course this isn't good. According to the test the result of my last command should start with "%pdf-1.4"
Any idea what I can do?
Having the same problem. Removed the -q option from the wicked_pdf.rb file on line 19 and then was able to get the proper string on the console.
=> "%PDF-1.4\n1 0 obj\n<<\n/Title ...
This also seems to have solved other problems. The PDF still didn't render correctly when using it from the web site - embedded font issue - on to the next issue now.
Hopefully this will work for you.
