Rails + Amazon API + Integration - ruby-on-rails

I'm a rails newb - in a little over my head and could use some help.
I have an existing rails app, and I'm trying to integrate the Amazon Products API with the gem "ruby-aaws"....i.e., place items inside a model, show them in the view, etc.
I've never worked with an external API before, so I'm not sure where to begin to start integration. Any help at all is much appreciated!
Here is some of the code that I've used to pull data with the API:
require 'amazon/aws'
require 'amazon/aws/search'
include Amazon::AWS
include Amazon::AWS::Search
is = ItemSearch.new( 'Watches', { 'Keywords' => 'Gucci' } )
rg = ResponseGroup.new( 'Large' )
req = Request.new
req.locale = 'us'
resp = req.search( is, rg )
items = resp.item_search_response[0].items[0].item
# Available properties for first item:
#
puts items[0].properties
items.each do |item|
attribs = item.item_attributes[0]
puts attribs.label
if attribs.list_price
puts attribs.title, attribs.list_price[0].formatted_price, item.medium_image, ''
end
end

I am also a newbie and trying to do something similar. I found this example on GitHub that looks really promising.
https://github.com/hundredwatt/Amazon-Product-Search-Example
But there are also some great related questions here that have answers for you:
Ruby Amazon book search
Good luck!

Related

Is there a way to parse external RSS Feeds with Jekyll?

I have several websites and would like to show content via RSS like headlines in a Jekyll project. Is it possible to parse external rss feeds with jekyll and use them?
Yes. You'd either want to create a plugin to fetch and parse the external feeds during jekyll build or, plan B, you could always fetch and parse the feeds client-side with AJAX. Since you asked for a Jekyll answer, here's a rough approximation of the former approach:
# Runs during jekyll build
class RssFeedCollector < Generator
safe true
priority :high
def generate(site)
# TODO: Insert code here to fetch RSS feeds
rss_item_coll = null;
# Create a new on-the-fly Jekyll collection called "external_feed"
jekyll_coll = Jekyll::Collection.new(site, 'external_feed')
site.collections['external_feed'] = jekyll_coll
# Add fake virtual documents to the collection
rss_item_coll.each do |item|
title = item[:title]
content = item[:content]
guid = item[:guid]
path = "_rss/" + guid + ".md"
path = site.in_source_dir(path)
doc = Jekyll::Document.new(path, { :site => site, :collection => jekyll_coll })
doc.data['title'] = title;
doc.data['feed_content'] = content;
jekyll_coll.docs << doc
end
end
end
You can then access the collection in your template like so:
{% for item in site.collections['external_feed'].docs %}
<h2>{{ item.title }}</h2>
<p>{{ item.feed_content }}</p>
{% endfor %}
There are a lot of possible variations on the theme but that's the idea.
Well, I don't think Jekyll per se can do that... because Jekyll is more of a CMS. However, Jekyll is written in Ruby can I believe you can easily run ruby/rake tasks with Jekyll (that's even probably what's being used when you build a Jekyll site), so I believe you should probably do that as a ruby script.

In need of an explanation of Web scraping with Nokogiri in Rails

I am utterly confused and lost with Nokogiri and web scraping in Rails. I need someone to explain to me how I can get article titles from a web site to list in a view in my Rails application. I can manage to retrieve the data in irb however I have no clue how I can get that same data to be displayed in a view I made.
I have watched a number of tutorials and read documentation and one thing that they do that confuses me the most is when they require nokogiri or open-uri in a their example ruby file what directory is that ruby file supposed to be placed in? Also is that file associated with any controller for it to be displayed in the particular view that I made?
I hope I am explaining my issue as clear as possible without any confusion as I am not trying to confuse myself anymore that i am in my explanation.
See, what I am trying to do is create an application where the user can register and sign in, after they are signed in they are redirected to a page with 3 links. Those links being Audi, BMW and Mercedes-Benz and depending on which link is clicked the user will be then directed to another page where they are returned back a list of articles that mention their desired choice.
I hope this explanation was helpful and I really hope someone can offer to help or give me some kind of documentation that will benefit me.
Thank you!
This is what I did in irb:
2.1.1 :001 > require 'rubygems'
=> false
2.1.1 :002 > require 'nokogiri'
=> true
2.1.1 :003 > require 'open-uri'
=> true
2.1.1 :004 > page = Nokogiri::HTML(open("http://www.dtm.com/de/News/Archiv/index.html"))
I then got this returned:
=> #<Nokogiri::HTML::Document:0x814e3b40 name="document" children=[#<Nokogiri::XML::DTD:0x814e37f8 name="HTML">, #<Nokogiri::XML::Element:0x814e358c name="html" children=[#<Nokogiri::XML::Text:0x814e3384 "\r\n">, #<Nokogiri::XML::Element:0x814e32d0 name="head" children=[#<Nokogiri::XML::Text:0x814e30f0 "\r\n">, #<Nokogiri::XML::Element:0x814e3028 name="title" children=[#<Nokogiri::XML::Text:0x814e2e48 "DTM | Newsarchiv">]>, #<Nokogiri::XML::Text:0x814e2c90 "\r\n">, #<Nokogiri::XML::Element:0x814e2bc8 name="meta" attributes=[#<Nokogiri::XML::Attr:0x814e2b64 name="charset" value="utf-8">]>, #<Nokogiri::XML::Text:0x814e2718 "\r\n">, #<Nokogiri::XML::Element:0x814e2664 name="meta" ...
(I got more but just put up a few lines of what was returned) I am assuming this is the raw data from the page.
I then put:
2.1.1 :008 > puts page
Which returned back the raw HTML content.
Finally I entered:
2.1.1 :014 > page.css("a")
Which returned back the all the links on the page.
I am hoping to help you with a real world example. Lets get some data from Reuters for example.
In your console try this:
# require your tools make sure you have gem install nokogiri
pry(main)> require 'nokogiri'
pry(main)> require 'open-uri'
# set the url
pry(main)> url = "http://www.reuters.com/finance/stocks/overview?symbol=0005.HK"
# load and assign to a variable
pry(main)> doc = Nokogiri::HTML(open(url))
# take a piece of the site that has an element style .sectionQuote you can use ids also
pry(main)> quote = doc.css(".sectionQuote")
Now if you have a look in quote you will see you will have Nokogiri elements. Lets have a look inside:
pry(main)> quote.size
=> 6
pry(main)> quote.first
=> #(Element:0x43ff468 {
name = "div",
attributes = [ #(Attr:0x43ff404 { name = "class", value = "sectionQuote nasdaqChange" })],
children = [
#(Text "\n\t\t\t"),
#(Element:0x43fef18 {
name = "div",
attributes = [ #(Attr:0x43feeb4 { name = "class", value = "sectionQuoteDetail" })],
children = [
#(Text "\n\t\t\t\t"),
#(Element:0x43fe9c8 { name = "span", attributes = [ #(Attr:0x43fe964 { name = "class", value = "nasdaqChangeHeader" })], children = [ #(Text "0005.HK on Hong Kong Stock")] }),
.....
}),
#(Text "\n\t\t")]
})
You can see that nokogiri has essentially encapsulated each DOM element, so that you can search and access it quickly.
if you want to just simply display this div element you can:
pry(main)> quote.first.to_html
=> "<div class=\"sectionQuote nasdaqChange\">\n\t\t\t<div class=\"sectionQuoteDetail\">\n\t\t\t\t<span class=\"nasdaqChangeHeader\">0005.HK on Hong Kong Stock</span>\n\t\t\t\t<br class=\"clear\"><br class=\"clear\">\n\t\t\t\t<span style=\"font-size: 23px;\">\n\t\t\t\t82.85</span><span>HKD</span><br>\n\t\t\t\t<span class=\"nasdaqChangeTime\">14 Aug 2014</span>\n\t\t\t</div>\n\t\t</div>"
and it is possible to use it directly in the view of a rails application.
if you want to be more specific and take individual components and traverse by looping the quote variable for elements one level down, in this instance you can:
pry(main)> quote.each{|p| puts p.inspect}
Or be very specific and get the value of an element ie the name of the stock in our example:
pry(main)> quote.at_css(".nasdaqChangeHeader").content
=> "0005.HK on Hong Kong Stock"
This is a very useful link: http://nokogiri.org/tutorials/searching_a_xml_html_document.html
Really hope this helps
PS: A tip for looking inside objects
(http://ruby-doc.org/core-2.1.1/Object.html#method-i-inspect)
puts quote.inspect
First, you can put nokogiri and openuri in the gemfile of your rails app, with that in place you don't need to require these libraries.
You flow to scrape the sites should be:
# put this code on your controller
web_site = params[:web_site] # could be http://www.bmw.com/com/en/
#doc = Nokogiri::HTML(open(web_site))
#then you can iterate over the document in your view
<% #doc.css('.standardTeaser').each do |teaser_bmw| %>
<p>teaser_bmw.css('.headline').text </p>
#other content of teaser you can search here
<% end %>
So, to scrape the web site you need to fetch the html from the web site and find what content you want to grab.
If you know some basics of css selector it will be very easy to do. Me example doesn't take in account if you want to save the data in a database... but if you want, you just need to create a table with the field you need to save and than create a record after parsing the html.
Is that made sense to you?

How to read the Rails API

I'm having a difficult time understanding the Rails API. I am trying to figure out a way to understand what I can call from certain points inside Rails, such as when I'm in a controller, so I wrote something to tell me all the methods that are available sorted by what Module/Class they fall under:
last_sig = ""
self.methods.each do |method|
#i_am = self.method(method).owner
#puts i_am.class
#places.push(self.method(method).owner)
m = self.method(method)
sig = "#{m.owner.class}: #{m.owner}"
if sig != last_sig
last_sig = sig
puts sig
end
puts " #{method}"
end
As an example, I find out (just using this as an easy example) that I can use the render() method and it is located at ActionController::Instrumentation, so then I look at the render() function there and it says:
render(*args)
# File actionpack/lib/action_controller/metal/instrumentation.rb, line 38
def render(*args)
render_output = nil
self.view_runtime = cleanup_view_runtime do
Benchmark.ms { render_output = super }
end
render_output
end
That is all is says, I don't understand how from this I could understand how it works, then I do some more searching and by "luck" I discover that it is documented in ActionView, and I wonder how I was able to know this? Anyway, any tips on how to read the API would be appreciated- It seems like many of the things in the API are not documented for a User, and I don't know if they are for the User or for the developers of Rails- I'm used to using a documentation like jQuery which seems much easier to Discover functionality by using-

Nokogiri Timeout::Error when scraping own site

Nokogiri works fine for me in the console, but if I put it anywhere... Model, View, or Controller, it times out.
I'd like to use it 1 of 2 ways...
Controller
def show
#design = Design.find(params[:id])
doc = Nokogiri::HTML(open(design_url(#design)))
images = doc.css('.well img') ? doc.css('.well img').map{ |i| i['src'] } : []
end
or...
Model
def first_image
doc = Nokogiri::HTML(open("http://localhost:3000/blog/#{self.id}"))
image = doc.css('.well img')[0] ? doc.css('.well img')[0]['src'] : nil
self.update_attribute(:photo_url, image)
end
Both result in a timeout, though they work perfectly in the console.
When you run your Nokogiri code from the console, you're referencing your development server at localhost:3000. Thus, there are two instances running: one making the call (your console) and one answering the call (your server)
When you run it from within your app, you are referencing the app itself, which is causing an infinite loop since there is no available resource to respond to your call (that resource is the one making the call!). So you would need to be running multiple instances with something like Unicorn (or simply another localhost instance at a different port), and you would need at least one of those instances to be free to answer the Nokogiri request.
If you plan to run this in production, just know that this setup will require an available resource to answer the Nokogiri request, so you're essentially tying up 2 instances with each call. So if you have 4 instances and all 4 happen to make the call at the same time, your whole application is screwed. You'll probably experience pretty severe degradation with only 1 or 2 calls at a time as well...
Im not sure what default value of timeout.
But you can specify some timeout value like below.
require 'net/http'
http = Net::HTTP.new('localhost')
http.open_timeout = 100
http.read_timeout = 100
Nokogiri.parse(http.get("/blog/#{self.id}").body)
Finally you can find what is the problem as you can control timeout value.
So, with tyler's advice I dug into what I was doing a bit more. Because of the disconnect that ckeditor has with the images, due to carrierwave and S3, I can't get any info direct from the uploader (at least it seems that way to me).
Instead, I'm sticking with nokogiri, and it's working wonderfully. I realized what I was actually doing with the open() command, and it was completely unnecessary. Nokogiri parses HTML. I can give it HTML in for form of #design.content! Duh, on my part.
So, this is how I'm scraping my own site, to get the images associated with a blog entry:
designs_controller.rb
def create
params[:design][:photo_url] = Nokogiri::HTML(params[:design][:content]).css('img').map{ |i| i['src']}[0]
#design = Design.new(params[:design])
if #design.save
flash[:success] = "Design created"
redirect_to designs_url
else
render 'designs/new'
end
end
def show
#design = Design.find(params[:id])
#categories = #design.categories
#tags = #categories.map {|c| c.name}
#related = Design.joins(:categories).where('categories.name' => #tags).reject {|d| d.id == #design.id}.uniq
set_meta_tags og: {
title: #design.name,
type: 'article',
url: design_url(#design),
image: Nokogiri::HTML(#design.content).css('img').map{ |i| i['src']},
article: {
published_time: #design.published_at.to_datetime,
modified_time: #design.updated_at.to_datetime,
author: 'Alphabetic Design',
section: 'Designs',
tag: #tags
}
}
end
The Update action has the same code for Nokogiri as the Create action.
Seems kind of obvious now that I'm looking at it, lol. I dwelled on this for longer than I'd like to admit...

ActionController::Base.param_parsers Alternative

I have found several websites pointing to using the following code to add support for custom parameter formats:
ActionController::Base.param_parsers[Mime::PLIST] = lambda do |body|
str = StringIO.new(body)
plist = CFPropertyList::List.new({:data => str.string})
CFPropertyList.native_types(plist.value)
end
This one here is for the Apple plist format, which is what I am looking to do. However, using Rails 3.2.1, The dev server won't start, saying that param_parsers is undefined. I cannot seam to find any documentation for it being deprecated or any alternative to use, just that it is indeed included in the 2.x documentation and not the 3.x documentation.
Is there any other way in Rails 3 to support custom parameter formats in POST and PUT requests?
The params parsing moved to a Rack middleware. It is now part of ActionDispatch.
To register new parsers, you can either redeclare the use of the middleware like so:
MyRailsApp::Application.config.middleware.delete "ActionDispatch::ParamsParser"
MyRailsApp::Application.config.middleware.use(ActionDispatch::ParamsParser, {
Mime::PLIST => lambda do |body|
str = StringIO.new(body)
plist = CFPropertyList::List.new({:data => str.string})
CFPropertyList.native_types(plist.value)
end
})
or you can change the constant containing the default parsers like so
ActionDispatch::ParamsParser::DEFAULT_PARSERS[Mime::PLIST] = lambda do |body|
str = StringIO.new(body)
plist = CFPropertyList::List.new({:data => str.string})
CFPropertyList.native_types(plist.value)
end
The first variant is probably the cleanest. But you need to be aware that the last one to replace the middleware declaration wins there.

Resources