Nokogiri Scraping Multiple URLS on Same Domain - ruby-on-rails

I am trying to scrape multiple urls with Nokogiri that are on the same domain. For example cltampa.com/potlikker and cltampa.com/artbreaker. For both urls I am looking for the same two elements, the main header image for every blog post and the url of the headline. I have code that is working for grabbing images, but it is very inefficient and definitely not DRY. I need to grab the associated headline hrefs as well, so I can then wrap the images with them in my view.
My controller currently looks like this
def index
doc = Nokogiri::HTML(open('http://cltampa.com/blogs/potlikker'))
potlikker = doc.xpath('//*[#class="contentImageCenter"]/img/#src')
doc = Nokogiri::HTML(open('http://cltampa.com/blogs/artbreaker'))
artbreaker = doc.xpath('//*[#class="contentImageCenter"]/img/#src')
#images = potlikker + artbreaker
end
My view looks like this
<div id="container" class="container">
<% #images.each do |img|%>
<div class="item">
<img src="http://www.cltampa.com<%= img %>">
</div>
<% end %>
</div>
My first question is what is the most efficient way to parse multiple urls, because what I have now is not it. Would I create a separate method for that, any help on this would be awesome.
My next question is how would I grab the headline href at the same time as grabbing the image url. I have the xpath to grab both of them separately, but putting them together and then rendering them in my view is confusing me.
I've been referencing this answer Iterating through multiple URLs to parse HTML with Nokogori but haven't had any luck yet.
Thanks in advance.
PROGRESS UPDATE
def index
urls = %w[http://cltampa.com/blogs/potlikker http://cltampa.com/blogs/artbreaker http://cltampa.com/blogs/politicalanimals http://cltampa.com/blogs/earbuds http://cltampa.com/blogs/bedpost http://cltampa.com/blogs/dailyloaf]
#final_images = []
#final_urls = []
urls.each do |url|
blog = Nokogiri::HTML(open(url))
images = blog.xpath('//*[#class="contentImageCenter"]/img/#src')
images.each do |image|
#final_images << image
end
end
urls.each do |url|
blog = Nokogiri::HTML(open(url))
story_path = blog.xpath('//*[#class="postTitle"]/a/#href')
story_path.each do |path|
#final_urls << path
end
end
end
The above code is technically giving me what I need, just not sure on how to tie them together in the view. I need to wrap the final_urls around the final_images. I am sure there is a better way to do this, any info again is appreciated.
I should also add that I am experiencing a timeout on Heroku, so any advice on speeding this up, moving to a background task etc. would be very much appreciated. I am looking into it now, but not exactly sure the best route to take.

Related

Asynchronously call controller action to return URL to use with image_tag

I currently have the following view thats is loading a bunch of images, one for each character inside an array. The problem is that get_character_thumbnail calls an api to get the images url and thats being done synchronously when the user requests the page data.
<% #comic.characters.each do |character| %>
<div class = "col-sm-6 col-md-3">
<div class = "thumbnail">
<%= image_tag(get_character_thumbnail(character)) %>
</div>
<div class = "caption">
<p><%= character['name'] %></p>
</div>
</div>
<% end %>
I want to enforce that each call to get_character_thumbnail(character) will be done asynchronously and the page doesn’t get stuck.
def get_character_thumbnail(character)
response = HTTParty.get(character['resourceURI'], get_basic_api_options)
response['data']['results'][0]['thumbnail']['path'] + '/portrait_large.jpg'
end
Since I’m pretty new to ruby on rails I’m struggling a little bit to setup an ajax call to do the trick. Does anyone have a suggestion or a link that could help me?
Your routes should always return a response immediately. Here's one approach in which you're writing a custom endpoint to serve a single character image. Your html page would load quickly, then you can set the img src attributes to point at your image route.
Here's some example code to clarify:
some html.erb template
<% #comic.characters.each do |character| %>
<img src='/character_image/<%= character.id %>'>
<% end %>
a new route, get '/character_image/:id'
def character_image
#char = Character.find(id: params[:id])
img_path = "tmp/char_img_#{#char.id}.jpg"
unless File.exists?(img_path)
img_url = get_char_img(#char) # hit your API to get the url
`wget #{img_url} -O #{img_path}`
end
send_file img_path, type: 'image/jpg', disposition: 'inline'
end
This code will cache the image to avoid duplicate API requests if, say, your html page were refreshed.
I'm using tmp/ here because it's a write-enabled location on Heroku (which blocks filesystem writes to other locations). On other environments (locally, for example), you could choose to save the images to public/, which Rails serves statically by default.
You should make an AJAX call and get all the images in response.
Then store in Javascript object.
Reason for the Page Freeze -
Because you might not be using closure concept of JavaScript. When we load/read multiple image files on browser, it occur because in the first iteration File-1 wasn't load and it jump to next iteration which sometimes causes not to load few random image files and freezes the browser.
You can check my Image Preview library (https://github.com/palash-kulkarni/image_uploader/blob/master/image-uploader-1.0.0.js). Its in progress, but i know it will definitely help you.
Link to Demo (https://github.com/palash-kulkarni/image_uploader/tree/master/Demo)
$.each(files, function (_, file) {
if (image.types.test(file.type)) {
that.displayPreview(file, categoryName, image);
that.bindSingleDeleteEvent(categoryName, image);
} else {
that.clientMessage.display('#flash', image.failureMessage.invalidFileType);
}
});
// Iterate object of images
reader = new FileReader();
reader.onload = (function (img) {
return function (event) {
img.attr('src', event.target.result);
};
})(img);
reader.readAsDataURL(imageFile);
// End Iterator

rails navigation and partial refresh

Thanks for your time!
I get some reports data on my hand and I want to present these data on the web. The view of the HTML will be divided into two parts, the left part and the right part. There's a tree view in the left part consisting of the report names. In the right part presents the contents of the report.
What I want to achieve is when I click the report name on the left part, it will call an Action in the Controller, and passed the report name as parameter. The Action will fetch the data from the database and represent the data in the right part. And now I am stuck on how to realize this kind of view.
I've Googled a lot on the Internet and found Frameset, Partials or Ajax may capable of this. Because I've never developed web applications before and also new to Rails. So can anyone give me some advise or suggestion?
Things I've already known :
I've used Frameset to accomplish a view like this. But I found it needs a lot of .html files and all these .html files are static. And many people don't suggest it at all.
Then I've Googled Partials. But it seems Partials don't call the Action. It directly loads the _partials.html.erb to the main view. And besides, how can I control the layout? Using CSS?
I've never used Ajax.
If you want a fluid, seamless transition between one report and another, you should use both AJAX and Partials.
The way that it works is something like:
Make a left column in the html that has some links
Make the right column inside a partial
Assign the links to jQuery listeners to call the AJAX.
I'll put a bit of code here to show how it works:
Controller:
def index
reports = Report.all
if params[:report_id]
reports = Report.find(params[:report_id]
end
respond_to do |format|
format.html
format.js { render :template => "update_reports" }
end
end
update_reports.js.erb (in the same folder as the report views):
$('#report_viewer').html('<%= escape_javascript render :partial => "report_detail" %>');
In your view:
<div style=float:left>
<ul>
<li><%= link_to "Some report", "#", :class => "ajax" %></li>
</ul>
</div>
<div style=float:right id="report_viewer">
<%= render :partial => "report_detail" %>
</div>
<script type='text/javascript'>
$(document).ready(function() {
$(".ajax").click(function(e) {
$(this).ajax("your route to the action");
}
});
</script>
I think it's basically this, now let me explain a few things:
I don't remember if you have to do this, but in my case I created a new custom route to force the call to the action to be a json call instead of a html one. You can do this by adding :format => "js" to your route
You must name all your partials like "_yourname.html.erb". Rails won't recognize partials without the leading underscore.
In the controller, everything that comes after "format.js" is optional, you don't need to specify the template name, and if you don't Rails will look for the file index.js.erb.
The update_reports.js.erb file is basically a callback javascript that executes to update the current page. It finds the div where the partial is, and updates it rendering a new partial with the new report.
In the view, the link to change the report don't need to be a link at all if you're using the jQuery.click listener, but if it is a link, it must have the href as "#", or else the browser will just try to redirect to that location.
There are several ways to hook your link to the ajax function, I just chose the one I like it better, but you also could have a named function and call it in the html tag "onClick='yourFunction()'".
You need jQuery to call ajax like this. If you're sing Rails 3.0 or lower, you should replace the default Prototype with jQuery, because it's much better (IMHO), but I think prototype also have some ajax features.
It may seem complicated, but once you get the idea of it it'll become simple as writing any other action.
In the js callback file you could also add an animation to smooth the transition, like a fading. Look for the jQuery fade function for more info on this.
This is quite an open question so don't take this answer verbatim, but merely as a guide.
app/views/layouts/reports.html.erb
<!DOCTYPE html>
<html itemscope itemtype="http://schema.org/">
<head>
# omitted
</head>
<body>
<div id="body-container">
<div id="left-column">
<ul id="reports-list">
<% for report in #reports %>
<li><%= link_to report.name, report %></li>
<% end %>
</ul>
</div>
<div id="right-column">
<%= yield %>
</div>
</div>
</body>
</html>
app/controllers/reports_controller.rb
class ReportsController < ApplicationController
before_filter { #reports = Report.all }
def index
end
def show
#report = Report.find(params[:id])
end
def edit
#report = Report.find(params[:id])
end
def new
#report = Report.new
end
def update
# omitted
end
def create
# omitted
end
def destroy
#report = Report.find(params[:id])
end
end
routes.rb
YourApp::Application.routes.draw do
resources :reports
root to: "reports#index"
end
This would achieve the effect your after using just rails, of course adding ajax could add a better user experience.

Rails3 helpers and dynamic content

I've been working to pull dynamic data from last.fm using youpy's "lastfm" gem. Getting the data works great; however, rails doesn't seem to like the dynamic portion. Right now, I have added the code to a helper module called "HomeHelper" (generated during the creation of the rails app) found in the helper folder:
module HomeHelper
##lastfm = Lastfm.new(key, secret)
##wesRecent = ##lastfm.user.get_recent_tracks(:user => 'weskey5644')
def _album_art_helper
trackHash = ##wesRecent[0]
medAlbumArt = trackHash["image"][3]
if medAlbumArt["content"] == nil
html = "<img src=\"/images/noArt.png\" height=\"auto\" width=\"150\" />"
else
html = "<img src=#{medAlbumArt["content"]} height=\"auto\" width=\"150\" />"
end
html.html_safe
end
def _recent_tracks_helper
lfartist1 = ##wesRecent[0]["artist"]["content"]
lftrack1 = ##wesRecent[0]["name"]
lfartist1 = ##wesRecent[1]["artist"]["content"]
lftrack1 = ##wesRecent[1]["name"]
htmltrack = "<div class=\"lastfm_recent_tracks\">
<div class=\"lastfm_artist\"><p>#{lfartist1 = ##wesRecent[0]["artist"]["content"]}</p></div>
<div class=\"lastfm_trackname\"><p>#{lftrack1 = ##wesRecent[0]["name"]}</p></div>
<div class=\"lastfm_artist\"><p>#{lfartist2 = ##wesRecent[1]["artist"]["content"]}</p></div>
<div class=\"lastfm_trackname\"><p>#{lftrack2 = ##wesRecent[1]["name"]}</p></div>
</div>
"
htmltrack.html_safe
end
end
I created a partial for each and added them to my Index page:
<div class="album_art"><%= render "album_art" %></div>
<div id="nowplayingcontain"><%= render "recent_tracks" %></div>
Great, this gets the data I need and displays on the page like I want; however, it seems that when the song changes, according to last.fm, it doesn't on my site unless I restart the server.
I've tested this using Phusion Gassenger and also WEBrick and it seems to do it on both. I had thought this might be an issue with caching of this particular page so I tried a couple of caching hacks to expire the page an reload. This didn't help.
I then came to conclusion that sticking this code in a helper file might not be the best solution. I don't know how well helpers handle dynamic content; such as this. If anyone has any insight on this, awesome!! Thanks everyone!
Your problem isn't that you're using a helper, the problem is that you're using class variables:
module HomeHelper
##lastfm = Lastfm.new(key, secret)
##wesRecent = ##lastfm.user.get_recent_tracks(:user => 'weskey5644')
that are initialized when the module is first read. In particular, ##wesRecent will be initialized once and then it will stay the same until you restart the server or happen to get a new server process. You should be able to call get_recent_tracks when you need it:
def _album_art_helper
trackHash = ##lastfm.user.get_recent_tracks(:user => 'weskey5644').first
#...
Note that this means that your two helpers won't necessarily be using the same track list.
You might want to add a bit of "only refresh the tracks at most once very minute" logic as well.

Template path in Rails 3

Let's say, I connected the route / to WelcomeController's index action.
Inside of the index.html.erb-Template I want to display the path of the template from Rails.root upwards, ie.
<h1> We are rendering: <%= how_do_i_do_this? %></h1>
to render to
<h1> We are rendering: app/views/presentation/index.html.erb</h1>
In Rails 2 I could access template.path, but this doesn't work anymore
Any ideas?
Because of how template rendering works in Rails, you will now be able to use __FILE__ for this instead. This works for me:
<%= __FILE__.gsub(Rails.root.to_s, "") %>
There may be a better way to do this however, but I couldn't find it when I went looking.
Ryan's answer works. If you also want to put your method in a helper, use Kernel#caller. Here is a method I'm using to do something similar:
def has_page_comment? code = nil
if code.nil?
# grab caller file, sanitize
code = caller.first.split(':').first.gsub(Rails.root.to_s,'').gsub('.html.erb','')
end
...
end

Basecamp API Rails

I was wondering if someone could do me massive favour..
I really don't understand how to make use of APIs - so I was wondering if, using Basecamp as an example, someone could talk me though the basics.
So far I have an application with a dashboard controller/view, I have put basecamp.rb into my /lib directory, added the following to my application_controller:
def basecamp_connect
Basecamp.establish_connection!('XXXXXX.basecamphq.com', 'USER', 'PASS', false)
#basecamp = Basecamp.new
end
Obviously changing the required parts to my credentials.
Next up I have added the following to my dashboard_controller:
def index
Basecamp::TodoList.find(:all)
end
Next I presume I have to somehow list the Todos on the dashboard using some sort of loop.
Am I doing the right thing, if so - how on earth do I display all the todo items and if not - what am I doing wrong/missing.
It doesn't have to be todos, anything from Basecamp or any other popular API service would be a good start. It's just that I happen to have a basecamp account!
Thanks,
Danny
Your view expects to have some variables defined. You can loop through those variables and display their content as you want.
So you could do, in your action :
def index
#list = Basecamp::TodoList.find(:all)
end
Then in your view you have access to the #list variable and you can to the following :
<ul>
<% #list.each do |item| %>
<li><%= item.to_json</li>
<% end %>
</ul>
Replacing the json dump by the elements as you wish to display them of course.
You might want to read the rails guides to get a lot more of informations.

Resources