I am currently working on a Rails app.
I want to go to a website(http://alt19.com/) and select a set of options, then click a button which triggers the download of a CSV file. Then I want to take the file and parse it.
I have found a gem for parsing CSV files.
However, I don't know if there is a gem for navigating to another website, selecting a set of options, downloading several files and saving them somewhere where my app can process them.
Is there anything like this?
If not, are there any alternative solutions?
You can use mechanize gem to scrap the page. Mechanize uses nokogiri as one of the dependency which is responsible for scraping and mechanize has added feature of clicking elements from the page.
As you can see the CSV Generator from makes a post with some params.
Just do the same with 'net/https' and 'open_uri'
Example :
require "uri"
require "net/http"
params = {'box1' => 'Nothing is less important than which fork you use. Etiquette is the science of living. It embraces everything. It is ethics. It is honor. -Emily Post',
'button1' => 'Submit'
}
x = Net::HTTP.post_form(URI.parse('http://www.interlacken.com/webdbdev/ch05/formpost.asp'), params)
puts x.body
Example source: Submitting POST data from the controller in rails to another website
Related
I have a Rails 3.2.13 site that needs to scrape another website to get a product description. What is the best way to do this in Rails 3?
I've heard that nokogiri is fast. Should I use nokogiri? And if I use nokogiri is it possible that I don't have to save the scraped data anymore? I imagine it as just like getting json data from an API, is it like that?
I'd recommend a combination of Nokogiri and open-uri. Require both gems, and then just do something along the lines of doc = Nokogiri::HTML(open(YOUR_URL)). Then find the element you want to capture (using developer tools in chrome (or the equivalent) or something like Selector Gadget. Then you can use doc.at_css(SELECTOR) for a single element, or doc.search(SELECTOR) for multiple selectors. Calling the text method the response should get you the product description you're looking for. No need to save anything to the database (unless you want to) Hope that helps!
mechanize is a wonderful gem for scraping data from other websites as html. It is simple, robust and using nokogiri gem as result wrapper.
the following snippet will show you how you can fetch needed data being seen as Safari browser from url:
require 'htmlentities'
require "mechanize"
a = Mechanize.new { |agent|
agent.user_agent_alias = 'Mac Safari'
}
#resultHash = {}
a.get(url) do |page|
parsedPage = page.parser
#resultHash[:some_data_name] = parsedPage.at_xpath("//h1[#class='any-class']").text.split(/\s+/).join(" ")
end
I search links via css form page = agent.get('http://www.print-index.ru/default.aspx?p=81&gr=198') and after that I have in page variable a lot of links but I don't know how use them, how click on them via Mechanize. I found on stackoverflow this method:
page = agent.get "http://google.com"
node = page.search ".//p[#class='posted']"
Mechanize::Page::Link.new(node, agent, page).click
but it works for only one link so how can I use this method for many links.
If I should post additional information, please say it.
If your goal is simply to make it to the next page and then scrape some info off of it, then all you really care about are:
Page content (For scraping your data)
The URL to the next page you need to visit
The way you get to the page content could be done by using Mechanize OR something else, like OpenURI (which is part of Ruby standard lib). As a side note, Mechanize uses Nokogiri behind the scenes; when you start to dig into elements on the parsed page you will see they come back as Nokogiri related objects.
Anyways, if this were my project I'd probably go the route of using OpenURI to get at the page's content and then Nokogiri to search it. I like the idea of using a Ruby standard library instead of requiring an additional dependency.
Here is an example using OpenURI:
require 'nokogiri'
require 'open-uri'
printing_page = Nokogiri::HTML(open("http://www.print-index.ru/default.aspx?p=81&gr=198"))
# ...
# Your code to scrape whatever you want from the Printing Page goes here
# ...
# Find the next page to visit. Example: You want to visit the "About the project" page next
about_project_link_in_navbar_menu = printing_page.css('a.graymenu')[4] # This is a overly simple finder. Nokogiri can do xpath searches too.
about_project_link_in_navbar_menu_url = "http://www.print-index.ru#{about_project_link_in_navbar_menu.attributes["href"].value}" # Get the URL page
about_project_page = Nokogiri::HTML(open(about_project_link_in_navbar_menu_url)) # Get the About page's content
# ....
# Do something...
# ....
Here's an example using Mechanize to get the page content (they are very similar):
require 'mechanize'
agent = Mechanize.new
printing_page = agent.get("http://www.print-index.ru/default.aspx?p=81&gr=198")
# ...
# Your code to scrape whatever you want from the Printing Page goes here
# ...
# Find the next page to visit. Example: You want to visit the "About the project" page next
about_project_link_in_navbar_menu = printing_page.search('a.graymenu')[4] # This is a overly simple finder. Nokogiri can do xpath searches too.
about_project_link_in_navbar_menu_url = "http://www.print-index.ru#{about_project_link_in_navbar_menu.attributes["href"].value}" # Get the URL page
about_project_page = agent.get(about_project_link_in_navbar_menu_url)
# ....
# Do something...
# ....
PS I used google to translate Russian to english.. if the variable names are incorrect, i'm sorry! :X
I coded a little Ruby script that would parse a remote XML file and extract some data from it using Nokogiri. Now I'm trying to code a more advanced version as a Rails application.
I have my code inside of a controller. It's similar to the code that I used in my Ruby script, however it's not working. I believe the error is because it's trying to load the XML locally rather then externally.
Here is the error that Rails is giving me:
No such file or directory - http://mal-api.com/anime/10?format=xml
Here is a sample of the code in my controller: (I can provide the whole thing if needed, but it's mainly just the default Rails scaffold code.)
def create
require 'nokogiri'
#anime = Anime.new(params[:anime])
doc = Nokogiri::XML(open("http://mal-api.com/anime/#{#anime.mal_id}?format=xml"))
#Title
title = doc.css("anime english_title").inner_html
#Snipped rails scaffold code
end
mal_id is passed in through a form. Nokogiri is added in my Gemfile.
Is there something I'm missing or that I've done wrong?
Any help is appreciated.
By default the open method in ruby is used to open files. If you want to directly open an URL you need to require 'open-uri'. More information can be found in the docs: http://www.ruby-doc.org/stdlib-1.9.3/libdoc/open-uri/rdoc/OpenURI.html
This might be (read: probably is) a dumb question, but here goes...
Is there a simple, preferably non third-party, way to display Rails logs in the browser? It gets kind of old opening a log file, then closing it and re-opening it on the next error. It would be superfantastic to just to go domain.tld/logs, which would require user authentication, of course.
For reference, there is a very simple way to do this. However, you would want to protect this page with some sort of authentication/authorization.
Controller:
lines = params[:lines]
if Rails.env == "production"
#logs = `tail -n #{lines} log/production.log`
else
#logs = `tail -n #{lines} log/development.log`
end
Log File View:
<pre><%= #logs %></pre>
View to display link:
<%= link_to "log file", log_path(lines: 100) %>
Routes:
get 'logs/:lines' => "pages#log", as: "log"
All you need is to open log file and put its content into browser.
Realted topic: ruby: all ways to read from file.
Also you should know that your logs grow very fast, and it's not a good idea to show whole log in browser. You can just open last 100-200 rows. Related topic: Reading the last n lines of a file in Ruby?
Also you can try this solution: http://newrelic.com/. It is more complex and little oftopic, but quite useful.
I created a for this purpose called browserlog.
Installing is just a matter of adding the gem to the Gemfile:
gem 'browserlog'
Afterwards all you need is to mount the engine on the route you want:
MyApp::Application.routes.draw do
mount Browserlog::Engine => '/logs'
end
Once that's set up, accessing /logs/development (or production, test, etc) will show a window like this:
(source: andredieb.com)
Pimp my Log can be used to visualization many types of logs including Ruby on Rails.
It also includes the management of users and notification.
[Edited]
By default, the PimpMyLog does not support Ruby on Rails logs. It's necessary to implement the regex to works with it.
[/Edited]
I have paper_clip installed on my Rails 3 app, and can upload a file - wow that was fun and easy!
Challenge now is, allowing a user to upload multiple objects.
Whether it be clicking select fileS and being able to select more than one. Or clicking a more button and getting another file upload button.
I can't find any tutorials or gems to support this out of the box. Shocking I know...
Any suggestions or solutions. Seems like a common need?
Thanks
Okay, this is a complex one but it is doable. Here's how I got it to work.
On the client side I used http://github.com/valums/file-uploader, a javascript library which allows multiple file uploads with progress-bar and drag-and-drop support. It's well supported, highly configurable and the basic implementation is simple:
In the view:
<div id='file-uploader'><noscript><p>Please Enable JavaScript to use the file uploader</p></noscript></div>
In the js:
var uploader = new qq.FileUploader({
element: $('#file-uploader')[0],
action: 'files/upload',
onComplete: function(id, fileName, responseJSON){
// callback
}
});
When handed files, FileUploader posts them to the server as an XHR request where the POST body is the raw file data while the headers and filename are passed in the URL string (this is the only way to upload a file asyncronously via javascript).
This is where it gets complicated, since Paperclip has no idea what to do with these raw requests, you have to catch and convert them back to standard files (preferably before they hit your Rails app), so that Paperclip can work it's magic. This is done with some Rack Middleware which creates a new Tempfile (remember: Heroku is read only):
# Embarrassing note: This code was adapted from an example I found somewhere online
# if you recoginize any of it please let me know so I pass credit.
module Rack
class RawFileStubber
def initialize(app, path=/files\/upload/) # change for your route, careful.
#app, #path = app, path
end
def call(env)
if env["PATH_INFO"] =~ #path
convert_and_pass_on(env)
end
#app.call(env)
end
def convert_and_pass_on(env)
tempfile = env['rack.input'].to_tempfile
fake_file = {
:filename => env['HTTP_X_FILE_NAME'],
:type => content_type(env['HTTP_X_FILE_NAME']),
:tempfile => tempfile
}
env['rack.request.form_input'] = env['rack.input']
env['rack.request.form_hash'] ||= {}
env['rack.request.query_hash'] ||= {}
env['rack.request.form_hash']['file'] = fake_file
env['rack.request.query_hash']['file'] = fake_file
if query_params = env['HTTP_X_QUERY_PARAMS']
require 'json'
params = JSON.parse(query_params)
env['rack.request.form_hash'].merge!(params)
env['rack.request.query_hash'].merge!(params)
end
end
def content_type(filename)
case type = (filename.to_s.match(/\.(\w+)$/)[1] rescue "octet-stream").downcase
when %r"jp(e|g|eg)" then "image/jpeg"
when %r"tiff?" then "image/tiff"
when %r"png", "gif", "bmp" then "image/#{type}"
when "txt" then "text/plain"
when %r"html?" then "text/html"
when "js" then "application/js"
when "csv", "xml", "css" then "text/#{type}"
else 'application/octet-stream'
end
end
end
end
Later, in application.rb:
config.middleware.use 'Rack::RawFileStubber'
Then in the controller:
def upload
#foo = modelWithPaperclip.create({ :img => params[:file] })
end
This works reliably, though it can be a slow process when uploading a lot of files simultaneously.
DISCLAIMER
This was implemented for a project with a single, known & trusted back-end user. It almost certainly has some serious performance implications for a high traffic Heroku app and I have not fire tested it for security. That said, it definitely works.
The method Ryan Bigg recommends is here:
https://github.com/rails3book/ticketee/commit/cd8b466e2ee86733e9b26c6c9015d4b811d88169
https://github.com/rails3book/ticketee/commit/982ddf6241a78a9e6547e16af29086627d9e72d2
The file-uploader recommendation by Daniel Mendel is really great. It's a seriously awesome user experience, like Gmail drag-and-drop uploads. Someone wrote a blog post about how to wire it up with a rails app using the rack-raw-upload middleware, if you're interested in an up-to-date middleware component.
http://pogodan.com/blog/2011/03/28/rails-html5-drag-drop-multi-file-upload
https://github.com/newbamboo/rack-raw-upload
http://marc-bowes.com/2011/08/17/drag-n-drop-upload.html
There's also another plugin that's been updated more recently which may be useful
jQuery-File-Upload
Rails setup instructions
Rails setup instructions for multiples
And another one (Included for completeness. I haven't investigated this one.)
PlUpload
plupload-rails3
These questions are highly related
Drag-and-drop file upload in Google Chrome/Chromium and Safari?
jQuery Upload Progress and AJAX file upload
I cover this in Rails 3 in Action's Chapter 8. I don't cover uploading to S3 or resizing images however.
Recommending you buy it based solely on it fixing this one problem may sound a little biased, but I can just about guarantee you that it'll answer other questions you have down the line. It has a Behaviour Driven Development approach as one of the main themes, introducing you to Rails features during the development of an application. This shows you not only how you can build an application, but also make it maintainable.
As for the resizing of images after they've been uploaded, Paperclip's got pretty good documentation on that. I'd recommend having a read and then asking another question on SO if you don't understand any of the options / methods.
And as for S3 uploading, you can do this:
has_attached_file :photo, :styles => { ... }, :storage => :s3
You'd need to configure Paperclip::Storage::S3 with your S3 details to set it up, and again Paperclip's got some pretty awesome documentation for this.
Good luck!