Parsing a document in a table - ruby-on-rails

How do I parse a document in a table and send it across as a JSON file to another db.
Detailed Desc:
I have crawled and taken data into a table from websites using anemone. I need to now parse it and transfer it as a JSON file to another server. I think, I will have to first convert the document in the table into nokogiri document which can be parsed and converted to json file. Any idea how can I convert the doc into nokogiri document or if anyone has any other idea to parse it and send it as a json file ?

Nokogiri is your best bet for the HTML parsing, but as for converting it to JSON you're on your own from what I can tell.
Once you have it parsed via Nokogiri it shouldn't be terribly hard to extract the elements you need and generate JSON that represents them. What you're doing isn't a very common task, so you'll have to bridge the gap between Nokogiri and whichever gem you're using to generate the JSON.

Okay I found the answer long time back, I basically made use of REST to send message form one application to another, i sent it across as a hash. And the obvious one, I used nokogiri for parsing the table.
def post_me
#page_hash = page_to_hash
res = Net::HTTP.post_form(URI.parse('http://127.0.0.1:3007/element_data/save.json'),#page_hash)
end
For sending the hash from one application to another using net/http.
def page_to_hash
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'domainatrix'
#page = self.page.sub(/^<!DOCTYPE html(.*)$/, '<!DOCTYPE html>')
hash={}
doc = Nokogiri::HTML(self.page)
doc.search('*').each do |n|
puts n.name
end
Using Nokogiri for parsing the page table in my model. page table had the whole body of a webpage.
file_type = []
file_type_data=doc.xpath('//a/#href[contains(. , ".pdf") or contains(. , ".doc")
or contains(. , ".xls") or contains(. , ".cvs") or contains(. , ".txt")]')
file_type_data.each do |href|
if href[1] == "/"
href = "http://" + website_url + href
end
file_type << href
end
file_type_str = file_type.join(",")
hash ={:head => head,:title => title, :body => self.body,
:image => images_str, :file_type => file_type_str, :paragraph => para_str, :description => descr_str,:keyword => key_str,
:page_url=> self.url, :website_id=>self.parent_request_id, :website_url => website_url,
:depth => self.depth, :int_links => #int_links_arr, :ext_links => #ext_links_arr
}
A simple parsing example and how i formed my hash.

Related

Read and write JSON data from form to file

I am trying to work with JSON data in Rails.
We need to save some countries in our JSON file which we support. We have created a form which a user can create a new country/state/pincode pair and this form will append the pair in the JSON file. After that, we need to read that JSON file and print which countries are supported.
We know how to read data from the JSON file, but we are having some issues while writing the data in the proper format.
This is the code for reading the data:
#data=JSON.parse( IO.read("public/dealer.json") )
How can I write data to a file from the form in JSON format?
Given a ruby object, you can generate a file with text in json format like so:
require 'json'
data = { "foo" => "bar" }
File.open("output.json", "w+") do |f|
f.write(JSON.generate(data))
end
require 'json'
data = [{ "foo" => "bar" } , { "foo1" => "bar1" }]
File.open("output.json", "w+") do |f|
f.write(JSON.generate(data))
end
Try this...!

how to read uploaded files

I'm giving user opportunity to upload their files. However, application is not saving those files into the database, it only needs to get informations out of it.
So from a form which looks like this:
= simple_form_for #channel, method: :post do |f|
= f.input :name
= f.input :configuration_file, as: :file
= f.submit
come params[:channel][:configuration_file]
#<ActionDispatch::Http::UploadedFile:0xc2af27c #original_filename="485.csv", #content_type="text/csv", #headers="Content-Disposition: form-data; name=\"channel[configuration_file]\"; filename=\"485.csv\"\r\nContent-Type: text/csv\r\n", #tempfile=#<File:/tmp/RackMultipart20140822-6972-19sqxq2>>
How exactly can i read from this thing? I tried simply
File.open(params[:channel][:configuration_file])
but it returns error
!! #<TypeError: can't convert ActionDispatch::Http::UploadedFile into String>
PS
Additional solutions for xml and csv would be much appreciated!
According to the Rails docs:
http://api.rubyonrails.org/classes/ActionDispatch/Http/UploadedFile.html
an uploaded file supports the following instance methods, among others:
open()
path()
read(length=nil, buffer=nil)
you could try:
my_data = params[:channel][:configuration_file].read
to get a string of the file contents?
or even:
my_data = File.read params[:channel][:configuration_file].path
Also, if the file can be long, you may want to open the file and read line by line. A few solutions here:
How to read lines of a file in Ruby
If you want to read a CSV file, you could try:
require 'csv'
CSV.foreach(params[:channel][:configuration_file].path, :headers => true) do |row|
row_hash = row.to_hash
# Do something with the CSV data
end
Assuming you have headers in your CSV of course.
For XML I recommend the excellent Nokogiri gem:
http://nokogiri.org/
At least partly because it uses an efficient C library for navigating the XML. (This can be a problem if you're using JRuby). Its use is probably out of scope of this answer and adequately explained in the Nokogiri docs.
From the documentation
The actual file is accessible via the tempfile accessor, though some
of its interface is available directly for convenience.
You can change your code to:
file_content = params[:channel][:configuration_file].read
or if you want to use the File API:
file_content = File.read params[:channel][:configuration_file].path

Parsing feed with SimpleRSS in Rails, numeric codes appear, can't encode them properly

I am using the SimpleRSS gem to parse a WordPress RSS feed. The only problem is that many characters in that feed are encoded using numeric codes, e.g.
’
instead of
'
Files
*rss_helper.rb*
module RssHelper
require 'rubygems'
require 'simple-rss'
require 'open-uri'
def rss
rss = SimpleRSS.parse open('http://example.com/feed/')
end
end
show.html.slim
...
-rss.entries.each do |entry|
=entry.title
With the entry.title, I have tried:
=entry.title.encode("UTF-8")
=entry.title.encode(Encoding::UTF_8, :invalid => :replace, :undef => :replace, :replace => '')
Neither has worked. I found a lot of resources regarding the iconv gem, but from what I understand it is deprecated now.
I also attempted using the .force_encoding method instead of .encoding, but no matter what I choose it always displays that numeric code directly from the feed.
How do I force it to render the proper character?
EDIT: Here is my final helper using the gem suggested by the selected answer, included here so anyone who views this can see what I did.
*rss_helper.rb*
def decode(string)
coder = HTMLEntities.new
return coder.decode(string)
end
show.html.slim
...
decode(entry.title)
...
Run it through HTMLEntities.
HTMLEntities.new.decode(rss_feed_content)
This'll translate the entity-encoded characters into their literal equivalents.

Rails form => URL => JSON => Save params

This is basically what I want to do, with the params given in a form, I want to do a GET/POST request to a site, this site expects an specific URL like http://site.com/user=XXX&size=XXX and it will give me back a JSON, I want to parse/save the data from this JSON into my rails app when the form is submitted.
I am totally lost with this manner, anything would be very appreciated.
Rails Form Data => Build the URL => Do a GET/Post request => Catch JSON => Parse => Save
for rest api you can use activeresource in your application
http://api.rubyonrails.org/classes/ActiveResource/Base.html
if it's something very specific you can use Net::Http to make requests and then parse json to ruby objects by yourself.
Examples of using http://www.rubyinside.com/nethttp-cheat-sheet-2940.html
for decoding json you can use
Json or ActiveSupport::JSON.decode or this https://github.com/flori/json
I guess you want to do a request to another not your site to get the response. So you can install curb gem (the curl wrapper in ruby). Then use it to make the request on another site and parse json with standart RoR tools like Json to hash.
From http://www.rubyinside.com/nethttp-cheat-sheet-2940.html you get you can do the following:
at the top of your file add:
require "net/http"
require "uri"
require 'json'
then in your controller or helper:
#set the uri
uri = URI.parse("http://my.site.com/uri")
#set the post params and get the respons
response = Net::HTTP.post_form(uri, {"first_param" => "my param", "second_param" => "another param"})
#get the json info
data = JSON.parse(response.body)
#set result to an ActiveRecord (maybe there is a better way to do this, I guess it depends on the response you get
#something = Mymodel.new
#something.name = data["name"]
...
#something.save
Hope it helps!

POSTing File Attachments over HTTP via JSON API

I have a model called Book, which has_many :photos (file attachments handled by paperclip).
I'm currently building a client which will communicate with my Rails app through JSON, using Paul Dix's Typhoeus gem, which uses libcurl.
POSTing a new Book object was easy enough. To create a new book record with the title "Hello There" I could do something as simple as this:
require 'rubygems'
require 'json'
require 'typhoeus'
class Remote
include Typhoeus
end
p Remote.post("http://localhost:3000/books.json",
{ :params =>
{ :book => { :title => "Hello There" }}})
My problems begin when I attempt to add the photos to this query. Simply POSTing the file attachments through the HTML form creates a query like this:
Parameters: {"commit"=>"Submit", "action"=>"create", "controller"=>"books", "book"=>{"title"=>"Hello There", "photo_attributes"=>[{"image"=>#<File:/var/folders/1V/1V8Kw+LEHUCKonqJ-dp3oE+++TI/-Tmp-/RackMultipart20090917-3026-i6d6b9-0>}]}}
And so my assumption is I'm looking to recreate the same query in the Remote.post call.
I'm thinking that I'm letting the syntax of the array of hashes within a hash get the best of me. I've been attempting to do variations of what I was expecting would work, which would be something like:
p Remote.post("http://localhost:3000/books.json",
{ :params =>
{ :book => { :title => "Hello There",
:photo_attributes => [{ :image => "/path/to/image/here" }] }}})
But this seems to concatenate into a string what I'm trying to make into a hash, and returns (no matter what I do in the :image => "" hash):
NoMethodError (undefined method `stringify_keys!' for "image/path/to/image/here":String):
But I also don't want to waste too much time figuring out what is wrong with my syntax here if this isn't going to work anyway, so I figured I'd come here.
My question is:
Am I on the right track? If I clear up this syntax to post an array of hashes instead of an oddly concatenated string, should that be enough to pass the images into the Book object?
Or am I approaching this wrong?
Actually, you can't post files over xhr, there a security precaution in javascript that prevents it from handling any files at all. The trick to get around this is to post the file to a hidden iframe, and the iframe does a regular post to the server, avoiding the full page refresh. The technique is detailed in several places, possibly try this one (they are using php, but the principle remains the same, and there is a lengthy discussion which is helpful):
Posting files to a hidden iframe

Resources