Nokogiri+Paperclip = XML parse with img from url

Nokogiri+Paperclip = XML parse with img from url - ruby-on-rails

I need to parse some params to my DB and image from URL.
I use paperclip for image.
In Rails console I can add image to new post by this code:
image = Image.new
image.image_from_url "http://yug-avto.ru/files/image/tradein/hyundai/877_VOLKSWAGEN_FAETON_2011_2_1366379491.jpg"
image.watermark = true
image.save!
in my Image model I have
require "open-uri"
.......
def image_from_url(img_url)
self.image = open(img_url)
end
And all work done. But when I use Nokogiri, this code don't work.
rake aborted!
No such file or directory -
http://yug-avto.ru/files/image/tradein/peugeot/1027_Peugeot_308_2011_2_1370850441.jpg
My rake task for Nokogiri parse:
doc.xpath("//item").each do |ad|
img = ad.at("image").text
img1 = Image.new
img1.image = open("#{img}")
img1.watermark = true
img1.save!
end
In rake task for Nokogiri, I have require 'nokogiri' and require 'open-uri'.
How to be?:))))

This is a code snippet from my parser... I guess where you went wrong is using open(url) instead of parse(url).
picture = Picture.new(
realty_id: realty.id,
position: position,
hashcode: realty.hashcode
)
# picture.image = URI.parse(url), edit: added open() as this worked for Savroff
picture.image = open(URI.parse(url))
picture.save!
Additionally it would be a good idea to check if the image really exists
picture_array.each do |url|
# checks if the Link works
res = Net::HTTP.get_response(URI.parse(url))
# if so, it will add the Picture Link to the verified Array
if res.code.to_i >= 200 && res.code.to_i < 400 #good codes will be betweem 200 - 399
verified_array << url
end
end

Thanks TheChamp, you led me to the right thoughts.
First need to parse URL and after that open.
image = Image.new
ad_image_url = URI.parse("#{img}")
image.image = open(ad_image_url)
image.watermark = true
image.save!

Related

Access ActiveStorageBlob or ActiveStorageAttachment like it would be a native model

Would it be possible to access the ActiveStorageBlob or ActiveStorageAttachment like it would be a native model ?
E.g.
I want to do ActiveStorageBlob.first to access the first record of this model/table.
or. ActiveStorageAttachment.all.as_json to generate json formated print.
The background idea is to find a way how to dump the content of these ActiveStorage related tables as json formated files. Then change simething on these files, and load it back.
----Extending this text after got correct answer-----
Thank you very much Sarah Marie.
And I hope you know how to load the JSON data back into these tables ?
I have tried this :
dump_file_path = File.join(Rails.root, "backup", active_storage_blobs_file)
load_json = JSON.parse(File.read(dump_file_path))
load_json.each do |j|
ActiveStorage::Blob.create(j)
end
But thats not working.
ActiveModel::UnknownAttributeError (unknown attribute
'attachable_sgid' for ActiveStorage::Blob.)

ActiveStorage::Blob.first
ActiveStorage::Attachment.all.as_json
---- For second extended question ----
ActiveStorage::Blob.create_before_direct_upload!(
filename: j[:filename],
content_type: j[:content_type],
byte_size: j[:byte_size],
checksum: j[:checksum]
)
# or
ActiveStorage::Blob.create_before_direct_upload!(**j.symbolize_keys)
Reference: https://github.com/rails/rails/blob/5f3ff60084ab5d5921ca3499814e4697f8350ee7/activestorage/app/controllers/active_storage/direct_uploads_controller.rb#L8-L9
https://github.com/rails/rails/blob/098fd7f9b3d5c6f540911bc0c17207d6b48d5bb3/activestorage/app/models/active_storage/blob.rb#L113-L120

Now I have a complete solution, how to dump and load the ActiveStorage tables as JSON files.
...dump it
active_storage_blobs_file = "active_storage_blob.json"
active_storage_attachments_file = "active_storage_attachment.json"
puts("...dump active_storage_blob")
dump_file_path = File.join(Rails.root, "backup",active_storage_blobs_file)
dump_file = File.open(dump_file_path, "w")
dump_file.write(JSON.pretty_generate(ActiveStorage::Blob.all.as_json))
dump_file.close()
puts("...dump active_storage_attachment")
dump_file_path = File.join(Rails.root, "backup",
active_storage_attachments_file)
dump_file = File.open(dump_file_path, "w")
dump_file.write(JSON.pretty_generate(ActiveStorage::Attachment.all.as_json))
dump_file.close()
...load it back
puts("...load active_storage_blob")
dump_file_path = File.join(Rails.root, "backup", active_storage_blobs_file)
abort("File does not exist (" + dump_file_path + ") > abort <") unless File.exist?(dump_file_path)
load_json = JSON.parse(File.read(dump_file_path))
load_json.each do |j|
j = j.except("attachable_sgid")
result = ActiveStorage::Blob.create(j)
if (not result.errors.empty?)
puts(result.errors.full_messages.to_s)
puts(j.inspect)
exit(1)
end
end
puts("...load active_storage_attachment")
dump_file_path = File.join(Rails.root, "backup", active_storage_attachments_file)
abort("File does not exist (" + dump_file_path + ") > abort <") unless File.exist?(dump_file_path)
load_json = JSON.parse(File.read(dump_file_path))
load_json.each do |j|
result = ActiveStorage::Attachment.create(j)
if (not result.errors.empty?)
puts(result.errors.full_messages.to_s)
puts(j.inspect)
exit(1)
end
end

Trouble with low level caching in rails

I have some issues with caching in rails. I don't find how should i setting it.
Here's the code :
submit_key = nil
pairs_email = Hash.new
pairs_type = Rails.cache.fetch("cache_typeform", :expires_in => 1.day) do
(0..9).each do
if submit_key.present?
url = "https://api.typeform.com/forms/#{typeform_id}/responses?page_size=1000&until=#{submit_key}"
response = RestClient.get url, {:Authorization => 'Bearer XXXXXXXXXXX'}
parsed = JSON.parse(response.body)
else
response = RestClient.get "https://api.typeform.com/forms/#{typeform_id}/responses?page_size=1000", {:Authorization => 'Bearer XXXXXXXXXXXXXXX}
parsed = JSON.parse(response.body)
end
parsed['items'].each do |item|
pairs_email[item['hidden']['email']] = item['token'] if item['hidden']['email'].present?
end
submit_key = parsed['items'][-1]['submitted_at'].chop
end
end
Then it should return a pairs containing an email and an ID and this pairs is used after to get more informations. However, nothing is returning.
Does someone can tell me what I've done wrong in my code? Am I missing something somewhere?
UPDATE
I want to use my cache for getting informations from the typeform API :
results = Hash.new
if pairs_email[email].present?
url = "https://api.typeform.com/v1/form/#{typeform_id}?key=#{ENV['TYPEFORM_API_KEY']}&token=#{pairs_email[email]}"
response = RestClient.get(url)
parsed = JSON.parse(response.body)
results["email"] = parsed["responses"][0]["hidden"]["email"] # Email
results["first_name"] = parsed["responses"][0]["answers"]["textfield_25078009"] # prénom
results["last_name"] = parsed["responses"][0]["answers"]["textfield_25078014"] # nom
results["phone_number"] = parsed["responses"][0]["answers"]["textfield_25444504"] #N°
results["job"] = parsed["responses"][0]["answers"]["textfield_24904749"] # métier
results["status_legal"] = parsed["responses"][0]["answers"]["list_24904751_choice"] # statut légal ?
results["birthdate"] = parsed["responses"][0]["answers"]["date_24904754"] # Date de naissance
results["zipcode"] = parsed["responses"][0]["answers"]["number_24904755"] # Code postal
results["has_partner"] = parsed["responses"][0]["answers"]["yesno_53894471"] # has_partner
results["children"] = parsed["responses"][0]["answers"]["list_53894494_choice"] # Nombre d'enfants
results["optical_option"] = parsed["responses"][0]["answers"]["list_24904752_choice_32209601"] # optical_option
results["dental_option"] = parsed["responses"][0]["answers"]["list_24904752_choice_32209602"] # dental_option
results["sick_15d"] = parsed["responses"][0]["answers"]["list_24904752_choice_32209603"] # Sick_15d
results["target_year"] = parsed["responses"][0]["answers"]["list_24905736_choice"] # target_year
results["monthly_income"] = parsed["responses"][0]["answers"]["number_24904756"] # monthly_income
results["independent"] = parsed["responses"][0]["answers"]["yesno_53895024"] # independent_1_year
#results["subject_to_discuss"] = parsed["responses"][0]["answers"]["textarea_24904759"] # Avez-vous des sujets dont vous voulez discuter
end

Here's something you must try before getting caching right. Attaching a screenshot from my machine.
Also if you are in development environment you would need to enable caching to see the effect. You could add config.action_controller.perform_caching = true and config.cache_store = :memory_store, { size: 64.megabytes } to your development.rb config file to enable caching.
This is just an idea of how caching happens and check if it really works, this should help you get going with your task.

Rails.cache.fetch stores the value evaluated from the block passed into this method (if there is one, of course). In your example, you're returning (0..9) range from the block, instead of actually evaluated [email id] pairs.

Is it possible to add DataUrl Image in RBPDF?

I am making pdf file from DataUrl Image in ruby on rails.
I have selected RBPDF to produce pdf file in server side.
But in this code I have following error
#pdf.Image(object["src"] , object["left"], object["top"], object["width"], object["height"])
Here object["src"] is DataUrl Image.
RuntimeError (RBPDF error: Missing image file:
data:image/jpeg;base64,/9j/4REORXhp...
Is it impossible to add RBPDF image from DataUrl image?
Adding files dynamically is not effective I think.

You may monkey patch the origin method.
I use the data_uri gem to parse the image data.
require 'data_uri'
require 'rmagick'
module Rbpdf
alias_method :old_getimagesize, :getimagesize
# #param [String] date_url
def getimagesize(date_url)
if date_url.start_with? 'data:'
uri = URI::Data.new date_url
image_from_blob = Magick::Image.from_blob(uri.data)
origin_process_image(image_from_blob[0])
else
old_getimagesize date_url
end
end
# this method is extracted without comments from the origin implementation of getimagesize
def origin_process_image(image)
out = Hash.new
out[0] = image.columns
out[1] = image.rows
case image.mime_type
when "image/gif"
out[2] = "GIF"
when "image/jpeg"
out[2] = "JPEG"
when "image/png"
out[2] = "PNG"
when " image/vnd.wap.wbmp"
out[2] = "WBMP"
when "image/x-xpixmap"
out[2] = "XPM"
end
out[3] = "height=\"#{image.rows}\" width=\"#{image.columns}\""
out['mime'] = image.mime_type
case image.colorspace.to_s.downcase
when 'cmykcolorspace'
out['channels'] = 4
when 'rgbcolorspace', 'srgbcolorspace' # Mac OS X : sRGBColorspace
if image.image_type.to_s == 'GrayscaleType' and image.class_type.to_s == 'PseudoClass'
out['channels'] = 0
else
out['channels'] = 3
end
when 'graycolorspace'
out['channels'] = 0
end
out['bits'] = image.channel_depth
out
end
end

Ruby on Rails open URI issue with broken file source

I am having a hell of a problem here.
I'm using ruby on rails:
ruby 1.8.7 (2011-12-10 patchlevel 356)
rails 2.3.14
I'm trying a simple open with open-uri on the following address:
http://jollymag.net/n/10390-летни-секс-пози-във-водата.html (link is NSFW)
However the resulting file when read produces a weird (broken) string.
This was tested on ruby 1.9.3 and rails 3.2.x too.
require 'open-uri'
url = 'http://jollymag.net/n/10390-летни-секс-пози-във-водата.html'
url = URI.encode(url)
file = open(url)
doc = file.collect.to_s # <- the document is broken
document = Nokogiri::HTML.parse(doc,nil,"utf8")
puts document # <- the document after nokogiri has one line of content
I tried Iconv stuff and others but nothing works. The above code is more or less a minimal isolated case for the exact problem.
I appreciate any help since I'm trying to figure this bug for a couple of days now.
Regards,
Yavor

So the problem was a tricky one for me.
It appears that some servers return only gzip-ed response.
So in order to read you of course have to read it accordingly.
I decided to post my whole crawl code so people might find a more complete solutions to such problems. This is part of a bigger class so it refers a lot of the times to self.
Hope it helps!
SHINSO_HEADERS = {
'Accept' => '*/*',
'Accept-Charset' => 'utf-8, windows-1251;q=0.7, *;q=0.6',
'Accept-Encoding' => 'gzip,deflate',
'Accept-Language' => 'bg-BG, bg;q=0.8, en;q=0.7, *;q=0.6',
'Connection' => 'keep-alive',
'From' => 'support#xenium.bg',
'Referer' => 'http://svejo.net/',
'User-Agent' => 'Mozilla/5.0 (compatible; Shinso/1.0;'
}
def crawl(url_address)
self.errors = Array.new
begin
begin
url_address = URI.parse(url_address)
rescue URI::InvalidURIError
url_address = URI.decode(url_address)
url_address = URI.encode(url_address)
url_address = URI.parse(url_address)
end
url_address.normalize!
stream = ""
timeout(10) { stream = url_address.open(SHINSO_HEADERS) }
if stream.size > 0
url_crawled = URI.parse(stream.base_uri.to_s)
else
self.errors << "Server said status 200 OK but document file is zero bytes."
return
end
rescue Exception => exception
self.errors << exception
return
end
# extract information before html parsing
self.url_posted = url_address.to_s
self.url_parsed = url_crawled.to_s
self.url_host = url_crawled.host
self.status = stream.status
self.content_type = stream.content_type
self.content_encoding = stream.content_encoding
self.charset = stream.charset
if stream.content_encoding.include?('gzip')
document = Zlib::GzipReader.new(stream).read
elsif stream.content_encoding.include?('deflate')
document = Zlib::Deflate.new().deflate(stream).read
#elsif stream.content_encoding.include?('x-gzip') or
#elsif stream.content_encoding.include?('compress')
else
document = stream.read
end
self.charset_guess = CharGuess.guess(document)
if not self.charset_guess.blank? or
not self.charset_guess == 'utf-8' or
not self.charset_guess == 'utf8'
document = Iconv.iconv("UTF-8", self.charset_guess , document).to_s
end
document = Nokogiri::HTML.parse(document,nil,"utf8")
document.xpath('//script').remove
document.xpath('//SCRIPT').remove
for item in document.xpath('//*[translate(#src, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz")]')
item.set_attribute('src',make_absolute_address(item['src']))
end
document = document.to_s.gsub(/<!--(.|\s)*?-->/,'')
#document = document.to_s.gsub(/\<![ \r\n\t]*(--([^\-]|[\r\n]|-[^\-])*--[ \r\n\t]*)\>/,'')
self.content = Nokogiri::HTML.parse(document,nil,"utf8")
end

Spec for File creation and writing into File

I am new here. I am working on a project with some tests. I have some problems with writing spec for a class. I am done with some simple specs but I have no clue how to write for this one. Any help will be highly appreciated.
My class
Class Writer
def initialize(filepath)
#filepath = RAILS_ROOT + filepath
#xml_document = Nokogiri::XML::Document.new
end
def open
File.open(#filepath,"w") do |f|
#gz = Zlib::GzipWriter.new(f)
#gz.write(%[<?xml version="1.0" encoding="UTF-8"?>\n])
#gz.write(%[<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n])
yield self
#gz.write(%[</urlset>])
#gz.close
end
end
def write_entry_to_xml(entry)
node = Nokogiri::XML::Node.new( "url" , #xml_document )
node["loc"] = entry.loc
node["changefreq"] = entry.changfreq
node["priority"] = entry.priority
node["lastmod"] = entry.lastmod
#gz.write(node.to_xml)
end
end
What I have written so far is as follows
describe "writer" do
before :each do
#time = Time.now
#filepath = RAILS_ROOT + "/public/sitemap/test/sitemap_test.xml.gz"
File.open(#filepath,"w") do |f|
#gz = Zlib::GzipWriter.new(f)
end
#xml_document = Nokogiri::XML::Document.new
#entry = Sitemap::Entry.new("location", "monthly", "0.8", #time)
end
describe "open" do
it "should create a file and write xml entries to it" do
end
end
describe "write_entry_to_xml" do
it "should format and entry to xml node and write it" do
node = Nokogiri::XML::Node.new( "url" , #xml_document )
node["loc"].should == #entry.loc
node["changefreq"].should == #entry.changfreq
node["priority"].should == #entry.priority
node["lastmod"].shoul == #entry.lastmod
end
end
Can anyone help me in writing the complete specs for this class.
Thanks in advance

I don't have time to do all this for you, but here are examples of how I am testing my code:
actual code
it's spec
Notice this: Ropet::Config.expects(:new).returns(config), this can be used for your Nokogiri::XML::Node#new.
My specs use RSpec and Mocha, I like the simplicity of this setup and what can be done with those simple tools.
Edit: rough spec for
def write_entry_to_xml(entry)
node = Nokogiri::XML::Node.new( "url" , #xml_document )
node["loc"] = entry.loc
node["changefreq"] = entry.changfreq
node["priority"] = entry.priority
node["lastmod"] = entry.lastmod
#gz.write(node.to_xml)
end
It could be something like this, though i don't know the purpose of your code.
it 'writes entry to xml' do
content = double('output')
node = double('node'); node.should_receive(:to_xml).and_return(content);
gz = double('gz'); gz.should_receive(:write).with(content)
w = Writer.new("some_path"); w.open
w.instance_variable_set(:#gz, gz) # i'm guessing #gz is assigned after open only?
entry = # i don't know what entry is
Nokogiri::XML::Node.stub(:new).and_return(node)
node.should_receive(:[]).with("loc", entry.loc)
node.should_receive(:[]).with("changefreq", entry.changefreq)
node.should_receive(:[]).with("priority", entry.priority)
node.should_receive(:[]).with("lastmod", entry.lastmod)
w.write_entry_to_xml(entry)
end

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Nokogiri+Paperclip = XML parse with img from url - ruby-on-rails

Thanks TheChamp, you led me to the right thoughts. First need to parse URL and after that open. image = Image.new ad_image_url = URI.parse("#{img}") image.image = open(ad_image_url) image.watermark = true image.save!

Related

Access ActiveStorageBlob or ActiveStorageAttachment like it would be a native model

Trouble with low level caching in rails

Is it possible to add DataUrl Image in RBPDF?

Ruby on Rails open URI issue with broken file source

Spec for File creation and writing into File

Categories

Resources