Ruby on Rails open URI issue with broken file source - ruby-on-rails

I am having a hell of a problem here.
I'm using ruby on rails:
ruby 1.8.7 (2011-12-10 patchlevel 356)
rails 2.3.14
I'm trying a simple open with open-uri on the following address:
http://jollymag.net/n/10390-летни-секс-пози-във-водата.html (link is NSFW)
However the resulting file when read produces a weird (broken) string.
This was tested on ruby 1.9.3 and rails 3.2.x too.
require 'open-uri'
url = 'http://jollymag.net/n/10390-летни-секс-пози-във-водата.html'
url = URI.encode(url)
file = open(url)
doc = file.collect.to_s # <- the document is broken
document = Nokogiri::HTML.parse(doc,nil,"utf8")
puts document # <- the document after nokogiri has one line of content
I tried Iconv stuff and others but nothing works. The above code is more or less a minimal isolated case for the exact problem.
I appreciate any help since I'm trying to figure this bug for a couple of days now.
Regards,
Yavor

So the problem was a tricky one for me.
It appears that some servers return only gzip-ed response.
So in order to read you of course have to read it accordingly.
I decided to post my whole crawl code so people might find a more complete solutions to such problems. This is part of a bigger class so it refers a lot of the times to self.
Hope it helps!
SHINSO_HEADERS = {
'Accept' => '*/*',
'Accept-Charset' => 'utf-8, windows-1251;q=0.7, *;q=0.6',
'Accept-Encoding' => 'gzip,deflate',
'Accept-Language' => 'bg-BG, bg;q=0.8, en;q=0.7, *;q=0.6',
'Connection' => 'keep-alive',
'From' => 'support#xenium.bg',
'Referer' => 'http://svejo.net/',
'User-Agent' => 'Mozilla/5.0 (compatible; Shinso/1.0;'
}
def crawl(url_address)
self.errors = Array.new
begin
begin
url_address = URI.parse(url_address)
rescue URI::InvalidURIError
url_address = URI.decode(url_address)
url_address = URI.encode(url_address)
url_address = URI.parse(url_address)
end
url_address.normalize!
stream = ""
timeout(10) { stream = url_address.open(SHINSO_HEADERS) }
if stream.size > 0
url_crawled = URI.parse(stream.base_uri.to_s)
else
self.errors << "Server said status 200 OK but document file is zero bytes."
return
end
rescue Exception => exception
self.errors << exception
return
end
# extract information before html parsing
self.url_posted = url_address.to_s
self.url_parsed = url_crawled.to_s
self.url_host = url_crawled.host
self.status = stream.status
self.content_type = stream.content_type
self.content_encoding = stream.content_encoding
self.charset = stream.charset
if stream.content_encoding.include?('gzip')
document = Zlib::GzipReader.new(stream).read
elsif stream.content_encoding.include?('deflate')
document = Zlib::Deflate.new().deflate(stream).read
#elsif stream.content_encoding.include?('x-gzip') or
#elsif stream.content_encoding.include?('compress')
else
document = stream.read
end
self.charset_guess = CharGuess.guess(document)
if not self.charset_guess.blank? or
not self.charset_guess == 'utf-8' or
not self.charset_guess == 'utf8'
document = Iconv.iconv("UTF-8", self.charset_guess , document).to_s
end
document = Nokogiri::HTML.parse(document,nil,"utf8")
document.xpath('//script').remove
document.xpath('//SCRIPT').remove
for item in document.xpath('//*[translate(#src, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz")]')
item.set_attribute('src',make_absolute_address(item['src']))
end
document = document.to_s.gsub(/<!--(.|\s)*?-->/,'')
#document = document.to_s.gsub(/\<![ \r\n\t]*(--([^\-]|[\r\n]|-[^\-])*--[ \r\n\t]*)\>/,'')
self.content = Nokogiri::HTML.parse(document,nil,"utf8")
end

Related

Access ActiveStorageBlob or ActiveStorageAttachment like it would be a native model

Would it be possible to access the ActiveStorageBlob or ActiveStorageAttachment like it would be a native model ?
E.g.
I want to do ActiveStorageBlob.first to access the first record of this model/table.
or. ActiveStorageAttachment.all.as_json to generate json formated print.
The background idea is to find a way how to dump the content of these ActiveStorage related tables as json formated files. Then change simething on these files, and load it back.
----Extending this text after got correct answer-----
Thank you very much Sarah Marie.
And I hope you know how to load the JSON data back into these tables ?
I have tried this :
dump_file_path = File.join(Rails.root, "backup", active_storage_blobs_file)
load_json = JSON.parse(File.read(dump_file_path))
load_json.each do |j|
ActiveStorage::Blob.create(j)
end
But thats not working.
ActiveModel::UnknownAttributeError (unknown attribute
'attachable_sgid' for ActiveStorage::Blob.)
ActiveStorage::Blob.first
ActiveStorage::Attachment.all.as_json
---- For second extended question ----
ActiveStorage::Blob.create_before_direct_upload!(
filename: j[:filename],
content_type: j[:content_type],
byte_size: j[:byte_size],
checksum: j[:checksum]
)
# or
ActiveStorage::Blob.create_before_direct_upload!(**j.symbolize_keys)
Reference: https://github.com/rails/rails/blob/5f3ff60084ab5d5921ca3499814e4697f8350ee7/activestorage/app/controllers/active_storage/direct_uploads_controller.rb#L8-L9
https://github.com/rails/rails/blob/098fd7f9b3d5c6f540911bc0c17207d6b48d5bb3/activestorage/app/models/active_storage/blob.rb#L113-L120
Now I have a complete solution, how to dump and load the ActiveStorage tables as JSON files.
...dump it
active_storage_blobs_file = "active_storage_blob.json"
active_storage_attachments_file = "active_storage_attachment.json"
puts("...dump active_storage_blob")
dump_file_path = File.join(Rails.root, "backup",active_storage_blobs_file)
dump_file = File.open(dump_file_path, "w")
dump_file.write(JSON.pretty_generate(ActiveStorage::Blob.all.as_json))
dump_file.close()
puts("...dump active_storage_attachment")
dump_file_path = File.join(Rails.root, "backup",
active_storage_attachments_file)
dump_file = File.open(dump_file_path, "w")
dump_file.write(JSON.pretty_generate(ActiveStorage::Attachment.all.as_json))
dump_file.close()
...load it back
puts("...load active_storage_blob")
dump_file_path = File.join(Rails.root, "backup", active_storage_blobs_file)
abort("File does not exist (" + dump_file_path + ") > abort <") unless File.exist?(dump_file_path)
load_json = JSON.parse(File.read(dump_file_path))
load_json.each do |j|
j = j.except("attachable_sgid")
result = ActiveStorage::Blob.create(j)
if (not result.errors.empty?)
puts(result.errors.full_messages.to_s)
puts(j.inspect)
exit(1)
end
end
puts("...load active_storage_attachment")
dump_file_path = File.join(Rails.root, "backup", active_storage_attachments_file)
abort("File does not exist (" + dump_file_path + ") > abort <") unless File.exist?(dump_file_path)
load_json = JSON.parse(File.read(dump_file_path))
load_json.each do |j|
result = ActiveStorage::Attachment.create(j)
if (not result.errors.empty?)
puts(result.errors.full_messages.to_s)
puts(j.inspect)
exit(1)
end
end

Trouble with low level caching in rails

I have some issues with caching in rails. I don't find how should i setting it.
Here's the code :
submit_key = nil
pairs_email = Hash.new
pairs_type = Rails.cache.fetch("cache_typeform", :expires_in => 1.day) do
(0..9).each do
if submit_key.present?
url = "https://api.typeform.com/forms/#{typeform_id}/responses?page_size=1000&until=#{submit_key}"
response = RestClient.get url, {:Authorization => 'Bearer XXXXXXXXXXX'}
parsed = JSON.parse(response.body)
else
response = RestClient.get "https://api.typeform.com/forms/#{typeform_id}/responses?page_size=1000", {:Authorization => 'Bearer XXXXXXXXXXXXXXX}
parsed = JSON.parse(response.body)
end
parsed['items'].each do |item|
pairs_email[item['hidden']['email']] = item['token'] if item['hidden']['email'].present?
end
submit_key = parsed['items'][-1]['submitted_at'].chop
end
end
Then it should return a pairs containing an email and an ID and this pairs is used after to get more informations. However, nothing is returning.
Does someone can tell me what I've done wrong in my code? Am I missing something somewhere?
UPDATE
I want to use my cache for getting informations from the typeform API :
results = Hash.new
if pairs_email[email].present?
url = "https://api.typeform.com/v1/form/#{typeform_id}?key=#{ENV['TYPEFORM_API_KEY']}&token=#{pairs_email[email]}"
response = RestClient.get(url)
parsed = JSON.parse(response.body)
results["email"] = parsed["responses"][0]["hidden"]["email"] # Email
results["first_name"] = parsed["responses"][0]["answers"]["textfield_25078009"] # prénom
results["last_name"] = parsed["responses"][0]["answers"]["textfield_25078014"] # nom
results["phone_number"] = parsed["responses"][0]["answers"]["textfield_25444504"] #N°
results["job"] = parsed["responses"][0]["answers"]["textfield_24904749"] # métier
results["status_legal"] = parsed["responses"][0]["answers"]["list_24904751_choice"] # statut légal ?
results["birthdate"] = parsed["responses"][0]["answers"]["date_24904754"] # Date de naissance
results["zipcode"] = parsed["responses"][0]["answers"]["number_24904755"] # Code postal
results["has_partner"] = parsed["responses"][0]["answers"]["yesno_53894471"] # has_partner
results["children"] = parsed["responses"][0]["answers"]["list_53894494_choice"] # Nombre d'enfants
results["optical_option"] = parsed["responses"][0]["answers"]["list_24904752_choice_32209601"] # optical_option
results["dental_option"] = parsed["responses"][0]["answers"]["list_24904752_choice_32209602"] # dental_option
results["sick_15d"] = parsed["responses"][0]["answers"]["list_24904752_choice_32209603"] # Sick_15d
results["target_year"] = parsed["responses"][0]["answers"]["list_24905736_choice"] # target_year
results["monthly_income"] = parsed["responses"][0]["answers"]["number_24904756"] # monthly_income
results["independent"] = parsed["responses"][0]["answers"]["yesno_53895024"] # independent_1_year
#results["subject_to_discuss"] = parsed["responses"][0]["answers"]["textarea_24904759"] # Avez-vous des sujets dont vous voulez discuter
end
Here's something you must try before getting caching right. Attaching a screenshot from my machine.
Also if you are in development environment you would need to enable caching to see the effect. You could add config.action_controller.perform_caching = true and config.cache_store = :memory_store, { size: 64.megabytes } to your development.rb config file to enable caching.
This is just an idea of how caching happens and check if it really works, this should help you get going with your task.
Rails.cache.fetch stores the value evaluated from the block passed into this method (if there is one, of course). In your example, you're returning (0..9) range from the block, instead of actually evaluated [email id] pairs.

Rspec Ruby on Rails Test File System in model

I have a model that has a method that looks through the filesystem starting at a particular location for files that match a particular regex. This is executed in an after_save callback. I'm not sure how to test this using Rspec and FactoryGirl. I'm not sure how to use something like FakeFS with this because the method is in the model, not the test or the controller. I specify the location to start in my FactoryGirl factory, so I could change that to a fake directory created by the test in a set up clause? I could mock the directory? I think there are probably several different ways I could do this, but which makes the most sense?
Thanks!
def ensure_files_up_to_date
files = find_assembly_files
add_files = check_add_assembly_files(files)
errors = add_assembly_files(add_files)
if errors.size > 0 then
return errors
end
update_files = check_update_assembly_files(files)
errors = update_assembly_files(update_files)
if errors.size > 0 then
return errors
else
return []
end
end
def find_assembly_files
start_dir = self.location
files = Hash.new
if ! File.directory? start_dir then
errors.add(:location, "Directory #{start_dir} does not exist on the system.")
abort("Directory #{start_dir} does not exist on the system for #{self.inspect}")
end
Find.find(start_dir) do |path|
filename = File.basename(path).split("/").last
FILE_TYPES.each { |filepart, filehash|
type = filehash["type"]
vendor = filehash["vendor"]
if filename.match(filepart) then
files[type] = Hash.new
files[type]["path"] = path
files[type]["vendor"] = vendor
end
}
end
return files
end
def check_add_assembly_files(files=self.find_assembly_files)
add = Hash.new
files.each do |file_type, file_hash|
# returns an array
file_path = file_hash["path"]
file_vendor = file_hash["vendor"]
filename = File.basename(file_path)
af = AssemblyFile.where(:name => filename)
if af.size == 0 then
add[file_path] = Hash.new
add[file_path]["type"] = file_type
add[file_path]["vendor"] = file_vendor
end
end
if add.size == 0 then
logger.error("check_add_assembly_files did not find any files to add")
return []
end
return add
end
def check_update_assembly_files(files=self.find_assembly_files)
update = Hash.new
files.each do |file_type, file_hash|
file_path = file_hash["path"]
file_vendor = file_hash["vendor"]
# returns an array
filename = File.basename(file_path)
af = AssemblyFile.find_by_name(filename)
if !af.nil? then
if af.location != file_path or af.file_type != file_type then
update[af.id] = Hash.new
update[af.id]['path'] = file_path
update[af.id]['type'] = file_type
update[af.id]['vendor'] = file_vendor
end
end
end
return update
end
def add_assembly_files(files=self.check_add_assembly_files)
if files.size == 0 then
logger.error("add_assembly_files didn't get any results from check_add_assembly_files")
return []
end
asm_file_errors = Array.new
files.each do |file_path, file_hash|
file_type = file_hash["type"]
file_vendor = file_hash["vendor"]
logger.debug "file type is #{file_type} and path is #{file_path}"
logger.debug FileType.find_by_type_name(file_type)
file_type_id = FileType.find_by_type_name(file_type).id
header = file_header(file_path, file_vendor)
if file_vendor == "TBA" then
check = check_tba_header(header, file_type, file_path)
software = header[TBA_SOFTWARE_PROGRAM]
software_version = header[TBA_SOFTWARE_VERSION]
elsif file_vendor == "TBB" then
check = check_tbb_header(header, file_type, file_path)
if file_type == "TBB-ANNOTATION" then
software = header[TBB_SOURCE]
else
software = "Unified"
end
software_version = "UNKNOWN"
end
if check == 0 then
logger.error("skipping file #{file_path} because it contains incorrect values for this filetype")
asm_file_errors.push("#{file_path} cannot be added to assembly because it contains incorrect values for this filetype")
next
end
if file_vendor == "TBA" then
xml = header.to_xml(:root => "assembly-file")
elsif file_vendor == "TBB" then
xml = header.to_xml
else
xml = ''
end
filename = File.basename(file_path)
if filename.match(/~$/) then
logger.error("Skipping a file with a tilda when adding assembly files. filename #{filename}")
next
end
assembly_file = AssemblyFile.new(
:assembly_id => self.id,
:file_type_id => file_type_id,
:name => filename,
:location => file_path,
:file_date => creation_time(file_path),
:software => software,
:software_version => software_version,
:current => 1,
:metadata => xml
)
assembly_file.save! # exclamation point forces it to raise an error if the save fails
end # end files.each
return asm_file_errors
end
Quick answer: you can stub out model methods like any others. Either stub a specific instance of a model, and then stub find or whatever to return that, or stub out any_instance to if you don't want to worry about which model is involved. Something like:
it "does something" do
foo = Foo.create! some_attributes
foo.should_receive(:some_method).and_return(whatever)
Foo.stub(:find).and_return(foo)
end
The real answer is that your code is too complicated to test effectively. Your models should not even know that a filesystem exists. That behavior should be encapsulated in other classes, which you can test independently. Your model's after_save can then just call a single method on that class, and testing whether or not that single method gets called will be a lot easier.
Your methods are also very difficult to test, because they are trying to do too much. All that conditional logic and external dependencies means you'll have to do a whole lot of mocking to get to the various bits you might want to test.
This is a big topic and a good answer is well beyond the scope of this answer. Start with the Wikipedia article on SOLID and read from there for some of the reasoning behind separating concerns into individual classes and using tiny, composed methods. To give you a ballpark idea, a method with more than one branch or more than 10 lines of code is too big; a class that is more than about 100 lines of code is too big.

twitter update_with_media and carrierwave

I am trying to post updates from heroku using carrierwave to twitter... with media.
http://rdoc.info/gems/twitter/Twitter/API/Tweets#update_with_media-instance_method
I can do it without media, but when I try media, I keep running into problems.
Twitter.update_with_media("message", File.new(picture.picture_url.to_s))
I get the error:
Errno::ENOENT (No such file or directory - https://amazonlinktopicture)
Any ideas? I tried with File.open also and it didn't work.
Just for benefit of other
> Source
require 'twitter'
require 'open-uri'
config = YAML.load_file('twitter.yml')
Twitter.configure do |c|
c.consumer_key = config['consumer_key']
c.consumer_secret = config['consumer_secret']
c.oauth_token = config['oauth_token']
c.oauth_token_secret = config['oauth_token_secret']
end
# Tempfile
begin
uri = URI.parse('https://dev.twitter.com/sites/default/files/images_terms/tweet-display-guidelines-20110405.png')
media = uri.open
media.instance_eval("def original_filename; '#{File.basename(uri.path)}'; end")
p Twitter.update_with_media(Time.now.to_s, media)
rescue => e
p e
end
# StringIO
begin
uri = URI.parse('http://a3.twimg.com/a/1315421129/images/logos/twitter_newbird_blue.png')
media = uri.open
media.instance_eval("def original_filename; '#{File.basename(uri.path)}'; end")
p Twitter.update_with_media(Time.now.to_s, media)
rescue => e
p e
end
require 'open-uri'
Twitter.update_with_media("message", open(picture.picture_url.to_s) {|f| f.read })
begin
twitter_client = Twitter::REST::Client.new do |c|
c.consumer_key = config['consumer_key']
c.consumer_secret = config['consumer_secret']
c.oauth_token = config['oauth_token']
c.oauth_token_secret = config['oauth_token_secret']
end
twitter_client.update_with_media(message, open(picture.picture_url))
rescue Exception => exc
#message = exc.message
end
begin
twitter_client = Twitter::REST::Client.new do |client|
client.consumer_key = config['consumer_key']
client.consumer_secret = config['consumer_secret']
client.oauth_token = config['oauth_token']
client.oauth_token_secret = config['oauth_token_secret']
end
twitter_client.update_with_media(message, open(picture.picture_url))
rescue Exception => exc
#message = exc.message
end

RubyAmf and Rails 3

I have recently been trying to upgrade my app form Rails 2.3.8 to newly-releases Rails 3.
After going through fixing some Rails 3 RubyAMF doesn't seem to work:
>>>>>>>> RubyAMF >>>>>>>>> #<RubyAMF::Actions::PrepareAction:0x1649924> took: 0.00017 secs
The action '#<ActionDispatch::Request:0x15c0cf0>' could not be found for DaysController
/Users/tammam56/.rvm/gems/ruby-1.9.2-p0/gems/actionpack-3.0.0/lib/abstract_controller/base.rb:114:in `process'
/Users/tammam56/.rvm/gems/ruby-1.9.2-p0/gems/actionpack-3.0.0/lib/abstract_controller/rendering.rb:40:in `process'
It doesn't seem to be able to find the proper controller. Might have to do with new changes in Rails 3 Router. Do you know how to go about finding the root cause of the problem and/or trying to fix it?
I'm pasting code from RubyAMF where this is happening (Exception happens at the line: #service.process(req, res)):
#invoke the service call
def invoke
begin
# RequestStore.available_services[#amfbody.service_class_name] ||=
#service = #amfbody.service_class_name.constantize.new #handle on service
rescue Exception => e
puts e.message
puts e.backtrace
raise RUBYAMFException.new(RUBYAMFException.UNDEFINED_OBJECT_REFERENCE_ERROR, "There was an error loading the service class #{#amfbody.service_class_name}")
end
if #service.private_methods.include?(#amfbody.service_method_name.to_sym)
raise RUBYAMFExc
eption.new(RUBYAMFException.METHOD_ACCESS_ERROR, "The method {#{#amfbody.service_method_name}} in class {#{#amfbody.service_class_file_path}} is declared as private, it must be defined as public to access it.")
elsif !#service.public_methods.include?(#amfbody.service_method_name.to_sym)
raise RUBYAMFException.new(RUBYAMFException.METHOD_UNDEFINED_METHOD_ERROR, "The method {#{#amfbody.service_method_name}} in class {#{#amfbody.service_class_file_path}} is not declared.")
end
#clone the request and response and alter it for the target controller/method
req = RequestStore.rails_request.clone
res = RequestStore.rails_response.clone
#change the request controller/action targets and tell the service to process. THIS IS THE VOODOO. SWEET!
controller = #amfbody.service_class_name.gsub("Controller","").underscore
action = #amfbody.service_method_name
req.parameters['controller'] = req.request_parameters['controller'] = req.path_parameters['controller'] = controller
req.parameters['action'] = req.request_parameters['action'] = req.path_parameters['action'] = action
req.env['PATH_INFO'] = req.env['REQUEST_PATH'] = req.env['REQUEST_URI'] = "#{controller}/#{action}"
req.env['HTTP_ACCEPT'] = 'application/x-amf,' + req.env['HTTP_ACCEPT'].to_s
#set conditional helper
#service.is_amf = true
#service.is_rubyamf = true
#process the request
rubyamf_params = #service.rubyamf_params = {}
if #amfbody.value && !#amfbody.value.empty?
#amfbody.value.each_with_index do |item,i|
rubyamf_params[i] = item
end
end
# put them by default into the parameter hash if they opt for it
rubyamf_params.each{|k,v| req.parameters[k] = v} if ParameterMappings.always_add_to_params
begin
#One last update of the parameters hash, this will map custom mappings to the hash, and will override any conflicting from above
ParameterMappings.update_request_parameters(#amfbody.service_class_name, #amfbody.service_method_name, req.parameters, rubyamf_params, #amfbody.value)
rescue Exception => e
raise RUBYAMFException.new(RUBYAMFException.PARAMETER_MAPPING_ERROR, "There was an error with your parameter mappings: {#{e.message}}")
end
#service.process(req, res)
#unset conditional helper
#service.is_amf = false
#service.is_rubyamf = false
#service.rubyamf_params = rubyamf_params # add the rubyamf_args into the controller to be accessed
result = RequestStore.render_amf_results
#handle FaultObjects
if result.class.to_s == 'FaultObject' #catch returned FaultObjects - use this check so we don't have to include the fault object module
e = RUBYAMFException.new(result['code'], result['message'])
e.payload = result['payload']
raise e
end
#amf3
#amfbody.results = result
if #amfbody.special_handling == 'RemotingMessage'
#wrapper = generate_acknowledge_object(#amfbody.get_meta('messageId'), #amfbody.get_meta('clientId'))
#wrapper["body"] = result
#amfbody.results = #wrapper
end
#amfbody.success! #set the success response uri flag (/onResult)
end
The best suggestion is to try rails3-amf. It currently is severely lacking in features in comparison to RubyAMF, but it does work and I'm adding new features as soon as they are requested or I have time.

Resources