I'm trying to create a method that deletes files on an S3 instance, but I am getting a AWS::S3::Errors::NoSuchKey: No Such Key error when I try to call .head or .read on an object.
app/models/file_item.rb
def thumbnail
{
exists: thumbnailable?,
small: "http://#{bucket}.s3.amazonaws.com/images/#{id}/small_thumb.png",
large: "http://#{bucket}.s3.amazonaws.com/images/#{id}/large_thumb.png"
}
end
lib/adapters/amazons3/accessor.rb
module Adapters
module AmazonS3
class Accessor
S3_BUCKET = AWS::S3.new.buckets[ENV['AMAZON_BUCKET']]
...
def self.delete_file(thumbnail)
prefix_pattern = %r{http://[MY-S3-HOST]-[a-z]+.s3.amazonaws.com/}
small_path = thumbnail[:small].sub(prefix_pattern, '')
large_path = thumbnail[:large].sub(prefix_pattern, '')
small = S3_BUCKET.objects[small_path]
large = S3_BUCKET.objects[large_path]
binding.pry
S3_BUCKET.objects.delete([small, large])
end
end
end
end
example url1
"http://projectname-staging.s3.amazonaws.com/images/994/small_thumb.png"
example url2
"http://projectname-production.s3.amazonaws.com/images/994/large_thumb.png"
assuming awssdk v1 for ruby.
small = S3_BUCKET.objects[small_path]
does not actually get any objects.
from: https://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/Bucket.html
bucket.objects['key'] #=> makes no request, returns an S3Object
bucket.objects.each do |obj|
puts obj.key
end
so you would need to alter your code to something like:
to_delete = []
S3_BUCKET.objects[small_path].each do |obj|
to_delete << obj.key
end
S3_BUCKET.objects[large_path].each do |obj|
to_delete << obj.key
end
S3_BUCKET.objects.delete(to_delete)
just banged out the code, so the idea is there, you might need to correct/polish it a bit
I was able to come of with a kind of different solution thanks to your answer of #Mircea above.
def self.delete_file(thumbnail)
folder = thumbnail[:small].match(/(\d+)(?!.*\d)/)
to_delete = []
S3_BUCKET.objects.with_prefix("images/#{folder}").each do |thumb|
to_delete << thumb.key
end
# binding.pry
S3_BUCKET.objects.delete(to_delete)
end
Related
So I have this code in my index action, would love to move it to a model, just a little confused on how to do it.
Original Code
def index
urls = %w[http://cltampa.com/blogs/potlikker http://cltampa.com/blogs/artbreaker http://cltampa.com/blogs/politicalanimals http://cltampa.com/blogs/earbuds http://cltampa.com/blogs/dailyloaf http://cltampa.com/blogs/bedpost]
#final_images = []
#final_urls = []
urls.each do |url|
blog = Nokogiri::HTML(open(url))
images = blog.xpath('//*[#class="postBody"]/div[1]//img/#src')
images.each do |image|
#final_images << image
end
story_path = blog.xpath('//*[#class="postTitle"]/a/#href')
story_path.each do |path|
#final_urls << path
end
end
end
I tested this code in my model and it works perfectly for one url, just not sure how to integrate all of the urls like the original code.
New Code
Model
class Photocloud < ActiveRecord::Base
attr_reader :url, :data
def initialize(url)
#url = url
end
def data
#data ||= Nokogiri::HTML(open(url))
end
def get_elements(path)
data.xpath(path)
end
end
Controller
def index
#scraper = Photocloud.new('http://cltampa.com/blogs/artbreaker')
#photos = #scraper.get_elements('//*[#class="postBody"]/div[1]//img/#src')
#story_urls = #scraper.get_elements('//*[#class="postBody"]/div[1]//img/#src')
end
My main questions are how would I initialize multiple urls and loop through them like my original code. I have tried different things but feel like I have hit a wall. I need to save them to the database, but would like to get this working first. Any help is greatly appreciated.
Updated Controller - WIP
def index
start_urls = %w[http://cltampa.com/blogs/potlikker
http://cltampa.com/blogs/artbreaker
http://cltampa.com/blogs/politicalanimals
http://cltampa.com/blogs/earbuds
http://cltampa.com/blogs/dailyloaf
http://cltampa.com/blogs/bedpost]
#scraper = Photocloud.new(start_urls)
#images =
#paths =
end
Need some help with this part...
It seems that you don't persist scraped images and paths to the database so Photocloud doesn't need to inherit from ActiveRecord::Base - it can be just a plain old ruby object (PORO):
class Photocloud
attr_reader :start_urls
attr_accessor :images, :paths
def initialize(start_urls)
#start_urls = start_urls
#images = []
#paths = []
end
def scrape
start_urls.each do |start_url|
blog = Nokogiri::HTML(open(url))
scrape_images(blog)
scrape_paths(blog)
end
end
private
def scrape_images(blog)
images = blog.xpath('//*[#class="postBody"]/div[1]//img/#src')
images.each do |image|
images << image
end
end
def scrape_paths(blog)
story_path = blog.xpath('//*[#class="postTitle"]/a/#href')
story_path.each do |path|
paths << path
end
end
end
In controller:
scraper = Photocloud.new(start_urls)
scraper.scrape
#images = scraper.images
#paths = scraper.paths
This is only one of the possibilities how you could structure code, of course.
Image files should be renamed as [variant-name]-[underscored-option-type].jpg for variants. I have come this far.
Updated Code
Spree::Image.class_eval do
after_save :change_file_name
private
def change_file_name
if self.viewable.kind_of? Spree::Variant
product_name = self.viewable.product.name.downcase.gsub(" ","_")
underscored_option_types = get_underscored_option_types
random_number = rand(10000...1000000)
extension = File.extname(self.attachment_file_name).downcase
attachment_file_name = product_name+"-"+underscored_option_types+"-"+"#{random_number}"+"#{extension}"
self.update_column(:attachment_file_name, attachment_file_name)
end
end
end
This code renames attachment_file_name column only. How to change image's name? Even self.save won't work, considering I escape recursive loop.
I had to rename the files at their respective locations, as there are different styles (version) of images stored. You have to rename each versions of an image at their respective locations. I hope the following code will help someone.cheers :)
Spree::Image.class_eval do
after_save :change_file_name
private
def change_file_name
#skip_change_file_name ||= false
return if #skip_change_file_name
if self.viewable.kind_of? Spree::Variant
product_name = self.viewable.product.name.downcase.gsub(" ","_")
underscored_option_types = get_underscored_option_types
random_number = rand(10000...1000000)
extension = File.extname(self.attachment_file_name).downcase
new_file_name = product_name+"-"+underscored_option_types+"-"+"#{random_number}"+"#{extension}"
(self.attachment.styles.keys+[:original]).each do |style|
FileUtils.move(self.attachment.path(style), File.join(File.dirname(self.attachment.path(style)), new_file_name))
end
self.attachment_file_name = new_file_name
#skip_change_file_name = true
self.save!
end
end
end
I have a model that has a method that looks through the filesystem starting at a particular location for files that match a particular regex. This is executed in an after_save callback. I'm not sure how to test this using Rspec and FactoryGirl. I'm not sure how to use something like FakeFS with this because the method is in the model, not the test or the controller. I specify the location to start in my FactoryGirl factory, so I could change that to a fake directory created by the test in a set up clause? I could mock the directory? I think there are probably several different ways I could do this, but which makes the most sense?
Thanks!
def ensure_files_up_to_date
files = find_assembly_files
add_files = check_add_assembly_files(files)
errors = add_assembly_files(add_files)
if errors.size > 0 then
return errors
end
update_files = check_update_assembly_files(files)
errors = update_assembly_files(update_files)
if errors.size > 0 then
return errors
else
return []
end
end
def find_assembly_files
start_dir = self.location
files = Hash.new
if ! File.directory? start_dir then
errors.add(:location, "Directory #{start_dir} does not exist on the system.")
abort("Directory #{start_dir} does not exist on the system for #{self.inspect}")
end
Find.find(start_dir) do |path|
filename = File.basename(path).split("/").last
FILE_TYPES.each { |filepart, filehash|
type = filehash["type"]
vendor = filehash["vendor"]
if filename.match(filepart) then
files[type] = Hash.new
files[type]["path"] = path
files[type]["vendor"] = vendor
end
}
end
return files
end
def check_add_assembly_files(files=self.find_assembly_files)
add = Hash.new
files.each do |file_type, file_hash|
# returns an array
file_path = file_hash["path"]
file_vendor = file_hash["vendor"]
filename = File.basename(file_path)
af = AssemblyFile.where(:name => filename)
if af.size == 0 then
add[file_path] = Hash.new
add[file_path]["type"] = file_type
add[file_path]["vendor"] = file_vendor
end
end
if add.size == 0 then
logger.error("check_add_assembly_files did not find any files to add")
return []
end
return add
end
def check_update_assembly_files(files=self.find_assembly_files)
update = Hash.new
files.each do |file_type, file_hash|
file_path = file_hash["path"]
file_vendor = file_hash["vendor"]
# returns an array
filename = File.basename(file_path)
af = AssemblyFile.find_by_name(filename)
if !af.nil? then
if af.location != file_path or af.file_type != file_type then
update[af.id] = Hash.new
update[af.id]['path'] = file_path
update[af.id]['type'] = file_type
update[af.id]['vendor'] = file_vendor
end
end
end
return update
end
def add_assembly_files(files=self.check_add_assembly_files)
if files.size == 0 then
logger.error("add_assembly_files didn't get any results from check_add_assembly_files")
return []
end
asm_file_errors = Array.new
files.each do |file_path, file_hash|
file_type = file_hash["type"]
file_vendor = file_hash["vendor"]
logger.debug "file type is #{file_type} and path is #{file_path}"
logger.debug FileType.find_by_type_name(file_type)
file_type_id = FileType.find_by_type_name(file_type).id
header = file_header(file_path, file_vendor)
if file_vendor == "TBA" then
check = check_tba_header(header, file_type, file_path)
software = header[TBA_SOFTWARE_PROGRAM]
software_version = header[TBA_SOFTWARE_VERSION]
elsif file_vendor == "TBB" then
check = check_tbb_header(header, file_type, file_path)
if file_type == "TBB-ANNOTATION" then
software = header[TBB_SOURCE]
else
software = "Unified"
end
software_version = "UNKNOWN"
end
if check == 0 then
logger.error("skipping file #{file_path} because it contains incorrect values for this filetype")
asm_file_errors.push("#{file_path} cannot be added to assembly because it contains incorrect values for this filetype")
next
end
if file_vendor == "TBA" then
xml = header.to_xml(:root => "assembly-file")
elsif file_vendor == "TBB" then
xml = header.to_xml
else
xml = ''
end
filename = File.basename(file_path)
if filename.match(/~$/) then
logger.error("Skipping a file with a tilda when adding assembly files. filename #{filename}")
next
end
assembly_file = AssemblyFile.new(
:assembly_id => self.id,
:file_type_id => file_type_id,
:name => filename,
:location => file_path,
:file_date => creation_time(file_path),
:software => software,
:software_version => software_version,
:current => 1,
:metadata => xml
)
assembly_file.save! # exclamation point forces it to raise an error if the save fails
end # end files.each
return asm_file_errors
end
Quick answer: you can stub out model methods like any others. Either stub a specific instance of a model, and then stub find or whatever to return that, or stub out any_instance to if you don't want to worry about which model is involved. Something like:
it "does something" do
foo = Foo.create! some_attributes
foo.should_receive(:some_method).and_return(whatever)
Foo.stub(:find).and_return(foo)
end
The real answer is that your code is too complicated to test effectively. Your models should not even know that a filesystem exists. That behavior should be encapsulated in other classes, which you can test independently. Your model's after_save can then just call a single method on that class, and testing whether or not that single method gets called will be a lot easier.
Your methods are also very difficult to test, because they are trying to do too much. All that conditional logic and external dependencies means you'll have to do a whole lot of mocking to get to the various bits you might want to test.
This is a big topic and a good answer is well beyond the scope of this answer. Start with the Wikipedia article on SOLID and read from there for some of the reasoning behind separating concerns into individual classes and using tiny, composed methods. To give you a ballpark idea, a method with more than one branch or more than 10 lines of code is too big; a class that is more than about 100 lines of code is too big.
I have a Xpath query which accepts array elements for output using Axslx, I need to tidy up my ouput for certain conditions one of which is the 'Software included'
My xpath scrapes the following URL http://h10010.www1.hp.com/wwpc/ie/en/ho/WF06b/321957-321957-3329742-89318-89318-5186820-5231694.html?dnr=1
A sample of my code is below:
clues = Array.new
clues << 'Optical drive'
clues << 'Pointing device'
clues << 'Software included'
selector = "//td[text()='%s']/following-sibling::td"
data = clues.map do |clue|
xpath = selector % clue
[clue, doc.at(xpath).text.strip]
end
Axlsx::Package.new do |p|
p.workbook.add_worksheet do |sheet|
data.each { |datum| sheet.add_row datum }
end
p.serialize 'output.xlsx'
end
My Current output formatting
My Desired output formatting
If you can rely on the data always using ';' for separators, have a go at this:
data = []
clues.each do |clue|
xpath = selector % clue
details = doc.at(xpath).text.strip.split(';')
data << [clue, details.pop]
details.each { |detail| data << ['', detail] }
end
to generate the data before the Axlsx::Package.new block
In answer to you comment/question: You do it with something like this ;)
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'axlsx'
class Scraper
def initialize(url, selector)
#url = url
#selector = selector
end
def hooks
#hooks ||= {}
end
def add_hook(clue, p_roc)
hooks[clue] = p_roc
end
def export(file_name)
Scraper.clues.each do |clue|
if detail = parse_clue(clue)
output << [clue, detail.pop]
detail.each { |datum| output << ['', datum] }
end
end
serialize(file_name)
end
private
def self.clues
#clues ||= ['Operating system', 'Processors', 'Chipset', 'Memory type', 'Hard drive', 'Graphics',
'Ports', 'Webcam', 'Pointing device', 'Keyboard', 'Network interface', 'Chipset', 'Wireless',
'Power supply type', 'Energy efficiency', 'Weight', 'Minimum dimensions (W x D x H)',
'Warranty', 'Software included', 'Product color']
end
def doc
#doc ||= begin
Nokogiri::HTML(open(#url))
rescue
raise ArgumentError, 'Invalid URL - Nothing to parse'
end
end
def output
#output ||= []
end
def selector_for_clue(clue)
#selector % clue
end
def parse_clue(clue)
if element = doc.at(selector_for_clue(clue))
call_hook(clue, element) || element.inner_html.split('<br>').each(&:strip)
end
end
def call_hook(clue, element)
if hooks[clue].is_a? Proc
value = hooks[clue].call(element)
value.is_a?(Array) ? value : [value]
end
end
def package
#package ||= Axlsx::Package.new
end
def serialize(file_name)
package.workbook.add_worksheet do |sheet|
output.each { |datum| sheet.add_row datum }
end
package.serialize(file_name)
end
end
scraper = Scraper.new("http://h10010.www1.hp.com/wwpc/ie/en/ho/WF06b/321957-321957-3329742-89318-89318-5186820-5231694.html?dnr=1", "//td[text()='%s']/following-sibling::td")
# define a custom action to take against any elements found.
os_parse = Proc.new do |element|
element.inner_html.split('<br>').each(&:strip!).each(&:upcase!)
end
scraper.add_hook('Operating system', os_parse)
scraper.export('foo.xlsx')
And the FINAL answer is... a gem.
http://rubydoc.info/gems/ninja2k/0.0.2/frames
I do the following
my_hash = Hash.new
my_hash[:children] = Array.new
I then have a function that calls itself a number of time each time writing to children
my_hash[:children] = my_replicating_function(some_values)
How do I write without overwriting data that is already written ?
This is what the entire function looks like
def self.build_structure(candidates, reports_id)
structure = Array.new
candidates.each do |candidate, index|
if candidate.reports_to == reports_id
structure = candidate
structure[:children] = Array.new
structure[:children] = build_structure(candidates, candidate.candidate_id)
end
end
structure
end
Maybe this:
structure[:children] << build_structure(candidates, candidate.candidate_id)
structure[:children] << build_structure(candidates, candidate.candidate_id)