This is my code:
#!/usr/bin/env ruby
# OptionParser
require 'optparse'
options = {}
optparse = OptionParser.new do|opts|
opts.banner = '...'
# This option inputs ...
options[:Lap1] = []
opts.on('-1', '--Lap1 filepath1,width1,height1,a1,first1,last1', String, '...') do|l1|
options[:Lap1] = l1.split(',')
end
end
optparse.parse!
My goal is to have an array of the separate inputs separated by commas. However this code only outputs the first variable $filepath1.
The output of:
puts(options[:Lap1])
and
puts(options[:Lap1][0]
is just the the first variable filepath1.
puts(options[:Lap1][1])
is nil, when it should be the variable width1.
Any suggestions or potential fixes would be helpful, thank you.
You need to call the parse! method on your OptionParser object after declaring all the options.
optparse.parse!
I'd write that code a bit differently:
require 'optparse'
require 'pp'
options = {
:Lap1 => []
}
optparse = OptionParser.new do |opts|
opts.banner = '...'
# This option inputs ...
opts.on(
'-1', '--Lap1 filepath1,width1,height1,a1,first1,last1',
Array,
'...'
) { |l1| options[:Lap1] = l1 }
end.parse!
pp options
Running that at the command-line:
ruby test.rb -1 filepath1,width1,height1,a1,first1,last1
Results in:
{:Lap1=>["filepath1", "width1", "height1", "a1", "first1", "last1"]}
Testing using --Lap1 results in:
ruby test.rb --Lap1 filepath1,width1,height1,a1,first1,last1
And:
{:Lap1=>["filepath1", "width1", "height1", "a1", "first1", "last1"]}
In my opinion, you should set your defaults when you define your options hash.
options = {
:Lap1 => []
}
Also, notice the use of Array instead of String. OptionParser will automatically split a comma-delimited string into an array of individual elements if you use Array, saving you that split step, and avoiding a bit of confusing code. See the documentation for OptionParser's make_switch method for more information.
Related
I have a method to capture the extension as a group using a regex:
def test(str)
word_match = str.match(/\.(\w*)/)
word_scan = str.scan(/\.(\w*)/)
puts word_match, word_scan
end
test("test.rb")
So it will return:
.rb
rb
Why would I get a different answer?
The reason is that match and scan return different objects. match returns either a MatchData object or a String while scan returns an Array. You can see this by calling the class method on your variables
puts word_match.class # => MatchData
puts word_scan.class # => Array
If you take a look at the to_s method on MatchData you'll notice it returns the entire matched string, rather than the captures. If you wanted just the captures you could use the captures method.
puts word_match.captures # => "rb"
puts word_match.captures.class # => Array
If you were to pass a block to the match method you would get a string back with similar results to the scan method.
word_match = str.match(/\.(\w*)/) { |m| m.captures } # => [["rb"]]
puts word_scan.inspect #=> ["rb"]
puts word_match #=> "rb
More information on these methods and how they work can be found in the ruby-doc for the String class.
Don't write your own code for this, take advantage of Ruby's own built-in code:
File.extname("test.rb") # => ".rb"
File.extname("a/b/d/test.rb") # => ".rb"
File.extname(".a/b/d/test.rb") # => ".rb"
File.extname("foo.") # => "."
File.extname("test") # => ""
File.extname(".profile") # => ""
File.extname(".profile.sh") # => ".sh"
You're missing some cases. Compare the above to the output of your attempts:
fnames = %w[
test.rb
a/b/d/test.rb
.a/b/d/test.rb
foo.
test
.profile
.profile.sh
]
fnames.map { |fn|
fn.match(/\.(\w*)/).to_s
}
# => [".rb", ".rb", ".a", ".", "", ".profile", ".profile"]
fnames.map { |fn|
fn.scan(/\.(\w*)/).to_s
}
# => ["[[\"rb\"]]",
# "[[\"rb\"]]",
# "[[\"a\"], [\"rb\"]]",
# "[[\"\"]]",
# "[]",
# "[[\"profile\"]]",
# "[[\"profile\"], [\"sh\"]]"]
The documentation for File.extname says:
Returns the extension (the portion of file name in path starting from the last period).
If path is a dotfile, or starts with a period, then the starting dot is not dealt with the start of the extension.
An empty string will also be returned when the period is the last character in path.
On Windows, trailing dots are truncated.
The File class has many more useful methods to pick apart filenames. There's also the Pathname class which is very useful for similar things.
I have a Rake task in my Rails app which looks into a folder for an XML file, parses it, and saves it to a database. The code works OK, but I have about 2100 files totaling 1.5GB, and processing is very slow, about 400 files in 7 hours. There are approximately 600-650 contracts in each XML file, and each contract can have 0 to n attachments. I did not paste all values, but each contract has 25 values.
To speed-up the process I use Activerecord's Import gem, so I am building an array per file and when the whole file is parsed. I do a mass import to Postgres. Only if a record is found is it directly updated and/or a new attachment inserted, but this is like 1 out of 100000 records. This helps a little, instead of doing new record per contract, but now I see that the slow part is XML parsing. Can you please look if I am doing something wrong in my parsing?
When I tried to print the arrays I am building, the slow part was until it loaded/parsed whole file and starts printing array by array. Thats why I assume the probem with speed is in parsing as Nokogiri loads the whole XML before it starts.
require 'nokogiri'
require 'pp'
require "activerecord-import/base"
ActiveRecord::Import.require_adapter('postgresql')
namespace :loadcrz2 do
desc "this task load contracts from crz xml files to DB"
task contracts: :environment do
actual_dir = File.dirname(__FILE__).to_s
Dir.foreach(actual_dir+'/../../crzfiles') do |xmlfile|
next if xmlfile == '.' or xmlfile == '..' or xmlfile == 'archive'
page = Nokogiri::XML(open(actual_dir+"/../../crzfiles/"+xmlfile))
puts xmlfile
cons = page.xpath('//contracts/*')
contractsarr = []
#c =[]
cons.each do |contract|
name = contract.xpath("name").text
crzid = contract.xpath("ID").text
procname = contract.xpath("procname").text
conname = contract.xpath("contractorname").text
subject = contract.xpath("subject").text
dateeff = contract.xpath("dateefficient").text
valuecontract = contract.xpath("value").text
attachments = contract.xpath('attachments/*')
attacharray = []
attachments.each do |attachment|
attachid = attachment.xpath("ID").text
attachname = attachment.xpath("name").text
doc = attachment.xpath("document").text
size = attachment.xpath("size").text
arr = [attachid,attachname,doc,size]
attacharray.push arr
end
#con = Crzcontract.find_by_crzid(crzid)
if #con.nil?
#c=Crzcontract.new(:crzname => name,:crzid => crzid,:crzprocname=>procname,:crzconname=>conname,:crzsubject=>subject,:dateeff=>dateeff,:valuecontract=>valuecontract)
else
#con.crzname = name
#con.crzid = crzid
#con.crzprocname=procname
#con.crzconname=conname
#con.crzsubject=subject
#con.dateeff=dateeff
#con.valuecontract=valuecontract
#con.save!
end
attacharray.each do |attar|
attachid=attar[0]
attachname=attar[1]
doc=attar[2]
size=attar[3]
#at = Crzattachment.find_by_attachid(attachid)
if #at.nil?
if #con.nil?
#c.crzattachments.build(:attachid=>attachid,:attachname=>attachname,:doc=>doc,:size=>size)
else
#a=Crzattachment.new
#a.attachid = attachid
#a.attachname = attachname
#a.doc = doc
#a.size = size
#a.crzcontract_id=#con.id
#a.save!
end
end
end
if #c.present?
contractsarr.push #c
end
#p #c
end
#p contractsarr
puts "done"
if contractsarr.present?
Crzcontract.import contractsarr, recursive: true
end
FileUtils.mv(actual_dir+"/../../crzfiles/"+xmlfile, actual_dir+"/../../crzfiles/archive/"+xmlfile)
end
end
end
There are a number of problems with the code. Here are some ways to improve it:
actual_dir = File.dirname(__FILE__).to_s
Don't use to_s. dirname is already returning a string.
actual_dir+'/../../crzfiles', with and without a trailing path delimiter is used repeatedly. Don't make Ruby rebuild the concatenated string over and over. Instead define it once, but take advantage of Ruby's ability to build the full path:
File.absolute_path('../../bar', '/path/to/foo') # => "/path/bar"
So use:
actual_dir = File.absolute_path('../../crzfiles', __FILE__)
and then refer to actual_dir only:
Dir.foreach(actual_dir)
This is unwieldy:
next if xmlfile == '.' or xmlfile == '..' or xmlfile == 'archive'
I'd do:
next if (xmlfile[0] == '.' || xmlfile == 'archive')
or even:
next if xmlfile[/^(?:\.|archive)/]
Compare these:
'.hidden'[/^(?:\.|archive)/] # => "."
'.'[/^(?:\.|archive)/] # => "."
'..'[/^(?:\.|archive)/] # => "."
'archive'[/^(?:\.|archive)/] # => "archive"
'notarchive'[/^(?:\.|archive)/] # => nil
'foo.xml'[/^(?:\.|archive)/] # => nil
The pattern will return a truthy value if it starts with '.' or is equal to 'archive'. It's not as readable but it's compact. I'd recommend the compound conditional test though.
In some places, you're concatenating xmlfile, so again let Ruby do it once:
xml_filepath = File.join(actual_dir, xmlfile)
which will honor the file path delimiter for whatever OS you're running on. Then use xml_filepath instead of concatenating the name:
xml_filepath = File.join(actual_dir, xmlfile)))
page = Nokogiri::XML(open(xml_filepath))
[...]
FileUtils.mv(xml_filepath, File.join(actual_dir, "archive", xmlfile)
join is a good tool so take advantage of it. It's not just another name for concatenating strings, because it's also aware of the correct delimiter to use for the OS the code is running on.
You use a lot of instances of:
xpath("some_selector").text
Don't do that. xpath, along with css and search return a NodeSet, and text when used on a NodeSet can be evil in a way that'll hurtle you down a very steep and slippery slope. Consider this:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<root>
<node>
<data>foo</data>
</node>
<node>
<data>bar</data>
</node>
</root>
EOT
doc.search('//node/data').class # => Nokogiri::XML::NodeSet
doc.search('//node/data').text # => "foobar"
The concatenation of the text into 'foobar' can't be split easily and it's a problem we see here in questions too often.
Do this if you expect getting a NodeSet back because of using search, xpath or css:
doc.search('//node/data').map(&:text) # => ["foo", "bar"]
It's better to use at, at_xpath or at_css if you're after a specific node because then text will work as you'd expect.
See "How to avoid joining all text from Nodes when scraping" also.
There's a lot of replication that could be DRY'd. Instead of this:
name = contract.xpath("name").text
crzid = contract.xpath("ID").text
procname = contract.xpath("procname").text
You could do something like:
name, crzid, procname = [
'name', 'ID', 'procname'
].map { |s| contract.at(s).text }
I have a hash mapping in my yaml file as below. How Can I iterate through it in simple ruby script? I would like to store the key in a variable and value in another variable in my ruby program during the iteration.
source_and_target_cols_map:
-
com_id: community_id
report_dt: note_date
sitesection: site_section
visitor_cnt: visitors
visit_cnt: visits
view_cnt: views
new_visitor_cnt: new_visitors
the way i am getting the data from the yaml file is below:
#!/usr/bin/env ruby
require 'yaml'
config_options = YAML.load_file(file_name)
#source_and_target_cols_map = config_options['source_and_target_cols_map']
puts #source_and_target_cols_map
The YAML.load_file method should return a ruby hash, so you can iterate over it in the same way you would normally, using the each method:
require 'yaml'
config_options = YAML.load_file(file_name)
config_options.each do |key, value|
# do whatever you want with key and value here
end
As per your yaml file it you should get the below Hash from the line config_options = YAML.load_file(file_name)
config_options = { 'source_and_target_cols_map' =>
[ { 'com_id' => 'community_id',
'report_dt' => 'note_date',
'sitesection' => 'site_section',
'visitor_cnt' => 'visitors',
'visit_cnt' => 'visits',
'view_cnt' => 'views',
'new_visitor_cnt' => 'new_visitors' }
]}
Then to iterate through you can take the below approach:
config_options['source_and_target_cols_map'][0].each {|k,v| key = k,value = v}
In Python, I can do this:
>>> import urlparse, urllib
>>> q = urlparse.parse_qsl("a=b&a=c&d=e")
>>> urllib.urlencode(q)
'a=b&a=c&d=e'
In Ruby[+Rails] I can't figure out how to do the same thing without "rolling my own," which seems odd. The Rails way doesn't work for me -- it adds square brackets to the names of the query parameters, which the server on the other end may or may not support:
>> q = CGI.parse("a=b&a=c&d=e")
=> {"a"=>["b", "c"], "d"=>["e"]}
>> q.to_params
=> "a[]=b&a[]=c&d[]=e"
My use case is simply that I wish to muck with the values of some of the values in the query-string portion of the URL. It seemed natural to lean on the standard library and/or Rails, and write something like this:
uri = URI.parse("http://example.com/foo?a=b&a=c&d=e")
q = CGI.parse(uri.query)
q.delete("d")
q["a"] << "d"
uri.query = q.to_params # should be to_param or to_query instead?
puts Net::HTTP.get_response(uri)
but only if the resulting URI is in fact http://example.com/foo?a=b&a=c&a=d, and not http://example.com/foo?a[]=b&a[]=c&a[]=d. Is there a correct or better way to do this?
In modern ruby this is simply:
require 'uri'
URI.encode_www_form(hash)
Quick Hash to a URL Query Trick :
"http://www.example.com?" + { language: "ruby", status: "awesome" }.to_query
# => "http://www.example.com?language=ruby&status=awesome"
Want to do it in reverse? Use CGI.parse:
require 'cgi'
# Only needed for IRB, Rails already has this loaded
CGI::parse "language=ruby&status=awesome"
# => {"language"=>["ruby"], "status"=>["awesome"]}
Here's a quick function to turn your hash into query parameters:
require 'uri'
def hash_to_query(hash)
return URI.encode(hash.map{|k,v| "#{k}=#{v}"}.join("&"))
end
The way rails handles query strings of that type means you have to roll your own solution, as you have. It is somewhat unfortunate if you're dealing with non-rails apps, but makes sense if you're passing information to and from rails apps.
As a simple plain Ruby solution (or RubyMotion, in my case), just use this:
class Hash
def to_param
self.to_a.map { |x| "#{x[0]}=#{x[1]}" }.join("&")
end
end
{ fruit: "Apple", vegetable: "Carrot" }.to_param # => "fruit=Apple&vegetable=Carrot"
It only handles simple hashes, though.
ActiveSupport offers the nice method to_sentence. Thus,
require 'active_support'
[1,2,3].to_sentence # gives "1, 2, and 3"
[1,2,3].to_sentence(:last_word_connector => ' and ') # gives "1, 2 and 3"
it's good that you can change the last word connector, because I prefer not to have the extra comma. but it takes so much extra text: 44 characters instead of 11!
the question: what's the most ruby-like way to change the default value of :last_word_connector to ' and '?
Well, it's localizable so you could just specify a default 'en' value of ' and ' for support.array.last_word_connector
See:
from: conversion.rb
def to_sentence(options = {})
...
default_last_word_connector = I18n.translate(:'support.array.last_word_connector', :locale => options[:locale])
...
end
Step by step guide:
First, Create a rails project
rails i18n
Next, edit your en.yml file: vim config/locales/en.yml
en:
support:
array:
last_word_connector: " and "
Finally, it works:
Loading development environment (Rails 2.3.3)
>> [1,2,3].to_sentence
=> "1, 2 and 3"
As an answer to how to override a method in general, a post here gives a nice way of doing it. It doesn't suffer from the same problems as the alias technique, as there isn't a leftover "old" method.
Here how you could use that technique with your original problem (tested with ruby 1.9)
class Array
old_to_sentence = instance_method(:to_sentence)
define_method(:to_sentence) { |options = {}|
options[:last_word_connector] ||= " and "
old_to_sentence.bind(self).call(options)
}
end
You might also want read up on UnboundMethod if the above code is confusing. Note that old_to_sentence goes out of scope after the end statement, so it isn't a problem for future uses of Array.
class Array
alias_method :old_to_sentence, :to_sentence
def to_sentence(args={})
a = {:last_word_connector => ' and '}
a.update(args) if args
old_to_sentence(a)
end
end