Checking if hash value has a text - ruby-on-rails

I have a hash:
universityname = e.university
topuniversities = CSV.read('lib/assets/topuniversities.csv',{encoding: "UTF-8", headers:true, header_converters: :symbol, converters: :all})
hashed_topuniversities = topuniversities.map {|d| d.to_hash}
hashed_topuniversities.any? {|rank, name| name.split(' ').include?(universityname) }.each do |s|
if s[:universityrank] <= 10
new_score += 10
elsif s[:universityrank] >= 11 && s[:universityrank] <= 25
new_score += 5
elsif s[:universityrank] >= 26 && s[:universityrank] <= 50
new_score += 3
elsif s[:universityrank] >= 51 && s[:universityrank] <= 100
new_score += 2
end
Basically what this is doing is looking at a hash and checking if the hash value contains a university name is an input.
For example the user input can be "Oxford University" and in the hash its stored as "Oxford". The User needs to type in as it stored in the hash to be able to be assigned a score, But I want it that if the user types in "oxford university" then the hash value "Oxford" should be selected and then go through.
Everything else in this works fine but the .include? does not work correctly, I still need to type the exact word.

hashed_topuniversities = topuniversities.map &:to_hash
univ = hashed_topuniversities.detect do |rank, name|
name.downcase.split(' ').include?(universityname.downcase)
end
new_score += case univ[:universityrank]
when -Float::INFINITY..10 then 10
when 11..25 then 5
when 26..50 then 3
when 50..100 then 2
else 0
end
Besides some code improvements in terms of being more idiomatic ruby, the main change is downcase called on both university name and user input. Now they are compared case insensitive.

I don't think your approach will work (in real-life, anyway). "University of Oxford" is an easy one--just look for the presence of the word, "Oxford". What about "University of Kansas"? Would you merely try to match "Kansas"? What about "Kansas State University"?
Also, some universities are are customarily referred to by well-know acronyms or shortened names, such as "LSE", "UCLA", "USC", "SUNY", "LSU", "RPI", "Penn State", "Georgia Tech", "Berkeley" and "Cal Tech". You also need to think about punctuation and "little words" (e.g., "at", "the", "of") in university names (e.g., "University of California, Los Angeles").
For any serious application, I think you need to construct a list of all commonly-used names for each university and then require an exact match between those names and the given university name (after punctuation and little words have been removed). You can do that by modifying the hash hashed_top_universities, perhaps like this:
hashed_top_universities
#=> { "University of California at Berkeley" =>
# { rank: 1, names: ["university california", "berkeley", "cal"] },
# "University of California at Los Angeles" =>
# { rank: 2, names: ["ucla"] },
# "University of Oxford" =>
# { rank: 3, names: ["oxford", "oxford university"] }
# }
Names of some universities contain non-ASCII characters, which is a further complication (that I will not address).
Here's how you might code it.
Given a university name, the first step is to construct a hash (reverse_hash) that maps university names to ranks. The names consist of the elements of the value of the key :names in the inner hashes in hashed_top_universities, together with the complete university names that comprise the keys in that hash, after they have been downcased and punctuation and "little words" have been removed.
PUNCTUATION = ",."
EXCLUSIONS = %w| of for the at u |
SCORE = { 1=>10, 3=>7, 25=>5, 50=>3, 100=>2, Float::INFINITY=>0 }
reverse_hash = hashed_top_universities.each_with_object({}) { |(k,v),h|
(v[:names] + [simplify(k)]).each { |name| h[name] = v[:rank] } }
#=> {"university california"=>1, "berkeley"=>1, "cal"=>1,
# "university california berkeley"=>1,
# "ucla"=>2, "university california los angeles"=>2,
# "oxford"=>3, "oxford university"=>3, "university oxford"=>3}
def simplify(str)
str.downcase.delete(PUNCTUATION).
gsub(/\b#{Regexp.union(EXCLUSIONS)}\b/,'').
squeeze(' ')
end
def score(name, reverse_hash)
rank = reverse_hash[simplify(name)]
SCORE.find { |k,_| rank <= k }.last
end
Let's try it.
score("University of California at Berkeley", reverse_hash)
#=> 10
score("Cal", reverse_hash)
#=> 10
score("UCLA", reverse_hash)
#=> 7
score("Oxford", reverse_hash)
#=> 7

Related

How to apply multiple regular expression on single field

Currently I have a regular expression for zip-codes for the U.S.:
validates :zip,
presence: true,
format: { with: /\A\d{5}(-\d{4})?\z/ }
I want to use different regular expressions for other countries on the same zip-code so the regular expression should be used according to the country:
For Australia 4 digits
For Canada 6 digits alphanumeric
For UK 6-7 digits alphanumeric
Can someone suggest how can I full fill my requirement?
You can give a lambda that returns a Regexp as the :with option for the format validator (see :with), which makes this nice and clean:
ZIP_COUNTRY_FORMATS = {
'US' => /\A\d{5}(-\d{4})?\z/,
'Australia' => /\A\d{4}\z/,
# ...
}
validates :zip, presence: true,
format: { with: ->(record){ ZIP_COUNTRY_FORMATS.fetch(record.country) } }
Note that uses Hash#fetch instead of Hash#[] so that if a country that doesn't exist is given it will raise a KeyError just as a sanity check. Alternatively you could return a default Regexp that matches anything:
ZIP_COUNTRY_FORMATS.fetch(record.country, //)
...or nothing:
ZIP_COUNTRY_FORMATS.fetch(record.country, /.\A/)
...depending on the behavior you want.
You would want to write a method to help you:
validates :zip, presence: true, with: :zip_validator
def zip_validator
case country
when 'AU'
# some regex or fail
when 'CA'
# some other regex or fail
when 'UK'
# some other regex or fail
else
# should this fail?
end
end
Suppose we give examples of valid postal codes for each country in a hash such as the following.
example_pcs = {
US: ["", "98230", "98230-1346"],
CAN: ["*", "V8V 3A2"],
OZ: ["!*", "NSW 1130", "ACT 0255", "VIC 3794", "QLD 4000", "SA 5664",
"WA 6500", "TAS 7430", "NT 0874"]
}
where the first element of each array is a string of codes that will be explained later.
We can construct a regex for each country from this information. (The information would undoubtedly be different in a real application, but I am just presenting the general idea.) For each country we construct a regex for each example postal code, using in part the above-mentioned codes. We then take the union of those regexes to obtain a single regex for that country. Here's one way the regex for an example postal code might be constructed.
def make_regex(str, codes='')
rstr = str.each_char.chunk do |c|
case c
when /\d/ then :DIGIT
when /[[:alpha:]]/ then :ALPHA
when /\s/ then :WHITE
else :OTHER
end
end.
map do |type, arr|
case type
when :ALPHA
if codes.include?('!')
arr
elsif arr.size == 1
"[[:alpha:]]"
else "[[:alpha:]]\{#{arr.size}\}"
end
when :DIGIT
(arr.size == 1) ? "\\d" : "\\d\{#{arr.size}\}"
when :WHITE
case codes
when /\*/ then "\\s*"
when /\+/ then "\\s+"
else (arr.size == 1) ? "\\s" : "\\s\{#{arr.size}\}"
end
when :OTHER
arr
end
end.
join
Regexp.new("\\A" << rstr << "\\z")
end
I've made the regex case-insensitive for letters, but that could of course be changed. Also, for some countries, the regex produced may have to be tweaked manually and/or some pre- or post-processing of postal code strings may be called for. For example, some combinations may have the correct format but nonetheless are not valid postal codes. In Australia, for example, the four digits following each region code must fall within specified ranges that vary by region.
Here are some examples.
make_regex("12345")
#=> /\A\d{5}\z/
make_regex("12345-1234")
#=> /\A\d{5}-\d{4}\z/
Regexp.union(make_regex("12345"), make_regex("12345-1234"))
#=> /(?-mix:\A\d{5}\z)|(?-mix:\A\d{5}-\d{4}\z)/
make_regex("V8V 3A2", "*")
#=> /\A[[:alpha:]]\d[[:alpha:]]\s*\d[[:alpha:]]\d\z/
make_regex("NSW 1130", "!*")
# => /\ANSW\s*\d{4}\z/
Then, for each country, we take the union of the regexes for each example postal code, saving those results as values in a hash whose keys are country codes.
h = example_pcs.each_with_object({}) { |(country, (codes, *examples)), h|
h[country] = Regexp.union(examples.map { |s| make_regex(s, codes) }.uniq) }
#=> {:US=>/(?-mix:\A\d{5}\z)|(?-mix:\A\d{5}-\d{4}\z)/,
# :CAN=>/\A[[:alpha:]]\d[[:alpha:]]\s*\d[[:alpha:]]\d\z/,
# :OZ=>/(?-mix:\ANSW\s*\d{4}\z)|(?-mix:\AACT\s*\d{4}\z)|(?-mix:\AVIC\s*\d{4}\z)|(?-mix:\AQLD\s*\d{4}\z)|(?-mix:\ASA\s*\d{4}\z)|(?-mix:\AWA\s*\d{4}\z)|(?-mix:\ATAS\s*\d{4}\z)|(?-mix:\ANT\s*\d{4}\z)/}
"12345" =~ h[:US]
#=> 0
"12345-1234" =~ h[:US]
#=> 0
"1234" =~ h[:US]
#=> nil
"12345 1234" =~ h[:US]
#=> nil
"V8V 3A2" =~ h[:CAN]
#=> 0
"V8V 3A2" =~ h[:CAN]
#=> 0
"V8v3a2" =~ h[:CAN]
#=> 0
"3A2 V8V" =~ h[:CAN]
#=> nil
"NSW 1132" =~ h[:OZ]
#=> 0
"NSW 1132" =~ h[:OZ]
#=> 0
"NSW1132" =~ h[:OZ]
#=> 0
"NSW113" =~ h[:OZ]
#=> nil
"QLD" =~ h[:OZ]
#=> nil
"CAT 1132" =~ h[:OZ]
#=> nil
The steps performed in make_regex for
str = "V8V 3A2"
codes = "*+"
are as follows.
e = str.each_char.chunk do |c|
case c
when /\d/ then :DIGIT
when /[[:alpha:]]/ then :ALPHA
when /\s/ then :WHITE
else :OTHER
end
end
#=> #<Enumerator: #<Enumerator::Generator:0x007f9ff201a330>:each>
We can see the values that will be generated by this enumerator by converting it to an array.
e.to_a
#=> [[:ALPHA, ["V"]], [:DIGIT, ["8"]], [:ALPHA, ["V"]], [:WHITE, [" "]],
# [:DIGIT, ["3"]], [:ALPHA, ["A"]], [:DIGIT, ["2"]]]
Continuing,
a = e.map do |type, arr|
case type
when :ALPHA
if codes.include?('!')
arr
elsif arr.size == 1
"[[:alpha:]]"
else "[[:alpha:]]\{#{arr.size}\}"
end
when :DIGIT
(arr.size == 1) ? "\\d" : "\\d\{#{arr.size}\}"
when :WHITE
case codes
when /\*/ then "\\s*"
when /\+/ then "\\s+"
else (arr.size == 1) ? "\\s" : "\\s\{#{arr.size}\}"
end
when :OTHER
arr
end
end
#=> ["[[:alpha:]]", "\\d", "[[:alpha:]]", "\\s*", "\\d", "[[:alpha:]]", "\\d"]
rstr = a.join
#=> "[[:alpha:]]\\d[[:alpha:]]\\s*\\d[[:alpha:]]\\d"
t = "\\A" << rstr << "\\z"
#=> "\\A[[:alpha:]]\\d[[:alpha:]]\\s*\\d[[:alpha:]]\\d\\z"
puts t
#=> \A[[:alpha:]]\d[[:alpha:]]\s*\d[[:alpha:]]\d\z
Regexp.new(t)
#=> /\A[[:alpha:]]\d[[:alpha:]]\s*\d[[:alpha:]]\d\z/

How can you sort an array in Ruby starting at a specific letter, say letter f?

I have a text array.
text_array = ["bob", "alice", "dave", "carol", "frank", "eve", "jordan", "isaac", "harry", "george"]
text_array = text_array.sort would give us a sorted array.
However, I want a sorted array with f as the first letter for our order, and e as the last.
So the end result should be...
text_array = ["frank", "george", "harry", "isaac", "jordan", "alice", "bob", "carol", "dave", "eve"]
What would be the best way to accomplish this?
Try this:
result = (text_array.select{ |v| v =~ /^[f-z]/ }.sort + text_array.select{ |v| v =~ /^[a-e]/ }.sort).flatten
It's not the prettiest but it will get the job done.
Edit per comment. Making a more general piece of code:
before = []
after = []
text_array.sort.each do |t|
if t > term
after << t
else
before << t
end
end
return (after + before).flatten
This code assumes that term is whatever you want to divide the array. And if an array value equals term, it will be at the end.
You can do that using a hash:
alpha = ('a'..'z').to_a
#=> ["a", "b", "c",..."x", "y", "z"]
reordered = alpha.rotate(5)
#=> ["f", "g",..."z", "a",...,"e"]
h = reordered.zip(alpha).to_h
# => {"f"=>"a", "g"=>"b",..., "z"=>"u", "a"=>"v",..., e"=>"z"}
text_array.sort_by { |w| w.gsub(/./,h) }
#=> ["frank", "george", "harry", "isaac", "jordan",
# "alice", "bob", "carol", "dave", "eve"]
A variant of this is:
a_to_z = alpha.join
#=> "abcdefghijklmnopqrstuvwxyz"
f_to_e = reordered.join
#=> "fghijklmnopqrstuvwxyzabcde"
text_array.sort_by { |w| w.tr(f_to_e, a_to_z) }
#=> ["frank", "george", "harry", "isaac", "jordan",
# "alice", "bob", "carol", "dave", "eve"]
I think the easiest would be to rotate the sorted array:
text_array.rotate(offset) if offset = text_array.find_index { |e| e.size > 0 and e[0] == 'f' }
Combining Ryan K's answer and my previous answer, this is a one-liner you can use without any regex:
text_array = text_array.sort!.select {|x| x.first >= "f"} + text_array.select {|x| x.first < "f"}
If I got your question right, it looks like you want to create sorted list with biased predefined patterns.
ie. let's say you want to define specific pattern of text which can completely change the sorting sequence for the array element.
Here is my proposal, you can get better code out of this, but my tired brain got it for now -
an_array = ["bob", "alice", "dave", "carol", "frank", "eve", "jordan", "isaac", "harry", "george"]
# Define your patterns with scores so that the sorting result can vary accordingly
# It's full fledged Regex so you can put any kind of regex you want.
patterns = {
/^f/ => 100,
/^e/ => -100,
/^g/ => 60,
/^j/ => 40
}
# Sort the array with our preferred sequence
sorted_array = an_array.sort do |left, right|
# Find score for the left string
left_score = patterns.find{ |p, s| left.match(p) }
left_score = left_score ? left_score.last : 0
# Find the score for the right string
right_score = patterns.find{ |p, s| right.match(p) }
right_score = right_score ? right_score.last : 0
# Create the comparision score to prepare the right order
# 1 means replace with right and -1 means replace with left
# and 0 means remain unchanged
score = if right_score > left_score
1
elsif left_score > right_score
-1
else
0
end
# For debugging purpose, I added few verbose data
puts "L#{left_score}, R:#{right_score}: #{left}, #{right} => #{score}"
score
end
# Original array
puts an_array.join(', ')
# Biased array
puts sorted_array.join(', ')

rails, inserting into hash then counting occurrences logic

i need to create a hash/array where 2 elements are stored: the country code, and the number of times the country occurred.
I want to vet some conceptual logic: i want to create a helper method that passes in a list of countries. Then, I loop through each country and will merge the country code into the hash through a series of if statements:
#map_country = Hash.new
if country == "United States"
#map_country.merge(:us => ??)
I'm not quite sure how I can add a counter to push into the hash. Can anyone help? Basically, I want to achieve how many times "United States" shows up.
Also, once I have this Hash completed - I want to do something different to each country based on the count. How do I go about picking out the value from the key? Moreover, how do I get just the key?
<% if #map_country[:country] > 5 %>
... do this with #map_country...
Thanks! Apologies if this is confusing, but really could use some help here. Thanks!
To me it sounds like you're trying to count occurrences which you can do with the #inject method:
[1] pry(main)> countries = ["United States", "Canada", "United States", "Mexico"]
=> ["United States", "Canada", "United States", "Mexico"]
[2] pry(main)> countries.inject({}) { |hash, ctr| hash[ctr] = hash[ctr].to_i + 1; hash }
=> {"United States"=>2, "Canada"=>1, "Mexico"=>1}
Then say you want to do something with that hash, you could loop through it like this:
[3] pry(main)> occ = countries.inject({}) { |hash, ctr| hash[ctr] = hash[ctr].to_i + 1; hash }
=> {"United States"=>2, "Canada"=>1, "Mexico"=>1}
[4] pry(main)> occ.each do |country, val|
[4] pry(main)* if val == 2
[4] pry(main)* puts "There are two occurences of #{country}"
[4] pry(main)* end
[4] pry(main)* end
There are two occurences of United States
If you're set on using a Hash (rather than a custom class) for this then just use a default_proc to auto-vivify entries with zeros and you a simple increment is all you need:
#map_country = Hash.new { |h, k| h[k] = 0 }
if country == 'United States'
#map_country[:us] += 1

Natural sorting of 3-4 characters in Rails

I have an unsorted list of area postcodes as follows:
["E1", "E1C", "E1D", "E10", "E11", "E12", "E2", "E3", "E4", "EC1", "EC1A", "EC1M", "EC1N",
"EC1R", "EC1V", "EC1Y", "EC2", "EC2A", "EC2M", "EC2N", "N1", "N10", "N11", "N12",
"N13", "N2", "NW1", "NW10", "NW2" etc]
I'd like to sort them as follows:
["E1", "E1C", "E1D", "E2", "E3", "E4", "E10", "E11", "E12", "EC1", "EC1A", "EC1M", "EC1N",
"EC1R", "EC1V", "EC1Y", "EC2", "EC2A", "EC2M", "EC2N", "N1", "N2", "N10", "N11", "N12",
"N13", "NW1", "NW2, "NW10" etc]
So to sum up the order of the formats for postcodes beginning with E would be:
E1
E1C
E11
EC1
EC1V
Same order for postcodes beginning with N, etc.
What would be the recommended way of sorting such strings? In this case the format of the string is always known, i.e. it will always be 2-4 alphanumberic characters, the first always being a letter.
Should I order the strings by length first and then order within each length group, or is there a more elegant method?
I'd use
array.sort_by do |str|
/\A(\w)(\d+)\Z/ === str
[$1, $2.to_i]
end
or, if you have arbitrary sequences of alternating letters and digits,
array.sort_by do |str|
/\A(\D*)(\d*)(\D*)(\d*)\Z/.match(str)[1..-1].reject(&:blank?).collect do |item|
/\d/ === item ? item.to_i : item
end
end
Kind of a weird way of doing it, but I think this should work:
array.sort do |a, b|
a = a.dup
b = b.dup
regex = /(\d+)/
a.match(regex)
a_num = $1.to_i
b.match(regex)
b_num = $1.to_i
if a_num > b_num
a.gsub!(regex, "1")
b.gsub!(regex, "0")
elsif a_num < b_num
a.gsub!(regex, "0")
b.gsub!(regex, "1")
end
a <=> b
end

ruby change content of a string

I'm doing data processing, one task is to get stats of people distribution. Say for the people of name "john doe", there fall in different states, ca, ar, and ny, and of different age groups, twenties, thirties, etc. {1,2} or {3} is the people's id.
"john doe" => "ca:tw#2{1,2}:th#1{3};ar:tw#1{4}:fi#1{5};ny:tw#1{6};"
Now if I want to get the id of john doe in ca with age tw, how should I get them? Maybe using Regex? And if I want to add a new id to it, say 100, now it becomes
"john doe" => "ca:tw#3{1,2,100}:th#1{3};ar:tw#1{4}:fi#1{5};ny:tw#1{6};"
how should I do that?
Thanks!
If you want to stick with string manipulation, you can use regex and gsub.
Here is one way to do it. It could use some clean up (eg error handling, re-factoring, etc.), but I think it would get you started.
def count(details, location, age_group)
location_details = /#{location}(.+?);/.match(details)[1]
age_count = /#{age_group}#(\d+)\{/.match(details)[1]
return age_count.to_i
end
def ids(details, location, age_group)
location_details = /#{location}(.+?);/.match(details)[1]
age_ids = /#{age_group}#\d+\{(.+?)\}/.match(details)[1]
return age_ids
end
def add(details, location, age_group, new_id)
location_details = /#{location}(.+?);/.match(details)[1]
new_count = count(details, location, age_group) + 1
new_ids = ids(details, location, age_group) + ',' + new_id
location_details.gsub!(/#{age_group}#\d+\{(.+?)\}/, "#{age_group}##{new_count}{#{new_ids}}")
details.gsub!(/#{location}(.+?);/, "#{location}#{location_details};")
end
You can see it produces the results you wanted (at least functionally, not sure about performance):
names = {"john doe" => "ca:tw#2{1,2}:th#1{3};ar:tw#1{4}:fi#1{5};ny:tw#1{6};"}
puts count(names["john doe"], 'ca', 'tw')
#=> 2
puts ids(names["john doe"], 'ca', 'tw')
#=> 1,2
names["john doe"] = add(names["john doe"], 'ca', 'tw', '100')
puts names["john doe"]
#=> ca:tw#3{1,2,100}:th#1{3};ar:tw#1{4}:fi#1{5};ny:tw#1{6};
It doesn't make sense to use a string for this inside the program. You may read the data from a string as it is stored, or write it back out that way, but you should store it in a manner that's easy to manipulate. For instance:
data = {
"john doe" => {
"ca" => {
"tw" => [1,2],
"th" => [3]
},
"ar" => {
"tw" => [4],
"fi" => [5]
},
"ny" => {
"tw" => [6]
}
}
}
Given that, the ids of the California John Doe's in their 20's are data['john doe']['ca']['tw']. The number of such John Doe's is data['john doe']['ca']['tw'].length; the first id is data['john doe']['ca']['tw'][0], and the second is data['john doe']['ca']['tw'][1]. You could add id 100 to it with data['john doe']['ca']['tw'] << 100; 100 would then be the value of data['john doe']['ca']['tw'][2].
If I were writing this, though, I would probably use actual numbers for the age-range keys (20, 30, 50) instead of those obscure letter prefixes.

Resources