Ruby/Rails dictionary app - 6 letter words finder that are built of two concatenated smaller words - ruby-on-rails

I need to create functionality which is going to process the dictionary (dictionary.txt file). The goal is to find all six-letter words that are built of two concatenated smaller words e.g.:
con + vex => convex
tail + or => tailor
we + aver => weaver
Of course, there may be some words inside the file that are not 6 letters long, but these can be easily sifted out using a simple method:
def cleanup_file
file_data = File.read('dictrionary.txt').split
file_data.reject! { |word| word.size < 6 }
end
But now comes the problem - how to find if the other strings in the array are made of two connected smaller words ?
[Edit]
Sample dictionary.txt file here

Thinking just in a pseudo code solution, but you should:
Iterate every line of the dictionary and store the words in 6 different arrays by the length of each word.
Make sure that all words are downcased, there are no duplicates and all the values are sorted, so later you could properly use .bsearch in the arrays.
Iterate the length-6 array (for example convex) and look for a match of the first character of the current word in the length-1 array (c for the given example) and in the length-5 array (onvex). If there's a match, save the words.
Then keep looking in the length-2 and length-4 arrays for matches (co and nvex correspondingly) and save a match.
Finally, look both parts of the string in the length-3 array (con and vex) and save any match
Look for the next 6 characters string until you've finished.
Most likely there are better ways to solve this, like in the first iteration inserting each word in its corresponding array using .bsearch_index to sort and not inserting duplicates in the same iteration, but most of the workload is going to be in the 2nd iteration and binary searches work in O(log n) time, so I guess it should work quick enough.

Suppose the dictionary is as follows.
dictionary = [
"a", "abased", "act", "action", "animal", "ape", "apeman",
"art", "assertion", "bar", "barbed", "barhop", "based", "be",
"become", "bed", "come", "hop", "ion", "man"
]
I assume that, like most dictionaries, dictionary is sorted.
First compute the following hash.
by_len = dictionary.each_with_object({}) do |w,h|
len = w.length
(h[len] ||= []) << w if len < 7
end
#=> {1=>["a"],
# 6=>["abased", "action", "animal", "apeman", "barbed",
# "barhop", "become"],
# 3=>["act", "ape", "art", "bar", "bed", "hop", "ion", "man"],
# 5=>["based"],
# 2=>["be"],
# 4=>["come"]}
Each key is a word length (1-6) and each value is an array of words from dictionary whose length is the value of the key.
Next I will define a helper function that returns true or false depending on whether a given array of words (list) contains a given word (word).
def found?(list, word)
w = list.bsearch { |w| w >= word }
w && w == word
end
For example:
found?(by_len[3], "art")
#=> true
found?(by_len[3], "any")
#=> false
See Array#bsearch.
We now extract the words of interest:
by_len[6].select { |w| (1..5).any? { |i|
found?(by_len[i], w[0,i]) && found?(by_len[6-i], w[i..-1]) } }
#=> ["abased", "action", "apeman", "barbed", "barhop", "become"]

Related

What does parameter a and b mean in an array of elements in Ruby?

I'm practicing the sort function, the target is an array (ary) of a sentence. An example method I have seen is to build and use a block, and finally arrange the elements (words) in the array from short to long, and from a to z.
But I don't understand why there are two parameters a and b in this example, why should we find out a.length and b.length first? This is the original code:
ary = %w(
Ruby is a open source programming language with a afocus on simplicity and productivity.
)
call_num = 0
sorted = ary.sort do |a, b|
call_num += 1
a.length <=> b.length
end
puts "Sort results: #{sorted}" #=>["a", "a", "on", "is", "and", "Ruby", "open", "with", "afocus", "source", "language", "simplicity", "programming", "productivity."]
puts "Number of array elements: #{ary.length}" #=> 14
puts "Number of calls to blocks: #{call_num}" #=>30
To sort the following array of words by their length Ruby has to basically compare each word in the array to each other word (Note that this is not exactly how sorting works internally but in the context of this example we can assume that sorting works like this).
ary = %w(
Ruby is a open source programming language with a afocus on simplicity and productivity.
)
That means in the first step Ruby will need to compare the words Ruby and is and has to decide how to sort those two words, then is and a, then a and open.
Those two words in each step of the comparison are the two block parameters a and b. a.length <=> b.length will then tell Ruby how to sort those two parameters (words).
See Comparable and Enumerable#sort

without iterating an array, How to check a value exisitng inside the loop or not?

i have an array.
key_values =
["loc=june 1 2 jubli's, Captain tim, BI",
"locSlug=june-1-",
"-2-jubli's_Captain-tim_BI",
"lat=29.404153823852539",
"long=-54.88862609863281",
"status=7",
"pg=10",
"pgsz=15",
"sprefix=/kings_search",
"city=Captain tim",
"neighborhood=june 1 ",
" 2 jubli's",
"state_id=BI",
"county_fips=15045"]
This is my array. I am iterating this till the end , when the current value includes "locSlug" or "neighborhood" then i am checking for next value whether it has "=" or not . if it has "=" then i am not doing anything, otherwise i am adding next value and my current value with "&". But i dont want to iterate this for whole values. How can i do it without iterating the whole loop.I have written code like as shown below.
def check_for_special_character(key_values)
key_values.each_with_index do |val,index|
unless index+1 >= key_values.length
if (val.include?("locSlug=") || val.include?("neighborhood=")) && !key_values[index+1].include?("=")
key_values[index] = [val, key_values[index+1]].join("&")
key_values.delete_at(index+1)
end
end
end
end
The above code is working fine but i dont want to do in this way. need your suggestions.
If u need any further clarifications please ask.
Convert Array to Hash, Then Access By Key
There may be a more efficient way of doing this if you have access to the original data object (e.g. a cookie, JSON string, or database query). However, assuming that you can't access the source data directly for some reason, or that you're just working with an array that you can't otherwise control, you can convert your data into a Ruby Hash object and then access the values you want by key. For example:
# Split your array elements on "=", and then convert the resulting array of
# arrays to a hash so you can return values by key. You need to have some
# sort of guard in place for sub-arrays that don't have exactly two elements.
hash = key_values.map { |e| e.split ?= }.select { |a| a.size.eql? 2 }.to_h
hash.fetch 'locSlug'
#=> "june-1-"
hash.fetch 'neighborhood'
#=> "june 1 "
The string you are retrieving from the cookie looks like this:
str = "loc=june 1 2 jubli's, Captain tim, BI&locSlug=june-1-&-2-jubli's_Capta"\
"in-tim_BI&lat=29.404153823852539&long=-54.88862609863281&status=7&pg=10"\
"&pgsz=15&sprefix=/kings_search&city=Captain tim&neighborhood=june 1 & 2"\
" jubli's&state_id=BI&county_fips=15045"
As you can see, & is used as both, a delimiter and a regular character:
"loc=june 1 2 jubli's, Captain tim, BI&locSlug=june-1-&-2-jubli's_Captain-..."
# ^ ^
# delimiter character
In other words: your string is broken. You should fix this by writing a properly escaped string to the cookie in the first place. For a query string, you'd use percent encoding. It should look like this:
str = "loc=june+1++2+jubli%27s%2C+Captain+tim%2C+BI&locSlug=june-1-%26-2-jubli"\
"%27s_Captain-tim_BI&lat=29.404153823852539&long=-54.88862609863281&stat"\
"us=7&pg=10&pgsz=15&sprefix=%2Fkings_search&city=Captain+tim&neighborhoo"\
"d=june+1+%26+2+jubli%27s&state_id=BI&county_fips=15045"
This can be parsed via:
Rack::Utils.parse_query(str)
#=> {
# "loc"=>"june 1 2 jubli's, Captain tim, BI",
# "locSlug"=>"june-1-&-2-jubli's_Captain-tim_BI",
# "lat"=>"29.404153823852539",
# "long"=>"-54.88862609863281",
# "status"=>"7",
# "pg"=>"10",
# "pgsz"=>"15",
# "sprefix"=>"/kings_search",
# "city"=>"Captain tim",
# "neighborhood"=>"june 1 & 2 jubli's",
# "state_id"=>"BI",
# "county_fips"=>"15045"
# }

Ruby method returns hash values in binary

I wrote a method that takes six names then generates an array of seven random numbers using four 6-sided dice. The lowest value of the four 6-sided dice is dropped, then the remainder is summed to create the value. The value is then added to an array.
Once seven numbers have been generated, the array is then ordered from highest to lowest and the lowest value is dropped. Then the array of names and the array of values are zipped together to create a hash.
This method ensures that the first name in the array of names receives the highest value, and the last name receives the lowest.
This is the result of calling the method:
{:strength=>1, :dexterity=>1, :constitution=>0, :intelligence=>0, :wisdom=>0, :charisma=>1}
As you can see, all the values I receive are either "1" or "0". I have no idea how this is happening.
Here is the code:
module PriorityStatGenerator
def self.roll_stats(first_stat, second_stat, third_stat, fourth_stat, fifth_stat, sixth_stat)
stats_priority = [first_stat, second_stat, third_stat, fourth_stat, fifth_stat, sixth_stat].map(&:to_sym)
roll_array = self.roll
return Hash[stats_priority.zip(roll_array)]
end
private
def self.roll
roll_array = []
7.times {
roll_array << Array.new(4).map{ 1 + rand(6) }.sort.drop(1).sum
}
roll_array.reverse.delete_at(6)
end
end
This is how I'm calling the method while I'm testing:
render plain: PriorityStatGenerator.roll_stats(params[:prioritize][:first_stat], params[:prioritize][:second_stat], params[:prioritize][:third_stat], params[:prioritize][:fourth_stat], params[:prioritize][:fifth_stat], params[:prioritize][:sixth_stat])
I added require 'priority_stat_generator' where I'm calling the method, so it is properly calling it.
Can someone help me make it return proper values between 1 and 18?
Here's a refactoring to simplify things and use an actually random number generator, as rand is notoriously terrible:
require 'securerandom'
module PriorityStatGenerator
def self.roll_stats(*stats)
Hash[
stats.map(&:to_sym).zip(self.roll(stats.length).reverse)
]
end
private
def self.roll(n = 7)
(n + 1).times.map do
4.times.map { 1 + SecureRandom.random_number(6) }.sort.drop(1).inject(:+)
end.sort.last(n)
end
end
This makes use of inject(:+) so it works in plain Ruby, no ActiveSupport required.
The use of *stats makes the roll_stats function way more flexible. Your version has a very rigid number of parameters, which is confusing and often obnoxious to use. Treating the arguments as an array avoids a lot of the binding on the expectation that there's six of them.
As a note it's not clear why you're making N+1 roles and then discarding the last. That's the same as generating N and discarding none. Maybe you meant to sort them and take the N best?
Update: Added sort and reverse to properly map in terms of priority.
You need to learn to use IRB or PRY to test snippets of your code, or better, learn to use a debugger. They give you insight into what your code is doing.
In IRB:
[7,6,5,4,3,2,1].delete_at(6)
1
In other words, delete_at(6) is doing what it's supposed to, but that's not what you want. Instead, perhaps slicing the array will behave more like you expect:
>> [7,6,5,4,3,2,1][0..-2]
[
[0] 7,
[1] 6,
[2] 5,
[3] 4,
[4] 3,
[5] 2
]
Also, in your code, it's not necessary to return a value when that operation is the last logical step in a method. Ruby will return the last value seen:
Hash[stats_priority.zip(roll_array)]
As amadan said, I can't see how you are getting the results you are, but their is a definite bug in your code.
The last line in self.roll is the return value.
roll_array.reverse.delete_at(6)
Which is going to return the value that was deleted. You need to add a new lines to return the roll_array instead of the delete_at value. You are also not sorting your array prior to removing that last item which will give you the wrong values as well.
def self.roll
roll_array = []
7.times {
roll_array << Array.new(4).map{ 1 + rand(6) }.sort.drop(1).sum
}
roll_array.sort.drop(1)
roll_array
end

Sorting an array in Ruby (Special Case)

I have an array in Ruby which has values as follows
xs = %w(2.0.0.1
2.0.0.6
2.0.1.10
2.0.1.5
2.0.0.8)
and so on. I want to sort the array such that the final result should be something like this :
ys = %w(2.0.0.1
2.0.0.6
2.0.0.8
2.0.1.5
2.0.1.10)
I have tried using the array.sort function, but it places "2.0.1.10" before "2.0.1.5". I am not sure why that happens
Using a Schwartzian transform (Enumerable#sort_by), and taking advantage of the lexicographical order defined by an array of integers (Array#<=>):
sorted_ips = ips.sort_by { |ip| ip.split(".").map(&:to_i) }
Can you please explain a bit more elaborately
You cannot compare strings containing numbers: "2" > "1", yes, but "11" < "2" because strings are compared lexicographically, like words in a dictionary. Therefore, you must convert the ip into something than can be compared (array of integers): ip.split(".").map(&:to_i). For example "1.2.10.3" is converted to [1, 2, 10, 3]. Let's call this transformation f.
You could now use Enumerable#sort: ips.sort { |ip1, ip2| f(ip1) <=> f(ip2) }, but check always if the higher abstraction Enumerable#sort_by can be used instead. In this case: ips.sort_by { |ip| f(ip) }. You can read it as "take the ips and sort them by the order defined by the f mapping".
Split your data into chunks by splitting on '.'. There is no standard function to do it as such so you need to write a custom sort to perform this.
And the behaviour you said about 2.0.1.10 before 2.0.1.5 is expected because it is taking the data as strings and doing ASCII comparisons, leading to the result that you see.
arr1 = "2.0.0.1".split('.')
arr2 = "2.0.0.6".split('.')
Compare both arr1 and arr2 element by element, for all the data in your input.

Walking over strings to guess a name from an email based on dictionary of names?

Let's say I have a dictionary of names (a huge CSV file). I want to guess a name from an email that has no obvious parsable points (., -, _). I want to do something like this:
dict = ["sam", "joe", "john", "parker", "jane", "smith", "doe"]
word = "johnsmith"
x = 0
y = word.length-1
name_array = []
for i in x..y
match_me = word[x..i]
dict.each do |name|
if match_me == name
name_array << name
end
end
end
name_array
# => ["john"]
Not bad, but I want "John Smith" or ["john", "smith"]
In other words, I recursively loop through the word (i.e., unparsed email string, "johndoe#gmail.com") until I find a match within the dictionary. I know: this is incredibly inefficient. If there's a much easier way of doing this, I'm all ears!
If there's not better way of doing it, then show me how to fix the example above, for it suffers from two major flaws: (1) how do I set the length of the loop (see problem of finding "i" below), and (2) how do I increment "x" in the example above so that I can cycle through all possible character combinations given an arbitrary string?
Problem of finding the length of the loop, "i":
for an arbitrary word, how can we derive "i" given the pattern below?
for a (i = 1)
a
for ab (i = 3)
a
ab
b
for abc (i = 6)
a
ab
abc
b
bc
c
for abcd (i = 10)
a
ab
abc
abcd
b
bc
bcd
c
cd
d
for abcde (i = 15)
a
ab
abc
abcd
abcde
b
bc
bcd
bcde
c
cd
cde
d
de
e
r = /^(#{Regexp.union(dict)})(#{Regexp.union(dict)})$/
word.match(r)
=> #<MatchData "johnsmith" 1:"john" 2:"smith">
The regex might take some time to build, but it's blazing fast.
I dare suggest a brute force solution that is not very elegant but still useful in case
you have a large number of items (building a regexp can be a pain)
the string to analyse is not limited to two components
you want to get all splittings of a string
you want only complete analyses of a string, that span from ^ to $.
Because of my poor English, I could not figure out a long personal name that can be split in more than one way, so let's analyse a phrase:
word = "godisnowhere"
The dictionary:
#dict = [ "god", "is", "now", "here", "nowhere", "no", "where" ]
#lengths = #dict.collect {|w| w.length }.uniq.sort
The array #lengths adds a slight optimization to the algorithm, we will use it to prune subwords of lengths that don't exist in the dictionary without actually performing dictionary lookup. The array is sorted, this is another optimization.
The main part of the solution is a recursive function that finds the initial subword in a given word and restarts for the tail subword.
def find_head_substring(word)
# boundary condition:
# remaining subword is shorter than the shortest word in #dict
return [] if word.length < #lengths[0]
splittings = []
#lengths.each do |len|
break if len > word.length
head = word[0,len]
if #dict.include?(head)
tail = word[len..-1]
if tail.length == 0
splittings << head
else
tails = find_head_substring(tail)
unless tails.empty?
tails.collect!{|tail| "#{head} #{tail}" }
splittings.concat tails
end
end
end
end
return splittings
end
Now see how it works
find_head_substring(word)
=>["god is no where", "god is now here", "god is nowhere"]
I have not tested it extensively, so I apologize in advance :)
If you just want the hits of matches in your dictionary:
dict.select{ |r| word[/#{r}/] }
=> ["john", "smith"]
You run a risk of too many confusing subhits, so you might want to sort your dictionary so longer names are first:
dict.sort_by{ |w| -w.size }.select{ |r| word[/#{r}/] }
=> ["smith", "john"]
You will still encounter situations where a longer name has a shorter substring following it and get multiple hits so you'll need to figure out a way to weed those out. You could have an array of first names, and another of last names, and take the first returned result of scanning for each, but given the diversity of first and last names, that doesn't guarantee 100% accuracy, and will still gather some bad results.
This sort of problem has no real good solution without further hints to the code about the person's name. Perhaps scanning the body of the message for salutation or valediction sections will help.
I'm not sure what you're doing with i, but isn't it as simple as:
dict.each do |first|
dict.each do |last|
puts first,last if first+last == word
end
end
This one bags all occurrences, not necessarily exactly two:
pattern = Regexp.union(dict)
matches = []
while match = word.match(pattern)
matches << match.to_s # Or just leave off to_s to keep the match itself
word = match.post_match
end
matches

Resources