I have a string a5bc2cdf3. I want to expand it to aaaaabcbccdfcdfcdf.
In the string is a5, so the resulting string should contain 5 consecutive "a"s, "bc2" results in "bc" appearing 2 times consecutively, and cdf should repeat 3 times.
If input is a5bc2cdf3, and output is aaaaabcbccdfcdfcdf how can I do this in a Ruby method?
def get_character("compressed_string",index)
expanded_string = calculate_expanded_string(compressed_string)
required_char = expanded_string(char_at, index_number(for eg 3))
end
def calculate_expanded_string(compressed_string)
return expanded
end
You may use a regex like
.gsub(/([a-zA-Z]+)(\d+)/){$1*$2.to_i}
See the Ruby online demo
The /([a-zA-Z]+)(\d+)/ will match stubstrings with 1+ letters (([a-zA-Z]+)) and 1+ digits ((\d+)) and will capture them into 2 groups that are later used inside a block to return the string you need.
Note that instead of [a-zA-Z] you might consider using \p{L} that can match any letters.
You want to break out of gsub once the specified index is reached in the original "compressed" string. It is still possible, see this Ruby demo:
s = 'a5bc2cdf3' # input string
index = 5 # break index
result = "" # expanded string
s.gsub!(/([a-zA-Z]+)(\d+)/){ # regex replacement
result << $1*$2.to_i # add to the resulting string
break if Regexp.last_match.end(0) >= index # Break if the current match end index is bigger or equal to index
}
puts result[index] # Show the result
# => b
For brevity, you may replace Regexp.last_match with $~.
I would propose to use scan to move over the compressed string, using a simple RegEx which detects groups of non-decimal characters followed by their count as decimal /([^\d]+)(\d+)/.
def get_character(compressed_string, index)
result = nil
compressed_string.scan(/([^\d]+)(\d+)/).inject(0) do |total_length, (chars, count)|
decoded_string = chars * count.to_i
total_length += decoded_string.length
if index < total_length
result = decoded_string[-(total_length - index)]
break
else
total_length
end
end
result
end
Knowing the current (total) length, one can break out of the loop if the current expanded string includes the requested index. The string is never decoded entirely.
This code gives the following results
get_character("a5bc2cdf3", 5) # => "b"
get_character("a5bc2cdf3", 10) # => "d"
get_character("a5bc2cdf3", 20) # => nil
Just another way. I prefer Wiktor's method by a long way.
def stringy str, index
lets, nums = str.split(/\d+/), str.split(/[a-z]+/)[1..-1].map(&:to_i)
ostr = lets.zip(nums).map { |l,n| l*n }.join
ostr[index]
end
str = 'a5bc2cdf3'
p stringy str, 5 #=> "b"
I'd use:
str = "a5bc2cdf3"
str.split(/(\d+)/).each_slice(2).map { |s, c| s * c.to_i }.join # => "aaaaabcbccdfcdfcdf"
Here's how it breaks down:
str.split(/(\d+)/) # => ["a", "5", "bc", "2", "cdf", "3"]
This works because split will return the value being split on if it's in a regex group: /(\d+)/.
str.split(/(\d+)/).each_slice(2).to_a # => [["a", "5"], ["bc", "2"], ["cdf", "3"]]
The resulting array can be broken into the string to be repeated and its associated count using each_slice(2).
str.split(/(\d+)/).each_slice(2).map { |s, c| s * c.to_i } # => ["aaaaa", "bcbc", "cdfcdfcdf"]
That array of arrays can then be processed in a map that uses String's * to repeat the characters.
And finally join concatenates all the resulting expanded strings back into a single string.
Related
I have about thirty thousand records with a string column that has been stored in the following format, with different keys:
"something: this, this and that, that, other stuff, another: name, another name, last: here"
In rails, I want to change it into a hash like
{
something: [ "this", "this and that", "that" ],
another: [ "name", "another name" ],
last: [ "here" ]
}
Is there a way to do this elegantly? I was thinking of splitting at the colon, then doing a reverse search of the first space.
There are about a hundred ways to solve this. A pretty straightforward one is this:
str = "something: this, this and that, that, other stuff, another: name, another name, last: here"
key = nil
str.scan(/\s*([^,:]+)(:)?\s*/).each_with_object({}) do |(val, colon), hsh|
if colon
key = val.to_sym
hsh[key] = []
else
hsh[key] << val
end
end
# => {
# something: ["this", "this and that", "that", "other stuff"],
# another: ["name", "another name"],
# last: ["here"]
# }
It works by scanning the string with the following regular expression:
/
\s* # any amount of optional whitespace
([^,:]+) # one or more characters that aren't , or : (capture 1)
(:)? # an optional trailing : (capture 2)
\s* # any amount of optional whitespace
/x
Then it iterates over the matches and puts them into a hash. When a match has a trailing colon (capture 2), a new hash key is created with an empty array for a value. Otherwise the value (capture 1) is added to the array for the most recent key.
Or…
A somewhat less straightforward but cleverer approach is to let the RegExp do more work:
MATCH_LIST_ENTRY = /([^:]+):\s*((?:[^,]+(?:,\s*|$))+?)(?=[^:,]+:|$)/
def parse_list2(str)
str.scan(MATCH_LIST_ENTRY).map do |k, vs|
[k.to_sym, vs.split(/,\s*/)]
end.to_h
end
I won't pick apart the RegExp for this one, but it's simpler than it looks. Regexper does a pretty good job of explaining it.
You can see both of these in action on repl.it here: https://repl.it/#jrunning/LongtermMidnightblueAssembler
If str is the string given in the example, the desired hash can be constructed as follows.
str.split(/, *(?=\p{L}+:)/).
each_with_object({}) do |s,h|
k, v = s.split(/: +/)
h[k.to_sym]= v.split(/, */)
end
#=> {:something=>["this", "this and that", "that", "other stuff"],
# :another=>["name", "another name"],
# :last=>["here"]}
Note:
str.split(/, *(?=\p{L}+:)/)
#=> ["something: this, this and that, that, other stuff",
# "another: name, another name",
# "last: here"]
This regular expression reads, "match a comma followed by zero or more spaces, the match to be immediately followed by one or more Unicode letters followed by a colon, (?=\p{L}+:) being a positive lookahead".
elegantly:
result_hash = {}
string.scan(/(?<key>[\w]+(?=:))|(?<value>[\s\w]+(?=(,|\z)))/) do |key,value|
if key.present?
result_hash[key] = []
current_key = key
elsif value.present?
result_hash[current_key] << value.strip
end
end
then jsonize:
json = result_hash.to.json
I am trying to remove words that have more that have the same letter more than once. I have tried squeeze but all that is doing is removing words that have duplicate letters next to each other.
Here is the code at the moment:
array = []
File.open('word.txt').each do |line|
if line.squeeze == line
array << line
end
end
Input from word.txt
start
james
hello
joins
regex
Output that I am looking for
james
joins
Any suggestions on how I can get around this.
Perhaps something like this:
array = []
File.open('word.txt').each do |line|
chars = line.chars
array << line if chars.uniq == chars
end
or shorter:
array = File.open('word.txt').select { |word| word.chars.uniq == word.chars }
You could use a regular expression, for example:
re = /
(.) # match and capture a single character
.*? # any number of characters in-between (non-greedy)
\1 # match the captured character again
/x
Example:
'start'[re] #=> "tart"
'james'[re] #=> nil
'hello'[re] #=> "ll"
'joins'[re] #=> nil
'regex'[re] #=> "ege"
It can be passed to grep to return all matched lines:
IO.foreach('word.txt').grep(re)
#=> ["start\n", "hello\n", "regex\n"]
or to grep_v to return the other lines:
IO.foreach('word.txt').grep_v(re)
#=> ["james\n", "joins\n"]
I have 2 strings:
a = "qwer"
b = "asd"
Result = "qawsedr"
Same is the length of b is greater than a. show alternate the characters.
What is the best way to do this? Should I use loop?
You can get the chars from your a and b string to work with them as arrays and then "merge" them using zip, then join them.
In the case of strings with different length, the array values must be reversed, so:
def merge_alternately(a, b)
a = a.chars
b = b.chars
if a.length >= b.length
a.zip(b)
else
array = b.zip(a)
array.map{|e| e != array[-1] ? e.reverse : e}
end
end
p merge_alternately('abc', 'def').join
# => "adbecf"
p merge_alternately('ab', 'zsd').join
# => "azbsd"
p merge_alternately('qwer', 'asd').join
# => "qawsedr"
Sebastián's answer gets the job done, but it's needlessly complex. Here's an alternative:
def merge_alternately(a, b)
len = [a.size, b.size].max
Array.new(len) {|n| [ a[n], b[n] ] }.join
end
merge_alternately("ab", "zsd")
# => "azbsd"
The first line gets the size of the longer string. The second line uses the block form of the Array constructor; it yields the indexes from 0 to len-1 to the block, resulting in an array like [["a", "z"], ["b", "s"], [nil, "d"]]. join turns it into a string, conveniently calling to_s on each item, which turns nil into "".
Here's another version that does basically the same thing, but skips the intermediate arrays:
def merge_alternately(a, b)
len = [a.size, b.size].max
len.times.reduce("") {|s, i| s + a[i].to_s + b[i].to_s }
end
len.times yields an Enumerator that yields the indexes from 0 to len-1. reduce starts with an empty string s and in each iteration appends the next characters from a and b (or ""—nil.to_s—if a string runs out of characters).
You can see both on repl.it: https://repl.it/I6c8/1
Just for fun, here's a couple more solutions. This one works a lot like Sebastián's solution, but pads the first array of characters with nils if it's shorter than the second:
def merge_alternately(a, b)
a, b = a.chars, b.chars
a[b.size - 1] = nil if a.size < b.size
a.zip(b).join
end
And it wouldn't be a Ruby answer without a little gsub:
def merge_alternately2(a, b)
if a.size < b.size
b.gsub(/./) { a[$`.size].to_s + $& }
else
a.gsub(/./) { $& + b[$`.size].to_s }
end
end
See these two on repl.it: https://repl.it/I6c8/2
I've been using the following code for the problem. I'm making a program to change the IUPAC name into structure, so i want to analyse the string entered by the user.In IUPAC name there are brackets as well. I want to extract the compound name as per the brackets. The way I have shown in the end.
I want to modify the way such that the output comes out to be like this and to be stored in an array :
As ["(4'-cyanobiphenyl-4-yl)","5-[(4'-cyanobiphenyl-4-yl)oxy]",
"({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}" .... and so on ]
And the code for splitting which i wrote is:
Reg_bracket=/([^(){}\[\]]*)([(){}\[\]])/
attr_reader :obrk, :cbrk
def count_level_br
#xbrk=0
#cbrk=0
if #temp1
#obrk+=1 if #temp1[1]=="(" || #temp1[1]=="[" ||#temp1[1]=="{"
#obrk-=1 if #temp1[1]==")" || #temp1[1]=="]" ||#temp1[1]=="}"
end
puts #obrk.to_s
end
def split_at_bracket(str=nil) #to split the brackets according to Regex
if str a=str
else a=self
end
a=~Reg_bracket
if $& #temp1=[$1,$2,$']
end
#temp1||=[a,"",""]
end
def find_block
#obrk=0 , r=""
#temp1||=["",""]
split_at_bracket
r<<#temp1[0]<<#temp1[1]
count_level_br
while #obrk!=0
split_at_bracket(#temp1[2])
r<<#temp1[0]<<#temp1[1]
count_level_br
puts r.to_s
if #obrk==0
puts "Level 0 has reached"
#puts "Close brackets are #{#cbrk}"
return r
end
end #end
end
end #class end'
I ve used the regex to match the brackets. And then when it finds any bracket it gives the result of before match, after match and second after match and then keeps on doing it until it reaches to the end.
The output which I m getting right now is this.
1
2
1-[(
3
1-[({
4
1-[({5-[
5
1-[({5-[(
4
1-[({5-[(4'-cyanobiphenyl-4-yl)
3
1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]
2
1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}
1
1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)
0
1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]
Level 0 has reached
testing ends'
I have written a simple program to match the string using three different regular expressions. The first one will help separate out the parenthesis, the second will separate out the square brackets and the third will give the curly braces. Here is the following code. I hope you will be able to use it in your program effectively.
reg1 = /(\([a-z0-9\'\-\[\]\{\}]+.+\))/ # for parenthesis
reg2 = /(\[[a-z0-9\'\-\(\)\{\}]+.+\])/ # for square brackets
reg3 = /(\{[a-z0-9\'\-\(\)\[\]]+.+\})/ # for curly braces
a = Array.new
s = gets.chomp
x = reg1.match(s)
a << x.to_s
str = x.to_s.chop.reverse.chop.reverse
while x != nil do
x = reg1.match(str)
a << x.to_s
str = x.to_s.chop
end
x = reg2.match(s)
a << x.to_s
str = x.to_s.chop.reverse.chop.reverse
while x != nil do
x = reg2.match(str)
a << x.to_s
str = x.to_s.chop
end
x = reg3.match(s)
a << x.to_s
str = x.to_s.chop.reverse.chop.reverse
while x != nil do
x = reg3.match(str)
a << x.to_s
str = x.to_s.chop
end
puts a
The output is a follows :
ruby reg_yo.rb
4,4'{-1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]-2-[(4'-cyanobiphenyl-4-yl)oxy]ethylene}dihexanoic acid # input string
({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]-2-[(4'-cyanobiphenyl-4-yl)
(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)
(4'-cyanobiphenyl-4-yl)
[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]-2-[(4'-cyanobiphenyl-4-yl)oxy]
[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]
[(4'-cyanobiphenyl-4-yl)oxy]
{-1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]-2-[(4'-cyanobiphenyl-4-yl)oxy]ethylene}
{5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}
Update : I have modified the code so as to search for recursive patterns.
I have an unsorted list of area postcodes as follows:
["E1", "E1C", "E1D", "E10", "E11", "E12", "E2", "E3", "E4", "EC1", "EC1A", "EC1M", "EC1N",
"EC1R", "EC1V", "EC1Y", "EC2", "EC2A", "EC2M", "EC2N", "N1", "N10", "N11", "N12",
"N13", "N2", "NW1", "NW10", "NW2" etc]
I'd like to sort them as follows:
["E1", "E1C", "E1D", "E2", "E3", "E4", "E10", "E11", "E12", "EC1", "EC1A", "EC1M", "EC1N",
"EC1R", "EC1V", "EC1Y", "EC2", "EC2A", "EC2M", "EC2N", "N1", "N2", "N10", "N11", "N12",
"N13", "NW1", "NW2, "NW10" etc]
So to sum up the order of the formats for postcodes beginning with E would be:
E1
E1C
E11
EC1
EC1V
Same order for postcodes beginning with N, etc.
What would be the recommended way of sorting such strings? In this case the format of the string is always known, i.e. it will always be 2-4 alphanumberic characters, the first always being a letter.
Should I order the strings by length first and then order within each length group, or is there a more elegant method?
I'd use
array.sort_by do |str|
/\A(\w)(\d+)\Z/ === str
[$1, $2.to_i]
end
or, if you have arbitrary sequences of alternating letters and digits,
array.sort_by do |str|
/\A(\D*)(\d*)(\D*)(\d*)\Z/.match(str)[1..-1].reject(&:blank?).collect do |item|
/\d/ === item ? item.to_i : item
end
end
Kind of a weird way of doing it, but I think this should work:
array.sort do |a, b|
a = a.dup
b = b.dup
regex = /(\d+)/
a.match(regex)
a_num = $1.to_i
b.match(regex)
b_num = $1.to_i
if a_num > b_num
a.gsub!(regex, "1")
b.gsub!(regex, "0")
elsif a_num < b_num
a.gsub!(regex, "0")
b.gsub!(regex, "1")
end
a <=> b
end