I recently solved this problem, but felt there is a simpler way to do it. I looked into inject, step, and map, but couldn't figure out how to implement them into this code. I want to use fewer lines of code than I am now. I'm new to ruby so if the answer is simple I'd love to add it to my toolbag. Thank you in advance.
goal: accept a sentence string as an arg, and return the sentence with words alternating between uppercase and lowercase
def alternating_case(str)
newstr = []
words = str.split
words.each.with_index do |word, i|
if i.even?
newstr << word.upcase
else
newstr << word.downcase
end
end
newstr.join(" ")
end
You could reduce the number of lines in the each_with_index block by using a ternary conditional (true/false ? value_if_true : value_if_false):
words.each.with_index do |word, i|
newstr << i.even? ? word.upcase : word.downcase
end
As for a different way altogether, you could iterate over the initial string, letter-by-letter, and then change the method when you hit a space:
def alternating_case(str)
#downcase = true
new_str = str.map { |letter| set_case(letter)}
end
def set_case(letter)
#downcase != #downcase if letter == ' '
return #downcase ? letter.downcase : letter.upcase
end
We can achieve this by using ruby's Array#cycle.
Array#cycle returns an Enumerator object which calls block for each element of enum repeatedly n times or forever if none or nil is given.
cycle_enum = [:upcase, :downcase].cycle
#=> #<Enumerator: [:upcase, :downcase]:cycle>
5.times.map { cycle_enum.next }
#=> [:upcase, :downcase, :upcase, :downcase, :upcase]
Now, using the above we can write it as following:
word = "dummyword"
cycle_enum = [:upcase, :downcase].cycle
word.chars.map { |c| c.public_send(cycle_enum.next) }.join("")
#=> "DuMmYwOrD"
Note: If you are new to ruby, you may not be familiar with public_send or Enumberable module. You can use the following references.
Enumberable#cycle
#send & #public_send
I am following the guide here
Currently this is the model:
SOS_token = 0
EOS_token = 1
class Lang:
def __init__(self, name):
self.name = name
self.word2index = {}
self.word2count = {}
self.index2word = {0: "SOS", 1: "EOS"}
self.n_words = 2 # Count SOS and EOS
def addSentence(self, sentence):
for word in sentence.split(' '):
self.addWord(word)
def addWord(self, word):
if word not in self.word2index:
self.word2index[word] = self.n_words
self.word2count[word] = 1
self.index2word[self.n_words] = word
self.n_words += 1
else:
self.word2count[word] += 1
def unicodeToAscii(s):
return ''.join(
c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn'
)
# Lowercase, trim, and remove non-letter characters
def normalizeString(s):
s = unicodeToAscii(s.lower().strip())
s = re.sub(r"([.!?])", r" \1", s)
s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
return s
def readLangs(lang1, lang2, reverse=False):
print("Reading lines...")
# Read the file and split into lines
lines = open('Scribe/%s-%s.txt' % (lang1, lang2), encoding='utf-8').\
read().strip().split('\n')
# Split every line into pairs and normalize
pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines]
# Reverse pairs, make Lang instances
if reverse:
pairs = [list(reversed(p)) for p in pairs]
input_lang = Lang(lang2)
output_lang = Lang(lang1)
else:
input_lang = Lang(lang1)
output_lang = Lang(lang2)
return input_lang, output_lang, pair
MAX_LENGTH = 5000
eng_prefixes = (
"i am ", "i m ",
"he is", "he s ",
"she is", "she s ",
"you are", "you re ",
"we are", "we re ",
"they are", "they re "
)
def filterPair(p):
return len(p[0].split(' ')) < MAX_LENGTH and \
len(p[1].split(' ')) < MAX_LENGTH and \
p[1].startswith(eng_prefixes)
def filterPairs(pairs):
return [pair for pair in pairs if filterPair(pair)]
def prepareData(lang1, lang2, reverse=False):
input_lang, output_lang, pairs = readLangs(lang1, lang2, reverse)
print("Read %s sentence pairs" % len(pairs))
pairs = filterPairs(pairs)
print("Trimmed to %s sentence pairs" % len(pairs))
print("Counting words...")
for pair in pairs:
input_lang.addSentence(pair[0])
output_lang.addSentence(pair[1])
print("Counted words:")
print(input_lang.name, input_lang.n_words)
print(output_lang.name, output_lang.n_words)
return input_lang, output_lang, pairs
The difference between what I'm trying to do and the guide is that I'm trying to insert my input languages as list of strings instead of reading them from a file:
pairs=['string one goes like this', 'string two goes like this']
input_lang = Lang(pairs[0][0])
output_lang = Lang(pairs[1][1])
But I it seems like when I try to count the number of words input_lang.n_words in my string I always get 2.
Is there something I'm missing in calling the class Lang?
Update:
I ran
language = Lang('english')
for sentence in pairs: language.addSentence(sentence)
print (language.n_words)
and that gave me the number of words in pairs
Though, that doesn't give me input_lang and output_lang like the guide did:
for pair in pairs:
input_lang.addSentence(pair[0])
output_lang.addSentence(pair[1])
So first of all you are initialising the Lang object with calls to pairs[0][0] and pairs[1][1] which is the same as Lang('s') and Lang('t')
The Lang object is supposed to be an object that stores information about a language so I would expect you need to only initialise it once with Lang('english') and then add the sentences from you dataset to the Lang object with the Lang.addSentence function.
Right now you aren't loading your dataset into the Lang object at all so when you want to know language.n_words it is just the initial value it gets when the object is created self.n_words = 2 # Count SOS and EOS
None of what you are doing in your question makes any sense, but I think what you want is the following:
language = Lang('english')
for sentence in pairs: language.addSentence(sentence)
print (language.n_words)
I am trying to modify my URL to be clean and friendly by removing more than one occurrence of specific characters
local function fix_url(str)
return str:gsub("[+/=]", {["+"] = "+", ["/"] = "/", ["="] = "="}) --Needs some regex to remove multiple occurances of characters
end
url = "///index.php????page====about&&&lol===you"
output = fix_url(url)
What I would like to achieve the output as is this :
"/index.php?page=about&lol=you"
But instead my output is this :
"///index.php????page====about&&&lol===you"
Is gsub the way i should be doing this ?
I don't see how to do this with one call to gsub. The code below does this by calling gsub once for each character:
url = "///index.php????page====about&&&lol===you"
function fix_url(s,C)
for c in C:gmatch(".") do
s=s:gsub(c.."+",c)
end
return s
end
print(fix_url(url,"+/=&?"))
Here's one possible solution (replace %p with whatever character class you like):
local
function fold(s)
local ans = ''
for s in s:gmatch '.' do
if s ~= ans:sub(-1) then ans = ans .. s end
end
return ans
end
local
function fix_url(s)
return s:gsub('%p+',fold) --remove multiple same characters
end
url = '///index.php????page====about&&&lol===you'
output = fix_url(url)
print(output)
I've been using the following code for the problem. I'm making a program to change the IUPAC name into structure, so i want to analyse the string entered by the user.In IUPAC name there are brackets as well. I want to extract the compound name as per the brackets. The way I have shown in the end.
I want to modify the way such that the output comes out to be like this and to be stored in an array :
As ["(4'-cyanobiphenyl-4-yl)","5-[(4'-cyanobiphenyl-4-yl)oxy]",
"({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}" .... and so on ]
And the code for splitting which i wrote is:
Reg_bracket=/([^(){}\[\]]*)([(){}\[\]])/
attr_reader :obrk, :cbrk
def count_level_br
#xbrk=0
#cbrk=0
if #temp1
#obrk+=1 if #temp1[1]=="(" || #temp1[1]=="[" ||#temp1[1]=="{"
#obrk-=1 if #temp1[1]==")" || #temp1[1]=="]" ||#temp1[1]=="}"
end
puts #obrk.to_s
end
def split_at_bracket(str=nil) #to split the brackets according to Regex
if str a=str
else a=self
end
a=~Reg_bracket
if $& #temp1=[$1,$2,$']
end
#temp1||=[a,"",""]
end
def find_block
#obrk=0 , r=""
#temp1||=["",""]
split_at_bracket
r<<#temp1[0]<<#temp1[1]
count_level_br
while #obrk!=0
split_at_bracket(#temp1[2])
r<<#temp1[0]<<#temp1[1]
count_level_br
puts r.to_s
if #obrk==0
puts "Level 0 has reached"
#puts "Close brackets are #{#cbrk}"
return r
end
end #end
end
end #class end'
I ve used the regex to match the brackets. And then when it finds any bracket it gives the result of before match, after match and second after match and then keeps on doing it until it reaches to the end.
The output which I m getting right now is this.
1
2
1-[(
3
1-[({
4
1-[({5-[
5
1-[({5-[(
4
1-[({5-[(4'-cyanobiphenyl-4-yl)
3
1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]
2
1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}
1
1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)
0
1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]
Level 0 has reached
testing ends'
I have written a simple program to match the string using three different regular expressions. The first one will help separate out the parenthesis, the second will separate out the square brackets and the third will give the curly braces. Here is the following code. I hope you will be able to use it in your program effectively.
reg1 = /(\([a-z0-9\'\-\[\]\{\}]+.+\))/ # for parenthesis
reg2 = /(\[[a-z0-9\'\-\(\)\{\}]+.+\])/ # for square brackets
reg3 = /(\{[a-z0-9\'\-\(\)\[\]]+.+\})/ # for curly braces
a = Array.new
s = gets.chomp
x = reg1.match(s)
a << x.to_s
str = x.to_s.chop.reverse.chop.reverse
while x != nil do
x = reg1.match(str)
a << x.to_s
str = x.to_s.chop
end
x = reg2.match(s)
a << x.to_s
str = x.to_s.chop.reverse.chop.reverse
while x != nil do
x = reg2.match(str)
a << x.to_s
str = x.to_s.chop
end
x = reg3.match(s)
a << x.to_s
str = x.to_s.chop.reverse.chop.reverse
while x != nil do
x = reg3.match(str)
a << x.to_s
str = x.to_s.chop
end
puts a
The output is a follows :
ruby reg_yo.rb
4,4'{-1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]-2-[(4'-cyanobiphenyl-4-yl)oxy]ethylene}dihexanoic acid # input string
({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]-2-[(4'-cyanobiphenyl-4-yl)
(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)
(4'-cyanobiphenyl-4-yl)
[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]-2-[(4'-cyanobiphenyl-4-yl)oxy]
[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]
[(4'-cyanobiphenyl-4-yl)oxy]
{-1-[({5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}oxy)carbonyl]-2-[(4'-cyanobiphenyl-4-yl)oxy]ethylene}
{5-[(4'-cyanobiphenyl-4-yl)oxy]pentyl}
Update : I have modified the code so as to search for recursive patterns.
String = "Mod1:10022932,10828075,5946410,13321905,5491120,5030731|Mod2:22704455,22991440,22991464,21984312,21777721,21777723,21889761,21939852,23091478,22339903,23091485,22099714,21998260,22364832,21939858,21944274,21944226,22800221,22704443,21777728,21777719,21678184,21998265,21834900,21984331,22704454,21998261,21944214,21862610,21836482|Mod3:10828075,13321905,5491120,5946410,5030731,15806212,4100566,4787137,2625339,2408317,2646868,19612047,2646862,11983534,8591489,19612048,10249319,14220471,15806209,13330887,15075124,17656842,3056657,5086273|Mod4:10828075,5946410,13321905,5030731,5491120,4787137,4100566,15806212,2625339,3542205,2408317,2646862,2646868|Mod5:10022932;0.2512,10828075;0.2093,5030731;0.1465,5946410;0.1465,4787137;0.1465,2625339;0.0143,5491120;0.0143,13321905;0.0143,3542205;0.0143,15806212;0.0119,4100566;0.0119,19612047;0.0100,2408317;0.0100"
How can I split it out so that I can get each title(Mod1, Mod2..) and the ID's that belong to each title.
This is that I've tried so far, which is removing everything after the pipe, which I dont want.
mod_name = string.split(":")[0]
mod_ids = string.split(":")[1] #This gets me the ID's but also include the |Mod*
ids = mod_mod_ids.split("|").first.strip #Only returns Id's before the first "|"
Desired Output:
I need to save mod_name and mod_ids to their respective columns,
mod_name = #name ("Mod1...Mod2 etc) #string
mod_ids = #ids (All Ids after the ":" in Mod*:) #array
I think this does what you want:
ids = string.split("|").map {|part| [part.split(":")[0], part.split(":")[1].split(/,|;/)]}
There are a couple of ways to do this:
# This will split the string on "|" and ":" and will return:
# %w( Mod1 id1 Mod2 id2 Mod3 id3 ... )
ids = string.split(/[|:]/)
# This will first split on "|", and for each string, split it again on ":" and returs:
# [ %w(Mod1 id1), %w(Mod2 id2), %w(Mod3 id3), ... ]
ids = string.split("|").map { |str| str.split(":") }
If you want a Hash as a result for easy access via the titles, then you could do this:
str.split('|').inject({}){|h,x| k,v = x.split(':'); h[k] = v.split(','); h}
=> {
"Mod1"=>["10022932", "10828075", "5946410", "13321905", "5491120", "5030731"],
"Mod2"=>["22704455", "22991440", "22991464", "21984312", "21777721", "21777723", "21889761", "21939852", "23091478", "22339903", "23091485", "22099714", "21998260", "22364832", "21939858", "21944274", "21944226", "22800221", "22704443", "21777728", "21777719", "21678184", "21998265", "21834900", "21984331", "22704454", "21998261", "21944214", "21862610", "21836482"],
"Mod3"=>["10828075", "13321905", "5491120", "5946410", "5030731", "15806212", "4100566", "4787137", "2625339", "2408317", "2646868", "19612047", "2646862", "11983534", "8591489", "19612048", "10249319", "14220471", "15806209", "13330887", "15075124", "17656842", "3056657", "5086273"],
"Mod4"=>["10828075", "5946410", "13321905", "5030731", "5491120", "4787137", "4100566", "15806212", "2625339", "3542205", "2408317", "2646862", "2646868"],
"Mod5"=>["10022932;0.2512", "10828075;0.2093", "5030731;0.1465", "5946410;0.1465", "4787137;0.1465", "2625339;0.0143", "5491120;0.0143", "13321905;0.0143", "3542205;0.0143", "15806212;0.0119", "4100566;0.0119", "19612047;0.0100", "2408317;0.0100"]
}
Untested:
all_mods = {}
string.split("|").each do |fragment|
mod_fragments = fragment.split(":")
all_mods[mod_fragments[0]] = mod_fragments[1].split(",")
end
What I ended up using thanks to #tillerjs help.
data = sting.split("|")
data.each do |mod|
module_name = mod.split(":")[0]
recommendations = mod.split(":")[1]
end