Define function to extract titles from passenger names
def get_title(name):
title_search = re.search(' ([A-Za-z]+)\.', name)
# If the title exists, extract and return it.
if title_search:
return title_search.group(1)
return ""
title_search = re.search(' ([A-Za-z]+)\.', name) What does this mean?
The titanic dataset has names of the passenger like: Graham, Miss. Margaret Edith and Behr, Mr. Karl Howell
The titles here are Mr. and Miss.
title_search = re.search(' ([A-Za-z]+).', name)
The above line of code searches for names having titles. Mr. and Miss. are not the only ones, it could also be for example, Dr., Prof. and so on. Since we do not know before hand what are the titles but we know the pattern, which is 'alphabets followed by a period', we look for those words alone.
([A-Za-z]+). means, look for any word that starts with A-Z or a-z and end with a fullstop.
I suggest you read about regular expressions.
Related
Tring to generate a random string but it needs to be formatted a specific way.
N = number
L = Capital Letter
must be NL-NN
needs hyphen as well
examples: 5K-22, 9L-19, 0R-66
every method I have tried has just generated a string but without the hyphen, I know it is probably something simple my brain just hurts thinking on it so I thought I'd see if one of yall could give me a hand.
Thanks
Try this:
function randomchar(a,b)
return string.char(math.random(string.byte(a),string.byte(b)))
end
a=randomchar('0','9')
b=randomchar('A','Z')
c=randomchar('0','9')
d=randomchar('0','9')
print(a..b..'-'..c..d)
I am trying to split this statement in Lua
sendex,000D6F0011BA2D60,fb,btn,1,on,100,null
i need output like this way:
Mac:000D6F0011BA2D60
Value:1
command:on
value:100
how to split and get the values?
local input = "sendex,000D6F0011BA2D60,fb,btn,1,on,100,null"
local buffer = {}
for word in input:gmatch('[^,]+') do
table.insert(buffer, word)
--print(word) -- uncomment this to see the words as they are being matched ;)
end
print("Mac:"..buffer[2])
print("Value:"..buffer[5])
...
For a complete explanation of what string.gmatch does, see the Lua reference. To summarize, it iterates over a string and searches for a pattern, in this case [^,]+, meaning all groups of 1 or more characters that aren't a comma. Every time it finds said pattern, it does something with it and continues searching.
If your input is exactly like you have described, the code below works:
s="sendex,000D6F0011BA2D60,fb,btn,1,on,100,null"
Mac,Value,command,value = s:match(".-,(.-),.-,.-,(.-),(.-),(.-),")
print(Mac,Value,command,value)
It uses the non-greedy pattern .- to split the input into fields. It also captures the relevant fields.
I am fairly new to Ruby and I am struggling with a regular expression to seed a database from this text file: http://www.gutenberg.org/cache/epub/673/pg673.txt.
I want the <h1> tags as the words for the dictionary database, and the <def> tags as the definitions.
I could be quite off base here (I've only ever seeded a db with copy and past ;):
require 'open-uri'
Dictionary.delete_all
g_text = open('http://www.gutenberg.org/cache/epub/673/pg673.txt')
y = g_text.read(/<h1>(.*?)<\/h1>/)
a = g_text.read(/<def>(.*?)<\/def>/)
Dictionary.create!(:word => y, :definition => a)
As you can see, there are often more than one <def> for each <h1>, which is fine, as I can just add columns to my table for definition1, definition2, etc.
But what would this regular expression look like to be sure that each definition is in the same row as the immediately preceding <h1> tag?
Thanks for an help!
Edit:
Okay, so this is what i am trying now:
doc.scan(Regexp.union(/<h1>(.*?)<\/h1>/, /<def>(.*?)<\/def>/)).map do |m, n|
p [m,n]
end
How do I get rid of all of the nil entries?
It seems like regular expression is the only way of making it through the whole document without stopping part way through when an error is encountered...at least after a couple attempts at other parsers.
what I came to (with a local extract for sandbox use):
require 'pp' # For SO to pretty print the hash at end
h1regex="h1>(.+)<\/h1" # Define the hl regex (avoid empty tags)
defregex="def>(.+)<\/def" # define the def regex (avoid empty tags)
# Initialize vars
defhash={}
key=nil
last=nil
open("./gut.txt") do |f|
f.each_line do |l|
newkey=l[/#{h1regex}/i,1] # get the next key (or nothing)
if (newkey != last && newkey != nil) then # if we changed key, update the hash (some redundant hl entries with other defs)
key = last = newkey # update current key
defhash[key] = [] # init the new entry to empty array
end
if l[/#{defregex}/i] then
defhash[key] << l[/#{defregex}/i,1] # we did match a def, add it to the current key array
end
end
end
pp defhash # print the result
Which give this output:
{"A"=>
[" The first letter of the English and of many other alphabets. The capital A of the alphabets of Middle and Western Europe, as also the small letter (a), besides the forms in Italic, black letter, etc., are all descended from the old Latin A, which was borrowed from the Greek <spn>Alpha</spn>, of the same form; and this was made from the first letter (<i>Aleph</i>, and itself from the Egyptian origin. The <i>Aleph</i> was a consonant letter, with a guttural breath sound that was not an element of Greek articulation; and the Greeks took it to represent their vowel <i>Alpha</i> with the \\'84 sound, the Ph\\'d2nician alphabet having no vowel symbols.",
"The name of the sixth tone in the model major scale (that in C), or the first tone of the minor scale, which is named after it the scale in A minor. The second string of the violin is tuned to the A in the treble staff. -- A sharp (A#) is the name of a musical tone intermediate between A and B. -- A flat (A♭) is the name of a tone intermediate between A and G.",
"In each; to or for each; <as>as, \"twenty leagues <ex>a</ex> day\", \"a hundred pounds <ex>a</ex> year\", \"a dollar <ex>a</ex> yard\", etc.</as>",
"In; on; at; by.",
"In process of; in the act of; into; to; -- used with verbal substantives in <i>-ing</i> which begin with a consonant. This is a shortened form of the preposition <i>an</i> (which was used before the vowel sound); as in <i>a</i> hunting, <i>a</i> building, <i>a</i> begging. \"Jacob, when he was <i>a</i> dying\" <i>Heb. xi. 21</i>. \"We'll <i>a</i> birding together.\" \" It was <i>a</i> doing.\" <i>Shak.</i> \"He burst out <i>a</i> laughing.\" <i>Macaulay</i>. The hyphen may be used to connect <i>a</i> with the verbal substantive (as, <i>a</i>-hunting, <i>a</i>-building) or the words may be written separately. This form of expression is now for the most part obsolete, the <i>a</i> being omitted and the verbal substantive treated as a participle.",
"Of.",
" A barbarous corruption of <i>have</i>, of <i>he</i>, and sometimes of <i>it</i> and of <i>they</i>."],
"Abalone"=>
["A univalve mollusk of the genus <spn>Haliotis</spn>. The shell is lined with mother-of-pearl, and used for ornamental purposes; the sea-ear. Several large species are found on the coast of California, clinging closely to the rocks."],
"Aband"=>["To abandon.", "To banish; to expel."],
"Abandon"=>
["To cast or drive out; to banish; to expel; to reject.",
"To give up absolutely; to forsake entirely ; to renounce utterly; to relinquish all connection with or concern on; to desert, as a person to whom one owes allegiance or fidelity; to quit; to surrender.",
"Reflexively : To give (one's self) up without attempt at self-control ; to yield (one's self) unrestrainedly ; -- often in a bad sense.",
"To relinquish all claim to; -- used when an insured person gives up to underwriters all claim to the property covered by a policy, which may remain after loss or damage by a peril insured against."]}
Hope it can help.
Late edit: there's probably a better way, I'm not a ruby expert. I was just giving a usual advice while reviewing, but as it seems no one has answered this is how I would do it.
I'm simply trying to convert uppercased company names into proper names.
Company names can include:
Dashes
Apostrophes
Roman Numerals
Text like LLC, LP, INC which should stay uppercase.
I thought I might be able to use acronyms like this:
ACRONYMS = %W( LP III IV VI VII VIII IX GI)
ActiveSupport::Inflector.inflections(:en) do |inflect|
ACRONYMS.each { |a| inflect.acronym(a) }
end
However, the conversion does not take into account word breaks, so having VI and VII does not work. For example, the conversion of "ADVISORS".titleize is "Ad VI Sors", as the VI becomes a whole word.
Dashes get removed.
It seems like there should be a generic gem for this generic problem, but I didn't find one. Is this problem really not that common? What's the best solution besides completely hacking the current inflection library?
Company names are a little odd, since a lot of times they're Marks (as in Service Mark) more than proper names. That means precise capitalization might actually matter, and trying to titleize might not be worth it.
In any case, here's a pattern that might work. Build your list of tokens to "keep", then manually split the string up and titleize the non-token parts.
# Make sure you put long strings before short (VII before VI)
word_tokens = %w{VII VI IX XI}
# Special characters need to be separate, since they never appear as "part" of another word
special_tokens = %w{-}
# Builds a regex like /(\bVII\b|\bVI\b|-|)/ that wraps "word tokens" in a word boundary check
token_regex = /(#{word_tokens.map{|t| /\b#{t}\b/}.join("|")}|#{special_tokens.join("|")})/
title = "ADVISORS-XI"
title.split(token_regex).map{|s| s =~ token_regex ? s : s.titleize}.join
I'm new to LaTeX and BibTeX, so excuse my ignorance. I have the following entry:
#Article{Hart,
author = {P.E. Hart, N.J. Nilsson, B. Raphael},
title = {Correction to \"A Formal Basis for the Heuristic Determination of Minimum Cost Paths\" },
journal = {SIGART Newsletter 37},
year = {1972},
pages = {28-29}
}
But this comes out as a capital letter A with diaeresis (Ä) and a ':', respectively. How do you get BibTeX to display quotes in a title?
First, quotes should be done like ``this'' (with back-ticks and apostrophes). Second, wrapping this in braces {like this} protects it from being messed with by bibtex. (You need to do this to get capital letters in article titles, for instance.