Does anyone know of a Rails Helper which can automatically prepend the appropriate article to a given string? For instance, if I pass in "apple" to the function it would turn out "an apple", whereas if I were to send in "banana" it would return "a banana"
I already checked the Rails TextHelper module but could not find anything. Apologies if this is a duplicate but it is admittedly a hard answer to search for...
None that I know of but it seems simple enough to write a helper for this right?
Off the top of my head
def indefinite_articlerize(params_word)
%w(a e i o u).include?(params_word[0].downcase) ? "an #{params_word}" : "a #{params_word}"
end
hope that helps
edit 1: Also found this thread with a patch that might help you bulletproof this more https://rails.lighthouseapp.com/projects/8994/tickets/2566-add-aan-inflector-indefinitize
There is now a gem for this: indefinite_article.
Seems like checking that the first letter is a vowel would get you most of the way there, but there are edge cases:
Some people will say "an historic moment" but write "a historic moment".
But, it's "a history"!
Acronyms and abbreviations are problematic ("An NBC reporter" but "A NATO authority")
Words starting with a vowel but pronounced with an initial consonant ("a union")
Others?
(source)
I know the following answer goes too much for a practical simple implementation, but in case someone wants to do it with accuracy under some scale of implementation.
The rule is actually pretty much simple, but the problem is that the rule is dependent on the pronounciation, not spelling:
If the initial sound is a vowel sound (not necessarily a vowel letter), then prepend 'an', otherwise prepend 'a'.
Referring to John's examples:
'an hour' because the 'h' here is a vowel sound, whereas 'a historic' because the 'h' here is a consonant sound. 'an NBC' because the 'N' here is read as 'en', whereas 'a NATO' because the 'N' here is read as 'n'.
So the question is reduced to finding out: "when are certain letters pronounced as vowel sounds". In order to do that, you somehow need to access a dictionary that has phonological representations for each word, and check its initial phoneme.
Look into https://deveiate.org/code/linguistics/ - it provides handling of indefinite articles and much more. I've used it successfully on many projects.
I love the gem if you want a comprehensive solution. But if you just want tests to read more nicely, it is helpful to monkey patch String to follow the standard Rails inflector pattern:
class String
def articleize
%w(a e i o u).include?(self[0].downcase) ? "an #{self}" : "a #{self}"
end
end
Related
I'm trying to do this example :
sentence="{My name is {Adam} and I don't work here}"
Result should be 'Adam'
So what I'm trying to say is however many parenthesis exist I want the result to show the value of the last closed parenthesis
It's not clear from your question, but if there can only ever be one set of outer braces at any level (i.e. "{My name} {is {Adam}}" and "{My {name} is {Adam}}" are invalid input), you can take advantage of the fact that what you want is the last opening brace in the sentence.
def deepest(sentence):
intermediate = sentence.rpartition("{")[-1]
return intermediate[:intermediate.index("}")]
deepest("{My name is {Adam} and I don't work here}")
# 'Adam'
deepest("{Someone {set us {{up} the bomb}!}}")
# 'up'
The regex answer also makes this assumption, though regex is likely to be much slower. If multiple outer braces are possible, please make your question clearer.
You can't just index strings like that... The best way is to use a clever regex:
>>> import re
>>> re.search(r'{[^{}]*}', "{My name is {Adam} and I don't work here}").group()
'{Adam}'
This regex pattern essentially searches for every set of {} that doesn't have the characters { or } in them.
Using Rstudio.
Have a descriptive character feature that begins with values like "I love you", "I love him", "I love my dad", "I rather love...", "I hate..", "I don't care..", "I surely love....". Many "I * love" patterns, among others.
Now I like to create a new feature that =1 if the raw feature begins with "I love*". Otherwise the new feature =0.
In SAS, i can just write such:
if compress(old_feature) in: ("Ilove") then new_feature=1; else new_feature=0;
How to do that in Rstudio? I have searched here and the closest example is below
grep("^FA_.*Sc$",names(nc_df), value=TRUE). But this captures a lot I don't want. For example, "I definitely love".
Thanks.
Trying to work out how to parse out phone numbers that are left in a string.
e.g.
"Hi Han, this is Chewie, Could you give me a call on 02031234567"
"Hi Han, this is Chewie, Could you give me a call on +442031234567"
"Hi Han, this is Chewie, Could you give me a call on +44 (0) 203 123 4567"
"Hi Han, this is Chewie, Could you give me a call on 0207-123-4567"
"Hi Han, this is Chewie, Could you give me a call on 02031234567 OR +44207-1234567"
And be able to consistently replace any one of them with some other item (e.g. some text, or a link).
Am assuming it's a regex type approach (I'm already doing something similar with email which works well).
I've got to
text.scan(/([^A-Z|^"]{6,})/i)
Which leaves me a leading space I can't work out how to drop (would appreciate the help there).
Is there a standard way of doing this that people use?
It also drops things into arrays, which isn't particularly helpful
i.e. if there were multiple numbers.
[["02031234567"]["+44207-1234567"]]
as opposed to
["02031234567","+44207-1234567"]
Adding in the third use-case with spaces is difficult. I think the only way to successfully meet that acceptance criteria would be to chain a #gsub call on to your #scan.
Thus:
text.gsub(/\s+/, "").scan(/([^A-Z|^"|^\s]{6,})/i)
The following code will extract all the numbers for you:
text.scan(/(?<=[ ])[\d \-+()]+$|(?<=[ ])[\d \-+()]+(?=[ ]\w)/)
For the examples you supplied this results in:
["02031234567"]
["+442031234567"]
["+44 (0) 203 123 4567"]
["0207-123-4567"]
["02031234567", "+44207-1234567"]
To understand this regex, what we are matching is:
[\d \-+()]+ which is a sequence of one or more digits, spaces, minus, plus, opening or closing brackets (in any order - NB regex is greedy by default, so it will match as many of these characters next to each other as possible)
that must be preceded by a space (?<=[ ]) - NB the space in the positive look-behind is not captured, and therefore this makes sure that there are no leading spaces in the results
and is either at the end of the string $, or | is followed by a space then a word character (?=[ ]\w) (NB this lookahead is not captured)
This pattern will get rid of the space but not match your third case with spaces:
/([^A-Z|^"|^\s]{6,})/i
This is what I came to in the end in case it helps somebody
numbers = text.scan(/([^A-Z|^"]{6,})/i).collect{|x| x[0].strip }
That gives me an array of
["+442031234567", "02031234567"]
I'm sure there is a more elegant way of doing this and possibly you'd want to check the numbers for likelihood of being phonelike - e.g. using the brilliant Phony gem.
numbers = text.scan(/([^A-Z|^"]{6,})/i).collect{|x| x[0].strip }
real_numbers = numbers.keep_if{|n| Phony.plausible? PhonyRails.normalize_number(n, default_country_code: "GB")}
Which should help exclude serial numbers or the like from being identified as numbers. You'll obviously want to change the country code to something relevant for you.
I am fairly new to Ruby and I am struggling with a regular expression to seed a database from this text file: http://www.gutenberg.org/cache/epub/673/pg673.txt.
I want the <h1> tags as the words for the dictionary database, and the <def> tags as the definitions.
I could be quite off base here (I've only ever seeded a db with copy and past ;):
require 'open-uri'
Dictionary.delete_all
g_text = open('http://www.gutenberg.org/cache/epub/673/pg673.txt')
y = g_text.read(/<h1>(.*?)<\/h1>/)
a = g_text.read(/<def>(.*?)<\/def>/)
Dictionary.create!(:word => y, :definition => a)
As you can see, there are often more than one <def> for each <h1>, which is fine, as I can just add columns to my table for definition1, definition2, etc.
But what would this regular expression look like to be sure that each definition is in the same row as the immediately preceding <h1> tag?
Thanks for an help!
Edit:
Okay, so this is what i am trying now:
doc.scan(Regexp.union(/<h1>(.*?)<\/h1>/, /<def>(.*?)<\/def>/)).map do |m, n|
p [m,n]
end
How do I get rid of all of the nil entries?
It seems like regular expression is the only way of making it through the whole document without stopping part way through when an error is encountered...at least after a couple attempts at other parsers.
what I came to (with a local extract for sandbox use):
require 'pp' # For SO to pretty print the hash at end
h1regex="h1>(.+)<\/h1" # Define the hl regex (avoid empty tags)
defregex="def>(.+)<\/def" # define the def regex (avoid empty tags)
# Initialize vars
defhash={}
key=nil
last=nil
open("./gut.txt") do |f|
f.each_line do |l|
newkey=l[/#{h1regex}/i,1] # get the next key (or nothing)
if (newkey != last && newkey != nil) then # if we changed key, update the hash (some redundant hl entries with other defs)
key = last = newkey # update current key
defhash[key] = [] # init the new entry to empty array
end
if l[/#{defregex}/i] then
defhash[key] << l[/#{defregex}/i,1] # we did match a def, add it to the current key array
end
end
end
pp defhash # print the result
Which give this output:
{"A"=>
[" The first letter of the English and of many other alphabets. The capital A of the alphabets of Middle and Western Europe, as also the small letter (a), besides the forms in Italic, black letter, etc., are all descended from the old Latin A, which was borrowed from the Greek <spn>Alpha</spn>, of the same form; and this was made from the first letter (<i>Aleph</i>, and itself from the Egyptian origin. The <i>Aleph</i> was a consonant letter, with a guttural breath sound that was not an element of Greek articulation; and the Greeks took it to represent their vowel <i>Alpha</i> with the \\'84 sound, the Ph\\'d2nician alphabet having no vowel symbols.",
"The name of the sixth tone in the model major scale (that in C), or the first tone of the minor scale, which is named after it the scale in A minor. The second string of the violin is tuned to the A in the treble staff. -- A sharp (A#) is the name of a musical tone intermediate between A and B. -- A flat (A♭) is the name of a tone intermediate between A and G.",
"In each; to or for each; <as>as, \"twenty leagues <ex>a</ex> day\", \"a hundred pounds <ex>a</ex> year\", \"a dollar <ex>a</ex> yard\", etc.</as>",
"In; on; at; by.",
"In process of; in the act of; into; to; -- used with verbal substantives in <i>-ing</i> which begin with a consonant. This is a shortened form of the preposition <i>an</i> (which was used before the vowel sound); as in <i>a</i> hunting, <i>a</i> building, <i>a</i> begging. \"Jacob, when he was <i>a</i> dying\" <i>Heb. xi. 21</i>. \"We'll <i>a</i> birding together.\" \" It was <i>a</i> doing.\" <i>Shak.</i> \"He burst out <i>a</i> laughing.\" <i>Macaulay</i>. The hyphen may be used to connect <i>a</i> with the verbal substantive (as, <i>a</i>-hunting, <i>a</i>-building) or the words may be written separately. This form of expression is now for the most part obsolete, the <i>a</i> being omitted and the verbal substantive treated as a participle.",
"Of.",
" A barbarous corruption of <i>have</i>, of <i>he</i>, and sometimes of <i>it</i> and of <i>they</i>."],
"Abalone"=>
["A univalve mollusk of the genus <spn>Haliotis</spn>. The shell is lined with mother-of-pearl, and used for ornamental purposes; the sea-ear. Several large species are found on the coast of California, clinging closely to the rocks."],
"Aband"=>["To abandon.", "To banish; to expel."],
"Abandon"=>
["To cast or drive out; to banish; to expel; to reject.",
"To give up absolutely; to forsake entirely ; to renounce utterly; to relinquish all connection with or concern on; to desert, as a person to whom one owes allegiance or fidelity; to quit; to surrender.",
"Reflexively : To give (one's self) up without attempt at self-control ; to yield (one's self) unrestrainedly ; -- often in a bad sense.",
"To relinquish all claim to; -- used when an insured person gives up to underwriters all claim to the property covered by a policy, which may remain after loss or damage by a peril insured against."]}
Hope it can help.
Late edit: there's probably a better way, I'm not a ruby expert. I was just giving a usual advice while reviewing, but as it seems no one has answered this is how I would do it.
I have an ActiveRecord model, Foo, which has a name field. I'd like users to be able to search by name, but I'd like the search to ignore case and any accents. Thus, I'm also storing a canonical_name field against which to search:
class Foo
validates_presence_of :name
before_validate :set_canonical_name
private
def set_canonical_name
self.canonical_name ||= canonicalize(self.name) if self.name
end
def canonicalize(x)
x.downcase. # something here
end
end
I need to fill in the "something here" to replace the accented characters. Is there anything better than
x.downcase.gsub(/[àáâãäå]/,'a').gsub(/æ/,'ae').gsub(/ç/, 'c').gsub(/[èéêë]/,'e')....
And, for that matter, since I'm not on Ruby 1.9, I can't put those Unicode literals in my code. The actual regular expressions will look much uglier.
ActiveSupport::Inflector.transliterate (requires Rails 2.2.1+ and Ruby 1.9 or 1.8.7)
example:
>> ActiveSupport::Inflector.transliterate("àáâãäå").to_s
=> "aaaaaa"
Rails has already a builtin for normalizing, you just have to use this to normalize your string to form KD and then remove the other chars (i.e. accent marks) like this:
>> "àáâãäå".mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s
=> "aaaaaa"
Better yet is to use I18n:
1.9.3-p392 :001 > require "i18n"
=> false
1.9.3-p392 :002 > I18n.transliterate("Olá Mundo!")
=> "Ola Mundo!"
I have tried a lot of this approaches but they were not achieving one or several of these requirements:
Respect spaces
Respect 'ñ' character
Respect case (I know is not a requirement for the original question but is not difficult to move an string to lowcase)
Has been this:
# coding: utf-8
string.tr(
"ÀÁÂÃÄÅàáâãäåĀāĂ㥹ÇçĆćĈĉĊċČčÐðĎďĐđÈÉÊËèéêëĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħÌÍÎÏìíîïĨĩĪīĬĭĮįİıĴĵĶķĸĹĺĻļĽľĿŀŁłÑñŃńŅņŇňʼnŊŋÒÓÔÕÖØòóôõöøŌōŎŏŐőŔŕŖŗŘřŚśŜŝŞşŠšſŢţŤťŦŧÙÚÛÜùúûüŨũŪūŬŭŮůŰűŲųŴŵÝýÿŶŷŸŹźŻżŽž",
"AAAAAAaaaaaaAaAaAaCcCcCcCcCcDdDdDdEEEEeeeeEeEeEeEeEeGgGgGgGgHhHhIIIIiiiiIiIiIiIiIiJjKkkLlLlLlLlLlNnNnNnNnnNnOOOOOOooooooOoOoOoRrRrRrSsSsSsSssTtTtTtUUUUuuuuUuUuUuUuUuUuWwYyyYyYZzZzZz"
)
– http://blog.slashpoundbang.com/post/12938588984/remove-all-accents-and-diacritics-from-string-in-ruby
You have to modify a little bit the character list to respect 'ñ' character but is an easy job.
My answer: the String#parameterize method:
"Le cœur de la crémiére".parameterize
=> "le-coeur-de-la-cremiere"
For non-Rails programs:
Install activesupport: gem install activesupport then:
require 'active_support/inflector'
"a&]'s--3\014\xC2àáâã3D".parameterize
# => "a-s-3-3d"
Decompose the string and remove non-spacing marks from it.
irb -ractive_support/all
> "àáâãäå".mb_chars.normalize(:kd).gsub(/\p{Mn}/, '')
aaaaaa
You may also need this if used in a .rb file.
# coding: utf-8
the normalize(:kd) part here splits out diacriticals where possible (ex: the "n with tilda" single character is split into an n followed by a combining diacritical tilda character), and the gsub part then removes all the diacritical characters.
I think that you maybe don't really what to go down that path. If you are developing for a market that has these kind of letters your users probably will think you are a sort of ...pip.
Because 'å' isn't even close to 'a' in any meaning to a user.
Take a different road and read up about searching in a non-ascii way. This is just one of those cases someone invented unicode and collation.
A very late PS:
http://www.w3.org/International/wiki/Case_folding
http://www.w3.org/TR/charmod-norm/#sec-WhyNormalization
Besides that I have no ide way the link to collation go to a msdn page but I leave it there. It should have been http://www.unicode.org/reports/tr10/
This assumes you use Rails.
"anything".parameterize.underscore.humanize.downcase
Given your requirements, this is probably what I'd do... I think it's neat, simple and will stay up to date in future versions of Rails and Ruby.
Update: dgilperez pointed out that parameterize takes a separator argument, so "anything".parameterize(" ") (deprecated) or "anything".parameterize(separator: " ") is shorter and cleaner.
Convert the text to normalization form D, remove all codepoints with unicode category non spacing mark (Mn), and convert it back to normalization form C. This will strip all diacritics, and your problem is reduced to a case insensitive search.
See http://www.siao2.com/2005/02/19/376617.aspx and http://www.siao2.com/2007/05/14/2629747.aspx for details.
The key is to use two columns in your database: canonical_text and original_text. Use original_text for display and canonical_text for searches. That way, if a user searches for "Visual Cafe," she sees the "Visual Café" result. If she really wants a different item called "Visual Cafe," it can be saved separately.
To get the canonical_text characters in a Ruby 1.8 source file, do something like this:
register_replacement([0x008A].pack('U'), 'S')
You probably want Unicode decomposition ("NFD"). After decomposing the string, just filter out anything not in [A-Za-z]. æ will decompose to "ae", ã to "a~" (approximately - the diacritical will become a separate character) so the filtering leaves a reasonable approximation.
iconv:
http://groups.google.com/group/ruby-talk-google/browse_frm/thread/8064dcac15d688ce?
=============
a perl module which i can't understand:
http://www.ahinea.com/en/tech/accented-translate.html
============
brute force (there's a lot of htose critters!:
http://projects.jkraemer.net/acts_as_ferret/wiki#UTF-8support
http://snippets.dzone.com/posts/show/2384
I had problems getting the foo.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s solution to work. I'm not using Rails and there was some conflict with my activesupport/ruby versions that I couldn't get to the bottom of.
Using the ruby-unf gem seems to be a good substitute:
require 'unf'
foo.to_nfd.gsub(/[^\x00-\x7F]/n,'').downcase
As far as I can tell this does the same thing as .mb_chars.normalize(:kd). Is this correct? Thanks!
If you are using PostgreSQL => 9.4 as your DB adapter, maybe you could add in a migration it's "unaccent" extension that I think does what you want, like this:
def self.up
enable_extension "unaccent" # No falla si ya existe
end
In order to test, in the console:
2.3.1 :045 > ActiveRecord::Base.connection.execute("SELECT unaccent('unaccent', 'àáâãäåÁÄ')").first
=> {"unaccent"=>"aaaaaaAA"}
Notice there is case sensitive up to now.
Then, maybe use it in a scope, like:
scope :with_canonical_name, -> (name) {
where("unaccent(foos.name) iLIKE unaccent('#{name}')")
}
The iLIKE operator makes the search case insensitive. There is another approach, using citext data type. Here is a discussion about this two approaches. Notice also that use of PosgreSQL's lower() function is not recommended.
This will save you some DB space, since you will no longer require the cannonical_name field, and perhaps make your model simpler, at the cost of some extra processing in each query, in an amount depending of whether you are using iLIKE or citext, and your dataset.
If you are using MySQL maybe you can use this simple solution, but I have not tested it.
lol.. i just tryed this.. and it is working.. iam still not pretty sure why.. but when i use this 4 lines of code:
str = str.gsub(/[^a-zA-Z0-9 ]/,"")
str = str.gsub(/[ ]+/," ")
str = str.gsub(/ /,"-")
str = str.downcase
it automaticly removes any accent from filenames.. which i was trying to remove(accent from filenames and renaming them than) hope it helped :)