Nokogiri XML to hash using attibute names - ruby-on-rails

I'm new to rails and I'm looking to parse an XML from Pubmed Eutil's API into a hash with the attributes I want. Here is what I have so far:
def pubmed_search
new
if params[:search_terms].present?
require 'nokogiri'
require 'open-uri'
#search_terms = params[:search_terms].split.join("+")
uid_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="+#search_terms
uid_doc = Nokogiri::HTML(open(uid_url))
#uid = uid_doc.xpath("//id").map {|uid| uid.text}.join(",")
detail_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id="+#uid
detail_doc = Nokogiri::HTML(open(detail_url))
#details = #detail_doc.xpath("//item[#name='Title']|//item[#name='FullJournalName']|//item[#name='Author']").map{|article|
article.text}
render :new
else
render :new
end
This gives me the values I want (Authors, Title, Journal Name) but it comes out in one giant array without the attribute names like so:
["Keshmiri-Neghab H", "Goliaei B", "Nikoofar A", "Gossypol enhances radiation induced autophagy in glioblastoma multiforme.", "General physiology and biophysics", "Alzahrani EO", "Asiri A", "El-Dessoky MM", "Kuang Y", "Quiescence as an explanation of Gompertzian tumor growth revisited.", "Mathematical biosciences", "Neofytou M", "Tanos V", "Constantinou I", "Kyriacou E", "Pattichis M", "Pattichis C", "Computer Aided Diagnosis in Hysteroscopic Imaging.", "IEEE journal of biomedical and health informatics", "Lou Q", "Ji L", "Zhong W", "Li S", "Yu S", "Li Z", "Meng X", "Synthesis and Cytotoxicity Evaluation of Naphthalimide Derived N-Mustards.", "Molecules (Basel, Switzerland)", "Sesang W", "Punyanitya S", "Pitchuanchom S", "Udomputtimekakul P", "Nuntasaen N", "Banjerdpongchai R", "Wudtiwai B", "Pompimon W", "Cytotoxic Aporphine Alkaloids from Leaves and Twigs of Pseuduvaria trimera (Craib).", "Molecules (Basel, Switzerland)", "Yang XK", "Xu MY", "Xu GS", "Zhang YL", "Xu ZX", "In Vitro and in Vivo Antitumor Activity of Scutebarbatine A on Human Lung Carcinoma A549 Cell Lines.", "Molecules (Basel, Switzerland)", "Yang CY", "Lu RH", "Lin CH", "Jen CH", "Tung CY", "Yang SH", "Lin JK", "Jiang JK", "Lin CH", "Single Nucleotide Polymorphisms Associated with Colorectal Cancer Susceptibility and Loss of Heterozygosity in a Taiwanese Population.", "PloS one", "Zhang H", "Gu L", "Liu T", "Chiang KY", "Zhou M", "Inhibition of MDM2 by Nilotinib Contributes to Cytotoxicity in Both Philadelphia-Positive and Negative Acute Lymphoblastic Leukemia.", "PloS one", "Oliveira A", "Pinho D", "Albino-Teixeira A", "Medeiros R", "Dinis-Oliveira RJ", "Carvalho F", "Morphine glucuronidation increases its analgesic effect in guinea-pigs.", "Life sciences", "Kabbout M", "Dakhlallah D", "Sharma S", "Bronisz A", "Srinivasan R", "Piper M", "Marsh CB", "Ostrowski MC", "MicroRNA 17-92 Cluster Mediates ETS1 and ETS2-Dependent RAS-Oncogenic Transformation.", "PloS one", "Kannen H", "Hazama H", "Kaneda Y", "Fujino T", "Awazu K", "Development of Laser Ionization Techniques for Evaluation of the Effect of Cancer Drugs Using Imaging Mass Spectrometry.", "International journal of molecular sciences", "Liang J", "Tong P", "Zhao W", "Li Y", "Zhang L", "Xia Y", "Yu Y", "The REST Gene Signature Predicts Drug Sensitivity in Neuroblastoma Cell Lines and Is Significantly Associated with Neuroblastoma Tumor Stage.", "International journal of molecular sciences", "Mathur A", "Ware C", "Davis L", "Gazdar A", "Pan BS", "Lutterbach B", "FGFR2 Is Amplified in the NCI-H716 Colorectal Cancer Cell Line and Is Required for Growth and Survival.", "PloS one", "van As JW", "van den Berg H", "van Dalen EC", "Different infusion durations for preventing platinum-induced hearing loss in children with cancer.", "The Cochrane database of systematic reviews", "Lynam-Lennon N", "Maher SG", "Maguire A", "Phelan J", "Muldoon C", "Reynolds JV", "O'Sullivan J", "Altered Mitochondrial Function and Energy Metabolism Is Associated with a Radioresistant Phenotype in Oesophageal Adenocarcinoma.", "PloS one", "Meriggi F", "Andreis F", "Premi V", "Liborio N", "Codignola C", "Mazzocchi M", "Rizzi A", "Prochilo T", "Rota L", "Di Biasi B", "Bertocchi P", "Abeni C", "Ogliosi C", "Aroldi F", "Zaniboni A", "Assessing cancer caregivers' needs for an early targeted psychosocial support project: The experience of the oncology department of the Poliambulanza Foundation.", "Palliative & supportive care", "Gwede CK", "Davis SN", "Wilson S", "Patel M", "Vadaparampil ST", "Meade CD", "Rivers BM", "Yu D", "Torres-Roca J", "Heysek R", "Spiess PE", "Pow-Sang J", "Jacobsen P", "Perceptions of Prostate Cancer Screening Controversy and Informed Decision Making: Implications for Development of a Targeted Decision Aid for Unaffected Male First-Degree Relatives.", "American journal of health promotion : AJHP", "Simerska P", "Suksamran T", "Ziora ZM", "Rivera FD", "Engwerda C", "Toth I", "Ovalbumin lipid core peptide vaccines and their CD4<sup>+</sup> and CD8<sup>+</sup> T cell responses.", "Vaccine", "Ogembo JG", "Manga S", "Nulah K", "Foglabenchi LH", "Perlman S", "Wamai RG", "Welty T", "Welty E", "Tih P", "Achieving high uptake of human papillomavirus vaccine in Cameroon: Lessons learned in overcoming challenges.", "Vaccine", "Chung CY", "Alden SL", "Funderburg NT", "Fu P", "Levine AD", "Progressive Proximal-to-Distal Reduction in Expression of the Tight Junction Complex in Colonic Epithelium of Virally-Suppressed HIV+ Individuals.", "PLoS pathogens"]
What I'm looking for instead would be:
#details = {{:Title => {"title1"}, :Authors => {"author1", "author2", "author3"}, :Journal => {"journal1"}},{:Title => {"title2"}, :Authors => {"author4", "author5", "author6"}, :Journal => {"journal2"}}
I've tried some .to_hash methods described in other answers, but they don't create a hash that deals with the XML attributes very well, as the name of the attributes I want are in the #name attribute for each "item". Here is some sample XML from pubmed:
<eSummaryResult><DocSum><Id>11850928</Id><Item Name="PubDate" Type="Date">1965 Aug</Item><Item Name="EPubDate" Type="Date"/><Item Name="Source" Type="String">Arch Dermatol</Item><Item Name="AuthorList" Type="List"><Item Name="Author" Type="String">LoPresti PJ</Item><Item Name="Author" Type="String">Hambrick GW Jr</Item></Item><Item Name="LastAuthor" Type="String">Hambrick GW Jr</Item><Item Name="Title" Type="String">Zirconium granuloma following treatment of rhus dermatitis.</Item><Item Name="Volume" Type="String">92</Item><Item Name="Issue" Type="String">2</Item><Item Name="Pages" Type="String">188-91</Item><Item Name="LangList" Type="List"><Item Name="Lang" Type="String">English</Item></Item><Item Name="NlmUniqueID" Type="String">0372433</Item><Item Name="ISSN" Type="String">0003-987X</Item><Item Name="ESSN" Type="String">1538-3652</Item><Item Name="PubTypeList" Type="List"><Item Name="PubType" Type="String">Journal Article</Item></Item><Item Name="RecordStatus" Type="String">PubMed - indexed for MEDLINE</Item><Item Name="PubStatus" Type="String">ppublish</Item><Item Name="ArticleIds" Type="List"><Item Name="pubmed" Type="String">11850928</Item><Item Name="eid" Type="String">11850928</Item><Item Name="rid" Type="String">11850928</Item></Item><Item Name="History" Type="List"><Item Name="pubmed" Type="Date">1965/08/01 00:00</Item><Item Name="medline" Type="Date">2002/03/09 10:01</Item><Item Name="entrez" Type="Date">1965/08/01 00:00</Item></Item><Item Name="References" Type="List"/><Item Name="HasAbstract" Type="Integer">1</Item><Item Name="PmcRefCount" Type="Integer">0</Item><Item Name="FullJournalName" Type="String">Archives of dermatology</Item><Item Name="ELocationID" Type="String"/><Item Name="SO" Type="String">1965 Aug;92(2):188-91</Item></DocSum><DocSum><Id>11482001</Id><Item Name="PubDate" Type="Date">2001 Jun</Item><Item Name="EPubDate" Type="Date"/><Item Name="Source" Type="String">Adverse Drug React Toxicol Rev</Item><Item Name="AuthorList" Type="List"><Item Name="Author" Type="String">Mantle D</Item><Item Name="Author" Type="String">Gok MA</Item><Item Name="Author" Type="String">Lennard TW</Item></Item><Item Name="LastAuthor" Type="String">Lennard TW</Item><Item Name="Title" Type="String">Adverse and beneficial effects of plant extracts on skin and skin disorders.</Item><Item Name="Volume" Type="String">20</Item><Item Name="Issue" Type="String">2</Item><Item Name="Pages" Type="String">89-103</Item><Item Name="LangList" Type="List"><Item Name="Lang" Type="String">English</Item></Item><Item Name="NlmUniqueID" Type="String">9109474</Item><Item Name="ISSN" Type="String">0964-198X</Item><Item Name="ESSN" Type="String"/><Item Name="PubTypeList" Type="List"><Item Name="PubType" Type="String">Journal Article</Item><Item Name="PubType" Type="String">Review</Item></Item><Item Name="RecordStatus" Type="String">PubMed - indexed for MEDLINE</Item><Item Name="PubStatus" Type="String">ppublish</Item><Item Name="ArticleIds" Type="List"><Item Name="pubmed" Type="String">11482001</Item><Item Name="eid" Type="String">11482001</Item><Item Name="rid" Type="String">11482001</Item></Item><Item Name="History" Type="List"><Item Name="pubmed" Type="Date">2001/08/03 10:00</Item><Item Name="medline" Type="Date">2002/01/23 10:01</Item><Item Name="entrez" Type="Date">2001/08/03 10:00</Item></Item><Item Name="References" Type="List"/><Item Name="HasAbstract" Type="Integer">1</Item><Item Name="PmcRefCount" Type="Integer">3</Item><Item Name="FullJournalName" Type="String">Adverse drug reactions and toxicological reviews</Item><Item Name="ELocationID" Type="String"/><Item Name="SO" Type="String">2001 Jun;20(2):89-103</Item></DocSum></eSummaryResult>
Thanks for any help, I've been dying trying to finding an answer.

There is no automatic way to do this, the structure of the xml does not match the structure of your required hash. You must pick out the desired nodes from the xml manually and construct the hash from their values. Using xpath is probably the easiest, the code might look something like this:
#details = []
detail_doc.xpath("/eSummaryResult/DocSum").each do |node|
detail = {}
detail[:title] = node.xpath("Item[#Name='Title']").text
detail[:journal] = node.xpath("Item[#Name='Journal']").text
detail[:authors] = node.xpath("Item[#Name='AuthorList']/Item[#Name='Author']").map{|n| n.text}
#details.push(detail)
end

Related

Extracting transaction data from variable length arrays

I have built a small program in ruby that collects table data from my own PDF bank statements. This does so by scanning each PDF statement for tables and then filters out for transactional line item patterns.
Everything is working great and I have managed to collect an array of line items as an array of string arrays. Getting an array of keyed objects would be better but a bit tricky with the format of the statements.
The issue is that the line items have different lengths, so it's kind of tricky to always know the location of the correct values to map.
For example:
["Transaction 1", "1.00"]
["Transaction 2", "Hello World", "3.00"]
["Transaction 3", "Hello World", "feeffe", "5.00"]
["Transaction 4", "Hello World", "feeffe", "5.00", "12.00"]
["Transaction 5", "Hello World # 10.00", "feeffe", "10.00", "12.00"]
The line items only range in between 2 and 5 array items normally.
Is there an efficient/accurate way to map the above to:
{ description: "Transaction 1", amt: "1.00"}
{ description: "Transaction 2 - Hello World", amt: "3.00"}
{ description: "Transaction 3 - Hello World - feeffe", amt: "5.00"}
{ description: "Transaction 4 - Hello World - feeffe", amt: "5.00"}
{ description: "Transaction 5 - Hello World # 10.00 - feeffe", amt: "10.00"}
-Or is the only way to write IF conditions that looks at the array length and makes a "best guess" effort?
If you are having,
row = ["Transaction 2", "Hello World", "3.00"]
You can follow by doing,
{ description: row[0..-2].join(' - '), amt: row[-1] }
You have to further manipulate how these rows get iterated so further logic will vary.
update:
For condition updated specified later, it is seen to have row can have length 5 where actual amount is second last value.
data = (row.length == 5) ? [row[0..-3], row[-2]] : [row[0..-2], row[-1]]
{ description: data[0].join(' - '), amt: data[1] }
Assume your transaction is on a variable tr, i.e.
tr=["Transaction 5", "Hello World", "feeffe", "10.00", "12.00"]
I would first separtate this into those strings which look like an amount, and those which don't:
amounts,texts= tr.partition {|el| /^\d+[.]\d{2}/ =~ el}
Here you can check that !amounts.empty?, to guard agains transaction without amount. Now your hash could be
{
transaction_name: texts.first,
transaction_text: "#{texts[1]}#{amounts.size > 1 ? %( # #{amounts.first}) : ''}#{texts.size > 2 ? %( - #{texts.last}) : ''}",
amt: amounts.last
}
Try this regex:
"\K[^",\]]+
Here is Demo
If the number of items always determines the index of the amount element, you can do something like:
input = [
["Transaction 1", "1.00"],
["Transaction 2", "Hello World", "3.00"],
["Transaction 3", "Hello World", "feeffe", "5.00"],
["Transaction 4", "Hello World", "feeffe", "5.00", "12.00"],
["Transaction 5", "Hello World # 10.00", "feeffe", "10.00", "12.00"]
]
ROW_LENGTH_TO_AMOUNT_INDEX = {
2 => 1,
3 => 2,
4 => 3,
5 => 3,
}
def map(transactions)
transactions.map do |row|
amount_index = ROW_LENGTH_TO_AMOUNT_INDEX[row.length]
{
description: row[0],
amt: row[amount_index]
}
end
end
p map(input)
[{:description=>"Transaction 1", :amt=>"1.00"}, {:description=>"Transaction 2", :amt=>"3.00"}, {:description=>"Transaction 3", :amt=>"5.00"}, {:description=>"Transaction 4", :amt=>"5.00"}, {:description=>"Transaction 5", :amt=>"10.00"}]
Or, perhaps something like this?
MAPPERS = {
2 => lambda { |row| { description: row[0], amt: row[1]} },
3 => lambda { |row| { description: row[0], amt: row[2]} },
4 => lambda { |row| { description: row[0], amt: row[3]} },
5 => lambda { |row| { description: row[0], amt: row[3]} }
}
def map(transactions)
transactions.map do |row|
MAPPERS[row.length].call(row)
end
end
arr = [["Transaction 1", "1.00"],
["Transaction 2", "Hello World", "3.00"],
["Transaction 3", "Hello World", "feeffe", "5.00"]]
arr.map {|*first, last| { description: first.join(' - '), amt: last } }
#=> [{:description=>"Transaction 1", :amt=>"1.00"},
# {:description=>"Transaction 2 - Hello World", :amt=>"3.00"},
# {:description=>"Transaction 3 - Hello World - feeffe", :amt=>"5.00"}]

How to parse values in multidimensional array and select one over the other based on condition?

I have an array that behaves like a multidimensional array through spaces, like:
"roles"=>["1 editor 0", "1 editor 1", "2 editor 0", "2 editor 1", "14 editor 0", "15 editor 0"], "commit"=>"Give Access", "id"=>"3"}
Each array value represents [category_id, user.title, checked_boolean], and comes from
form
<%= hidden_field_tag "roles[]", [c.id, "editor", 0] %>
<%= check_box_tag "roles[]", [c.id, "editor", 1 ], !!checked %>
which I process it using splits
params[:roles].each do |role|
cat_id = role[0].split(" ")[0]
title = role.split(" ")[1]
checked_boolean = role.split(" ")[2]
end
Given the array at the top, you can see that the "Category 1" & "Category 2" is checked, while "Cat 14" and "Cat 15" are not.
I would like to compare the values of the given array, and if both 1 & 0 exists for a given category_id, I would like to get rid of the value with "checked_boolean = 0". This way, if the boolean is a 1, I can check to see if the Role already exists, and if not, create it. And if it is 0, I can check to see if Role exists, and if it does, delete it.
How would I be able to do this? I thought of doing something like params[:roles].uniq but didn't know how to process the uniq only on the first split.
Or is there a better way of posting the "unchecks" in Rails? I've found solutions for processing the uncheck action for simple checkboxes that passes in either true/false, but my case is different because it needs to pass in true/false in addition to the User.Title
Let's params[:roles] is:
["1 editor 0", "1 editor 1", "2 editor 0", "2 editor 1", "14 editor 0", "15 editor 0"]
The example of the conversion and filtering is below:
roles = params[:roles].map {| role | role.split " " }
filtered = roles.select do| role |
next true if role[ 2 ].to_i == 1
count = roles.reduce( 0 ) {| count, r | r[ 0 ] == role[ 0 ] && count + 1 || count}
count == 1
end
# => [["1", "editor", "1"], ["2", "editor", "1"], ["14", "editor", "0"], ["15", "editor", "0"]]
filtered.map {| role | role.join( ' ' ) }
Since the select method returns a new filtered role array, so result array you can see above. But of course you can still use and source params[:roles], and intermediate (after map method worked) versions of role array.
Finally you can adduce the result array into the text form:
filtered.map {| role | role.join( ' ' ) }
=> ["1 editor 1", "2 editor 1", "14 editor 0", "15 editor 0"]
majioa's solution is certainly more terse and a better use of the language's features, but here is my take on it with a more language agnostic approach. I have only just started learning Ruby so I used this as an opportunity to learn, but it does solve your problem.
my_array = ["1 editor 0", "1 editor 0", "1 editor 1", "2 editor 0",
"2 editor 1", "14 editor 0", "15 editor 0"]
puts "My array before:"
puts my_array.inspect
# As we're nesting a loop inside another for each loop
# we can't delete from the same array without confusing the
# iterator of the outside loop. Instead we'll delete at the end.
role_to_del = Array.new
my_array.each do |role|
cat_id, checked_boolean = role.split(" ")[0], role.split(" ")[2]
if checked_boolean == "1"
# Search through the array and mark the roles for deletion if
# the category id's match and the found role's checked status
# doesn't equal 1.
my_array.each do |s_role|
s_cat_id = s_role.split(" ")[0]
if s_cat_id != cat_id
next
else
s_checked_boolean = s_role.split(" ")[2]
role_to_del.push s_role if s_checked_boolean != "1"
end
end
end
end
# Delete all redundant roles
role_to_del.each { |role| my_array.delete role }
puts "My array after:"
puts my_array.inspect
Output:
My array before:
["1 editor 0", "1 editor 0", "1 editor 1", "2 editor 0", "2 editor 1", "14 editor 0",
"15 editor 0"]
My array after:
["1 editor 1", "2 editor 1", "14 editor 0", "15 editor 0"]

Word Frequency count in a very inefficient way

This is my code for calculate word frequency
word_arr= ["I", "received", "this", "in", "email", "and", "found", "it", "a", "good", "read", "to", "share......", "Yes,", "Dr", "M.", "Bakri", "Musa", "seems", "to", "know", "what", "is", "happening", "in", "Malaysia.", "Some", "of", "you", "may", "know.", "He", "is", "a", "Malay", "extra horny", "horny nor", "nor their", "their babes", "babes are", "are extra", "extra SEXY..", "SEXY.. .", ". .", ". .It's", ".It's because", "because their", "their CONDOMS", "CONDOMS are", "are Made", "Made In", "In China........;)", "China........;) &&"]
arr_stop_kwd=["a","and"]
frequencies = Hash.new(0)
word_arr.each { |word|
if !arr_stop_kwd.include?(word.downcase) && !word.match('&&')
frequencies["#{word.downcase}"] += 1
end
}
when i have 100k data it will take 9.03 seconds,that,s to much time can i calculate any another way
Thx in advance
Take a look at Facets gem
You can do something like this using the frequency method
require 'facets'
frequencies = (word_arr-arr_stop_kwd).frequency
Note that stop word can be subtracted from the word_arr. Refer to Array Documentation.

How can I order a list of letters ("a","b","c",...,"z","aa","ab")? String#succ and <=> don't seem to play well together in this case

One of my objects ('item') has an ID ('letter_id') in the format of "a", "b", ..., "aa", "ab", etc. To generate it I am using ruby's String#succ in an instance method like this:
def set_letter_id
last = parent.items.all(:order => "letter_id ASC").last
if last.nil?
self.letter_id = 'a'
else
self.letter_id = last.letter_id.succ
end
end
Now this works great until the 28th letter. The 27th will properly generate "aa", but then the value of last will always return the item with the letter_id of 'z' because the ordering of the returned items doesn't follow the same rules as String#succ.
I found this out from a comment over here - but now I'm struggling to find a nice solution around this issue. The problem is basically this:
"aa".succ #=> "ab" - great, that's what I want.
"z"<=>"aa" #=> 1 - not so great, "z" should actually be less than "aa"
Obviously this isn't necessarily a bug, but it makes sorting and ordering a list of letter_ids in this format quite difficult. Has anyone encountered this and found a workaround, or any suggestions that I might try? Thanks!
There was a solution in answers at link you've posted - you have to write own <=> in way to sort_by{|i|[i.length,i]}
irb> %w{a b c z aa ab zz aaa}.shuffle.sort_by { |i| [i.length,i] }
=> ["a", "b", "c", "z", "aa", "ab", "zz", "aaa"]
You can override the <=> method for your Item model to compare first by ID length, then by alphanumeric.
Something like this:
class Item < ActiveRecord::Base
# stuff
def <=>(other)
len_comp = self.letter_id.length <=> other.letter_id.length
return len_comp if len_comp != 0
self.letter_id <=> other.letter_id
end
end
That way you first compare for shorter ID length (i.e., "z" before "aa"), then lexicographically.
This sort of issue is exactly why some people discourage the use of String#succ. It clashes with Range, Object#to_a, and others.
Anyway, you probably know this, but things like this might help...
>> t
=> ["x", "y", "z", "aa", "ab", "ac", "ad", "ae", "af", "ag"]
>> t.shuffle.sort_by { |e| "%3s" % [e] }
=> ["x", "y", "z", "aa", "ab", "ac", "ad", "ae", "af", "ag"]
You could even renormalize this way and dispense with sort_by.

Rails plugin for US states and cities

We need a Rails plugin for US states and cities. Please see if we can get that.
Maybe this would help: http://github.com/bcardarella/decoder
Interestingly enough, the National Weather Service produces such a data source:
http://www.weather.gov/geodata/catalog/national/html/cities.htm
CityState gem: https://github.com/loureirorg/city-state
CS.states(:us)
# => {:AK=>"Alaska", :AL=>"Alabama", :AR=>"Arkansas", :AZ=>"Arizona", :CA=>"California", :CO=>"Colorado", :CT=>"Connecticut", :DC=>"District of Columbia", :DE=>"Delaware", :FL=>"Florida", :GA=>"Georgia", :HI=>"Hawaii", :IA=>"Iowa", :ID=>"Idaho", :IL=>"Illinois", :IN=>"Indiana", :KS=>"Kansas", :KY=>"Kentucky", :LA=>"Louisiana", :MA=>"Massachusetts", :MD=>"Maryland", :ME=>"Maine", :MI=>"Michigan", :MN=>"Minnesota", :MO=>"Missouri", :MS=>"Mississippi", :MT=>"Montana", :NC=>"North Carolina", :ND=>"North Dakota", :NE=>"Nebraska", :NH=>"New Hampshire", :NJ=>"New Jersey", :NM=>"New Mexico", :NV=>"Nevada", :NY=>"New York", :OH=>"Ohio", :OK=>"Oklahoma", :OR=>"Oregon", :PA=>"Pennsylvania", :RI=>"Rhode Island", :SC=>"South Carolina", :SD=>"South Dakota", :TN=>"Tennessee", :TX=>"Texas", :UT=>"Utah", :VA=>"Virginia", :VT=>"Vermont", :WA=>"Washington", :WI=>"Wisconsin", :WV=>"West Virginia", :WY=>"Wyoming"}
CS.cities(:ak, :us)
# => ["Adak", "Akhiok", "Akiachak", "Akiak", "Akutan", "Alakanuk", "Ambler", "Anchor Point", "Anchorage", "Angoon", "Atqasuk", "Barrow", "Bell Island Hot Springs", "Bethel", "Big Lake", "Buckland", "Chefornak", "Chevak", "Chicken", "Chugiak", "Coffman Cove", "Cooper Landing", "Copper Center", "Cordova", "Craig", "Deltana", "Dillingham", "Douglas", "Dutch Harbor", "Eagle River", "Eielson Air Force Base", "Fairbanks", "Fairbanks North Star Borough", "Fort Greely", "Fort Richardson", "Galena", "Girdwood", "Goodnews Bay", "Haines", "Homer", "Hooper Bay", "Juneau", "Kake", "Kaktovik", "Kalskag", "Kenai", "Ketchikan", "Kiana", "King Cove", "King Salmon", "Kipnuk", "Klawock", "Kodiak", "Kongiganak", "Kotlik", "Koyuk", "Kwethluk", "Levelock", "Manokotak", "May Creek", "Mekoryuk", "Metlakatla", "Mountain Village", "Nabesna", "Naknek", "Nazan Village", "Nenana", "New Stuyahok", "Nikiski", "Ninilchik", "Noatak", "Nome", "Nondalton", "Noorvik", "North Pole", "Northway", "Old Kotzebue", "Palmer", "Pedro Bay", "Petersburg", "Pilot Station", "Point Hope", "Point Lay", "Prudhoe Bay", "Russian Mission", "Sand Point", "Scammon Bay", "Selawik", "Seward", "Shungnak", "Sitka", "Skaguay", "Soldotna", "Stebbins", "Sterling", "Sutton", "Talkeetna", "Teller", "Thorne Bay", "Togiak", "Tok", "Toksook Bay", "Tuntutuliak", "Two Rivers", "Unalakleet", "Unalaska", "Valdez", "Wainwright", "Wasilla"]
It works with all countries over the world. Also, it uses the MaxMind database so its continuously updated (with command CS.update)
I just took the data from the NWS and created a Rails plugin called geoinfo hosted on Github. At this point, it's still a quick hack, but contains all the NWS data in the lib/db folder if you don't want to use it as a plugin. Hope this helps.

Resources