Why isnt Nokogiri grabbing data as an array - ruby-on-rails

I am using Nokogiri to grab data from a webpage, I was under the impression that the following would grab the data and return is as an array? Instead I am getting one big string which is causing a few issues.
home_team = doc.css(".team-home.teams")
if i was to use
home_team = doc.css(".team-home.teams").text
i could understand the data being returned as as string. Am i looking at this the wrong way?
I have even tried
home_team = doc.css(".team-home.teams").map(&:text)
but that seems to be returning a string aswell? If i was getting an array returned in the console it would be in array format yes?
If someone could try this in their console
require 'open-uri'
require 'nokogiri'
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map(&:text)
#home_team = doc.css(".team-home.teams")
puts home_team
and just confirm that the output is a string in both cases and what the difference between the two are. slightly lost at the mo
Thanks

You are getting an array. It's just that puts is doing a to_s on. Check this out:
require 'open-uri'
require 'nokogiri'
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map(&:text)
# home_team = doc.css(".team-home.teams")
puts home_team.class
puts home_team.map(&:strip).inspect
#=> Array
#=> ["Everton", "Aston Villa", "Southampton", "Stoke", "Swansea", "Man Utd", "Sunderland", "Tottenham", "Chelsea", "Wigan", "Sunderland", "Arsenal", "Man City", "Swansea", "West Ham", "Wigan", "Everton", "Aston Villa", "Southampton", "Fulham", "Reading", "Chelsea", "Newcastle", "Norwich", "Stoke", "West Brom", "Liverpool", "Tottenham", "QPR", "Man Utd", "Newcastle", "Arsenal", "Aston Villa", "Everton", "Reading", "Southampton", "Stoke", "Chelsea", "Arsenal", "Fulham", "Norwich", "QPR", "Sunderland", "Swansea", "West Brom", "West Ham", "Tottenham", "Liverpool", "Man Utd", "Man City", "Aston Villa", "Chelsea", "Everton", "Southampton", "Stoke", "Wigan", "Newcastle", "Reading", "Arsenal", "Fulham", "Liverpool", "Man Utd", "Norwich", "QPR", "Sunderland", "Swansea", "Tottenham", "West Brom", "West Ham", "Arsenal", "Aston Villa", "Everton", "Fulham", "Man Utd", "Norwich", "QPR", "Reading", "Stoke", "Sunderland", "Chelsea", "Liverpool", "Man City", "Newcastle", "Southampton", "Swansea", "Tottenham", "West Brom", "West Ham", "Wigan"]

There's a lot of white space in the data. I get an array when I do this:
home_team = doc.css(".team-home.teams").map {|team| team.text.strip}

Related

Rails: Sort Array based on an other array

I want to Sort Array in Ruby on Rails based on an other Array but still wants to keep first Array values in the result :-
all_countries = ["Afghanistan", "Aland Islands", "Albania", "Algeria", "American Samoa", "Andorra", "Angola", "Anguilla", "Antarctica", "Argentina", "Armenia", "Armenien", "Australia", ....,]
gcc = ["UAE", "Saudi Arabia", "Qatar", "Bahrain", "Kuwait", "Oman"]
Desired OUTPUT:
I want to sort all countries but want GCC countries (Without Sorting - any order) to appear first, remaining countries should appear in sorting (A-Z).
["UAE", "Saudi Arabia", "Qatar", "Bahrain", "Kuwait", "Oman", "Afghanistan", "Aland Islands", "Albania", "Algeria", "American Samoa" .....]
I can do it in following way but it doesn't seem to be very good code and break alphabetical sorting for non-gcc countries.
countries.sort_by{|x| gcc.index(x) || gcc.size }
Any better way to do it ?
gcc + (all_countries - gcc).sort

Need ISO 639-1 (2 letter) Languages code and List and translation in all languages

I need list of 2 letter language code and name , which is again translated in all languages .
Best I can give is a dictionary where keys are the codes and values are their associated language name in English:
{'ab': 'Abkhaz', 'aa': 'Afar', 'af': 'Afrikaans', 'ak': 'Akan', 'sq': 'Albanian', 'am': 'Amharic', 'ar': 'Arabic', 'an': 'Aragonese', 'hy': 'Armenian', 'as': 'Assamese', 'av': 'Avaric', 'ae': 'Avestan', 'ay': 'Aymara', 'az': 'Azerbaijani', 'bm': 'Bambara', 'ba': 'Bashkir', 'eu': 'Basque', 'be': 'Belarusian', 'bn': 'Bengali', 'bh': 'Bihari', 'bi': 'Bislama', 'bs': 'Bosnian', 'br': 'Breton', 'bg': 'Bulgarian', 'my': 'Burmese', 'ca': 'Catalan; Valencian', 'ch': 'Chamorro', 'ce': 'Chechen', 'ny': 'Chichewa; Chewa; Nyanja', 'zh': 'Chinese', 'cv': 'Chuvash', 'kw': 'Cornish', 'co': 'Corsican', 'cr': 'Cree', 'hr': 'Croatian', 'cs': 'Czech', 'da': 'Danish', 'dv': 'Divehi; Maldivian;', 'nl': 'Dutch', 'dz': 'Dzongkha', 'en': 'English', 'eo': 'Esperanto', 'et': 'Estonian', 'ee': 'Ewe', 'fo': 'Faroese', 'fj': 'Fijian', 'fi': 'Finnish', 'fr': 'French', 'ff': 'Fula', 'gl': 'Galician', 'ka': 'Georgian', 'de': 'German', 'el': 'Greek, Modern', 'gn': 'Guaraní', 'gu': 'Gujarati', 'ht': 'Haitian', 'ha': 'Hausa', 'he': 'Hebrew (modern)', 'hz': 'Herero', 'hi': 'Hindi', 'ho': 'Hiri Motu', 'hu': 'Hungarian', 'ia': 'Interlingua', 'id': 'Indonesian', 'ie': 'Interlingue', 'ga': 'Irish', 'ig': 'Igbo', 'ik': 'Inupiaq', 'io': 'Ido', 'is': 'Icelandic', 'it': 'Italian', 'iu': 'Inuktitut', 'ja': 'Japanese', 'jv': 'Javanese', 'kl': 'Kalaallisut', 'kn': 'Kannada', 'kr': 'Kanuri', 'ks': 'Kashmiri', 'kk': 'Kazakh', 'km': 'Khmer', 'ki': 'Kikuyu, Gikuyu', 'rw': 'Kinyarwanda', 'ky': 'Kirghiz, Kyrgyz', 'kv': 'Komi', 'kg': 'Kongo', 'ko': 'Korean', 'ku': 'Kurdish', 'kj': 'Kwanyama, Kuanyama', 'la': 'Latin', 'lb': 'Luxembourgish', 'lg': 'Luganda', 'li': 'Limburgish', 'ln': 'Lingala', 'lo': 'Lao', 'lt': 'Lithuanian', 'lu': 'Luba-Katanga', 'lv': 'Latvian', 'gv': 'Manx', 'mk': 'Macedonian', 'mg': 'Malagasy', 'ms': 'Malay', 'ml': 'Malayalam', 'mt': 'Maltese', 'mi': 'Māori', 'mr': 'Marathi (Marāṭhī)', 'mh': 'Marshallese', 'mn': 'Mongolian', 'na': 'Nauru', 'nv': 'Navajo, Navaho', 'nb': 'Norwegian Bokmål', 'nd': 'North Ndebele', 'ne': 'Nepali', 'ng': 'Ndonga', 'nn': 'Norwegian Nynorsk', 'no': 'Norwegian', 'ii': 'Nuosu', 'nr': 'South Ndebele', 'oc': 'Occitan', 'oj': 'Ojibwe, Ojibwa', 'cu': 'Old Church Slavonic', 'om': 'Oromo', 'or': 'Oriya', 'os': 'Ossetian, Ossetic', 'pa': 'Panjabi, Punjabi', 'pi': 'Pāli', 'fa': 'Persian', 'pl': 'Polish', 'ps': 'Pashto, Pushto', 'pt': 'Portuguese', 'qu': 'Quechua', 'rm': 'Romansh', 'rn': 'Kirundi', 'ro': 'Romanian, Moldavan', 'ru': 'Russian', 'sa': 'Sanskrit (Saṁskṛta)', 'sc': 'Sardinian', 'sd': 'Sindhi', 'se': 'Northern Sami', 'sm': 'Samoan', 'sg': 'Sango', 'sr': 'Serbian', 'gd': 'Scottish Gaelic', 'sn': 'Shona', 'si': 'Sinhala, Sinhalese', 'sk': 'Slovak', 'sl': 'Slovene', 'so': 'Somali', 'st': 'Southern Sotho', 'es': 'Spanish; Castilian', 'su': 'Sundanese', 'sw': 'Swahili', 'ss': 'Swati', 'sv': 'Swedish', 'ta': 'Tamil', 'te': 'Telugu', 'tg': 'Tajik', 'th': 'Thai', 'ti': 'Tigrinya', 'bo': 'Tibetan', 'tk': 'Turkmen', 'tl': 'Tagalog', 'tn': 'Tswana', 'to': 'Tonga', 'tr': 'Turkish', 'ts': 'Tsonga', 'tt': 'Tatar', 'tw': 'Twi', 'ty': 'Tahitian', 'ug': 'Uighur, Uyghur', 'uk': 'Ukrainian', 'ur': 'Urdu', 'uz': 'Uzbek', 've': 'Venda', 'vi': 'Vietnamese', 'vo': 'Volapük', 'wa': 'Walloon', 'cy': 'Welsh', 'wo': 'Wolof', 'fy': 'Western Frisian', 'xh': 'Xhosa', 'yi': 'Yiddish', 'yo': 'Yoruba', 'za': 'Zhuang, Chuang', 'zu': 'Zulu'}

Get & store digits from string

I have a string of values like this:
=> "[\"3\", \"4\", \"60\", \"71\", \"49\", \"62\", \"9\", \"14\", \"17\", \"63\"]"
I want to put each value in an array so I can use each do. So something like this:
#numbers =>["72", "58", "49", "62", "9", "13", "17", "63"]
This is the code I want to use once the string is a usable array:
#numbers.each do |n|
#answers << Answer.find(n)
end
I have tried using split() but the characters are not balanced on each side of the number. I also was trying to use a regex split(/\D/) but I think I am just getting worse ideas.
The controller:
#scores = []
#each_answer = []
#score.answer_ids.split('/').each do |a|
#each_answer << Answer.find(a).id
end
Where #score.answer_ids is:
=> "[\"3\", \"4\", \"60\", \"71\", \"49\", \"62\", \"9\", \"14\", \"17\", \"63\"]"
Looks like an array of JSON strings. You could probably use Ruby's built-in JSON library to parse it, then map the elements of the array to integers:
input = "[\"3\", \"4\", \"60\", \"71\", \"49\", \"62\", \"9\", \"14\", \"17\", \"63\"]"
require 'json'
ids = JSON.parse(input).map(&:to_i)
#answers += Answer.find(ids)
I'd use:
foo = "[\"3\", \"4\", \"60\", \"71\", \"49\", \"62\", \"9\", \"14\", \"17\", \"63\"]"
foo.scan(/\d+/) # => ["3", "4", "60", "71", "49", "62", "9", "14", "17", "63"]
If you want integers instead of strings:
foo.scan(/\d+/).map(&:to_i) # => [3, 4, 60, 71, 49, 62, 9, 14, 17, 63]
If the data originates inside your system, and isn't the result of user input from the wilds of the Internet, then you can do something simple like:
bar = eval(foo) # => ["3", "4", "60", "71", "49", "62", "9", "14", "17", "63"]
which will execute the contents of the string as if it was Ruby code. You do NOT want to do that if the input came from user input that you haven't scrubbed.
In your code n is a String, not an Integer. The #find method expects an Integer, so you need to convert the String to an Array of Integers before iterating over it. For example:
str = "[\"3\", \"4\", \"60\", \"71\", \"49\", \"62\", \"9\", \"14\", \"17\", \"63\"]"
str.scan(/\d+/).map(&:to_i).each do |n|
#answers << Answer.find(n)
end

Word Frequency count in a very inefficient way

This is my code for calculate word frequency
word_arr= ["I", "received", "this", "in", "email", "and", "found", "it", "a", "good", "read", "to", "share......", "Yes,", "Dr", "M.", "Bakri", "Musa", "seems", "to", "know", "what", "is", "happening", "in", "Malaysia.", "Some", "of", "you", "may", "know.", "He", "is", "a", "Malay", "extra horny", "horny nor", "nor their", "their babes", "babes are", "are extra", "extra SEXY..", "SEXY.. .", ". .", ". .It's", ".It's because", "because their", "their CONDOMS", "CONDOMS are", "are Made", "Made In", "In China........;)", "China........;) &&"]
arr_stop_kwd=["a","and"]
frequencies = Hash.new(0)
word_arr.each { |word|
if !arr_stop_kwd.include?(word.downcase) && !word.match('&&')
frequencies["#{word.downcase}"] += 1
end
}
when i have 100k data it will take 9.03 seconds,that,s to much time can i calculate any another way
Thx in advance
Take a look at Facets gem
You can do something like this using the frequency method
require 'facets'
frequencies = (word_arr-arr_stop_kwd).frequency
Note that stop word can be subtracted from the word_arr. Refer to Array Documentation.

Rails plugin for US states and cities

We need a Rails plugin for US states and cities. Please see if we can get that.
Maybe this would help: http://github.com/bcardarella/decoder
Interestingly enough, the National Weather Service produces such a data source:
http://www.weather.gov/geodata/catalog/national/html/cities.htm
CityState gem: https://github.com/loureirorg/city-state
CS.states(:us)
# => {:AK=>"Alaska", :AL=>"Alabama", :AR=>"Arkansas", :AZ=>"Arizona", :CA=>"California", :CO=>"Colorado", :CT=>"Connecticut", :DC=>"District of Columbia", :DE=>"Delaware", :FL=>"Florida", :GA=>"Georgia", :HI=>"Hawaii", :IA=>"Iowa", :ID=>"Idaho", :IL=>"Illinois", :IN=>"Indiana", :KS=>"Kansas", :KY=>"Kentucky", :LA=>"Louisiana", :MA=>"Massachusetts", :MD=>"Maryland", :ME=>"Maine", :MI=>"Michigan", :MN=>"Minnesota", :MO=>"Missouri", :MS=>"Mississippi", :MT=>"Montana", :NC=>"North Carolina", :ND=>"North Dakota", :NE=>"Nebraska", :NH=>"New Hampshire", :NJ=>"New Jersey", :NM=>"New Mexico", :NV=>"Nevada", :NY=>"New York", :OH=>"Ohio", :OK=>"Oklahoma", :OR=>"Oregon", :PA=>"Pennsylvania", :RI=>"Rhode Island", :SC=>"South Carolina", :SD=>"South Dakota", :TN=>"Tennessee", :TX=>"Texas", :UT=>"Utah", :VA=>"Virginia", :VT=>"Vermont", :WA=>"Washington", :WI=>"Wisconsin", :WV=>"West Virginia", :WY=>"Wyoming"}
CS.cities(:ak, :us)
# => ["Adak", "Akhiok", "Akiachak", "Akiak", "Akutan", "Alakanuk", "Ambler", "Anchor Point", "Anchorage", "Angoon", "Atqasuk", "Barrow", "Bell Island Hot Springs", "Bethel", "Big Lake", "Buckland", "Chefornak", "Chevak", "Chicken", "Chugiak", "Coffman Cove", "Cooper Landing", "Copper Center", "Cordova", "Craig", "Deltana", "Dillingham", "Douglas", "Dutch Harbor", "Eagle River", "Eielson Air Force Base", "Fairbanks", "Fairbanks North Star Borough", "Fort Greely", "Fort Richardson", "Galena", "Girdwood", "Goodnews Bay", "Haines", "Homer", "Hooper Bay", "Juneau", "Kake", "Kaktovik", "Kalskag", "Kenai", "Ketchikan", "Kiana", "King Cove", "King Salmon", "Kipnuk", "Klawock", "Kodiak", "Kongiganak", "Kotlik", "Koyuk", "Kwethluk", "Levelock", "Manokotak", "May Creek", "Mekoryuk", "Metlakatla", "Mountain Village", "Nabesna", "Naknek", "Nazan Village", "Nenana", "New Stuyahok", "Nikiski", "Ninilchik", "Noatak", "Nome", "Nondalton", "Noorvik", "North Pole", "Northway", "Old Kotzebue", "Palmer", "Pedro Bay", "Petersburg", "Pilot Station", "Point Hope", "Point Lay", "Prudhoe Bay", "Russian Mission", "Sand Point", "Scammon Bay", "Selawik", "Seward", "Shungnak", "Sitka", "Skaguay", "Soldotna", "Stebbins", "Sterling", "Sutton", "Talkeetna", "Teller", "Thorne Bay", "Togiak", "Tok", "Toksook Bay", "Tuntutuliak", "Two Rivers", "Unalakleet", "Unalaska", "Valdez", "Wainwright", "Wasilla"]
It works with all countries over the world. Also, it uses the MaxMind database so its continuously updated (with command CS.update)
I just took the data from the NWS and created a Rails plugin called geoinfo hosted on Github. At this point, it's still a quick hack, but contains all the NWS data in the lib/db folder if you don't want to use it as a plugin. Hope this helps.

Resources