Word Frequency count in a very inefficient way - ruby-on-rails

This is my code for calculate word frequency
word_arr= ["I", "received", "this", "in", "email", "and", "found", "it", "a", "good", "read", "to", "share......", "Yes,", "Dr", "M.", "Bakri", "Musa", "seems", "to", "know", "what", "is", "happening", "in", "Malaysia.", "Some", "of", "you", "may", "know.", "He", "is", "a", "Malay", "extra horny", "horny nor", "nor their", "their babes", "babes are", "are extra", "extra SEXY..", "SEXY.. .", ". .", ". .It's", ".It's because", "because their", "their CONDOMS", "CONDOMS are", "are Made", "Made In", "In China........;)", "China........;) &&"]
arr_stop_kwd=["a","and"]
frequencies = Hash.new(0)
word_arr.each { |word|
if !arr_stop_kwd.include?(word.downcase) && !word.match('&&')
frequencies["#{word.downcase}"] += 1
end
}
when i have 100k data it will take 9.03 seconds,that,s to much time can i calculate any another way
Thx in advance

Take a look at Facets gem
You can do something like this using the frequency method
require 'facets'
frequencies = (word_arr-arr_stop_kwd).frequency
Note that stop word can be subtracted from the word_arr. Refer to Array Documentation.

Related

How to change values in an array in an efficient way

I am getting the following arrays from external api endpoint.
Input:-
1. [["date", "country_name", "month"], ["2019-02-21", "US", "Jan"]]
2. ["name", "homeAddress", "zipcode"]
Expected Output:-
1. [["Date", "Country Name", "Month"], ["2019-02-21", "US", "Jan"]]
2. ["Name", "Home Address", "Zipcode"]
How can I change the each array in an efficient way in Ruby on Rails?
Update:
Some of the name are different in expected as follows
Input:
["column1", "column2", "date"]
Expected output:
["column3", "column4", "Date"]
How can I get the above output?
Answer:-
Inputs:-
a=['1', '2', '3', '4']
b= {"1"=>"10", "2"=>"20", "3"=>"30"}
Execute:
c=a.map{|i| b[i].nil?? i : b[i] }
Output:-
["10", "20", "30", "4"]
You want to replace '_' with space or bring space when capital letter is encountered within string,
Try following rails methods to do so,
"now_isTheTime".titleize.camelize
=> "Now Is The Time"
ar1 = [["date", "country_name", "month"], ["2019-02-21", "US", "Jan"]]
ar2 = ["name", "homeAddress", "zipcode"]
def formatter(string)
return string if string.length < 3 || string.count("0-9").positive?
string.titleize.camelize
end
ar1.map{ |sub_arr| sub_arr.map(&method(:formatter)) }
ar2.map(&method(:formatter))

Nokogiri XML to hash using attibute names

I'm new to rails and I'm looking to parse an XML from Pubmed Eutil's API into a hash with the attributes I want. Here is what I have so far:
def pubmed_search
new
if params[:search_terms].present?
require 'nokogiri'
require 'open-uri'
#search_terms = params[:search_terms].split.join("+")
uid_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="+#search_terms
uid_doc = Nokogiri::HTML(open(uid_url))
#uid = uid_doc.xpath("//id").map {|uid| uid.text}.join(",")
detail_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id="+#uid
detail_doc = Nokogiri::HTML(open(detail_url))
#details = #detail_doc.xpath("//item[#name='Title']|//item[#name='FullJournalName']|//item[#name='Author']").map{|article|
article.text}
render :new
else
render :new
end
This gives me the values I want (Authors, Title, Journal Name) but it comes out in one giant array without the attribute names like so:
["Keshmiri-Neghab H", "Goliaei B", "Nikoofar A", "Gossypol enhances radiation induced autophagy in glioblastoma multiforme.", "General physiology and biophysics", "Alzahrani EO", "Asiri A", "El-Dessoky MM", "Kuang Y", "Quiescence as an explanation of Gompertzian tumor growth revisited.", "Mathematical biosciences", "Neofytou M", "Tanos V", "Constantinou I", "Kyriacou E", "Pattichis M", "Pattichis C", "Computer Aided Diagnosis in Hysteroscopic Imaging.", "IEEE journal of biomedical and health informatics", "Lou Q", "Ji L", "Zhong W", "Li S", "Yu S", "Li Z", "Meng X", "Synthesis and Cytotoxicity Evaluation of Naphthalimide Derived N-Mustards.", "Molecules (Basel, Switzerland)", "Sesang W", "Punyanitya S", "Pitchuanchom S", "Udomputtimekakul P", "Nuntasaen N", "Banjerdpongchai R", "Wudtiwai B", "Pompimon W", "Cytotoxic Aporphine Alkaloids from Leaves and Twigs of Pseuduvaria trimera (Craib).", "Molecules (Basel, Switzerland)", "Yang XK", "Xu MY", "Xu GS", "Zhang YL", "Xu ZX", "In Vitro and in Vivo Antitumor Activity of Scutebarbatine A on Human Lung Carcinoma A549 Cell Lines.", "Molecules (Basel, Switzerland)", "Yang CY", "Lu RH", "Lin CH", "Jen CH", "Tung CY", "Yang SH", "Lin JK", "Jiang JK", "Lin CH", "Single Nucleotide Polymorphisms Associated with Colorectal Cancer Susceptibility and Loss of Heterozygosity in a Taiwanese Population.", "PloS one", "Zhang H", "Gu L", "Liu T", "Chiang KY", "Zhou M", "Inhibition of MDM2 by Nilotinib Contributes to Cytotoxicity in Both Philadelphia-Positive and Negative Acute Lymphoblastic Leukemia.", "PloS one", "Oliveira A", "Pinho D", "Albino-Teixeira A", "Medeiros R", "Dinis-Oliveira RJ", "Carvalho F", "Morphine glucuronidation increases its analgesic effect in guinea-pigs.", "Life sciences", "Kabbout M", "Dakhlallah D", "Sharma S", "Bronisz A", "Srinivasan R", "Piper M", "Marsh CB", "Ostrowski MC", "MicroRNA 17-92 Cluster Mediates ETS1 and ETS2-Dependent RAS-Oncogenic Transformation.", "PloS one", "Kannen H", "Hazama H", "Kaneda Y", "Fujino T", "Awazu K", "Development of Laser Ionization Techniques for Evaluation of the Effect of Cancer Drugs Using Imaging Mass Spectrometry.", "International journal of molecular sciences", "Liang J", "Tong P", "Zhao W", "Li Y", "Zhang L", "Xia Y", "Yu Y", "The REST Gene Signature Predicts Drug Sensitivity in Neuroblastoma Cell Lines and Is Significantly Associated with Neuroblastoma Tumor Stage.", "International journal of molecular sciences", "Mathur A", "Ware C", "Davis L", "Gazdar A", "Pan BS", "Lutterbach B", "FGFR2 Is Amplified in the NCI-H716 Colorectal Cancer Cell Line and Is Required for Growth and Survival.", "PloS one", "van As JW", "van den Berg H", "van Dalen EC", "Different infusion durations for preventing platinum-induced hearing loss in children with cancer.", "The Cochrane database of systematic reviews", "Lynam-Lennon N", "Maher SG", "Maguire A", "Phelan J", "Muldoon C", "Reynolds JV", "O'Sullivan J", "Altered Mitochondrial Function and Energy Metabolism Is Associated with a Radioresistant Phenotype in Oesophageal Adenocarcinoma.", "PloS one", "Meriggi F", "Andreis F", "Premi V", "Liborio N", "Codignola C", "Mazzocchi M", "Rizzi A", "Prochilo T", "Rota L", "Di Biasi B", "Bertocchi P", "Abeni C", "Ogliosi C", "Aroldi F", "Zaniboni A", "Assessing cancer caregivers' needs for an early targeted psychosocial support project: The experience of the oncology department of the Poliambulanza Foundation.", "Palliative & supportive care", "Gwede CK", "Davis SN", "Wilson S", "Patel M", "Vadaparampil ST", "Meade CD", "Rivers BM", "Yu D", "Torres-Roca J", "Heysek R", "Spiess PE", "Pow-Sang J", "Jacobsen P", "Perceptions of Prostate Cancer Screening Controversy and Informed Decision Making: Implications for Development of a Targeted Decision Aid for Unaffected Male First-Degree Relatives.", "American journal of health promotion : AJHP", "Simerska P", "Suksamran T", "Ziora ZM", "Rivera FD", "Engwerda C", "Toth I", "Ovalbumin lipid core peptide vaccines and their CD4<sup>+</sup> and CD8<sup>+</sup> T cell responses.", "Vaccine", "Ogembo JG", "Manga S", "Nulah K", "Foglabenchi LH", "Perlman S", "Wamai RG", "Welty T", "Welty E", "Tih P", "Achieving high uptake of human papillomavirus vaccine in Cameroon: Lessons learned in overcoming challenges.", "Vaccine", "Chung CY", "Alden SL", "Funderburg NT", "Fu P", "Levine AD", "Progressive Proximal-to-Distal Reduction in Expression of the Tight Junction Complex in Colonic Epithelium of Virally-Suppressed HIV+ Individuals.", "PLoS pathogens"]
What I'm looking for instead would be:
#details = {{:Title => {"title1"}, :Authors => {"author1", "author2", "author3"}, :Journal => {"journal1"}},{:Title => {"title2"}, :Authors => {"author4", "author5", "author6"}, :Journal => {"journal2"}}
I've tried some .to_hash methods described in other answers, but they don't create a hash that deals with the XML attributes very well, as the name of the attributes I want are in the #name attribute for each "item". Here is some sample XML from pubmed:
<eSummaryResult><DocSum><Id>11850928</Id><Item Name="PubDate" Type="Date">1965 Aug</Item><Item Name="EPubDate" Type="Date"/><Item Name="Source" Type="String">Arch Dermatol</Item><Item Name="AuthorList" Type="List"><Item Name="Author" Type="String">LoPresti PJ</Item><Item Name="Author" Type="String">Hambrick GW Jr</Item></Item><Item Name="LastAuthor" Type="String">Hambrick GW Jr</Item><Item Name="Title" Type="String">Zirconium granuloma following treatment of rhus dermatitis.</Item><Item Name="Volume" Type="String">92</Item><Item Name="Issue" Type="String">2</Item><Item Name="Pages" Type="String">188-91</Item><Item Name="LangList" Type="List"><Item Name="Lang" Type="String">English</Item></Item><Item Name="NlmUniqueID" Type="String">0372433</Item><Item Name="ISSN" Type="String">0003-987X</Item><Item Name="ESSN" Type="String">1538-3652</Item><Item Name="PubTypeList" Type="List"><Item Name="PubType" Type="String">Journal Article</Item></Item><Item Name="RecordStatus" Type="String">PubMed - indexed for MEDLINE</Item><Item Name="PubStatus" Type="String">ppublish</Item><Item Name="ArticleIds" Type="List"><Item Name="pubmed" Type="String">11850928</Item><Item Name="eid" Type="String">11850928</Item><Item Name="rid" Type="String">11850928</Item></Item><Item Name="History" Type="List"><Item Name="pubmed" Type="Date">1965/08/01 00:00</Item><Item Name="medline" Type="Date">2002/03/09 10:01</Item><Item Name="entrez" Type="Date">1965/08/01 00:00</Item></Item><Item Name="References" Type="List"/><Item Name="HasAbstract" Type="Integer">1</Item><Item Name="PmcRefCount" Type="Integer">0</Item><Item Name="FullJournalName" Type="String">Archives of dermatology</Item><Item Name="ELocationID" Type="String"/><Item Name="SO" Type="String">1965 Aug;92(2):188-91</Item></DocSum><DocSum><Id>11482001</Id><Item Name="PubDate" Type="Date">2001 Jun</Item><Item Name="EPubDate" Type="Date"/><Item Name="Source" Type="String">Adverse Drug React Toxicol Rev</Item><Item Name="AuthorList" Type="List"><Item Name="Author" Type="String">Mantle D</Item><Item Name="Author" Type="String">Gok MA</Item><Item Name="Author" Type="String">Lennard TW</Item></Item><Item Name="LastAuthor" Type="String">Lennard TW</Item><Item Name="Title" Type="String">Adverse and beneficial effects of plant extracts on skin and skin disorders.</Item><Item Name="Volume" Type="String">20</Item><Item Name="Issue" Type="String">2</Item><Item Name="Pages" Type="String">89-103</Item><Item Name="LangList" Type="List"><Item Name="Lang" Type="String">English</Item></Item><Item Name="NlmUniqueID" Type="String">9109474</Item><Item Name="ISSN" Type="String">0964-198X</Item><Item Name="ESSN" Type="String"/><Item Name="PubTypeList" Type="List"><Item Name="PubType" Type="String">Journal Article</Item><Item Name="PubType" Type="String">Review</Item></Item><Item Name="RecordStatus" Type="String">PubMed - indexed for MEDLINE</Item><Item Name="PubStatus" Type="String">ppublish</Item><Item Name="ArticleIds" Type="List"><Item Name="pubmed" Type="String">11482001</Item><Item Name="eid" Type="String">11482001</Item><Item Name="rid" Type="String">11482001</Item></Item><Item Name="History" Type="List"><Item Name="pubmed" Type="Date">2001/08/03 10:00</Item><Item Name="medline" Type="Date">2002/01/23 10:01</Item><Item Name="entrez" Type="Date">2001/08/03 10:00</Item></Item><Item Name="References" Type="List"/><Item Name="HasAbstract" Type="Integer">1</Item><Item Name="PmcRefCount" Type="Integer">3</Item><Item Name="FullJournalName" Type="String">Adverse drug reactions and toxicological reviews</Item><Item Name="ELocationID" Type="String"/><Item Name="SO" Type="String">2001 Jun;20(2):89-103</Item></DocSum></eSummaryResult>
Thanks for any help, I've been dying trying to finding an answer.
There is no automatic way to do this, the structure of the xml does not match the structure of your required hash. You must pick out the desired nodes from the xml manually and construct the hash from their values. Using xpath is probably the easiest, the code might look something like this:
#details = []
detail_doc.xpath("/eSummaryResult/DocSum").each do |node|
detail = {}
detail[:title] = node.xpath("Item[#Name='Title']").text
detail[:journal] = node.xpath("Item[#Name='Journal']").text
detail[:authors] = node.xpath("Item[#Name='AuthorList']/Item[#Name='Author']").map{|n| n.text}
#details.push(detail)
end

Reject element from array in efficient way

My array is
arr = ["wow what", "what anice", "anice day.currently", "day.currently i", "i am", "am in", "in delhi", "delhi but", "but in", "in night", "night i", "i am", "am going", "going to", "to us"]
arr.each do |el|
if !el.match('in') && !el.match('is').blank?
fresh_arr << el
end
but i have 110k element array and it give 8sec that,s too much time can i do this any another way
Thx
Use delete_if
arr.delete_if do |e|
e.match('in') && e.match('is').blank?
end
arr
Try this
arr.reject { |i| i.match('in') || i.match('is').blank? }
You can select all elements you need by doing this
arr.select{|el| !el.match('in') && !el.match('is').blank?}

Ruby delete_if multiple values - only first value being deleted

words.delete_if do |x|
x == ("a"||"for"||"to"||"and")
end
words is an array with many words. My code is deleting "a" but not deleting "for", "to" or "and".
May this will help you
words.delete_if do |x|
%w(a for to and).include?(x)
end
Just do
words - ["a", "for", "to", "and"]
Example
words = %w(this is a just test data for array - method and nothing)
=> ["this", "is", "a", "just", "test", "data", "for", "array", "-", "method", "and", "nothing"]
words = words - ["a", "for", "to", "and"]
=> ["this", "is", "just", "test", "data", "array", "-", "method", "nothing"]
If you run "a" || "b" in irb then you will always get "a" because it is a non null value and it would be returned by || always..
In your case "a"||"for" will always evaluate for "a" irrespective of the other values in the array..
So this is my alternate solution to your question
w = %W{a for to end}
words.reject! { |x| w.include?(x) }

Rails plugin for US states and cities

We need a Rails plugin for US states and cities. Please see if we can get that.
Maybe this would help: http://github.com/bcardarella/decoder
Interestingly enough, the National Weather Service produces such a data source:
http://www.weather.gov/geodata/catalog/national/html/cities.htm
CityState gem: https://github.com/loureirorg/city-state
CS.states(:us)
# => {:AK=>"Alaska", :AL=>"Alabama", :AR=>"Arkansas", :AZ=>"Arizona", :CA=>"California", :CO=>"Colorado", :CT=>"Connecticut", :DC=>"District of Columbia", :DE=>"Delaware", :FL=>"Florida", :GA=>"Georgia", :HI=>"Hawaii", :IA=>"Iowa", :ID=>"Idaho", :IL=>"Illinois", :IN=>"Indiana", :KS=>"Kansas", :KY=>"Kentucky", :LA=>"Louisiana", :MA=>"Massachusetts", :MD=>"Maryland", :ME=>"Maine", :MI=>"Michigan", :MN=>"Minnesota", :MO=>"Missouri", :MS=>"Mississippi", :MT=>"Montana", :NC=>"North Carolina", :ND=>"North Dakota", :NE=>"Nebraska", :NH=>"New Hampshire", :NJ=>"New Jersey", :NM=>"New Mexico", :NV=>"Nevada", :NY=>"New York", :OH=>"Ohio", :OK=>"Oklahoma", :OR=>"Oregon", :PA=>"Pennsylvania", :RI=>"Rhode Island", :SC=>"South Carolina", :SD=>"South Dakota", :TN=>"Tennessee", :TX=>"Texas", :UT=>"Utah", :VA=>"Virginia", :VT=>"Vermont", :WA=>"Washington", :WI=>"Wisconsin", :WV=>"West Virginia", :WY=>"Wyoming"}
CS.cities(:ak, :us)
# => ["Adak", "Akhiok", "Akiachak", "Akiak", "Akutan", "Alakanuk", "Ambler", "Anchor Point", "Anchorage", "Angoon", "Atqasuk", "Barrow", "Bell Island Hot Springs", "Bethel", "Big Lake", "Buckland", "Chefornak", "Chevak", "Chicken", "Chugiak", "Coffman Cove", "Cooper Landing", "Copper Center", "Cordova", "Craig", "Deltana", "Dillingham", "Douglas", "Dutch Harbor", "Eagle River", "Eielson Air Force Base", "Fairbanks", "Fairbanks North Star Borough", "Fort Greely", "Fort Richardson", "Galena", "Girdwood", "Goodnews Bay", "Haines", "Homer", "Hooper Bay", "Juneau", "Kake", "Kaktovik", "Kalskag", "Kenai", "Ketchikan", "Kiana", "King Cove", "King Salmon", "Kipnuk", "Klawock", "Kodiak", "Kongiganak", "Kotlik", "Koyuk", "Kwethluk", "Levelock", "Manokotak", "May Creek", "Mekoryuk", "Metlakatla", "Mountain Village", "Nabesna", "Naknek", "Nazan Village", "Nenana", "New Stuyahok", "Nikiski", "Ninilchik", "Noatak", "Nome", "Nondalton", "Noorvik", "North Pole", "Northway", "Old Kotzebue", "Palmer", "Pedro Bay", "Petersburg", "Pilot Station", "Point Hope", "Point Lay", "Prudhoe Bay", "Russian Mission", "Sand Point", "Scammon Bay", "Selawik", "Seward", "Shungnak", "Sitka", "Skaguay", "Soldotna", "Stebbins", "Sterling", "Sutton", "Talkeetna", "Teller", "Thorne Bay", "Togiak", "Tok", "Toksook Bay", "Tuntutuliak", "Two Rivers", "Unalakleet", "Unalaska", "Valdez", "Wainwright", "Wasilla"]
It works with all countries over the world. Also, it uses the MaxMind database so its continuously updated (with command CS.update)
I just took the data from the NWS and created a Rails plugin called geoinfo hosted on Github. At this point, it's still a quick hack, but contains all the NWS data in the lib/db folder if you don't want to use it as a plugin. Hope this helps.

Resources