I'm trying to create a time series that will predict scores grouped by team.
TeamScores$Year <- as.Date(TeamScores$Year)
sample <-TeamScores[1:20,]
dput(sample)
structure(list(
Team = c("Abl Christian", "Air Force", "Akron", "Alab A&M", "Alabama", "Alabama St", "Albany", "Alcorn State", "American", "App State", "AR Lit Rock", "Arizona", "Arizona St", "Ark Pine Bl", "Arkansas", "Arkansas St", "Army", "Auburn", "Austin Peay", "Ball State"),
Score = c(71.7, 67.4, 68.4, 60.6, 71.8, 65.6, 66.8, 60.3, 72, 77.3, 73.6, 70.9, 77.8, 65.3, 75.5, 72.8, 70.2, 78.9, 80.1, 74.1),
Year = structure(
c(17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532),
class = "Date")),
row.names = c(NA, -20L),
class = c("tbl_df", "tbl", "data.frame"))
I created a time series successfully (I think), but I cannot get my fit to work.
time_ser<-ts(matrix(TeamScores$Team,nrow=3530),start=c(2009-01-01),frequency=1)
class(time_ser)
#[1] "ts"
fit<- auto.arima(time_ser)
#Error in stats::arima(x = x, order = order, seasonal = seasonal, include.mean = include.mean, :
'x' must be numeric
In addition: Warning message:
In is.constant(x) : NAs introduced by coercion
My x (Score) is numeric and I'm just lost. I assumed that I needed to do the auto.arima function to then perform the predict function.
With the datastructure of your format you can run ARIMA like this:
# making the structure
TeamScores <- structure(list(
Team = c("Abl Christian", "Air Force", "Akron", "Abl Christian", "Air Force", "Akron","Abl Christian", "Air Force", "Akron","Abl Christian", "Air Force", "Akron","Abl Christian", "Air Force", "Akron","Abl Christian", "Air Force", "Akron","Abl Christian", "Air Force"),
Score = c(71.7, 67.4, 68.4, 60.6, 71.8, 65.6, 66.8, 60.3, 72, 77.3, 73.6, 70.9, 77.8, 65.3, 75.5, 72.8, 70.2, 78.9, 80.1, 74.1),
Year = structure(
c(17532, 17533, 17534, 17535, 17536, 17537, 17538, 17539, 17540, 17541, 17542, 17543, 17544, 17545, 17546, 17547, 17548, 17549, 17550, 17551),
class = "Date")),
row.names = c(NA, -20L),
class = c("tbl_df", "tbl", "data.frame"))
# make a vector with team names:
teamnames <- c("Abl Christian", "Air Force", "Akron")
# run ARIMA for each team:
for (team in teamnames){
subdf <- subset(TeamScores, Team==team)
fit <- auto.arima(subdf$Score,xreg=subdf$Year)
print(fit)}
P.S. I couldn´t run arima with your sample code/data, because in the sample all of the dates are the same (2018-01-01) and each group is present only once, and you cannot really make a time series with one timepoint nor one datapoint per group. Thus I´ve made a similar structure for the testing. Also I skipped making the ts object and ran ARIMA directly on the dataframe.
Related
I want to Sort Array in Ruby on Rails based on an other Array but still wants to keep first Array values in the result :-
all_countries = ["Afghanistan", "Aland Islands", "Albania", "Algeria", "American Samoa", "Andorra", "Angola", "Anguilla", "Antarctica", "Argentina", "Armenia", "Armenien", "Australia", ....,]
gcc = ["UAE", "Saudi Arabia", "Qatar", "Bahrain", "Kuwait", "Oman"]
Desired OUTPUT:
I want to sort all countries but want GCC countries (Without Sorting - any order) to appear first, remaining countries should appear in sorting (A-Z).
["UAE", "Saudi Arabia", "Qatar", "Bahrain", "Kuwait", "Oman", "Afghanistan", "Aland Islands", "Albania", "Algeria", "American Samoa" .....]
I can do it in following way but it doesn't seem to be very good code and break alphabetical sorting for non-gcc countries.
countries.sort_by{|x| gcc.index(x) || gcc.size }
Any better way to do it ?
gcc + (all_countries - gcc).sort
I'm new to rails and I'm looking to parse an XML from Pubmed Eutil's API into a hash with the attributes I want. Here is what I have so far:
def pubmed_search
new
if params[:search_terms].present?
require 'nokogiri'
require 'open-uri'
#search_terms = params[:search_terms].split.join("+")
uid_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="+#search_terms
uid_doc = Nokogiri::HTML(open(uid_url))
#uid = uid_doc.xpath("//id").map {|uid| uid.text}.join(",")
detail_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id="+#uid
detail_doc = Nokogiri::HTML(open(detail_url))
#details = #detail_doc.xpath("//item[#name='Title']|//item[#name='FullJournalName']|//item[#name='Author']").map{|article|
article.text}
render :new
else
render :new
end
This gives me the values I want (Authors, Title, Journal Name) but it comes out in one giant array without the attribute names like so:
["Keshmiri-Neghab H", "Goliaei B", "Nikoofar A", "Gossypol enhances radiation induced autophagy in glioblastoma multiforme.", "General physiology and biophysics", "Alzahrani EO", "Asiri A", "El-Dessoky MM", "Kuang Y", "Quiescence as an explanation of Gompertzian tumor growth revisited.", "Mathematical biosciences", "Neofytou M", "Tanos V", "Constantinou I", "Kyriacou E", "Pattichis M", "Pattichis C", "Computer Aided Diagnosis in Hysteroscopic Imaging.", "IEEE journal of biomedical and health informatics", "Lou Q", "Ji L", "Zhong W", "Li S", "Yu S", "Li Z", "Meng X", "Synthesis and Cytotoxicity Evaluation of Naphthalimide Derived N-Mustards.", "Molecules (Basel, Switzerland)", "Sesang W", "Punyanitya S", "Pitchuanchom S", "Udomputtimekakul P", "Nuntasaen N", "Banjerdpongchai R", "Wudtiwai B", "Pompimon W", "Cytotoxic Aporphine Alkaloids from Leaves and Twigs of Pseuduvaria trimera (Craib).", "Molecules (Basel, Switzerland)", "Yang XK", "Xu MY", "Xu GS", "Zhang YL", "Xu ZX", "In Vitro and in Vivo Antitumor Activity of Scutebarbatine A on Human Lung Carcinoma A549 Cell Lines.", "Molecules (Basel, Switzerland)", "Yang CY", "Lu RH", "Lin CH", "Jen CH", "Tung CY", "Yang SH", "Lin JK", "Jiang JK", "Lin CH", "Single Nucleotide Polymorphisms Associated with Colorectal Cancer Susceptibility and Loss of Heterozygosity in a Taiwanese Population.", "PloS one", "Zhang H", "Gu L", "Liu T", "Chiang KY", "Zhou M", "Inhibition of MDM2 by Nilotinib Contributes to Cytotoxicity in Both Philadelphia-Positive and Negative Acute Lymphoblastic Leukemia.", "PloS one", "Oliveira A", "Pinho D", "Albino-Teixeira A", "Medeiros R", "Dinis-Oliveira RJ", "Carvalho F", "Morphine glucuronidation increases its analgesic effect in guinea-pigs.", "Life sciences", "Kabbout M", "Dakhlallah D", "Sharma S", "Bronisz A", "Srinivasan R", "Piper M", "Marsh CB", "Ostrowski MC", "MicroRNA 17-92 Cluster Mediates ETS1 and ETS2-Dependent RAS-Oncogenic Transformation.", "PloS one", "Kannen H", "Hazama H", "Kaneda Y", "Fujino T", "Awazu K", "Development of Laser Ionization Techniques for Evaluation of the Effect of Cancer Drugs Using Imaging Mass Spectrometry.", "International journal of molecular sciences", "Liang J", "Tong P", "Zhao W", "Li Y", "Zhang L", "Xia Y", "Yu Y", "The REST Gene Signature Predicts Drug Sensitivity in Neuroblastoma Cell Lines and Is Significantly Associated with Neuroblastoma Tumor Stage.", "International journal of molecular sciences", "Mathur A", "Ware C", "Davis L", "Gazdar A", "Pan BS", "Lutterbach B", "FGFR2 Is Amplified in the NCI-H716 Colorectal Cancer Cell Line and Is Required for Growth and Survival.", "PloS one", "van As JW", "van den Berg H", "van Dalen EC", "Different infusion durations for preventing platinum-induced hearing loss in children with cancer.", "The Cochrane database of systematic reviews", "Lynam-Lennon N", "Maher SG", "Maguire A", "Phelan J", "Muldoon C", "Reynolds JV", "O'Sullivan J", "Altered Mitochondrial Function and Energy Metabolism Is Associated with a Radioresistant Phenotype in Oesophageal Adenocarcinoma.", "PloS one", "Meriggi F", "Andreis F", "Premi V", "Liborio N", "Codignola C", "Mazzocchi M", "Rizzi A", "Prochilo T", "Rota L", "Di Biasi B", "Bertocchi P", "Abeni C", "Ogliosi C", "Aroldi F", "Zaniboni A", "Assessing cancer caregivers' needs for an early targeted psychosocial support project: The experience of the oncology department of the Poliambulanza Foundation.", "Palliative & supportive care", "Gwede CK", "Davis SN", "Wilson S", "Patel M", "Vadaparampil ST", "Meade CD", "Rivers BM", "Yu D", "Torres-Roca J", "Heysek R", "Spiess PE", "Pow-Sang J", "Jacobsen P", "Perceptions of Prostate Cancer Screening Controversy and Informed Decision Making: Implications for Development of a Targeted Decision Aid for Unaffected Male First-Degree Relatives.", "American journal of health promotion : AJHP", "Simerska P", "Suksamran T", "Ziora ZM", "Rivera FD", "Engwerda C", "Toth I", "Ovalbumin lipid core peptide vaccines and their CD4<sup>+</sup> and CD8<sup>+</sup> T cell responses.", "Vaccine", "Ogembo JG", "Manga S", "Nulah K", "Foglabenchi LH", "Perlman S", "Wamai RG", "Welty T", "Welty E", "Tih P", "Achieving high uptake of human papillomavirus vaccine in Cameroon: Lessons learned in overcoming challenges.", "Vaccine", "Chung CY", "Alden SL", "Funderburg NT", "Fu P", "Levine AD", "Progressive Proximal-to-Distal Reduction in Expression of the Tight Junction Complex in Colonic Epithelium of Virally-Suppressed HIV+ Individuals.", "PLoS pathogens"]
What I'm looking for instead would be:
#details = {{:Title => {"title1"}, :Authors => {"author1", "author2", "author3"}, :Journal => {"journal1"}},{:Title => {"title2"}, :Authors => {"author4", "author5", "author6"}, :Journal => {"journal2"}}
I've tried some .to_hash methods described in other answers, but they don't create a hash that deals with the XML attributes very well, as the name of the attributes I want are in the #name attribute for each "item". Here is some sample XML from pubmed:
<eSummaryResult><DocSum><Id>11850928</Id><Item Name="PubDate" Type="Date">1965 Aug</Item><Item Name="EPubDate" Type="Date"/><Item Name="Source" Type="String">Arch Dermatol</Item><Item Name="AuthorList" Type="List"><Item Name="Author" Type="String">LoPresti PJ</Item><Item Name="Author" Type="String">Hambrick GW Jr</Item></Item><Item Name="LastAuthor" Type="String">Hambrick GW Jr</Item><Item Name="Title" Type="String">Zirconium granuloma following treatment of rhus dermatitis.</Item><Item Name="Volume" Type="String">92</Item><Item Name="Issue" Type="String">2</Item><Item Name="Pages" Type="String">188-91</Item><Item Name="LangList" Type="List"><Item Name="Lang" Type="String">English</Item></Item><Item Name="NlmUniqueID" Type="String">0372433</Item><Item Name="ISSN" Type="String">0003-987X</Item><Item Name="ESSN" Type="String">1538-3652</Item><Item Name="PubTypeList" Type="List"><Item Name="PubType" Type="String">Journal Article</Item></Item><Item Name="RecordStatus" Type="String">PubMed - indexed for MEDLINE</Item><Item Name="PubStatus" Type="String">ppublish</Item><Item Name="ArticleIds" Type="List"><Item Name="pubmed" Type="String">11850928</Item><Item Name="eid" Type="String">11850928</Item><Item Name="rid" Type="String">11850928</Item></Item><Item Name="History" Type="List"><Item Name="pubmed" Type="Date">1965/08/01 00:00</Item><Item Name="medline" Type="Date">2002/03/09 10:01</Item><Item Name="entrez" Type="Date">1965/08/01 00:00</Item></Item><Item Name="References" Type="List"/><Item Name="HasAbstract" Type="Integer">1</Item><Item Name="PmcRefCount" Type="Integer">0</Item><Item Name="FullJournalName" Type="String">Archives of dermatology</Item><Item Name="ELocationID" Type="String"/><Item Name="SO" Type="String">1965 Aug;92(2):188-91</Item></DocSum><DocSum><Id>11482001</Id><Item Name="PubDate" Type="Date">2001 Jun</Item><Item Name="EPubDate" Type="Date"/><Item Name="Source" Type="String">Adverse Drug React Toxicol Rev</Item><Item Name="AuthorList" Type="List"><Item Name="Author" Type="String">Mantle D</Item><Item Name="Author" Type="String">Gok MA</Item><Item Name="Author" Type="String">Lennard TW</Item></Item><Item Name="LastAuthor" Type="String">Lennard TW</Item><Item Name="Title" Type="String">Adverse and beneficial effects of plant extracts on skin and skin disorders.</Item><Item Name="Volume" Type="String">20</Item><Item Name="Issue" Type="String">2</Item><Item Name="Pages" Type="String">89-103</Item><Item Name="LangList" Type="List"><Item Name="Lang" Type="String">English</Item></Item><Item Name="NlmUniqueID" Type="String">9109474</Item><Item Name="ISSN" Type="String">0964-198X</Item><Item Name="ESSN" Type="String"/><Item Name="PubTypeList" Type="List"><Item Name="PubType" Type="String">Journal Article</Item><Item Name="PubType" Type="String">Review</Item></Item><Item Name="RecordStatus" Type="String">PubMed - indexed for MEDLINE</Item><Item Name="PubStatus" Type="String">ppublish</Item><Item Name="ArticleIds" Type="List"><Item Name="pubmed" Type="String">11482001</Item><Item Name="eid" Type="String">11482001</Item><Item Name="rid" Type="String">11482001</Item></Item><Item Name="History" Type="List"><Item Name="pubmed" Type="Date">2001/08/03 10:00</Item><Item Name="medline" Type="Date">2002/01/23 10:01</Item><Item Name="entrez" Type="Date">2001/08/03 10:00</Item></Item><Item Name="References" Type="List"/><Item Name="HasAbstract" Type="Integer">1</Item><Item Name="PmcRefCount" Type="Integer">3</Item><Item Name="FullJournalName" Type="String">Adverse drug reactions and toxicological reviews</Item><Item Name="ELocationID" Type="String"/><Item Name="SO" Type="String">2001 Jun;20(2):89-103</Item></DocSum></eSummaryResult>
Thanks for any help, I've been dying trying to finding an answer.
There is no automatic way to do this, the structure of the xml does not match the structure of your required hash. You must pick out the desired nodes from the xml manually and construct the hash from their values. Using xpath is probably the easiest, the code might look something like this:
#details = []
detail_doc.xpath("/eSummaryResult/DocSum").each do |node|
detail = {}
detail[:title] = node.xpath("Item[#Name='Title']").text
detail[:journal] = node.xpath("Item[#Name='Journal']").text
detail[:authors] = node.xpath("Item[#Name='AuthorList']/Item[#Name='Author']").map{|n| n.text}
#details.push(detail)
end
I would like to see if I can check 2 arrays with similar values against each other and return the items that are different between the 2. However I have items that have similar names and I would want those to be excluded as well.
Example:
pantry = ["apples", "chedder cheese mild", "flour", "salt"]
recipe = ["bacon", "chedder cheese sharp", "flour", "chocolate"]
#=> desired return ["apples","bacon", "chocolate", "salt"]
What I get using pantry - recipe #=> ["apples", "bacon", "chocolate", "salt", "cheddar cheese mild"]
If we assume that "similar" means multi word strings where all but the last word are the same...
pantry = ["apples", "chedder cheese mild", "flour", "salt"]
recipe = ["bacon", "chedder cheese sharp", "flour", "chocolate"]
result = (pantry + recipe).group_by{|x| x.slice(0,(x.index(/[\s][^\s]+\z/) || x.size))}
result = result.values.select{|x|x.size == 1}.flatten.sort
=> ["apples", "bacon", "chocolate", "salt"]
I am using Nokogiri to grab data from a webpage, I was under the impression that the following would grab the data and return is as an array? Instead I am getting one big string which is causing a few issues.
home_team = doc.css(".team-home.teams")
if i was to use
home_team = doc.css(".team-home.teams").text
i could understand the data being returned as as string. Am i looking at this the wrong way?
I have even tried
home_team = doc.css(".team-home.teams").map(&:text)
but that seems to be returning a string aswell? If i was getting an array returned in the console it would be in array format yes?
If someone could try this in their console
require 'open-uri'
require 'nokogiri'
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map(&:text)
#home_team = doc.css(".team-home.teams")
puts home_team
and just confirm that the output is a string in both cases and what the difference between the two are. slightly lost at the mo
Thanks
You are getting an array. It's just that puts is doing a to_s on. Check this out:
require 'open-uri'
require 'nokogiri'
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map(&:text)
# home_team = doc.css(".team-home.teams")
puts home_team.class
puts home_team.map(&:strip).inspect
#=> Array
#=> ["Everton", "Aston Villa", "Southampton", "Stoke", "Swansea", "Man Utd", "Sunderland", "Tottenham", "Chelsea", "Wigan", "Sunderland", "Arsenal", "Man City", "Swansea", "West Ham", "Wigan", "Everton", "Aston Villa", "Southampton", "Fulham", "Reading", "Chelsea", "Newcastle", "Norwich", "Stoke", "West Brom", "Liverpool", "Tottenham", "QPR", "Man Utd", "Newcastle", "Arsenal", "Aston Villa", "Everton", "Reading", "Southampton", "Stoke", "Chelsea", "Arsenal", "Fulham", "Norwich", "QPR", "Sunderland", "Swansea", "West Brom", "West Ham", "Tottenham", "Liverpool", "Man Utd", "Man City", "Aston Villa", "Chelsea", "Everton", "Southampton", "Stoke", "Wigan", "Newcastle", "Reading", "Arsenal", "Fulham", "Liverpool", "Man Utd", "Norwich", "QPR", "Sunderland", "Swansea", "Tottenham", "West Brom", "West Ham", "Arsenal", "Aston Villa", "Everton", "Fulham", "Man Utd", "Norwich", "QPR", "Reading", "Stoke", "Sunderland", "Chelsea", "Liverpool", "Man City", "Newcastle", "Southampton", "Swansea", "Tottenham", "West Brom", "West Ham", "Wigan"]
There's a lot of white space in the data. I get an array when I do this:
home_team = doc.css(".team-home.teams").map {|team| team.text.strip}
We need a Rails plugin for US states and cities. Please see if we can get that.
Maybe this would help: http://github.com/bcardarella/decoder
Interestingly enough, the National Weather Service produces such a data source:
http://www.weather.gov/geodata/catalog/national/html/cities.htm
CityState gem: https://github.com/loureirorg/city-state
CS.states(:us)
# => {:AK=>"Alaska", :AL=>"Alabama", :AR=>"Arkansas", :AZ=>"Arizona", :CA=>"California", :CO=>"Colorado", :CT=>"Connecticut", :DC=>"District of Columbia", :DE=>"Delaware", :FL=>"Florida", :GA=>"Georgia", :HI=>"Hawaii", :IA=>"Iowa", :ID=>"Idaho", :IL=>"Illinois", :IN=>"Indiana", :KS=>"Kansas", :KY=>"Kentucky", :LA=>"Louisiana", :MA=>"Massachusetts", :MD=>"Maryland", :ME=>"Maine", :MI=>"Michigan", :MN=>"Minnesota", :MO=>"Missouri", :MS=>"Mississippi", :MT=>"Montana", :NC=>"North Carolina", :ND=>"North Dakota", :NE=>"Nebraska", :NH=>"New Hampshire", :NJ=>"New Jersey", :NM=>"New Mexico", :NV=>"Nevada", :NY=>"New York", :OH=>"Ohio", :OK=>"Oklahoma", :OR=>"Oregon", :PA=>"Pennsylvania", :RI=>"Rhode Island", :SC=>"South Carolina", :SD=>"South Dakota", :TN=>"Tennessee", :TX=>"Texas", :UT=>"Utah", :VA=>"Virginia", :VT=>"Vermont", :WA=>"Washington", :WI=>"Wisconsin", :WV=>"West Virginia", :WY=>"Wyoming"}
CS.cities(:ak, :us)
# => ["Adak", "Akhiok", "Akiachak", "Akiak", "Akutan", "Alakanuk", "Ambler", "Anchor Point", "Anchorage", "Angoon", "Atqasuk", "Barrow", "Bell Island Hot Springs", "Bethel", "Big Lake", "Buckland", "Chefornak", "Chevak", "Chicken", "Chugiak", "Coffman Cove", "Cooper Landing", "Copper Center", "Cordova", "Craig", "Deltana", "Dillingham", "Douglas", "Dutch Harbor", "Eagle River", "Eielson Air Force Base", "Fairbanks", "Fairbanks North Star Borough", "Fort Greely", "Fort Richardson", "Galena", "Girdwood", "Goodnews Bay", "Haines", "Homer", "Hooper Bay", "Juneau", "Kake", "Kaktovik", "Kalskag", "Kenai", "Ketchikan", "Kiana", "King Cove", "King Salmon", "Kipnuk", "Klawock", "Kodiak", "Kongiganak", "Kotlik", "Koyuk", "Kwethluk", "Levelock", "Manokotak", "May Creek", "Mekoryuk", "Metlakatla", "Mountain Village", "Nabesna", "Naknek", "Nazan Village", "Nenana", "New Stuyahok", "Nikiski", "Ninilchik", "Noatak", "Nome", "Nondalton", "Noorvik", "North Pole", "Northway", "Old Kotzebue", "Palmer", "Pedro Bay", "Petersburg", "Pilot Station", "Point Hope", "Point Lay", "Prudhoe Bay", "Russian Mission", "Sand Point", "Scammon Bay", "Selawik", "Seward", "Shungnak", "Sitka", "Skaguay", "Soldotna", "Stebbins", "Sterling", "Sutton", "Talkeetna", "Teller", "Thorne Bay", "Togiak", "Tok", "Toksook Bay", "Tuntutuliak", "Two Rivers", "Unalakleet", "Unalaska", "Valdez", "Wainwright", "Wasilla"]
It works with all countries over the world. Also, it uses the MaxMind database so its continuously updated (with command CS.update)
I just took the data from the NWS and created a Rails plugin called geoinfo hosted on Github. At this point, it's still a quick hack, but contains all the NWS data in the lib/db folder if you don't want to use it as a plugin. Hope this helps.