Google sheets importHTML removes zero and treats commas as decimal - google-sheets

I'm trying to import a table where the commas are the 1000 separator,
example: 32,100 is 32100 but it is treating it as 32.1 instead.
This is a similar table (first one / top left):
https://en.wikipedia.org/wiki/Demographics_of_the_world
imgur for screenshots:
https://imgur.com/a/hJR9tox
I want it to say:
Year million
1500 458
1600 580
1700 682
1750 791
1800 978
1850 1262
1900 1650
1950 2521
1999 5978
2008 6707
2011 7000
2015 7350
2018 7600
2020 7750
But it comes out as:
Year million
1500 458
1600 580
1700 682
1750 791
1800 978
1850 1,262
1900 1,65
1950 2,521
1999 5,978
2008 6,707
2011 7
2015 7,35
2018 7,6
2020 7,75
This is the function I'm using:
=IMPORTHTML("https://en.wikipedia.org/wiki/Demographics_of_the_world"; "table"; 1)
I have also tried using this function:
=IMPORTXML("https://en.wikipedia.org/wiki/Demographics_of_the_world"; "//*[#id='mw-content-text']/div/table[1]/tbody")
But that shows as this witch is extremely hard to understand since it looks like this and still removes the zeros:
World Population[1][2] Yearmillion 1500458 1600580 1700682 1750791 1800978 18501,262 19001,65 19502,521 19995,978 20086,707 20117 20157,35 20187,6 20207,75
Other things i have tried is:
forsing it to always print out three decimals, that wont work since it adds more numbers to the end of all numbers.

The main & easiest possible solution that you have is to change your Spreadsheet's locale setting to one that uses the , as mile separator.
As an alternative, if changing this setting is really not a possibility, you could create a script that uses URLFetchApp to retrieve the page's contents and parses the values, taking into considerations the usage of , as mile separator.

Related

Averaging a Data Series in a Google Sheet to a single entry per period regardless of the number of samples in the larger period?

I have a small data set of ~200 samples taken over twenty years with two columns of data that sometimes have multiple entries for the period (i.e. age or date). When I go to plot it, even though the data is over 20 years the graph heavily reflects the number of samples in the period and not the period itself. For example during age 23 there may be 2 or 3 samples, 1 for age 24, 20 for age 25, and 10 for age 35.. the number of samples entirely on needs for additional data at the time.. so simply there is no consistency to the sample rate.
How do I get an Max or an Average / Max for a period (age) and ensure there is only one entry per period in the sheet (about one entry per year) without having to create a separate sheet full of separate queries and charting off of that?
What I have tried in Google Sheets (where my data is) is on the x-series chart choosing "aggregate" (which is on the age period) which helps flatten the graph a bit, but doesn't reduce the series.
A read only link to the the spreadsheet is HERE for reference.
Data Looking something like this:
3/27/2013 36.4247 2.5 29.3
4/10/2013 36.4630 1.8 42.8
4/15/2013 36.4767 2.2 33.9
5/2/2013 36.5233 2.2 33.9
5/21/2013 36.5753 1.91 39.9
5/29/2013 36.5973 1.94 39.2
7/29/2013 36.7644 1.98 38.3
10/25/2013 37.0055 1.7 45.6
2/28/2014 37.3507 1.85 50 41.3
6/1/2014 37.6055 1.98 38 38.1
12/1/2014 38.1068 37
6/1/2015 38.6055 2.18 34 33.9
12/11/2015 39.1342 3.03 23 23.1
12/14/2015 39.1425 3.18 22 21.9
12/15/2015 39.1452 3.44 20 20.0
12/17/2015 39.1507 3.61 19 18.9
12/21/2015 39.1616 3.62 19 18.8
12/23/2015 39.1671 3.32 21 20.8
12/25/2015 39.1726 3.08 23 22.7
12/28/2015 39.1808 3.12 22 22.4
12/29/2015 39.1836 2.97 24 23.7
12/30/2015 39.1863 3.57 19 19.1
12/31/2015 39.1890 3.37 20 20.5
1/1/2016 39.1918 3.37 20 20.5
1/3/2016 39.1973 2.65 27 27.0
1/4/2016 39.2000 2.76 26 25.8
try:
=QUERY(SORTN(SORT({YEAR($A$6:$A), B6:B}, 1, 0, 2, 0), 9^9, 2, 1, 1),
"where Col1 <> 1899")
demo spreadsheet
and build a chart from there

Savitzky - Golay filter for 2D Matrices

i am doing some research about implementing a Savitzky-Golay filter for images. As far as i have read, the main application for this filter is signal processing, e.g. for smoothing audio-files.
The idea is fitting a polynomial through a defined neighbourhood around point P(i) and setting this point P to his new value P_new(i) = polynomial(i).
The problem in 2D-space is - in my opinion - that there is not only one direction to do the fitting. You can use different "directions" to find a polynomial. Like for
[51 52 11 33 34]
[41 42 12 24 01]
[01 02 PP 03 04]
[21 23 13 43 44]
[31 32 14 53 54]
It could be:
[01 02 PP 03 04], (horizontal)
[11 12 PP 23 24], (vertical)
[51 42 PP 43 54], (diagonal)
[41 42 PP 43 44], (semi-diagonal?)
but also
[41 02 PP 03 44], (semi-diagonal as well)
(see my illustration)
So my question is: Does the Savitzky-Golay filter even make sense for 2D-space, and if yes, is there and any defined generalized form for this filter for higher dimensions and larger filter masks?
Thank you !
A first option is to use SG filtering in a separable way, i.e. filtering once on the horizontal rows, then a second time on the vertical rows.
A second option is to rewrite the equations with a bivariate polynomial (bicubic f.i.) and solve for the coefficients by least-squares.

Estimating mortality with acmeR package

There is a relatively new package that has come out called acmeR for producing estimates of mortality (for birds and bats), and it takes into consideration things like bleedthrough (was the carcass still there but undetected, and then found in a later search?), diminishing searcher efficiency, etc. This is extremely useful, except I can't seem to get it to work, despite it seeming to be pretty straightforward. The data structure should be like:
Date, in US format mm/dd/yyyy or ISO 8601 format yyyy-mm-dd
Time, in am/pm US format HH:MM:SS AM or 24-hr ISO format HH:MM:SS
ID, arbitrary distinct alpha strings unique to each carcas
Species, arbitrary distinct alpha strings (e.g. AOU, ABMP, IBP)
Event, “Place”, “Check”, or “Search” (only 1st letter counts)
Found, TRUE or FALSE (only 1st letter counts)
and look something like this:
“Date”,“Time”,“ID”,“Species”,“Event”,“Found”
“1/7/2011”,“08:00:00 PM”,“T091”,“UNBA”,“Place”,TRUE
“1/8/2011”,“12:00:00 PM”,“T091”,“UNBA”,“Check”,TRUE
“1/8/2011”,“16:00:00 PM”,“T091”,“UNBA”,“Search”,FALSE
My data look like this:
Date: Date, format: "2017-11-09" "2017-11-09" "2017-11-09" ...
Time: Factor w/ 644 levels "1:00 PM","1:01 PM",..: 467 491 518 89 164 176 232 39 53 247 ...
Species: Factor w/ 52 levels "AMCR","AMKE",..: 31 27 39 27 39 31 39 45 27 24 ...
ID: Factor w/ 199 levels "GHBT001","GHBT002",..: 1 3 2 3 2 1 2 7 3 5 ...
Event: Factor w/ 3 levels "Check","Place",..: 2 2 2 3 3 3 1 2 1 2 ...
Found: logi TRUE TRUE TRUE FALSE TRUE TRUE ...
I have played with the date, time, event, etc formats too, trying multiple formats because I have had some issues there. So here are some of the errors I have encountered, depending on what subset of data I use:
Error in optim(p0, f, rd = rd, method = "BFGS", hessian = TRUE) :non-finite value supplied by optim In addition: Warning message: In log(c(a0, b0, t0)) : NaNs produced
Error in read.data(fname, spec = spec, blind = blind) : Expecting date format YYYY-MM-DD (ISO) or MM/DD/YYYY (USA) USA
Error in solve.default(rv$hessian): system is computationally singular: reciprocal condition number = 1.57221e-20
Warning message: # In sqrt(diag(Sig)[2]) : NaNs produced
Error in solve.default(rv$hessian) : Lapack routine dgesv: system is exactly singular: U[2,2] = 0
The last error is most common (and note, my data are non-numeric, sooooo... I am not sure what math is being done behind the scenes to come up with these equations, then fail in the solve), but the others are persistent too. Sometimes, despite the formatting being exactly the same in one dataset, a subset of that data will return an error when the parent dataset does not (does not have anything to do with species being there/not being there in one or the other dataset, as far as I can tell).
I cannot find any bug reports or issues with the acmeR package out there - so maybe it runs perfectly and my data are the problem, but after three ecologists have vetted the data and pronounced it good, I am at a dead end.
Here is a subset of the data, so you can see what they look like:
Date Time Species ID Event Found
8 2017-11-09 1:39 PM VATH GHBT007 P T
11 2017-11-09 2:26 PM CORA GHBT004 P T
12 2017-11-09 2:30 PM EUST GHBT006 P T
14 2017-11-09 6:43 AM CORA GHBT004 S T
18 2017-11-09 8:30 AM EUST GHBT006 S T
19 2017-11-09 9:40 AM CORA GHBT004 C T
20 2017-11-09 10:38 AM EUST GHBT006 C T
22 2017-11-09 11:27 AM VATH GHBT007 S F
32 2017-11-09 10:19 AM EUST GHBT006 C F

Lookup data from two ranges and if text matches, then assign numeric value

I am trying to calculate points in a Formula 1 racing league. I'm having trouble with a bonus 15 points if a constructor qualifies 1st and finishes the race 1st. The issue is there could be two different drivers who do this. For example. As you can see, HAM qualified 1st and ROS finished 1st in the race. Because they both drive for Mercedes, 15 points need to be awarded to Mercedes. The data can't be moved around as it's imported using an API (not in the example) but a copy of the layout can be found here
Qualifying Race Driver Team
14 1 ROS mercedes
1 15 HAM mercedes
3 3 VET ferrari
8 4 RIC red_bull
6 5 MAS williams
19 6 GRO haas
10 7 HUL force_india
16 8 BOT williams
7 9 SAI toro_rosso
5 10 VES toro_rosso
13 11 PAL renault
Put this in I2 and copy down. See if that is how you want it:
=IF(AND(VLOOKUP(1, $A$2:$H$12, 8, FALSE)=VLOOKUP(1, $B$2:$H$12, 7, FALSE), VLOOKUP(1, $B$2:$H$12, 7, FALSE)=H2, MATCH(H2, H:H, 0)=ROW(H2)), 15, 0)

Highcharts Plotline feature

I was wondering whether it'd be possible to draw multiple y axis plotlines on specific dates using highcharts.
I have 3 lines with different colors, one from 16 Apr to 30 Apr and the second one from 30 Apr to 28 May, the third from 28 May to 25 Jun
something similar to this :
Thanks for your help.
H

Resources