How to interpret xticks when plotting time series in Julia with Plots.jl and TimeSeries.jl - time-series

I have the following code:
using Plots
using TimeSeries
apple_ta = readtimearray("apple.csv")
p = plot(apple_ta[:Open])
Plots.xticks(p)
1-element Vector{Tuple{Vector{Float64}, Vector{String}}}:
([735599.0, 736330.0, 737060.0, 737791.0], ["2015-01-01", "2017-01-01", "2019-01-01", "2021-01-01"])
When I print the xticks, I get the above output but have no idea where values like 735599 are from. First, I thought they must be timestamps of the dates from the XAxis but they aren't. Where do they come from?
My final goal is to be able to set the xticks with xticks! function to whichever dates I want. Does anybody know how to?
PS. Here is the top 5 lines of apple.csv:
Date,Open,Close,Volume
2015-01-02,27.84749984741211,27.332500457763672,212818400
2015-01-05,27.072500228881836,26.5625,257142000
2015-01-06,26.635000228881836,26.565000534057617,263188400
2015-01-07,26.799999237060547,26.9375,160423600
2015-01-08,27.3075008392334,27.97249984741211,237458000

The numbers returned by xticks for each date are the result of calling Dates.value function:
using Dates
dt = Date(2015, 01, 01)
value = Dates.value(dt)
>>> println(value)
735599
So, if you want to set custom xticks when plotting dates, you can do the following:
dates = [Date(2015, 01, 01), Date(2016, 01, 01), Date(2017, 01, 01)]
ticks = Dates.value.(dates)
labels = string.(dates)
xticks!(ticks, labels, rotation=20)

To answer your question directly, Dates.value(::Date) returns the number of days since December 31st, year 0:
julia> Dates.value(Date(0, 12, 31))
0
julia> Dates.value(Date(1, 1, 1))
1
However it seems to me that something is off with your TimeArray - there's a plot recipe in TimeSeries for plotting data with correctly formatted xticks, here's an MWE using the data you provided:
julia> using DataFrames, TimeSeries, Plots, DelimitedFiles
julia> df = readdlm(IOBuffer("""Date,Open,Close,Volume
2015-01-02,27.84749984741211,27.332500457763672,212818400
2015-01-05,27.072500228881836,26.5625,257142000
2015-01-06,26.635000228881836,26.565000534057617,263188400
2015-01-07,26.799999237060547,26.9375,160423600
2015-01-08,27.3075008392334,27.97249984741211,237458000
"""), ',')
6×4 Matrix{Any}:
"Date" "Open" "Close" "Volume"
"2015-01-02" 27.8475 27.3325 212818400
"2015-01-05" 27.0725 26.5625 257142000
"2015-01-06" 26.635 26.565 263188400
"2015-01-07" 26.8 26.9375 160423600
"2015-01-08" 27.3075 27.9725 237458000
julia> ts = TimeArray(DataFrame(date = Date.(df[2:end, 1]), value = df[2:end, 3]); timestamp = :date)
5×1 TimeArray{Any, 1, Date, Vector{Any}} 2015-01-02 to 2015-01-08
│ │ value │
├────────────┼─────────┤
│ 2015-01-02 │ 27.3325 │
│ 2015-01-05 │ 26.5625 │
│ 2015-01-06 │ 26.565 │
│ 2015-01-07 │ 26.9375 │
│ 2015-01-08 │ 27.9725 │
julia> plot(ts)
This produces:
Slightly less than ideal cropping on the right hand side, but this can be fixed by either adding right_margin = 5Plots.mm or rotating the xticks via xrot = 30 (or some other value).
Are you maybe on some outdated version of TimeSeries or Plots? Here are the versions used to produce the above:
(jl_ypFE6F) pkg> st
Status `/tmp/jl_ypFE6F/Project.toml`
[a93c6f00] DataFrames v1.4.4
[91a5bcdd] Plots v1.38.0
[9e3dc215] TimeSeries v0.23.1

Related

Error when trying to calculate mean and SD of environmental dataset with loop from .nc data

I was trying to calculate mean and SD per month of a variable from an environmental dataset (.nc file of Sea surface temp/day during 2 years) and the loop I used gives me the following error
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'mean': recursive indexing failed at level 2
I have no idea where my error could be but if you are curious I was using the following .nc dataset just for SST for 2018-2019 from copernicus sstdata
Here is the script I used so far and the packages I'm using:
# Load required libraries (install the required libraries using the Packages tab, if necessary)
library(raster)
library(ncdf4)
#Opern the .nc file with the environmental data
ENV = nc_open("SST.nc")
ENV
#create an index of the month for every (daily) capture from 2018 to 2019 (in this dataset)
m_index = c()
for (y in 2018:2019) {
# if bisestile year (do not apply for this data but in case a larger year set is used)
if (y%%4==0) { m_index = c(m_index, rep(1:12 , times = c(31,29,31,30,31,30,31,31,30,31,30,31))) }
# if non-bisestile year
else { m_index = c(m_index, rep(1:12 , times = c(31,28,31,30,31,30,31,31,30,31,30,31))) }
}
length(m_index) # expected length (730)
table(m_index) # expected number of records assigned to each of the twelve months
# computing of monthly mean and standard deviation.
# We first create two empty raster stack...
SST_MM = stack() # this stack will contain the twelve average SST (one per month)
SST_MSD = stack() # this stack will contain the twelve SST st. dev. (one per month)
# We run the following loop (this can take a while)
for (m in 1:12) { # for every month
print(m) # print current month to track the progress of the loop...
sstMean = mean(ENV[[which(m_index==m)]], na.rm=T) # calculate the mean SST for all the records of the current month
sstSd = calc(ENV[[which(m_index==m)]], sd, na.rm=T) # calculate the st. dev. of SST for all the records of the current month
# add the monthly records to the stacks
SST_MM = stack(SST_MM, sstMean)
SST_MSD = stack(SST_MSD, sstSd)
}
And as mentioned, the output of the loop including the error:
SST_MM = stack() # this stack will contain the twelve average SST (one per month)
> SST_MSD = stack() # this stack will contain the twelve SST st. dev. (one per month)
> for (m in 1:12) { # for every month
+
+ print(m) # print current month to track the progress of the loop...
+
+ sstMean = mean(ENV[[which(m_index==m)]], na.rm=T) # calculate the mean SST for all the records of the current month
+ sstSd = calc(ENV[[which(m_index==m)]], sd, na.rm=T) # calculate the st. dev. of SST for all the records of the current month
+
+ # add the monthly records to the stacks
+
+ SST_MM = stack(SST_MM, sstMean)
+ SST_MSD = stack(SST_MSD, sstSd)
+
+ }
[1] 1
**Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'mean': recursive indexing failed at level 2**
It seems that you make things too complicated. I think the easiest way to do this is with terra::tapp like this:
library(terra)
x <- rast("SST.nc")
xmn <- tapp(x, "yearmonths", mean)
xsd <- tapp(x, "yearmonths", sd)
or more manually:
library(terra)
x <- rast("SST.nc")
y <- format(time(x),"%Y")
m <- format(time(x),"%m")
ym <- paste0(y, "_", m)
r <- tapp(x, ym, mean)

How do I setup the timestep when using DifferentialEquations.jl in Julia for an irregular time series?

Playing with the harmonic oscillator, the differential equation is driven by a regular time series
w_i in the millisecond range.
ζ = 1/4pi # damped ratio
function oscillator!(du,u,p,t)
du[1] = u[2] # y'(t) = z(t)
du[2] = -2*ζ*p(t)*u[2] - p(t)^2*u[1] # z'(t) = -2ζw(t)z(t) -w(t)^2y(t)
end
y0 = 0.0 # initial position
z0 = 0.0002 # initial speed
u0 = [y0, z0] # initial state vector
tspan = (0.0,10) # time interval
dt = 0.001 # timestep
w = t -> freq[Int(floor(t/dt))+1] # time series
prob = ODEProblem(oscillator!,u0,tspan,w) # define ODEProblem
sol = solve(prob,DP5(),adaptive=false,dt=0.001)
How do I setup the timestep when the parameter w_i is an irregular time series in the millisecond range.
date │ w
────────────────────────┼───────
2022-09-26T00:00:00.023 │ 4.3354
2022-09-26T00:00:00.125 │ 2.34225
2022-09-26T00:00:00.383 │ -2.0312
2022-09-26T00:00:00.587 │ -0.280142
2022-09-26T00:00:00.590 │ 6.28319
2022-09-26T00:00:00.802 │ 9.82271
2022-09-26T00:00:00.906 │ -5.21289
....................... | ........
While it's possible to disable adaptivity, and even if it was possible to force arbitrary step sizes, this isn't in general what you want to do, as it limits the accuracy of the solution greatly.
Instead, interpolate the parameter to let it take any value of t.
Fortunately, it's really simple to do!
using Interpolations
...
ts = [0, 0.1, 0.4, 1.0]
ws = [1.0, 2.0, 3.0, 4.0]
w = linear_interpolation(ts, ws)
tspan = first(ts), last(ts)
prob = ODEProblem(oscillator!, u0, tspan, w)
sol = solve(prob, DP5(), dt=0.001)
Of course, it doesn't need to be a linear interpolation.
If you still need the solution saved at particular time points, have a look at saveat for solve. E.g. saving the solution using ts used in the interpolation:
sol = solve(prob, DP5(), dt=0.001, saveat=ts)
Edit: Follow up on comment:
Mathematically, you are always making some assumption about the w(t) over the entire domain tspan. There is no such as "driven by a time series".
For example, the standard Runge-Kutta method you have chosen here will require that the ODE function is at h/2. For the better DP5() it is evaluated at several more sub-steps. This is of course unavoidable, regardless of adaptivity is used or not.
Try adding println(t) into your ODE function and you will see this.
In case someone comes from matlab's ode45, not that it simply still uses adaptivity, and just treats explicit time steps the same as saveat does. And, of course, it will evaluate the function at various t outside of the explicit steps as well.
So even in your first example, you are interpolating your w. You are making a strange type of constant_interpolation (but with floor, which combined with floats will cause other issues, since floor(n*dt/dt) might evaluate to n or n-1.).
And even if you were to pick a method that only will try to evaluate at exactly the predetermined time steps, say e.g. ExplicitEuler(), you are still implicitly making the same assumption that w(t) is constant up until the next time step.
Only now, you are also getting a much worse solution from just the ODE integration.
If a constant-previous type interpolation really is how w(t) is defined over the entire domain (which is what you did with floor(t/dt)) here, then what we have is:
w = extrapolate(interpolate((ts,), ws, Gridded(Constant{Previous}())), Flat())
There is simply mathematically no way we get to ignore what happens across the time-step, and there is no reason to limit the time-stepping to the sample points of our "load" function. It's not natural is correct in any mathematical sense.
u'(t) has to be defined on the entire domain we integrate over.

Find all circular path that contains only unique path

I have the following code to initiate the database
create
(C1: Company{name:'Company A'}),
(C2: Company{name:'Company B'}),
(C3: Company{name:'Company C'}),
(C4: Company{name:'Company D'}),
(C1)-[:Sell{contract:"TA1801"}]->(C2),
(C2)-[:Sell{contract:"TA1802"}]->(C3),
(C3)-[:Sell{contract:"TA1803"}]->(C1),
(C3)-[:Sell{contract:"TA1804"}]->(C4),
(C1)-[:Sell{contract:"TA1805"}]->(C4),
(C4)-[:Sell{contract:"TA1806"}]->(C1)
Let's say I would like to find only the unique path for "Company A"
MATCH path = (start:Company{name:"Company A"})-[r:Sell*]->(end:Company{name:"Company A"})
RETURN path
It returns five path
╒══════════════════════════════════════════════════════════════════════╕
│"path" │
╞══════════════════════════════════════════════════════════════════════╡
│[{"name":"Company A"},{"contract":"TA1805"},{"name":"Company D"},{"nam│
│e":"Company D"},{"contract":"TA1806"},{"name":"Company A"}] │
├──────────────────────────────────────────────────────────────────────┤
│[{"name":"Company A"},{"contract":"TA1805"},{"name":"Company D"},{"nam│
│e":"Company D"},{"contract":"TA1806"},{"name":"Company A"},{"name":"Co│
│mpany A"},{"contract":"TA1801"},{"name":"Company B"},{"name":"Company │
│B"},{"contract":"TA1802"},{"name":"Company C"},{"name":"Company C"},{"│
│contract":"TA1803"},{"name":"Company A"}] │
├──────────────────────────────────────────────────────────────────────┤
│[{"name":"Company A"},{"contract":"TA1801"},{"name":"Company B"},{"nam│
│e":"Company B"},{"contract":"TA1802"},{"name":"Company C"},{"name":"Co│
│mpany C"},{"contract":"TA1803"},{"name":"Company A"}] │
├──────────────────────────────────────────────────────────────────────┤
│[{"name":"Company A"},{"contract":"TA1801"},{"name":"Company B"},{"nam│
│e":"Company B"},{"contract":"TA1802"},{"name":"Company C"},{"name":"Co│
│mpany C"},{"contract":"TA1803"},{"name":"Company A"},{"name":"Company │
│A"},{"contract":"TA1805"},{"name":"Company D"},{"name":"Company D"},{"│
│contract":"TA1806"},{"name":"Company A"}] │
├──────────────────────────────────────────────────────────────────────┤
│[{"name":"Company A"},{"contract":"TA1801"},{"name":"Company B"},{"nam│
│e":"Company B"},{"contract":"TA1802"},{"name":"Company C"},{"name":"Co│
│mpany C"},{"contract":"TA1804"},{"name":"Company D"},{"name":"Company │
│D"},{"contract":"TA1806"},{"name":"Company A"}] │
└──────────────────────────────────────────────────────────────────────┘
However you can see the relation sell with a contract of TA1806,TA1801,TA1802 is being repeated more than once.
A specific example will be TA1806 appeared in route 1,2,4 and 5. TA1801 appeared in route 2,3,4,5
What i hope that the path contain only unique relation with the shortest path (initially i wanted the longest, but it seems that the complexity increase alot )
╒══════════════════════════════════════════════════════════════════════╕
│"path" │
╞══════════════════════════════════════════════════════════════════════╡
│[{"name":"Company A"},{"contract":"TA1805"},{"name":"Company D"},{"nam│
│e":"Company D"},{"contract":"TA1806"},{"name":"Company A"}] │
├──────────────────────────────────────────────────────────────────────┤┤
│[{"name":"Company A"},{"contract":"TA1801"},{"name":"Company B"},{"nam│
│e":"Company B"},{"contract":"TA1802"},{"name":"Company C"},{"name":"Co│
│mpany C"},{"contract":"TA1803"},{"name":"Company A"}] │
├──────────────────────────────────────────────────────────────────────┤
For longest path you can just order the paths by length and take the longest, but APOC helps for the duplication check (excepting the start node, since you want a circuit):
MATCH path = (start:Company{name:"Company A"})-[r:Sell*]->(end:Company{name:"Company A"})
WHERE NOT apoc.coll.containsDuplicates(tail(nodes(path)))
WITH path
ORDER BY length(path) DESC
LIMIT 1
RETURN path
The pure-Cypher approach for not repeating nodes in the path is rather ugly:
MATCH path = (start:Company{name:"Company A"})-[r:Sell*]->(end:Company{name:"Company A"})
WHERE all(node in tail(nodes(path)) WHERE single(x in tail(nodes(path)) WHERE x = node))
WITH path
ORDER BY length(path) DESC
LIMIT 1
RETURN path

How to use foreach / forv to replace duplicates in an increasing order

I have a "raw" data set that I´m trying to clean. The data set consists of individuals with the variable age between year 2000 and 2010. There are around 20000 individuals in the data set with the same problem.
The variable age is not increasing in the years 2004-2006. For example, for one individual it looks like this:
2000: 16,
2001: 17,
2002: 18,
2003: 19,
2004: 19,
2005: 19,
2006: 19,
2007: 23,
2008: 24,
2009: 25,
2010: 26,
So far I have tried to generate variables for the max age and max year:
bysort id: egen last_year=max(year)
bysort id: egen last_age=max(age)
and then use foreach combined with lags to try to replace age variable in decreasing order so that when the new variable last_age (that now are 26 in all years) rather looks like this:
2010: 26
2009: 25 (26-1)
2008: 24 (26-2) , and so on.
However, I have some problem with finding the correct code for this problem.
Assuming that for each individual the first value of age is not missing and is correct, something like this might work
bysort id (year): replace age = age[1]+(year-year[1])
Alternatively, if the last value of age is assumed to always be accurate,
bysort id (year): replace age = age[_N]-(year[_N]-year)
Or, just fix the ages where there is no observation-to-observation change in age
bysort id (year): replace age = age[_n-1]+(year-year[_n-1]) if _n>1 & age==age[_n-1]
In the absence of sample data none of these have been tested.
William's code is very much to the point, but a few extra remarks won't fit easily into a comment.
Suppose we have age already and generate two other estimates going forward and backward as he suggests:
bysort id (year): gen age2 = age[1] + (year - year[1])
bysort id (year): gen age3 = age[_N] - (year[_N] - year)
Now if all three agree, we are good, and if two out of three agree, we will probably use the majority vote. Either way, that is the median; the median will be, for 3 values, the sum MINUS the minimum MINUS the maximum.
gen median = (age + age2 + age3) - max(age, age2, age3) - min(age, age2, age3)
If we get three different estimates, we should look more carefully.
edit age* if max(age, age2, age3) > median & median > min(age, age2, age3)
A final test is whether medians increase in the same way as years:
bysort id (year) : assert (median - median[_n-1]) == (year - year[_n-1]) if _n > 1

How to generate a line chart with too many data in ruby/ruby on rails?

I'm trying to generate a line chart with data I take from a database.
The data basically have a date field, an estimated progress field and a real progress field.
The progresses may be nil but the date is always there.
Since I don't know what are the intervals of the date and I need the intervals of the date distributed uniformly , I want to make the data from the first date until the last date with steps of 1 day.
For example, let's say I have this in the database:
| date | estimated progress | real progress |
| 2012-08-01 | 0.0 | |
| 2012-08-02 | | 0.15 |
| 2012-08-05 | 0.3 | |
I would like to generate a line chart with this info:
x = [2012-08-01, 2012-08-02, 2012-08-03, 2012-08-04, 2012-08-05]
ep = [0.0 , 0.0, 0.0, 0.0, 0.3]
rp = [nil , 0.15, 0.15, 0.15, 0.15 ]
But since the start date and the finish date can be way too separated, I'd like to show the x labels with a custom interval. It could be every 3, 5 or 7 days depending on the distance between those dates.
I'm trying this with gchartrb which use the google chart api but I realized I can't have nil values inside my data. So I should replace it with 0.0 even though it's not 0. It's unknown.
The other problem I found is that I don't know how to specify the labels to show those intervals I said before. It just show me every label and therefore, it's not readable.
I'm looking for another gem, a solution for gchartrb or ideas to generate the data differently and make it understandable.
Maybe you should check this link:
http://railscasts.com/episodes/223-charts.
Here is also good library for charts:
Flotr 2 and gem for it flotr2-rails.
For Flotr 2 it is worth to check example with time/dates labels on axis.

Resources