Filling missing data after outer join

Filling missing data after outer join - f#

I have two time series which are at the same sampling rate. I would like to perform an outer join and then fill in any missing data (post outer join, there can be points in time where data exists in one series but not the other even though they are the same sampling rate) with the most recent previous value.
How can I perform this operating using Deedle?
Edit:
Based on this, I suppose you can re-sample before the join like so:
// Get the most recent value, sampled at 2 hour intervals
someSeries|> Series.sampleTimeInto
(TimeSpan(2, 0, 0)) Direction.Backward Series.lastValue
After doing this you can safely Join. Perhaps there is another way?

You should be able to perform the outer join on the original series (it is better to turn them into frames, because then you'll get nice multi-column frame) and then fill the missing values Frame.fillMissing.
// Note that s1[2] is undefined and s2[3] is undefined
let s1 = series [ 1=>1.0; 3=>3.0; 5=>5.0 ]
let s2 = series [ 1=>1.1; 2=>2.2; 5=>5.5 ]
// Build frames to make joining easier
let f1, f2 = frame [ "S1" => s1 ], frame [ "S2" => s2 ]
// Perform outer join and then fill the missing data
let f = f1.Join(f2, JoinKind.Outer)
let res = f |> Frame.fillMissing Direction.Forward
The final result and the intermediate frame with missing values look like this:
val it : Frame<int,string> =
S1 S2
1 -> 1 1.1
2 -> <missing> 2.2
3 -> 3 <missing>
5 -> 5 5.5
>
val it : Frame<int,string> =
S1 S2
1 -> 1 1.1
2 -> 1 2.2
3 -> 3 2.2
5 -> 5 5.5
Note that the result can still contain missing values - if the first value is missing, the fillMissing function has no previous value to propagate and so the series may start with some missing values.

Related

Constrain axis limits in chordDiagram (circlize) when making gifs

I hope somebody will be able to help me with this chordDiagram visualisation I am trying to create. I am well aware that maybe this visualization type was not suitable for this particular data, but somehow it was something I had in my head (or how I wanted to visualize this data) and what I wanted to create, and now I think it is too late to give it up :) too curious how one can fix it. It is my first real post here, though I am an active user of stackoverflow and I genuinely admire the audience here.
So I have this data on the change in the size of area in km2 over time (d0) and I am trying to create a GIF out of it using example here: https://guyabel.com/post/animated-directional-chord-diagrams/
The data "d0":
Time <- as.numeric(c(10,10,10,100,100,100,200,200,200,5,5,5,50,50,50,0,0,0))
Year <- as.character(c(2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200))
Area_km2 <- as.numeric(c(4.3075211,7.1672926,17.2780622,5.9099250,8.2909189,16.9748961,6.5400554,8.9036313,16.5627228,3.0765610,6.3929883,18.0708108,5.3520782,8.4503856,16.7938196,0.5565978,1.8415855,12.5089476))
(d0 <- as.data.frame(cbind(Time,Year,Area_km2)))
I also have the color codes stored in a separate dataframe (d1) following the above mentioned example.
The data "d1":
year <- as.numeric(c(2050,2100,2200))
order1 <- as.character(c(1,2,3))
col1 <- c("#40A4D8","#33BEB7","#0C5BCE")
(d1 <- as.data.frame(cbind(year,order1,col1)))
So the idea was to have self-linking flows within each sector increasing in size over time, which will look like just growing segments in a final animated GIF (or like growing pie segments), but I noticed that regardless how hard I try I can't seem to manage to constrain the axis of each segment to limits of that particular year in an every single frame. It seems that the axis is being added on and keeps on adding over time, which is not what I want.
Like for example in the first figure (figure0) or "starting frame" the size of the links matches well the dataframe:
figure0
So it is
orig_year
Area_km2
.frame
2050
0.557
0
2100
1.84
0
2200
12.5
0
But when one plots next figure (figure1), the axis seems to have taken the values from the starting frame and added on the current values (4, 7.4 and 19 respectively) instead of (3.08, 6.39 and 18.1) or what should have been the values according the data frame:
figure1
orig_year
Area_km2
.frame
2050
3.08
1
2100
6.39
1
2200
18.1
1
And it keep on doing so as one loops through the data and creates new plots for the next frames. I wonder whether it is possible to constrain the axis and create the visualization in a way that the links just gradually increase over time and the axis is, so to say, following the increase or does also increase gradually following the data???
Any help is highly appreciated!
Thanks.
My code:
Sort decreasing
(d0 <- arrange(d0,Time))
Copy columns
(d0$Dest_year <- d0$Year)
Re-arrange data
library(tweenr)
(d2 <- d0 %>%
mutate(corridor=paste(Year,Dest_year,sep="->")) %>%
dplyr::select(Time,corridor,Area_km2) %>%
mutate(ease="linear") %>%
tweenr::tween_elements('Time','corridor','ease',nframes=30) %>%
tibble::as_tibble())
(d2 <- d2 %>%
separate(col=.group,into=c("orig_year","dest_year"),sep="->") %>%
dplyr::select(orig_year,dest_year,Area_km2,everything()))
d2$Time <- NULL
Create a directory to store the individual plots
dir.create("./plot-gif/")
Fixing scales
scale_gap <- function(Area_km2_m,Area_km2_max,gap_at_max=1,gaps=NULL) {
p <- Area_km2_m/Area_km2_max
if(length(gap_at_max)==1 & !is.null(gaps)) {
gap_at_max <- rep(gap_at_max,gaps)
}
gap_degree <- (360-sum(gap_at_max))*(1-p)
gap_m <- (gap_degree + sum(gap_at_max))/gaps
return(gap_m)
}
Function to derive the size of gaps in each frame for an animated GIF
(d3 <- d2 %>% group_by(orig_year) %>% mutate(gaps=scale_gap(Area_km2_m=Area_km2,Area_km2_max=max(.$Area_km2),gap_at_max=4,gaps=9)))
library(magrittr)
Get the values for axis limits
(axmax <- d2 %>% group_by(orig_year,.frame) %>% mutate(max=mean(Area_km2)))
Creating unique chordDiagrams for each frame
library(circlize)
for(f in unique(d2$.frame)){
png(file=paste0("./plot-gif/figure",f,".png"),height=7,width=7,units="in",res=500)
circos.clear()
par(mar=rep(0,4),cex=1)
circos.par(start.degree=90,track.margin=c(-0.1,0.1),
gap.degree=filter(d3,.frame==f)$gaps,
points.overflow.warning=FALSE)
chordDiagram(x=filter(d2,.frame==f),directional=2,order=d1$year,
grid.col=d1$col1,annotationTrack=c("grid","name","axis"),
transparency=0.25,annotationTrackHeight=c(0.05,0.1),
direction.type=c("diffHeight"),
diffHeight=-0.04,link.sort=TRUE,
xmax=axmax$max)
dev.off()
}
Now make a GIF
library(magick)
img <- image_read(path="./plot-gif/figure0.png")
for(f in unique(d2$.frame)[-1]){
img0 <- image_read(path=paste0("./plot-gif/figure",f,".png"))
img <- c(img,img0)
message(f)
}
img1 <- image_scale(image=img,geometry="720x720")
ani0 <- image_animate(image=img1,fps=10)
image_write(image=ani0,path="./plot-gif/figure.gif")

I will start with your d0 object. I first construct the d0 object but I do not convert everything to characters, just put them as the original numeric format. Also I reorder d0 by Time and Year:
Time = c(10,10,10,100,100,100,200,200,200,5,5,5,50,50,50,0,0,0)
Year = c(2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200)
Area_km2 = c(4.3075211,7.1672926,17.2780622,5.9099250,8.2909189,16.9748961,6.5400554,8.9036313,16.5627228,3.0765610,6.3929883,18.0708108,5.3520782,8.4503856,16.7938196,0.5565978,1.8415855,12.5089476)
d0 = data.frame(Time = Time,
Year = Year,
Area_km2 = Area_km2,
Dest_year = Year)
d0 = d0[order(d0$Time, d0$Year), ]
The key thing is to calculate proper values for "gaps" between sectors so that the same unit from data corresponds to the same degree in different plots.
We first calculate the maximal total width of the circular plot:
width = tapply(d0$Area_km2, d0$Time, sum)
max_width = max(width)
We assume there are n sectors (where n = 3 in d0). We let the first n-1 gaps to be 2 degrees and we dynamically adjust the last gap according to the total amount of values in each plot. For the plot with the largest total value, the last gap is also set to 2 degrees.
n = 3
degree_per_unit = (360 - n*2)/max_width
Now degree_per_unit can be shared between multiple plots. Every time we calculate the value for last_gap:
for(t in sort(unique(Time))) {
l = d0$Time == t
d0_current = d0[l, c("Year", "Dest_year", "Area_km2")]
last_gap = 360 - (n-1)*2 - sum(d0_current$Area_km2)*degree_per_unit
circos.par(gap.after = c(rep(2, n-1), last_gap))
chordDiagram(d0_current, grid.col = c("2050" = "red", "2100" = "blue", "2200" = "green"))
circos.clear()
title(paste0("Time = ", t, ", Sum = ", sum(d0_current$Area_km2)))
Sys.sleep(1)
}

How to avoid connecting all the points of a function graph with Plotly

In a program that revolves around maths, I find myself using Plotly.NET (F#) to display user-defined functions. This works quite well, but there are cases where a function has discontinuities or even chunks defined over certain regions. For example, for the function f(x) defined by 0 if x <= 0 and 10 elsewhere, the expected graph (I used Wolfram Alpha here) is:
With Plotly and the code below,
let fn x = if x <= 0.0 then 0.0 else 10.0
let xs = [ -10.0 .. 0.1 .. 10.0 ]
let ys = Seq.map fn xs
Chart.Line(xs, ys, UseDefaults = false)
|> Chart.withTitle #"$f(x)$"
|> Chart.savePNG("example")
I get this graph:
As you can see, Plotly connects two points that shouldn't be connected (and I don't blame it, that's how the lib works). I wonder then how to avoid this kind of behaviour, which often happens with piecewise defined functions.
If possible, I would like a solution that is general enough to be applied to all functions / graphs, as my program does not encode functions in advance, the user enters them. The research I've done doesn't lead me anywhere, unfortunately, and the documentation doesn't show an example for what I want.
PS: also, you may have noticed, Plotly doesn't display the LaTex in the exported image, according to my research this is a known issue with Python, but if you know how to solve this with the .NET version of the lib, I'm also interested!

I don't think there's any way for Plotly to know that the function is discontinuous. Note that the vertical portion of your chart isn't truly vertical, because x jumps from 0.0 to 0.1.
However, you can still achieve the effect you're looking for by creating a separate chart for each piece of the function, and then combining them:
let color = Color.fromString "Blue"
let xsA = [ -10.0 .. 0.0 ]
let ysA = xsA |> Seq.map (fun _ -> 0.0)
let chartA = Chart.Line(xsA, ysA, LineColor = color)
let xsB = [ 0.0 .. 10.0 ]
let ysB = xsB |> Seq.map (fun _ -> 10.0)
let chartB = Chart.Line(xsB, ysB, LineColor = color)
[ chartA; chartB ]
|> Chart.combine
|> Chart.withLegend false
|> Chart.show
Note that there are actually two distinct points for x = 0 in the combined chart, so it's technically not a function. (Perhaps there's some way to show that the top piece is open, while the bottom piece is closed in Plotly, but I don't know how.) Result is:

st_buffer multipoint with different distance

I have a sfc_multipoint object and want to use st_buffer but with different distances for every single point in the multipoint object.
Is that possible?
The multipoint object are coordinates.
table = data
Every coordinate point (in the table in "lon" and "lat") should have a buffer with a different size. This buffer size is containt in the table in row "dist".
The table is called data.
This is my code:
library(sf)
coords <- matrix(c(data$lon,data$lat), ncol = 2)
tt <- st_multipoint(coords)
sfc <- st_sfc(tt, crs = 4326)
dt <- st_sf(data.frame(geom = sfc))
web <- st_transform(dt, crs = 3857)
geom <- st_geometry(web)
buf <- st_buffer(geom, dist = data$dist)
But it uses just the first dist of (0.100).
This is the result. Just really small buffers.
small buffer
For visualization see this picture. It´s just an example to show that the buffer should get bigger. example result

I think that he problem here is in how you are "creating" the points dataset.
Replicating your code with dummy data, doing this:
library(sf)
data <- data.frame(lat = c(0,1,2,3), lon = c(0,1,2,3), dist = c(0.1,0.2,0.3, 0.4))
coords <- matrix(c(data$lon,data$lat), ncol = 2)
tt <- st_multipoint(coords)
does not give you multiple points, but a single MULTIPOINT feature:
tt
#> MULTIPOINT (0 0, 1 1, 2 2, 3 3)
Therefore, only a single buffer distance can be "passed" to it and you get:
plot(sf::st_buffer(tt, data$dist))
To solve the problem, you need probably to build the point dataset differently. For example, using:
tt <- st_as_sf(data, coords = c("lon", "lat"))
gives you:
tt
#> Simple feature collection with 4 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 0 ymin: 0 xmax: 3 ymax: 3
#> epsg (SRID): NA
#> proj4string: NA
#> dist geometry
#> 1 0.1 POINT (0 0)
#> 2 0.2 POINT (1 1)
#> 3 0.3 POINT (2 2)
#> 4 0.4 POINT (3 3)
You see that tt is now a simple feature collection made of 4 points, on which buffering with multiple distances will indeed work:
plot(sf::st_buffer(tt, data$dist))
HTH!

Swift Range Operator with two unknown values

If I have two unknown values, lets say x and y, what is the best way loop through all of the values between between those values?
For example, given the values x = 0 and y = 5 I would like to do something with the values 0, 1, 2, 3, 4, and 5. The result could exclude 0 and 5 if this is simpler.
Using Swift's Range operator, I could do something like this:
for i in x...y {
// Do something with i
}
Except I do not know if x or y is the greater value.
The Swift documentation for Range Operators states:
The closed range operator (a...b) defines a range that runs from a to b, and includes the values a and b. The value of a must not be greater than b.
There are a number of solutions here. A pretty straight forward one is:
let diff = y - x
for i in 0...abs(diff) {
let value = min(x, y) + i
// Do something with value
}
Is there a better, or more elegant way to achieve this?

I guess the most explicit way of writing it would be:
for i in min(a, b)...max(a, b) {
// Do something with i
}
To exclude the first and last value, you can increment your lower limit and use the Swift ..< syntax:
let lowerLimit = min(a, b) + 1
let upperLimit = max(a, b)
for i in lowerLimit..<upperLimit {
// Do something with i
}

Cypher: analog of `sort -u` to merge 2 collections?

Suppose I have a node with a collection in a property, say
START x = node(17) SET x.c = [ 4, 6, 2, 3, 7, 9, 11 ];
and somewhere (i.e. from .csv file) I get another collection of values, say
c1 = [ 11, 4, 5, 8, 1, 9 ]
I'm treating my collections as just sets, order of elements does not matter. What I need is to merge x.c with c1 with come magic operation so that resulting x.c will contain only distinct elements from both. The following idea comes to mind (yet untested):
LOAD CSV FROM "file:///tmp/additives.csv" as row
START x=node(TOINT(row[0]))
MATCH c1 = [ elem IN SPLIT(row[1], ':') | TOINT(elem) ]
SET
x.c = [ newxc IN x.c + c1 WHERE (newx IN x.c AND newx IN c1) ];
This won't work, it will give an intersection but not a collection of distinct items.
More RTFM gives another idea: use REDUCE() ? but how?
How to extend Cypher with a new builtin function UNIQUE() which accept collection and return collection, cleaned form duplicates?
UPD. Seems that FILTER() function is something close but intersection again :(
x.c = FILTER( newxc IN x.c + c1 WHERE (newx IN x.c AND newx IN c1) )
WBR,
Andrii

How about something like this...
with [1,2,3] as a1
, [3,4,5] as a2
with a1 + a2 as all
unwind all as a
return collect(distinct a) as unique
Add two collections and return the collection of distinct elements.
dec 15, 2014 - here is an update to my answer...
I started with a node in the neo4j database...
//create a node in the DB with a collection of values on it
create (n:Node {name:"Node 01",values:[4,6,2,3,7,9,11]})
return n
I created a csv sample file with two columns...
Name,Coll
"Node 01","11,4,5,8,1,9"
I created a LOAD CSV statement...
LOAD CSV
WITH HEADERS FROM "file:///c:/Users/db/projects/coll-merge/load_csv_file.csv" as row
// find the matching node
MATCH (x:Node)
WHERE x.name = row.Name
// merge the collections
WITH x.values + split(row.Coll,',') AS combo, x
// process the individual values
UNWIND combo AS value
// use toInt as the values from the csv come in as string
// may be a better way around this but i am a little short on time
WITH toInt(value) AS value, x
// might as well sort 'em so they are all purdy
ORDER BY value
WITH collect(distinct value) AS values, x
SET x.values = values

You could use reduce like this:
with [1,2,3] as a, [3,4,5] as b
return reduce(r = [], x in a + b | case when x in r then r else r + [x] end)

Since Neo4j 3.0, with APOC Procedures you can easily solve this with apoc.coll.union(). In 3.1+ it's a function, and can be used like this:
...
WITH apoc.coll.union(list1, list2) as unionedList
...

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Filling missing data after outer join - f#

Related

Constrain axis limits in chordDiagram (circlize) when making gifs

How to avoid connecting all the points of a function graph with Plotly

st_buffer multipoint with different distance

Swift Range Operator with two unknown values

Cypher: analog of `sort -u` to merge 2 collections?

Categories

Resources