Constrain axis limits in chordDiagram (circlize) when making gifs - axes

I hope somebody will be able to help me with this chordDiagram visualisation I am trying to create. I am well aware that maybe this visualization type was not suitable for this particular data, but somehow it was something I had in my head (or how I wanted to visualize this data) and what I wanted to create, and now I think it is too late to give it up :) too curious how one can fix it. It is my first real post here, though I am an active user of stackoverflow and I genuinely admire the audience here.
So I have this data on the change in the size of area in km2 over time (d0) and I am trying to create a GIF out of it using example here: https://guyabel.com/post/animated-directional-chord-diagrams/
The data "d0":
Time <- as.numeric(c(10,10,10,100,100,100,200,200,200,5,5,5,50,50,50,0,0,0))
Year <- as.character(c(2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200))
Area_km2 <- as.numeric(c(4.3075211,7.1672926,17.2780622,5.9099250,8.2909189,16.9748961,6.5400554,8.9036313,16.5627228,3.0765610,6.3929883,18.0708108,5.3520782,8.4503856,16.7938196,0.5565978,1.8415855,12.5089476))
(d0 <- as.data.frame(cbind(Time,Year,Area_km2)))
I also have the color codes stored in a separate dataframe (d1) following the above mentioned example.
The data "d1":
year <- as.numeric(c(2050,2100,2200))
order1 <- as.character(c(1,2,3))
col1 <- c("#40A4D8","#33BEB7","#0C5BCE")
(d1 <- as.data.frame(cbind(year,order1,col1)))
So the idea was to have self-linking flows within each sector increasing in size over time, which will look like just growing segments in a final animated GIF (or like growing pie segments), but I noticed that regardless how hard I try I can't seem to manage to constrain the axis of each segment to limits of that particular year in an every single frame. It seems that the axis is being added on and keeps on adding over time, which is not what I want.
Like for example in the first figure (figure0) or "starting frame" the size of the links matches well the dataframe:
figure0
So it is
orig_year
Area_km2
.frame
2050
0.557
0
2100
1.84
0
2200
12.5
0
But when one plots next figure (figure1), the axis seems to have taken the values from the starting frame and added on the current values (4, 7.4 and 19 respectively) instead of (3.08, 6.39 and 18.1) or what should have been the values according the data frame:
figure1
orig_year
Area_km2
.frame
2050
3.08
1
2100
6.39
1
2200
18.1
1
And it keep on doing so as one loops through the data and creates new plots for the next frames. I wonder whether it is possible to constrain the axis and create the visualization in a way that the links just gradually increase over time and the axis is, so to say, following the increase or does also increase gradually following the data???
Any help is highly appreciated!
Thanks.
My code:
Sort decreasing
(d0 <- arrange(d0,Time))
Copy columns
(d0$Dest_year <- d0$Year)
Re-arrange data
library(tweenr)
(d2 <- d0 %>%
mutate(corridor=paste(Year,Dest_year,sep="->")) %>%
dplyr::select(Time,corridor,Area_km2) %>%
mutate(ease="linear") %>%
tweenr::tween_elements('Time','corridor','ease',nframes=30) %>%
tibble::as_tibble())
(d2 <- d2 %>%
separate(col=.group,into=c("orig_year","dest_year"),sep="->") %>%
dplyr::select(orig_year,dest_year,Area_km2,everything()))
d2$Time <- NULL
Create a directory to store the individual plots
dir.create("./plot-gif/")
Fixing scales
scale_gap <- function(Area_km2_m,Area_km2_max,gap_at_max=1,gaps=NULL) {
p <- Area_km2_m/Area_km2_max
if(length(gap_at_max)==1 & !is.null(gaps)) {
gap_at_max <- rep(gap_at_max,gaps)
}
gap_degree <- (360-sum(gap_at_max))*(1-p)
gap_m <- (gap_degree + sum(gap_at_max))/gaps
return(gap_m)
}
Function to derive the size of gaps in each frame for an animated GIF
(d3 <- d2 %>% group_by(orig_year) %>% mutate(gaps=scale_gap(Area_km2_m=Area_km2,Area_km2_max=max(.$Area_km2),gap_at_max=4,gaps=9)))
library(magrittr)
Get the values for axis limits
(axmax <- d2 %>% group_by(orig_year,.frame) %>% mutate(max=mean(Area_km2)))
Creating unique chordDiagrams for each frame
library(circlize)
for(f in unique(d2$.frame)){
png(file=paste0("./plot-gif/figure",f,".png"),height=7,width=7,units="in",res=500)
circos.clear()
par(mar=rep(0,4),cex=1)
circos.par(start.degree=90,track.margin=c(-0.1,0.1),
gap.degree=filter(d3,.frame==f)$gaps,
points.overflow.warning=FALSE)
chordDiagram(x=filter(d2,.frame==f),directional=2,order=d1$year,
grid.col=d1$col1,annotationTrack=c("grid","name","axis"),
transparency=0.25,annotationTrackHeight=c(0.05,0.1),
direction.type=c("diffHeight"),
diffHeight=-0.04,link.sort=TRUE,
xmax=axmax$max)
dev.off()
}
Now make a GIF
library(magick)
img <- image_read(path="./plot-gif/figure0.png")
for(f in unique(d2$.frame)[-1]){
img0 <- image_read(path=paste0("./plot-gif/figure",f,".png"))
img <- c(img,img0)
message(f)
}
img1 <- image_scale(image=img,geometry="720x720")
ani0 <- image_animate(image=img1,fps=10)
image_write(image=ani0,path="./plot-gif/figure.gif")

I will start with your d0 object. I first construct the d0 object but I do not convert everything to characters, just put them as the original numeric format. Also I reorder d0 by Time and Year:
Time = c(10,10,10,100,100,100,200,200,200,5,5,5,50,50,50,0,0,0)
Year = c(2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200,2050,2100,2200)
Area_km2 = c(4.3075211,7.1672926,17.2780622,5.9099250,8.2909189,16.9748961,6.5400554,8.9036313,16.5627228,3.0765610,6.3929883,18.0708108,5.3520782,8.4503856,16.7938196,0.5565978,1.8415855,12.5089476)
d0 = data.frame(Time = Time,
Year = Year,
Area_km2 = Area_km2,
Dest_year = Year)
d0 = d0[order(d0$Time, d0$Year), ]
The key thing is to calculate proper values for "gaps" between sectors so that the same unit from data corresponds to the same degree in different plots.
We first calculate the maximal total width of the circular plot:
width = tapply(d0$Area_km2, d0$Time, sum)
max_width = max(width)
We assume there are n sectors (where n = 3 in d0). We let the first n-1 gaps to be 2 degrees and we dynamically adjust the last gap according to the total amount of values in each plot. For the plot with the largest total value, the last gap is also set to 2 degrees.
n = 3
degree_per_unit = (360 - n*2)/max_width
Now degree_per_unit can be shared between multiple plots. Every time we calculate the value for last_gap:
for(t in sort(unique(Time))) {
l = d0$Time == t
d0_current = d0[l, c("Year", "Dest_year", "Area_km2")]
last_gap = 360 - (n-1)*2 - sum(d0_current$Area_km2)*degree_per_unit
circos.par(gap.after = c(rep(2, n-1), last_gap))
chordDiagram(d0_current, grid.col = c("2050" = "red", "2100" = "blue", "2200" = "green"))
circos.clear()
title(paste0("Time = ", t, ", Sum = ", sum(d0_current$Area_km2)))
Sys.sleep(1)
}

Related

Create dummy variable for all neighborhood region depending on another dummy variable (sf object)

I have a spatial dataframe (sf) of all European NUTS2 regions. Within this sf object certain regions have a dummy = 1, and others a dummy = 0.
How can I create a new dummy (lets say "dummy_neighbor") where all the neighboring (st_touches?) regions of the regions with dummy==1 get a dummy_neighbor==1, and all those not touching a dummy==1 region get a dummy_neighbor==0?
For the time being I have this work-around. But I guess there musst be an easier solution?
# load packages
library(sf)
library(here)
library(tidyverse)
library(spdep)
library(expp)
# import nuts 2 sf object
nuts2_sf <- st_read(here("Data", "nuts2_data", "final_nuts.shp"))
# take row numbers as extra column for later
nuts2_sf$rownumber = 1:nrow(nuts2_sf)
#neighbouring list
neighbour <- poly2nb(nuts2_sf, row.names="NUTS_ID", queen=TRUE)
# transform nb into data frame
nb_df <- plyr::ldply(neighbour, rbind)
nb_df$rownumber = 1:nrow(nb_df) # get rownbumer as column
# merge neighbour-list-df with sf by rownumber
df <- merge(nuts2_sf, nb_df, by = "rownumber")
# extract all neighbours of tp100_d=1
# (tp100_d is the name of the original dummy variable)
df_dummy <- df %>% filter(tp100_d == 1)
df_dummy$geometry <- NULL
all_neighbours <- as.vector(as.matrix(df_dummy[,c(66:76)])) %>% unique(.)
# create new neighbourhood-dummy for all neighbours (but not if its a neighbour that has a 1 in its original dummy)
nuts2_sf <- nuts2_sf %>% mutate(nb_dummy = ifelse(rownumber %in% all_neighbours & tp100_d == 0, 1, 0))

How to slice an image by table border

I have many png files like this:
I want to slice the image into 48 (=6x8) small image files for the 48 cells separated by the table borders. That is, I would like to have files img11.png, ..., img68.png, where img11.png contains the (1,1) "1.4x4x8" cell, img12.png the (1,2) "M/T" cell, img13.png the "550,000" cell, ..., img68.png the bottom right "641,500" cell.
I want to do it because I thought it would improve the performance of tesseract, which is not satisfactory because many of my image files have much poorer quality than shown above. Also, margins and sizes are diverse, and some images contain non-English characters and images.
Would there be software packages to detect the table borders and slice the image into m x n images? I am new in this area. I have read How to find table like structure in image but it's way beyond my ability. I am willing to learn, though.
Thanks for your help.
I'm using R. Bilal's suggestion (thanks) led me to the following.
Step 1: Convert the image to grayscale.
library(magick)
x <- image_read('https://i.stack.imgur.com/plBvs.png')
y <- image_convert(x, colorspace='Gray')
a <- as.integer(y[[1]])[,,1]
Step 2: Convert "dark" to 1 and "light" to 0.
w <- ifelse(a>190, 0, 1) # adjust 190
Step 3: Detect the horizontal and vertical lines.
ypos <- which(rowMeans(w) > .95) # adjust .95
xpos <- which(colMeans(w) > .95) # adjust .95
Step 4: Crop the original image (x).
xpos <- c(0,xpos, ncol(a))
ypos <- c(0,ypos, nrow(a))
outdir <- "cropped"
dir.create(outdir)
m <- 0
for (i in 1:(length(ypos)-1)) {
dy <- ypos[i+1]-ypos[i]
n <- 0
if (dy < 16) next # skip if too short
m <- m+1
for (j in 1:(length(xpos)-1)) {
dx <- xpos[j+1]-xpos[j]
if (dx < 16) next # skip if too narrow
n <- n+1
geom <- sprintf("%dx%d+%d+%d", dx, dy, xpos[j], ypos[i])
# cat(sprintf('%2d %2d: %s\n', m, n, geom))
cropped <- image_crop(x, geom)
outfile <- file.path(outdir, sprintf('%02d_%02d.png', m, n))
image_write(cropped, outfile, format="png")
}
}
The cropped (1,1) image is .

How to export all the information from 3d numpy array to a csv file

Kaggle Dataset and code link
I'm trying to solve the above Kaggle problem and I want to export preprocessed csv so that I can build a model on weka, but when I'm trying to save it in csv I'm losing a dimension, I want to retain all the information in that csv.
please help me with the relevant code or any resource.
Thanks
print (scaled_x)
|x |y |z |label
|1.485231 |-0.661030 |-1.194153 |0
|0.888257 |-1.370361 |-0.829636 |0
|0.691523 |-0.594794 |-0.936247 |0
Fs=20
frame_size = Fs*4 #80
hop_size = Fs*2 #40
def get_frames(df, frame_size, hop_size):
N_FEATURES = 3
frames = []
labels = []
for i in range(0,len(df )- frame_size, hop_size):
x = df['x'].values[i: i+frame_size]
y = df['y'].values[i: i+frame_size]
z = df['z'].values[i: i+frame_size]
label = stats.mode(df['label'][i: i+frame_size])[0][0]
frames.append([x,y,z])
labels.append(label)
frames = np.asarray(frames).reshape(-1, frame_size, N_FEATURES)
labels = np.asarray(labels)
return frames, labels
x,y = get_frames(scaled_x, frame_size, hop_size)
x.shape, y.shape
((78728, 80, 3), (78728,))
According to the link you posted, the data is times series accelerometer/gyro data sampled at 20 Hz, with a label for each sample. They want to aggregate the time series into frames (with the corresponding label being the most common label during a given frame).
So frame_size is the number of samples in a frame, and hop_size is the amount the sliding window moves forward each iteration. In other words, the frames overlap by 50% since hop_size = frame_size / 2.
Thus at the end you get a 3D array of 78728 frames of length 80, with 3 values (x, y, z) each.
EDIT: To answer your new question about how to export as CSV, you'll need to "flatten" the 3D frame array to a 2D array since that's what a CSV represents. There are multiple different ways to do this but I think the easiest may just be to concatenate the final two dimensions, so that each row is a frame, consisting of 240 values (80 samples of 3 co-ordinates each). Then concatenate the labels as the final column.
x_2d = np.reshape(x, (x.shape[0], -1))
full = np.concatenate([x, y], axis=1)
import pandas as pd
df = pd.DataFrame(full)
df.to_csv("frames.csv")
If you also want proper column names:
columns = []
for i in range(1, x.shape[1] + 1):
columns.extend([f"{i}_X", f"{i}_Y", f"{i}_Z"])
columns.append("label")
df = pd.DataFrame(full, columns=columns)

st_buffer multipoint with different distance

I have a sfc_multipoint object and want to use st_buffer but with different distances for every single point in the multipoint object.
Is that possible?
The multipoint object are coordinates.
table = data
Every coordinate point (in the table in "lon" and "lat") should have a buffer with a different size. This buffer size is containt in the table in row "dist".
The table is called data.
This is my code:
library(sf)
coords <- matrix(c(data$lon,data$lat), ncol = 2)
tt <- st_multipoint(coords)
sfc <- st_sfc(tt, crs = 4326)
dt <- st_sf(data.frame(geom = sfc))
web <- st_transform(dt, crs = 3857)
geom <- st_geometry(web)
buf <- st_buffer(geom, dist = data$dist)
But it uses just the first dist of (0.100).
This is the result. Just really small buffers.
small buffer
For visualization see this picture. It´s just an example to show that the buffer should get bigger. example result
I think that he problem here is in how you are "creating" the points dataset.
Replicating your code with dummy data, doing this:
library(sf)
data <- data.frame(lat = c(0,1,2,3), lon = c(0,1,2,3), dist = c(0.1,0.2,0.3, 0.4))
coords <- matrix(c(data$lon,data$lat), ncol = 2)
tt <- st_multipoint(coords)
does not give you multiple points, but a single MULTIPOINT feature:
tt
#> MULTIPOINT (0 0, 1 1, 2 2, 3 3)
Therefore, only a single buffer distance can be "passed" to it and you get:
plot(sf::st_buffer(tt, data$dist))
To solve the problem, you need probably to build the point dataset differently. For example, using:
tt <- st_as_sf(data, coords = c("lon", "lat"))
gives you:
tt
#> Simple feature collection with 4 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 0 ymin: 0 xmax: 3 ymax: 3
#> epsg (SRID): NA
#> proj4string: NA
#> dist geometry
#> 1 0.1 POINT (0 0)
#> 2 0.2 POINT (1 1)
#> 3 0.3 POINT (2 2)
#> 4 0.4 POINT (3 3)
You see that tt is now a simple feature collection made of 4 points, on which buffering with multiple distances will indeed work:
plot(sf::st_buffer(tt, data$dist))
HTH!

combine time series plot by using R

I wanna combine three graphics on one graph. The data from inside of R which is " nottem ". Can someone help me to write code to put a seasonal mean and harmonic (cosine model) and its time series plots together by using different colors? I already wrote model code just don't know how to combine them together to compare.
Code :library(TSA)
nottem
month.=season(nottem)
model=lm(nottem~month.-1)
summary(nottem)
har.=harmonic(nottem,1)
model1=lm(nottem~har.)
summary(model1)
plot(nottem,type="l",ylab="Average monthly temperature at Nottingham castle")
points(y=nottem,x=time(nottem), pch=as.vector(season(nottem)))
Just put your time series inside a matrix:
x = cbind(serie1 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2)),
serie2 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2)))
plot(x)
Or configure the plot region:
par(mfrow = c(2, 1)) # 2 rows, 1 column
serie1 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2))
serie2 = ts(cumsum(rnorm(100)), freq = 12, start = c(2013, 2))
require(zoo)
plot(serie1)
lines(rollapply(serie1, width = 10, FUN = mean), col = 'red')
plot(serie2)
lines(rollapply(serie2, width = 10, FUN = mean), col = 'blue')
hope it helps.
PS.: zoo package is not needed in this example, you could use the filter function.
You can extract the seasonal mean with:
s.mean = tapply(serie, cycle(serie), mean)
# January, assuming serie is monthly data
print(s.mean[1])
This graph is pretty hard to read, because your three sets of values are so similar. Still, if you want to simply want to graph all of these on the sample plot, you can do it pretty easily by using the coefficients generated by your models.
Step 1: Plot the raw data. This comes from your original code.
plot(nottem,type="l",ylab="Average monthly temperature at Nottingham castle")
Step 2: Set up x-values for the mean and cosine plots.
x <- seq(1920, (1940 - 1/12), by=1/12)
Step 3: Plot the seasonal means by repeating the coefficients from the first model.
lines(x=x, y=rep(model$coefficients, 20), col="blue")
Step 4: Calculate the y-values for the cosine function using the coefficients from the second model, and then plot.
y <- model1$coefficients[2] * cos(2 * pi * x) + model1$coefficients[1]
lines(x=x, y=y, col="red")
ggplot variant: If you decide to switch to the popular 'ggplot2' package for your plot, you would do it like so:
x <- seq(1920, (1940 - 1/12), by=1/12)
y.seas.mean <- rep(model$coefficients, 20)
y.har.cos <- model1$coefficients[2] * cos(2 * pi * x) + model1$coefficients[1]
plot_Data <- melt(data.frame(x=x, temp=nottem, seas.mean=y.seas.mean, har.cos=y.har.cos), id="x")
ggplot(plot_Data, aes(x=x, y=value, col=variable)) + geom_line()

Resources