Arrange nplot() by y-axis values - rcharts

I am using a multiBarHorizontalChart with nplot() to show variance from a mean rate. I have "negative change" bars highlighted in red and positive rate change bars in green– via grouping by a "posneg" variable. When I group by this variable however, the years on the y axis are no longer ordered. Any idea how I could maintain the order of the years while still grouping by this variable? Personally, I think the color difference makes the graph a lot easier to interpret. Here's a reproducible example, using the data hosted on Socrata:
install.packages("RSocrata")
library(RSocrata)
url="https://opendata.socrata.com/dataset/Preliminary-Data-Data-Visulaization-Project-8-12-1/4xgc-ygke"
dfRatePer100= read.socrata(url)
dfRatePer100=subset(dfRatePer100, select=c(1,3), Year!="NA")
colnames(dfRatePer100)= c("Year", "Dollar.Rate")
dfRatePer100$Dollar.Rate= as.numeric(dfRatePer100$Dollar.Rate, 3)
dfRatePer100$mean= mean(dfRatePer100$Dollar.Rate)
dfRatePer100=dfRatePer100%>%
mutate(rateVariance= Dollar.Rate - mean) %>%
arrange(desc(Year))
dfRatePer100$PosNeg=ifelse(dfRatePer100$rateVariance>0, "Positive rate change from mean", ifelse(dfRatePer100$rateVariance<0, "Negative rate change from mean", "No change from mean"))
ratePer100 <- nPlot(rateVariance~ Year, group="PosNeg",data = dfRatePer100,type = 'multiBarHorizontalChart')
ratePer100$chart(showLegend=T)
ratePer100$chart(showControls=F)
ratePer100$chart(color = c('green','red'))
ratePer100$yAxis(axisLabel='Variance from mean rate (in dollars)')
ratePer100$yAxis(tickFormat = "#! function(d) {return d3.format('.2f')(d)} !#")
ratePer100$set(width=600)
ratePer100
I appreciate any help! Thanks.

Not an answer but a suggestion, since looking at the source code, nvd3 multiBarHorizontalChart will group by the groups first and sort then by values, so don't think possible. taucharts might be a good option if rCharts is not a requirement.
library(rCharts)
df <- data.frame(
year = as.character(2000:2012)
,value = runif(13,-1,1)
)
df$group <- ifelse(df$value>0,"positive","negative")
np <-nPlot(
value ~ year,
group = "group",
data = df,
type = 'multiBarHorizontalChart'
)
np$chart(color = c('green','red'))
np
library(taucharts)
tauchart( df ) %>%
tau_bar( "value", "year", "group", horizontal=TRUE) %>%
tau_legend()

Related

How to only include p-value for certain comparisons using stat_pvalue_manual function?

I am trying to visualise a friedman's test followed by pairwise comparisons using a boxplot with p-values.
Here is an example of how it should look like:
[example graph downloaded from the internet][1]
However, since there are way too many significant comparisons in my case, my graph currently looks like this:
[my graph][2]
[1]: https://i.stack.imgur.com/DO6Vz.png
[2]: https://i.stack.imgur.com/94OXK.png
Here is the code I used to generate the graph with p-value
pwc_IFX_plot <- pwc_IFX %>% add_xy_position(x = "Variant")
ggboxplot(IFX_variant, x = "Variant", y = "Concentration", add = "point") +
stat_pvalue_manual(pwc_IFX_plot, hide.ns = TRUE)+
labs(
subtitle = get_test_label(res.fried_IFX, detailed = TRUE),
caption = get_pwc_label(pwc_IFX)
)+scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))
I hope to only show the comparisons of each group to my control group, rather than all the intergroup comparisons.
Thank you for your time.
Any suggestions would be highly appreciated!

Time Series with Gaps in R

I am trying to set a time series in R. However, I a data of daily trading. Trading takes places 5 days a week. Hence, there are gaps in the times series. I want to set the time series without filling the gaps. I tried ts() function but it only works when there are no gaps.
ncw2 = ts(ncw1, start = c("2020-01-23", 1), freq=365)
You may try using the "zoo" library. It allows you to set time series with gaps.
library(zoo)
df <- data.frame(
date <- c("2003-01-02", "2003-01-05", "2003-01-19"),
values <- c(3,8,1)
)
colnames(df) <- c("date", "values")
df.ts <- zoo(df[,-1], order.by = as.Date(df[,1], "%Y-%M-%d"))

Stata timeseries rolling forecast

I'm new to Stata and have a question about its command language. I want to use my ARIMA model to forecast, ie use x[t], x[t-1]... to produce an estimate xhat[t+1], and then roll forward one time step, to make the next forecast, rebuilding the model every N time steps.
i can duplicate code, something like the following code for T, T+1, T+2, etc.:
arima x if t<=T, arima(2,0,2)
predict xhat
to produce a series of xhats to compare with in-sample x observations. There must be a more natural way to do this in the command language. any suggestions, pointers would be very much appreciated.
Posting a working solution provided by Stata tech support:
webuse dfex
tsset month
generate int id = _n
capture program drop forecarima
program forecarima, rclass
syntax [if]
tempvar yhat
arima unemp `if', arima(1,1,0)
local T = e(tmax)
local T1 = `T' + 1
summarize id if month == `T1'
local h = r(max)
predict `yhat', y dynamic(`T')
return scalar y = unemp[`h']
return scalar yhat = `yhat'[`h']
end
rolling unemp = r(y) unemp_hat = r(yhat), window(400) recursive ///
saving(results,replace): forecarima
use results,clear
browse
this provides output with the prediction and observed both available. the dates are off by one step, but easier left to post-processing.

How to solve "not all divisions are known" error?

I'm trying to filter a Dask dataframe with groupby.
df = df.set_index('ngram');
sizes = df.groupby('ngram').size();
df = df[sizes > 15];
However, df.head(15) throws the error ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.. The divisions on sizes are not known:
>>> df.known_divisions
True
>>> sizes.known_divisions
False
A workaround is to do sizes.compute() or .to_csv(...) and then read it back to Dask with dd.from_pandas or dd.read_csv. Then sizes.known_divisions would return True. That's a notable inconvenience.
How else can this be solved? Am I using Dask wrong?
Note: there's an unanswered dublicate here.
In the common case you are using, it appears to be that your indexing series is in fact much smaller than the source dataframe you want to apply it to. In this case, it makes sense to materialise it and use simple indexing like this:
df = pd.DataFrame({'ngram': np.random.choice([1, 2, 3], size=1000),
'other': np.random.randn(1000)}) # fake data
d = dd.from_pandas(df, npartitions=3)
sizes = d.groupby('ngram').size().compute()
d = d.set_index('ngram') # also sorts the divisions
ngrams = sizes[sizes > 300].index.tolist() # a list of good ngrams
d.loc[ngrams].compute()

Predictors of different size for time series prediction using LSTM with Keras

I would like to predict time series values X using another time series Y and the past value of X.In detail, I would like to predict X at time t (Xt) using (Xt-p,...,Xt-1) and (Yt-p,...,Yt-1,Yt) with p the dimension of the "look back".
So, my problem is that I do not have the same length for my 2 predictors.
Let's use a exemple to be clearer.
If I use a timestep of 2, I would have for one observation :
[(Xt-p,Yt-p),...,(Xt-1,Yt-1),(??,Yt)] as input and Xt as output. I do not know what to use instead of the ??
I understand that mathematically speaking I need to have the same length for my predictors, so I am looking for a value to replace the missing value.
I really do not know if there is a good solution here and if I could to something so any help would be greatly appreciated.
Cheers !
PS : you could see my problem as if I wanted to predict the number of ice cream sell one day in advance in a city using the forcast of weather for the next day. X would be the number of ice cream and Y could be the temperature.
You could e.g. do the following:
input_x = Input(shape=input_shape_x)
input_y = Input(shape=input_shape_y)
lstm_for_x = LSTM(50, return_sequences=False)(input_x)
lstm_for_y = LSTM(50, return_sequences=False)(input_y)
merged = merge([lstm_for_x, lstm_for_y], mode="concat") # for keras < 2.0
merged = Concatenate([lstm_for_x, lstm_for_y])
output = Dense(1)(merged)
model = Model([x_input, y_input], output)
model.compile(..)
model.fit([X, Y], X_next)
Where X is an array of sequences, X_forward is X p-steps ahead and Y is an array of sequences of Ys.

Resources