What statistic method to use in multivariate abundance data with random effects? - glm

I am working with multivariate data with random effects.
My hypothesis is this: D has an effect on A1 and A2, where A1 and A2 are binary data, and D is a continuous variable.
I also have a random effect, R, that is a factor variable.
So my model would be something like this: A1andA2~D, random=1=~1|R
I tried to use the function manyglm in mvabund package, but it can not deal with random effects. Or I can use lme4, but it can not deal with multivariate data.
I can convert my multivariate data to a 4 level factor variable, but I didn't find any method to use not binary but factor data as a response variable. I also can convert the continuous D into factor variable.
Do you have any advice about what to use in that situation?

First, I know this should be a comment and not a complete answer but I can't comment yet and thought you might still appreciate the pointer.
You should be able to analyze your data with the MCMCglmm R package. (see here for an Intro), as it can handle mixed models with multivariate response data.

Related

Search for the optimal value of x for a given y

Please help me find an approach to solving the following problem: Let X is a matrix X_mxn = (x1,…,xn), xi is a time series and a vector Y_mx1. To predict values ​​from Y_mx1, let's train some model, let linear regression. We get Y = f (X). Now we need to find X for some given value of Y. The most naive thing is brute force, but what are the competent ways to solve such problems? Perhaps there is a use of the scipy.optimize package here, please enlighten me.
get an explanation or matherial to read for understanding
Most scipy-optimize algorithm use gradient method, for those optimization problem, we could apply these into re-engineering of data (find the best date to invest in the stock market...)
If you want to optimize the result, you should choose a good step size and suitable optimize method.
However, we should not classify tge problem as "predict" of xi because what we are doing is to find local/global maximum/minimum.
For example Newton-CG, your data/equation should contain all the information needed/a simulation, but no prediction is made from the method.
If you want to do a pretiction on "time", you could categorize the time data in "year,month..." then using unsupervise learning to "group" the data. If trend is obtained, then we can re-enginning the result to know the time

2.3 ratio between Pytorch BCEloss and my own "log" calculations

I'm scripting a toy model to both practice PyTorch and GAN models, and I'm making sure I understand each step as much as possible.
That leaded me to checking my understanding of the BCEloss function, and apparently I understand it... with a ratio of 2.3.
To check the results, I write the intermediate values for Excel:
tmp1 = y_pred.tolist() # predicted values in list (to copy/paste on Excel)
tmploss = nn.BCELoss(reduction='none') # redefining a loss giving the whole BCEloss tensor
tmp2 = tmploss(y_pred, y_real).tolist() # BCEloss values in list (to copy/paste Exel)
Then I copy tmp1 on Excel and calculate: -log(x) for each values, which is the BCEloss formula for y_target = y_real = 1.
Then I compare the resulting values with the values of tmp2: these values are 2.3x higher than "mine".
(Sorry, I couldn't figure out how to format tables on this site...)
Can you please tell me what is happening? I feel a PEBCAK coming :-)
This is because in Excel the Log function calculates the logarithm to the base 10.
The standard definition of binary cross entropy uses a log function to the base e.
The ratio you're seeing is just log(10)=2.302585

How can I return to my raw data, after using BoxCox transformation?

I am working on a forecasting project. After inspecting my time series, i decided to apply linear transformation by using R's forecast package's BoxCox function.
This function created an output variable which includes my transformed data. Afterwards I built my ARIMA model to forecast future values. However these forecasts are also in transformed scale. As BoxCox function makes calculations with precise lambda values, I am unable to specify functional form of my series (for example; i can't say if transformation is logaritmic or not).
So, I wonder if there is a function to transform series processed with BoxCox function to their orginal scale? Because I need to report forecasted values in orginal scale.
Use the lambda argument in the modelling function and don't transform your data yourself. All the modelling functions in the forecast package will do the BoxCox transformation for you, and back-transform the forecasts when you need them (including bias-adjustment if required). Here is a simple example:
library(forecast)
AirPassengers %>%
auto.arima(lambda=0, biasadj=TRUE) %>%
forecast() %>%
autoplot()

How do I check or validate the RBM (Restricted Boltzmann Machine) Model?

I'm trying to implement RBM, then i used play tennis case to test the rbm.
I've tried autoencoder before, and the result was good. Actually, I confuse with the function of RBM it self, i think it just like autoencoder, encode the input (each instance) for feature extraction then we can test or validate the model (network) to tried encode and decode some instances.
But the problem that i faced was the result for some function in RBM it's seems weird.
For example the result from gibbs sampling, the result for sampling data is so close with the actual data. The effect is the result from h(x) from sampling data and actual data is close enough.
so when i tried to compare the result decode all unit that in hidden layer back to actual value, the result is bad, the result for each feature (unit) almost same, about 0.4 to 0.5.
and then the f(x) = 1/m*sigma(log(p(x))) for lost function it self is just about 0.07142857142857142, it's never change (the change just about 0.00000000000000001 or 0.00000000000000002).
I used continue value for each feature, using standard normalization so the input range value is 0 to 1.
Any one have suggestion?
*sorry for my bad english :D

Missing data & single imputation

I have a complete ozone data set which consist a few missing values. I would like to use SPSS to do single imputation to impute my data.
Before I start impute my data, I would like to do randomly simulate missing data patterns with 5%, 10%, 15%, 25% and 40% of the data missing in order to evaluating the accuracy of imputation methods.
Can someone please teach me how to do the randomly missing data pattern by using SPSS?
Besides that can someone please tell me how to obtain the performance indicator such as: mean absolute error, coefficient of determination and root mean square error in order to check the best method for estimating missing values.
Unfortunately, my current SPSS supports no missing data analysis, so I can only give some general advice.
First: For your missing data pattern: Simply go to Data -> Select cases -> Random Sample and delete the desired amount of cases and then run the Imputation.
The values you mentioned should be provided by spss if you use their imputation module. There is a manual:
ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/20.0/de/client/Manuals/IBM_SPSS_Missing_Values.pdf
The answer to your first question. Assume your study variable is y and you want to simulate missingness of the variable y. This is en example code to compute extra variable y_miss according to your missing data pattern.
do if uniform(1) < .05.
comp y_miss = $SYSMIS.
else.
comp y_miss = y.
end if.

Resources