Multiple, Binomial Dependent Variables for GLM (or LME4) in R - glm

everyone. I'm pretty new to R. I've been trying to educate myself about this issue, but I've continued to run into road blocks.
I have a data set with two categorical, independent variables (habitat (1,2,3) and site (1,2,3,4,5). My response variables are the presence or absence of AFLP loci. I have 96 loci, and I want to determine which, if any, of these loci are significantly associated with habitat (site is a random effect). Each of the loci can be assumed to be independent from the others.
As far as relevancy to other researchers, this should be a problem that people trying to analyze molecular data with GLM or LME will begin to run into more.
Here is my code:
##Independent variables
Site=AFLP$Site ##AFLP is my data file
Habitat=AFLP$Habitat
##Dependent variable
Loci=AFLP[,4:99]
##Establishing matrix of variables
mydata <- cbind(Site, Habitat, Loci)
##glm
model1 <- glm(Loci ~ (1|Site)+Habitat, data=mydata, family="binomial")
I get this error:
Error in model.frame.default(formula = Loci ~ (1 | Site) + Habitat, data = mydata, :
invalid type (list) for variable 'Loci'
I know this error is associated with the data type of Loci; however, I've tried a bunch of things and still can't figure out how to correctly address the issue.
My problem seems to be similar to the ones in the below links, but again, I haven't been able to figure out how to apply this information to my data set.
http://stackoverflow.com/questions/18067519/using-r-to-do-a-regression-with-multiple-dependent-and-multiple-independent-vari
https://stats.stackexchange.com/questions/26585/how-to-do-a-generalized-linear-model-with-multiple-dependent-variables-in-r
Thank you in advance. If this turns out to have a simple answer, I apologize for taking up space. I have been Googling and trying to educate myself, and I haven't made any head-way.

Related

Ran a MANOVA where Pillai's/Wilks isn't significant, but one of the DVs is very significant in my output table of between-subjects effects

I'm a stats newb and was told by my professor to run a MANOVA for something I was checking out. Basically, I wanted to see if there was an interaction between ethnicity and a certain quadrant grouping for a set of outcome variables that are subscales of an overall measure (ders_tot).
An ANCOVA (one DV) already found an interaction between ethnicity and the quadrant grouping for ders_tot.
My MANOVA output is showing me that with Pillai's/Wilks there is no significance (p = .098 for both), but in SPSS there is also a table of between-subjects effects automatically generated that indicates strong interaction significance for one particular outcome variable (p = .003). The other DVs are far from significance (some as high as p = .27 or p = .66).
Is my MANOVA significance (or lack thereof) being seriously skewed by the highly nonsignificant variables? Am I still "allowed" to run analysis on that one particular variable included in the MANOVA that suggests strong significance? I also have data viz/chart output that makes a strong case for analyzing that particular variable.
(EDIT: BELOW PROBLEM HAS BEEN FIXED)
[Also, I've noticed that one of my covariates is always being run in SPSS with 1 df when it should be 2. I've triple checked the variable type and added labels and all that, and can't get it to run appropriately. When I run the same analysis in R, df = 2. This isn't affecting my sig. findings by much, but it's driving me crazy!]

How can I get predictions from these pretrained models?

I've been trying to generate human pose estimations, I came across many pretrained models (ex. Pose2Seg, deep-high-resolution-net ), however these models only include scripts for training and testing, this seems to be the norm in code written to implement models from research papers ,in deep-high-resolution-net I have tried to write a script to load the pretrained model and feed it my images, but the output I got was a bunch of tensors and I have no idea how to convert them to the .json annotations that I need.
total newbie here, sorry for my poor English in advance, ANY tips are appreciated.
I would include my script but its over 100 lines.
PS: is it polite to contact the authors and ask them if they can help?
because it seems a little distasteful.
Im not doing skeleton detection research, but your problem seems to be general.
(1) I dont think other people should teaching you from begining on how to load data and run their code from begining.
(2) For running other peoples code, just modify their test script which is provided e.g
https://github.com/leoxiaobin/deep-high-resolution-net.pytorch/blob/master/tools/test.py
They already helps you loaded the model
model = eval('models.'+cfg.MODEL.NAME+'.get_pose_net')(
cfg, is_train=False
)
if cfg.TEST.MODEL_FILE:
logger.info('=> loading model from {}'.format(cfg.TEST.MODEL_FILE))
model.load_state_dict(torch.load(cfg.TEST.MODEL_FILE), strict=False)
else:
model_state_file = os.path.join(
final_output_dir, 'final_state.pth'
)
logger.info('=> loading model from {}'.format(model_state_file))
model.load_state_dict(torch.load(model_state_file))
model = torch.nn.DataParallel(model, device_ids=cfg.GPUS).cuda()
Just call
# evaluate on Variable x with testing data
y = model(x)
# access Variable's tensor, copy back to CPU, convert to numpy
arr = y.data.cpu().numpy()
# write CSV
np.savetxt('output.csv', arr)
You should be able to open it in excel
(3) "convert them to the .json annotations that I need".
That's the problem nobody can help. We don't know what format you want. For their format, it can be obtained either by their paper. Or looking at their training data by
X, y = torch.load('some_training_set_with_labels.pt')
By correlating the x and y. Then you should have a pretty good idea.

"Contrast Error" message with lsmeans Tukey Test on GLM

I have defined a generalised linear model as follows:
glm(formula = ParticleCount ~ ParticlePresent + AlgaePresent +
ParticleTypeSize + ParticlePresent:ParticleTypeSize + AlgaePresent:ParticleTypeSize,
family = poisson(link = "log"), data = PCB)
and I have the below significant interactions
Df Deviance AIC LRT Pr(>Chi)
<none> 666.94 1013.8
ParticlePresent:ParticleTypeSize 6 680.59 1015.4 13.649 0.033818 *
AlgaePresent:ParticleTypeSize 6 687.26 1022.1 20.320 0.002428 **
I am trying to proceed with a posthoc test (Tukey) to compare the interaction of ParticleTypeSize using the lsmeans package. However, I get the following message as soon as I proceed:
library(lsmeans)
leastsquare=lsmeans(glm.particle3,~ParticleTypeSize,adjust ="tukey")
Error in `contrasts<-`(`*tmp*`, value = contrasts.arg[[nn]]) :
contrasts apply only to factors
I've checked whether ParticleTypeSize is a valid factor by applying:
l<-sapply(PCB,function(x)is.factor(x))
l
Sample AlgaePresent ParticlePresent ParticleTypeSize
TRUE FALSE FALSE TRUE
ParticleCount
FALSE
I'm stumped and unsure as to how I can rectify this error message. Any help would be much appreciated!
That error happens when the variable you specify is not a factor. You tested and found that it is, so that's a mystery and all I can guess is that the data changed since you fit the model. So try re-fitting the model with the present dataset.
All that said, I question what you are trying to do. First, you have ParticleTypeSize interacting with two other predictors, which means it is probably not advisable to look at marginal means (lsmeans) for that factor. The fact that there are interactions means that the pattern of those means changes depending on the values of the other variables.
Second, are AlgaePresent and ParticlePresent really numeric variables? By their names, they seem like they ought to be factors. If they are really indicators (0 and 1), that's OK, but it is still cleaner to code them as factors if you are using functions like lsmeans where factors and covariates are treated in distinctly different ways.
BTW, the lsmeans package is being deprecated, and new developments are occurring in its successor, the emmeans package.

failure to get p-values for lmer using lmerTest

I have run the following model using lmerTest and using lme4:
model2 = lmer(log(RT)~Group*A*B*C+(1|item)+(1+A+B+C|subject),data=dt)
Using lmerTest I get the following error when typing the summary() command:
> summary(model1)
Error in `colnames<-`(`*tmp*`, value = c("Estimate", "Std. Error", "df", :
length of 'dimnames' [2] not equal to array extent
I saw this has already been an issue for other users and that one user was able to bypass the issue running lsmeans().
When I tried lsmeans, I got the error:
Error in asMethod(object) : not a positive definite matrix.
I did not see any NAs when looking into the covariance matrix.
Note that I am able to run this model if I simply inverse the contrasts in the Group factor.
I have difficulties understanding why this is the case.
When I run the same model using lme4 and not lmerTest, I am able to get all the outputs of summary() but no p-values (as expected). pvals.fnc is discontinued in lme4 and I have not found an alternative yet. Plus it would be nice to have the p-values estimated in the same way for model2 as for the other models for which I was successfully able to use lmerTest.
Does anyone know what I should do at this point? Any help would be much appreciated!
If A or B or C are factors then you might get errors - such models are not yet supported by the lmerTest package (we will put the warning message together with the restrictions for such models in the help page)

How to fix Amos error: "observed variable is represented by an ellipse in the path diagram"?

I received the following question by email and have seen a lot of students with this problem:
I am trying to fit a structural equation model in Amos, but when I click "calculate estimates", I get the following error: "observed variable [variable name] is represented by an ellipse in the path diagram". Could you please advise me of what I am doing wrong?
IBM Help discusses this error but isn't that helpful.
In practice, I've seen this error come up a number of times. It can occur because you have incorrectly specified a variable as latent that you wanted to be observed. However, more commonly, it is the result of giving an inappropriate variable to a latent variable. Specifically, it is relatively easy to give a name to a latent factor that is the same as an observed variable in your data file.
For example, one time I had some personality variables in a dataset and the extraversion items were called E1, E2, E3, and so on. These are common names for residuals. So when giving residuals these names, there was a conflict with the names in the data file.
Another even more common cause is when you name a latent factor an appropriate name (e.g., selfesteem, extraversion, jobsatisfaction, etc.) and you have already created a scale score in your data file with the same name. This also causes the conflict.
The basic solution is just to give the latent variable a unique name that doesn't conflict with one in the data file. So for example, name the variable selfesteem_factor rather than selfesteem if you already have a variable called selfesteem.
I recently experienced the same problem. I followed Jeromy's advise and it worked. Actually that error message is caused by you giving the same name to a latent variable and an observed variable. In my case, I had a latent variable, trust, but I had also created a summated scale for trust(making it become an observed variable). So I got the same error message. when I changed the name of the latent variable, the model run properly

Resources