Problem with VSN: Error in vsnML(sv) : L-BFGS-B needs finite values of 'fn' - normalization

I am analyzing a proteomics dataset with 3 conditions and a total of 10 samples (columns) and 9650 proteins (rows). I want to apply VSN to my data table for normalization, but I got this error message: Error in vsnML(sv) : L-BFGS-B needs finite values of 'fn'
I was checking whether my data contains infinite values or if all values in one column are the same, but this is not the case.
Can someone help me and tell me why vsn cannot run with this specific dataset and what I can do? (This is like the 20th proteomics data set that I am trying to normalize for my thesis and never had this problem before)
norm_dt <- justvsn(as.matrix(dt))
Error that I get:
vsn2: 9650 x 10 matrix (1 stratum).
Error in vsnML(sv) : L-BFGS-B needs finite values of 'fn'
Some details about by session (I am building a complete workflow therefore I am not posting my whole sessionInfo() here)
R version 4.2.2
vsn version 3.66.0
If someone needs the data, I can send it. It it too big to write it here.

Related

LDA: Coherence Values using u_mass v c_v

I am currently attempting to record and graph coherence scores for various topic number values in order to determine the number of topics that would be best for my corpus. After several trials using u_mass, the data proved to be inconclusive since the scores don't plateau around a specific topic number. I'm aware that CV ranges from -14 to 14 when using u_mass, however my values range from -2 to -1 and selecting an accurate topic number is not possible. Due to these issues, I attempted to use c_v instead of u_mass but I receive the following error:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
This is my code for computing the coherence value
cm = CoherenceModel(model=ldamodel, texts=texts, dictionary=dictionary,coherence='c_v')
print("THIS IS THE COHERENCE VALUE ")
coherence = cm.get_coherence()
print(coherence)
If anyone could provide assistance in resolving my issues for either c_v or u_mass, it would be greatly appreciated! Thank you!

GLM Poisson thinks I have negative values in my dataset, throws error

I am trying to do a poisson GLM, and yet I continue to get this error
Poisson1 <- glm(Number.Flowers ~ Site, data = Flowering2, family="poisson")
Error in eval(family$initialize) :negative values not allowed for the 'Poisson' family
My data is count data and so is all positive values and zeros. What could be going on?
Is it possible for my CSV file to contain hidden negative values?
It's possible your CSV might be flawed in some way. Try a different method of importing it into R (fread, read.table, etc). Check for NA or NaN issues. Compare the number of rows.

XGBoost prediction always returning the same value - why?

I'm using SageMaker's built in XGBoost algorithm with the following training and validation sets:
https://files.fm/u/pm7n8zcm
When running the prediction model that comes out of the training with the above datasets always produces the exact same result.
Is there something obvious in the training or validation datasets that could explain this behavior?
Here is an example code snippet where I'm setting the Hyperparameters:
{
{"max_depth", "1000"},
{"eta", "0.001"},
{"min_child_weight", "10"},
{"subsample", "0.7"},
{"silent", "0"},
{"objective", "reg:linear"},
{"num_round", "50"}
}
And here is the source code: https://github.com/paulfryer/continuous-training/blob/master/ContinuousTraining/StateMachine/Retrain.cs#L326
It's not clear to me what hyper parameters might need to be adjusted.
This screenshot shows that I'm getting a result with 8 indexes:
But when I add the 11th one, it fails. This leads me to believe that I have to train the model with zero indexes instead of removing them. So I'll try that next.
Update: retraining with zero values included doesn't seem to help. I'm still getting the same value every time. I noticed i can't send more than 10 values to the prediction endpoint or it will return an error: "Unable to evaluate payload provided". So at this point using the libsvm format has only added more problems.
You've got a few things wrong there.
using {"num_round", "50"} with such a small ETA {"eta", "0.001"} will give you nothing.
{"max_depth", "1000"} 1000 is insane! (default value is 6)
Suggesting:
{"max_depth", "6"},
{"eta", "0.05"},
{"min_child_weight", "3"},
{"subsample", "0.8"},
{"silent", "0"},
{"objective", "reg:linear"},
{"num_round", "200"}
Try this and report your output
As I was grouping time series, certain frequencies created gaps in data.
I solved this issue by filling all NaN's.

Multiple, Binomial Dependent Variables for GLM (or LME4) in R

everyone. I'm pretty new to R. I've been trying to educate myself about this issue, but I've continued to run into road blocks.
I have a data set with two categorical, independent variables (habitat (1,2,3) and site (1,2,3,4,5). My response variables are the presence or absence of AFLP loci. I have 96 loci, and I want to determine which, if any, of these loci are significantly associated with habitat (site is a random effect). Each of the loci can be assumed to be independent from the others.
As far as relevancy to other researchers, this should be a problem that people trying to analyze molecular data with GLM or LME will begin to run into more.
Here is my code:
##Independent variables
Site=AFLP$Site ##AFLP is my data file
Habitat=AFLP$Habitat
##Dependent variable
Loci=AFLP[,4:99]
##Establishing matrix of variables
mydata <- cbind(Site, Habitat, Loci)
##glm
model1 <- glm(Loci ~ (1|Site)+Habitat, data=mydata, family="binomial")
I get this error:
Error in model.frame.default(formula = Loci ~ (1 | Site) + Habitat, data = mydata, :
invalid type (list) for variable 'Loci'
I know this error is associated with the data type of Loci; however, I've tried a bunch of things and still can't figure out how to correctly address the issue.
My problem seems to be similar to the ones in the below links, but again, I haven't been able to figure out how to apply this information to my data set.
http://stackoverflow.com/questions/18067519/using-r-to-do-a-regression-with-multiple-dependent-and-multiple-independent-vari
https://stats.stackexchange.com/questions/26585/how-to-do-a-generalized-linear-model-with-multiple-dependent-variables-in-r
Thank you in advance. If this turns out to have a simple answer, I apologize for taking up space. I have been Googling and trying to educate myself, and I haven't made any head-way.

failure to get p-values for lmer using lmerTest

I have run the following model using lmerTest and using lme4:
model2 = lmer(log(RT)~Group*A*B*C+(1|item)+(1+A+B+C|subject),data=dt)
Using lmerTest I get the following error when typing the summary() command:
> summary(model1)
Error in `colnames<-`(`*tmp*`, value = c("Estimate", "Std. Error", "df", :
length of 'dimnames' [2] not equal to array extent
I saw this has already been an issue for other users and that one user was able to bypass the issue running lsmeans().
When I tried lsmeans, I got the error:
Error in asMethod(object) : not a positive definite matrix.
I did not see any NAs when looking into the covariance matrix.
Note that I am able to run this model if I simply inverse the contrasts in the Group factor.
I have difficulties understanding why this is the case.
When I run the same model using lme4 and not lmerTest, I am able to get all the outputs of summary() but no p-values (as expected). pvals.fnc is discontinued in lme4 and I have not found an alternative yet. Plus it would be nice to have the p-values estimated in the same way for model2 as for the other models for which I was successfully able to use lmerTest.
Does anyone know what I should do at this point? Any help would be much appreciated!
If A or B or C are factors then you might get errors - such models are not yet supported by the lmerTest package (we will put the warning message together with the restrictions for such models in the help page)

Resources