I am having trouble using emmeans to evaluate mean (or weighted mean) of all the predictions. For example, a mixed model:
library(emmeans)
library(lme4)
m1 <- lmer(mpg ~ 1 + wt + (1|cyl),data=mtcars)
Fixed effects "wt" is successful:
emmeans(m1,specs="wt")
wt emmean SE df lower.CL upper.CL
3.22 20.2 1.71 1.83 12.1 28.3
However, to calculate the mean of predictions, the following previously worked (~ 12 months ago), but now fails:
emmeans(m1,specs="1")
NOTE: Results may be misleading due to involvement in interactions
Error in `[[<-.data.frame`(`*tmp*`, ".wgt.", value = 1) :
replacement has 1 row, data has 0
The same error occurs for simple linear models. Many thanks for any help.
I thought I was using the current version of emmeans (1.4.8) when I had the troubles described in the question. However, I may actually have been using emmeans 1.4.6 (please see comment by Russ Lenth below). I reverted back to emmeans v1.4.3 and the code worked. I then updated to the current version of emmeans (1.4.8) and the code continued to work. Most likely the cause was my use of emmeans 1.4.6, which had a known bug. Please see this github entry for more information.
Related
I'm new to ML and Kaggle. I was going through the solution of a Kaggle Challenge.
Challenge: https://www.kaggle.com/c/trackml-particle-identification
Solution: https://www.kaggle.com/outrunner/trackml-2-solution-example
While going through the code, I noticed that the author has used only train_1 file (not train_2, 3, …).
I know there is some strategy involved behind using only the train_1 file. Can someone, please, explain why is it so? Also, what are the use of blacklist_training.zip, train_sample.zip, and detectors.zip files?
I'm one of the organiser of the challenge. train_1 2 3 .. files are all equivalent. Outrunner has probably seen there was no improvement using more data.
train_sample.zip is a small dataset equivalent to train_1 2 3... provided for convenience.
blacklist_training.zip is a list of particles to be ignored due to a small bug in the simulator (not very important).
detectors.zip is the list of the geometrical surfaces where the x y z measurements are made.
David
I am pretty new to YOLO/Darknet and am walking in circles with the solutions. I have looked at the Github and Stackexchange fora pages corresponding with similar issues, but none seems to directly address this output issue (i.e. where the region IOU line is missing). Here is my output (training/testing):
Here is my directory structure:
Other details:
I am using the AlexeyAB fork.
6 classes in total (following this convention of annotating occluded and truncated items, so two "items" with three classes each)
I'm using 200+ training images (definitely too few, but I don't know if this is the root cause of my troubles).
There is no predictions.png, just predictions.jpg. However, I don't think this should be an issue.
I followed this tutorial.
Any help is very much appreciated; thank you in advance!
If it finish too soon on training, try adding -clear 1at the end of your training command.
EDIT:
This is the correct answer (ergo why I accepted it), but lacks an explanation. The "-clear 1" flag is, according to this answer, clears past stats.
I have defined a generalised linear model as follows:
glm(formula = ParticleCount ~ ParticlePresent + AlgaePresent +
ParticleTypeSize + ParticlePresent:ParticleTypeSize + AlgaePresent:ParticleTypeSize,
family = poisson(link = "log"), data = PCB)
and I have the below significant interactions
Df Deviance AIC LRT Pr(>Chi)
<none> 666.94 1013.8
ParticlePresent:ParticleTypeSize 6 680.59 1015.4 13.649 0.033818 *
AlgaePresent:ParticleTypeSize 6 687.26 1022.1 20.320 0.002428 **
I am trying to proceed with a posthoc test (Tukey) to compare the interaction of ParticleTypeSize using the lsmeans package. However, I get the following message as soon as I proceed:
library(lsmeans)
leastsquare=lsmeans(glm.particle3,~ParticleTypeSize,adjust ="tukey")
Error in `contrasts<-`(`*tmp*`, value = contrasts.arg[[nn]]) :
contrasts apply only to factors
I've checked whether ParticleTypeSize is a valid factor by applying:
l<-sapply(PCB,function(x)is.factor(x))
l
Sample AlgaePresent ParticlePresent ParticleTypeSize
TRUE FALSE FALSE TRUE
ParticleCount
FALSE
I'm stumped and unsure as to how I can rectify this error message. Any help would be much appreciated!
That error happens when the variable you specify is not a factor. You tested and found that it is, so that's a mystery and all I can guess is that the data changed since you fit the model. So try re-fitting the model with the present dataset.
All that said, I question what you are trying to do. First, you have ParticleTypeSize interacting with two other predictors, which means it is probably not advisable to look at marginal means (lsmeans) for that factor. The fact that there are interactions means that the pattern of those means changes depending on the values of the other variables.
Second, are AlgaePresent and ParticlePresent really numeric variables? By their names, they seem like they ought to be factors. If they are really indicators (0 and 1), that's OK, but it is still cleaner to code them as factors if you are using functions like lsmeans where factors and covariates are treated in distinctly different ways.
BTW, the lsmeans package is being deprecated, and new developments are occurring in its successor, the emmeans package.
I followed this rule but my result didn't meet the result in the paper !
while n=9 , segma= 0.7 ,f(W1)=11
the result in the paper for beta1= 6 and my result= 1.67
what's the wrong with this ?
who know this algorithm,what's beta here refers to ?
The paper you posted gets the algorithm slightly wrong, so you shouldn't rely upon it. Try following along with the example in this paper instead:
http://www.site.uottawa.ca/~mdislam/publications/LREC_06_242.pdf
I have been looking into a development issue that requires the use of pseudorandom number generation to allow the same set of random numbers to be generated for a given seed.
I have currently been looking at using long random(void) and void srandom(unsigned seed) for this (man page), and currently these are generating the same set of random numbers in a Mac app, an iOS app and an iOS app (64-bit) which is what I was hoping. The iOS tests were only in the simulator so I don't know whether this will affect the result.
My main concerns is that this algorithm could change at some point, making the applications we're developing effectively useless with old data. What are the chances of these algorithms changing / being different on a future device?
I'd say it's extremely likely they will change as the sequence is not guaranteed by any standard.
Why not use your own random number sequence? Even a simple linear congruential generator satisfies most statistical properties of randomness. Here is the formula for such a generator:
next_number = (a * current_number + b) % c
with
a = 1103515245
b = 12345
c = 4294967296
These values of a, b, c give you good statistical properties and are quite well known for building quick and dirty generators.
I don't have the slightest idea about the answer to the question you ask.
If a related question is "How can I be absolutely sure to have the same pseudo-random sequences generated in 10 years time ?", the answer to this question is : don't rely on an external library, write the code explicitly.
Bathsheba proposed this generator. You can google for "pseudo random generator algorithm". Here is a list of algorithms listed on wikipedia.
In fact, srandom did change since Mac OS X 10.7, according to this blog post. However, this was due
to the way srandom was implemented: it tried to access an uninitialized local variable, which
is undefined behavior in C. According to the post, the new compiler used since Mac
OS X 10.7 optimized out the uninitialized memory access, changing its behavior in subtle
ways.