Is there a way to specify dispersion parameter in glmer (binomial family)? - glm

I am trying to get estimates and standard errors from a glmer model. Once I know the disersion parameter I want to call summary and get the information. Is there a way to do it?
EDITED: I have seen that you can do what I want for glm with this code: summary(model, dispersion = 2.5) for model inference. However, it does not work for glmer models.

Related

The log-likelihood function of ARMA model after the deprecation of the statsmodels.tsa.arima_model.ARMA

I am following a tutorial building an AR model using ARMA model from statsmodels.tsa.arima_model, I got an error that it has been deprecated, and it is replaced by ARIMA from statsmodels.tsa.arima.model. I am trying to write a function that calculate a test statistic of the log-likelihood ratio test, which requires the use of the llf attribute of an initialized ARMA model. I do not know what is the counterpart of llf attribute in ARIMA model?
I tried statsmodels.tsa.arima.model.ARIMA.fit().loglike() but I did not know what kind of parameter it should get?
ar_2_model = ARIMA(data, order=(2,0,0) ).fit()
I tried ar_2_model.loglike(), but I could not figure out what kind of parameters to pass to it.
I want to get the value of the log-likelihood function of ar_2_model as similar to if I would have used
ar_3_model = ARMA(data. order=(2,0)).fit()
ll_f = ar_3_model.llf
You can still use llf:
ar_2_model = ARIMA(data, order=(2,0,0) ).fit()
print(ar_2_model.llf)

Robust Standard Errors in spatial error models

I am fitting a Spatial Error Model using the errorsarlm() function in the spdep library.
The Breusch-Pagan test for spatial models, calculated using the bptest.sarlm() function, suggest the presence of heteroskedasticity.
A natural next step would be to get the robust standard error estimates and update the p-values. In the documentation of the bptest.sarlm() function says the following:
"It is also technically possible to make heteroskedasticity corrections to standard error estimates by using the “lm.target” component of sarlm objects - using functions in the lmtest and sandwich packages."
and the following code (as reference) is presented:
lm.target <- lm(error.col$tary ~ error.col$tarX - 1)
if (require(lmtest) && require(sandwich)) {
print(coeftest(lm.target, vcov=vcovHC(lm.target, type="HC0"), df=Inf))}
where error.col is the spatial error model estimated.
Now, I can easily adapt the code to my problem and get the robust standard errors.
Nevertheless, I was wondering:
What exactly is the “lm.target” component of sarlm objects? I can not find any mention to it in the spdep documentation.
What exactly are $tary and $tarX? Again, it does not seem to be mentioned on the documentation.
Why documentation says it is "technically possible to make heteroskedasticity corrections"? Does it mean that proposed approach is not really recommended to overcome issues of heteroskedasticity?
I report this issue on github and had a response by Roger Bivand:
No, the approach is not recommended at all. Either use sphet or a Bayesian approach giving the marginal posterior distribution. I'll drop the confusing documentation. tary is $y - \rho W y$ and similarly for tarX in the spatial error model case. Note that tary etc. only occur in spdep in documentation for localmoran.exact() and localmoran.sad(); were you using out of date package versions?

How to fetch values using Permutation Feature Importance

I have a dataset with 5K (and 60 features) records focused on binary classification.
Please note that this solution doesn't work here
I am trying to generate feature importance using Permutation Feature Importance. However, I get the below error. Can you please look at my code and let me know whether I am making any mistake?
import eli5
from eli5.sklearn import PermutationImportance
logreg =LogisticRegression()
model = logreg.fit(X_train_std, y_train)
perm = PermutationImportance(model, random_state=1)
eli5.show_weights(perm, feature_names = X.columns.tolist())
I get an error like as shown below
AttributeError: 'PermutationImportance' object has no attribute 'feature_importances_'
Can you help me resolve this error?
If you look at your attributes of PermutationImportance object via
ord(perm)
you can see all attributes and methods BUT after you fit your PI object, meaning that you need to do:
perm = PermutationImportance(model, random_state=1).fit(X_train,y)

failure to get p-values for lmer using lmerTest

I have run the following model using lmerTest and using lme4:
model2 = lmer(log(RT)~Group*A*B*C+(1|item)+(1+A+B+C|subject),data=dt)
Using lmerTest I get the following error when typing the summary() command:
> summary(model1)
Error in `colnames<-`(`*tmp*`, value = c("Estimate", "Std. Error", "df", :
length of 'dimnames' [2] not equal to array extent
I saw this has already been an issue for other users and that one user was able to bypass the issue running lsmeans().
When I tried lsmeans, I got the error:
Error in asMethod(object) : not a positive definite matrix.
I did not see any NAs when looking into the covariance matrix.
Note that I am able to run this model if I simply inverse the contrasts in the Group factor.
I have difficulties understanding why this is the case.
When I run the same model using lme4 and not lmerTest, I am able to get all the outputs of summary() but no p-values (as expected). pvals.fnc is discontinued in lme4 and I have not found an alternative yet. Plus it would be nice to have the p-values estimated in the same way for model2 as for the other models for which I was successfully able to use lmerTest.
Does anyone know what I should do at this point? Any help would be much appreciated!
If A or B or C are factors then you might get errors - such models are not yet supported by the lmerTest package (we will put the warning message together with the restrictions for such models in the help page)

Using test data set in RapidMiner

I'm trying to create a model with a training dataset and want to label the records in a test data set.
All tutorials or help I find online has information on only using cross validation with one data set, i.e., training dataset. I couldn't find how to use test data. I tried to apply the result model on to the test set. But the test set seems to give different no. of attributes than training set after pre-processing. This is a text classification problem.
At the end I get some output like this
18.03.2013 01:47:00 Results of ResultWriter 'Write as Text (2)' [1]:
18.03.2013 01:47:00 SimpleExampleSet:
5275 examples,
366 regular attributes,
special attributes = {
confidence_1 = #367: confidence(1) (real/single_value)
confidence_5 = #368: confidence(5) (real/single_value)
confidence_2 = #369: confidence(2) (real/single_value)
confidence_4 = #370: confidence(4) (real/single_value)
prediction = #366: prediction(label) (nominal/single_value)/values=[1, 5, 2, 4]
}
But what I wanted is all my examples to be labelled.
It seems that my test data and training data have different no. of attributes, I see many of following in the logs.
Mar 18, 2013 1:46:41 AM WARNING: Kernel Model: The given example set does not contain a regular attribute with name 'wireless'. This might cause problems for some models depending on this particular attribute.
But how do we solve such problem in text classification as we cannot know no. of and name of attributes before hand.
Can some one please throw some pointers.
You probably use a Process Documents operator to preprocess both training and test set. Here it is important that both these operators are setup identically. To "synchronize" the wordlist, i.e. consider the same set of words in both of them, you have to connect the wordlist (wor) output of the Process Documents operator used for training to the corresponding input port of the Process Documents operator used for preprocessing the test set.

Resources