I am writing my master thesis and I run a generalized linear mixed regression model in SPSS (version 28) using count data.
Research question: which effect has the population mobility on the Covid-19 incidence at the federal state level in Germany during the period from February 2020 to November 2021.
To test the effect of population mobility (independent variable) on Covid-19 incidence (dependent variable) hierarchical models were used, with fixed factors:
mobility variables in 6 places.(scale)
cumulative vaccination rate (only second dose).( scale)
season (summer as the reference category) (nominal)
and random effects:
one model with days variable (Time level). (Scale)
Second model with federal states variable ( each state has a number from 1 to 16) ( place level). (Nominal)
Third model with both days and federal states (Time and place level).
First I have built intercept-only model to check which type of regression is more suitable for the count data (Possion or Negativ binomial) and to choose also the best variable as an offset from two variables..It showed that negative binomial regression is the best for this data. (Based on the BIC or AIC)
Secondly I have checked the collinearity between the original 6 mobility variables and I have excluded mobility variables that are highly correlated based on VIF. (Only one Variable was excluded)
Thirdly I have built 7 generalized linear models by adding only the fixed effects or the fixed factors which are the 5 mobility variables, the cumulative vaccination rate dose 2 and the season (with summer as a reference category) to the intercept only model gradually. From these 7 models the final model with best model fit was selected.
Finally I have built a generalized linear mixed model with the above final model and a classic random effect by adding Days variable only ((random-intercept component for time; TIME level)) and then with federal states variable only ((random-intercept component for place; PLACE level)) and finally with adding both of them together.
I am not sure if I ran the last step regarding the generalized linear mixed models correctly or not??
These are my Steps:
Analyze-> mixed models-> generalized linear mixed model-> fields and effects:
1.target-> case
Target distribution and relationship (link) with the linear model-> custom :
Distribution-> negative binomial
Link Funktion -> log
2.Fixed effects-> include intercept & 5 mobility variables & cumulative vaccination rate & season
3.random effects-> no intercept & days variable (TIME LEVEL)
Random effect covariance type: variance component
4.weight and offset-> use offset field-> log expected cases adjusted wave variable
Build options like general and estimation remain unchanged (suggested by spss)
Model options like Estimated means remain unchanged (suggested by spss)
I have done the same steps with the other 2 models except with random effects:
3.random effects-> no intercept & Federal state variable (PLACE LEVEL)
3.random effects-> no intercept & days variable & Federal state variable (TIME & PLACE LEVEL)
Output:
1.the variance of the random effect of days variable ( time level ) was very small 5,565E-6, indicating only marginal effect in the model. (MODEL 1)
2.the covariance of the random effect of the federal states was zero and the variance was 0.079 ( place level )(MODEL 2)
3.the variance of the random effect of days variable was very small 4,126E-6 and the covariance of the random effect of the federal states was zero and the variance was 0.060 ( Time and place level )(MODEL 3)
Can someone please check my steps and tell me which model from the models in the last step is the best for the presentation of results and explain also the last point in the output within the picture?
Thanks in advance to all of you...
Related
I am running cox proportional hazard regression in SPSS to see the association of 'predictor' with risk of a disease in a 10 years follow-up. I have another variable 'age_quartiles' with values 1,2,3,4 and want to use '1' as reference to get HRs for 2,3, and 4 relative to '1'. When I put this variable in Strata I still get one 'HR' as follows ('S_URAT_07' is the predictor with continuous values);
Question: How do I get HRs for the predictor for the event based on 'age_quartiles' 2,3 and 4 and keeping 1 as reference group? 'age_quartile' is not a predictor here. Am I suppose to choose a specific method?
As I answered yesterday to this same question on Cross Validated:
The model you're fitting involves only the one parameter for changes in hazard as S_URAT_07 varies (e.g., the B is the change in log hazard for a single unit increase in S_URAT_07), regardless of the level of age_quartiles. What differs by age_quartiles is the baseline hazard function when it's used as a strata or stratification variable, and the hazards are then no longer proportional.
If you specify age_quartiles as a factor (called a categorical covariate in COXREG) rather than a strata variable, you'll again get a single coefficient for S_URAT_07, but also a set of three coefficients that reflect proportionally differing baselines for each level of age_quartiles. You can specify simple contrasts on the factor with the first level as the reference category to reflect comparisons with that category.
If you specify age_quartiles as a factor and also include the interaction bewteen it and S_URAT_07, then you get separate proportional baseline hazard functions, but also allow the impact of S_URAT_07 to differ depending on the age_quartiles level.
I have a continuous dependent variable polity_diff and a continuous primary independent variable nb_eq. I have hypothesized that the effect of nb_eq will vary with different levels of the continuous variable gini_round in a non-linear manner: The effect of nb_eq will be greatest for mid-range values of gini_round and close to 0 for both low and high levels of gini_round (functional shape as a second-order polynomial).
My question is: How this is modelled in Stata?
To this point I've tried with a categorized version of gini_round which allows me to compare the different groups, but obviously this doesn't use data to its fullest. I can't get my head around the inclusion of a single interaction term which allows me to test my hypothesis. My best bet so far is something along the lines of the following (which is simplified by excluding some if-arguments etc.):
xtreg polity_diff c.nb_eq##c.gini_round_squared, fe vce(cluster countryno),
but I have close to 0 confidence that this is even nearly right.
Here's how I might do it:
sysuse auto, clear
reg price c.weight#(c.mpg##c.mpg) i.foreign
margins, dydx(weight) at(mpg = (10(10)40))
marginsplot
margins, dydx(weight) at(mpg=(10(10)40)) contrast(atcontrast(ar(2(1)4)._at) wald)
We interact weight with a second degree polynomial of mpg. The first margins calculates the average marginal effect of weight at different values of mpg. The graph looks like what you describe. The second margins compares the slopes at adjacent values of mpg and does a joint test that they are all equal.
I would probably give weight its own effect as well (two octothorpes rather than one), but the graph does not come out like your example:
reg price c.weight##(c.mpg##c.mpg) i.foreign
I've been working a bit with neural networks and I'm interested on implementing a spiking neuron model.
I've read a fair amount of tutorials but most of them seem to be about generating pulses and I haven't found any application of it on a given input train.
Say for example I got input train:
Input[0] = [0,0,0,1,0,0,1,1]
It enters the Izhikevich neuron, does the input multiply a weight or only makes use of the parameters a, b, c and d?
Izhikevich equations are:
v[n+1] = 0.04*v[n]^2 + 5*v[n] + 140 - u[n] + I
u[n+1] = a*(b*v[n] - u[n])
where v[n] is input voltage and u[n] is a general recovery variable.
Are there any texts on implementations of Izhikevich or similar spiking neuron models on a practical problem? I'm trying to understand how information is encoded on this models but it looks different from what's done with standard second generation neurons. The only tutorial I've found where it deals with a spiking train and a set of weights is [1] but I haven't seen the same with Izhikevich.
[1] https://msdn.microsoft.com/en-us/magazine/mt422587.aspx
The plain Izhikevich model by itself, does not include weights.
The two equations you mentioned, model the membrane potential (v[]) over time of a point neuron. To use weights, you could connect two or more of such cells with synapses.
Each synapse could include some sort spike detection mechanism on the source cell (pre-synaptic), and a synaptic current mechanism in the target (post-synaptic) cell side. That synaptic current could then be multiplied by a weight term, and then become part of the I term (in the 1st equation above) for the target cell.
As a very simple example of a two cell network, at every time step, you could check if pre- cell v is above (say) 0 mV. If so, inject (say) 0.01 pA * weightPrePost into the post- cell. weightPrePost would range from 0 to 1, and could be modified in response to things like firing rate, or Hebbian-like spike synchrony like in STDP.
With multiple synaptic currents going into a cell, you could devise various schemes how to sum them. The simplest one would be just a simple sum, more complicated ones could include things like distance and dendrite diameters (e.g. simulated neural morphology).
This chapter is a nice introduction to other ways to model synapses: Modelling
Synaptic Transmission
I am a developer that has been tasked with working out how previous results using SPSS were gathered, so we can repeat the process with some new data. We can't ask the person who did the original analysis because he is sadly no longer with us, so it has fallen to me to unravel what he did.
I am not a statistician and do not need to understand the principles involved. I really just need to know what menu items to navigate to.
We had a survey done, which asked a lot of questions of 10,000 people. A subset of 15 of these questions is being used for the analysis.
I know that factor analysis was done to reduce the data to 4 sets. K-means clustering was then used to find the cluster centers. This is what I'm after now.
I have worked out how to do the factor analysis to get the component score coefficient matrix that matches the data I have in my database. This was done by going to Analyze > Dimension Reduction > Factor. I then chose a fixed number of factors (4) from the "Extract" section, "Varimax" rotation from the "Rotation" section and checked the "Display factor score coefficient matrix" in the "Scores" section.
This gave data like this:
Matrix Value 1 Value 2 Value 3 Value 4
Q1 -0.0756 0.2134 -0.0245 -0.1236
Q2 ... ... ... ...
Q3 ... ... ... ...
...
What I have no idea of is how to proceed with this to do the k-means clustering.
The results I have in the database look like this:
Cluster centers Value 1 Value 2 Value 3 Value 4 Value 5
FAC1_1 -0.8373 -0.5766 0.2100 1.3499 0.2940
FAC2_1 ... ... ... ... ...
FAC3_1 ... ... ... ... ...
FAC4_1 ... ... ... ... ...
Now, I know that k-means clustering can be done on the original data set by using Analyze > Classify > K-means Cluster, but I don't know how to reference the factor analysis I've done.
Could someone give me some insight into how to create these cluster centers using SPSS?
In the GUI for FACTOR analysis (Analyze > Dimension Reduction > Factor), you have a sub-dialog "Scores", make sure "Save as variables" is checked.
This will save the factor scores in your data i.e. the variables FAC1_1, FAC2_1, FAC3_1, FAC4_1.
It is these variable that you then need to add as input variables in the K-means GUI.
It is better to setup your work in a syntax so if ever anyone else ever wants to replicate your work they can do so (and ideally your predecessor should have left his bread crumbs in a syntax document too. I would make every attempt to find this document if there is a remote possibility of it existing, a file of .sps file extension).
Here's how you'd set this up in syntax and what his/her workings may have looked like:
/* Replicate the factor analysis (four factors) and save the factor score variables */.
FACTOR
/VARIABLES < INPUT THE 15 VARIABLES HERE >
/MISSING LISTWISE
/ANALYSIS < INPUT THE 15 VARIABLES HERE >
/PRINT EXTRACTION ROTATION FSCORE
/FORMAT SORT BLANK(.10)
/PLOT ROTATION
/CRITERIA FACTORS(4) ITERATE(25)
/EXTRACTION PC
/CRITERIA ITERATE(25)
/ROTATION VARIMAX
/SAVE REG(ALL)
/METHOD=CORRELATION.
/* Replicate the clustering using factor scores as inputs, generating 5 segments */.
QUICK CLUSTER FAC1_1 FAC2_1 FAC3_1 FAC4_1
/MISSING=LISTWISE
/CRITERIA=CLUSTER(5) MXITER(10) CONVERGE(0)
/METHOD=KMEANS(NOUPDATE)
/SAVE CLUSTER (Seg5)
/PRINT INITIAL.
/* Check centroids match*/.
MEANS FAC1_1 FAC2_1 FAC3_1 FAC4_1 BY Seg5 /CELLS MEAN.
If you can replicate the FACTOR score variables to match exactly, then that is a good start, if the centroids do not match then, given the factor scores do match, then it can only be/most likely to be because the segment assignments are now different. Despite using the same input/methodology if the case ordering is different to previously, K-Means QUICK CLUSTER, can and will most likely yield different segment assignments due to random starting points.
I don't know any way round this but in principle these are the likely steps he/she had taken.
I have done same kind of analysis for a project of mine. First carry out the factor analysis, once you have been able to extract good amount of variance from the factor analysis try to save the factor scores (In SPSS).
For saving the factor scores go to Analyse->Dimension Reduction->Factor->Score->Save as variables.
As you save the scores there would be new variables created in the Variable view based on the number of components.
After you have been able to save the scores of the factors go to Analyse->Classify->K-Means and select the new variables (Factors Scores) enter the number of initial clusters required then OK.
If you have access to the system where the original work was done, look for the journal file (typically named statistics.jnl and kept in the location specified under Edit > Options > Files).
If journaling was in effect with the append option, it will have all the commands the user ran.
I'm doing the same set of analyses for a project. Just for your information, two-step clustering process offered by SPSS is more robust that K-means (Punj & Stewart 1983). In K-means, how are you going to choose the K?! You can also use the clvalid package to get the optimal number of K if you insist on using K-means.
Punj, G., & Stewart, D. W. (1983). Cluster analysis in marketing research: review and suggestions for application. Journal of marketing research, 134-148.
I want to classify documents (composed of words) into 3 classes (Positive, Negative, Unknown/Neutral). A subset of the document words become the features.
Until now, I have programmed a Naive Bayes Classifier using as a feature selector Information gain and chi-square statistics. Now, I would like to see what happens if I use Odds ratio as a feature selector.
My problem is that I don't know hot to implement Odds-ratio. Should I:
1) Calculate Odds Ratio for every word w, every class:
E.g. for w:
Prob of word as positive Pw,p = #positive docs with w/#docs
Prob of word as negative Pw,n = #negative docs with w/#docs
Prob of word as unknown Pw,u = #unknown docs with w/#docs
OR(Wi,P) = log( Pw,p*(1-Pw,p) / (Pw,n + Pw,u)*(1-(Pw,n + Pw,u)) )
OR(Wi,N) ...
OR(Wi,U) ...
2) How should I decide if I choose or not the word as a feature ?
Thanks in advance...
Since it took me a while to independently wrap my head around all this, let me explain my findings here for the benefit of humanity.
Using the (log) odds ratio is a standard technique for filtering features prior to text classification. It is a 'one-sided metric' [Zheng et al., 2004] in the sense that it only discovers features which are positively correlated with a particular class. As a log-odds-ratio for the probability of seeing a feature 't' given the class 'c', it is defined as:
LOR(t,c) = log [Pr(t|c) / (1 - Pr(t|c))] : [Pr(t|!c) / (1 - Pr(t|!c))]
= log [Pr(t|c) (1 - Pr(t|!c))] / [Pr(t|!c) (1 - Pr(t|c))]
Here I use '!c' to mean a document where the class is not c.
But how do you actually calculate Pr(t|c) and Pr(t|!c)?
One subtlety to note is that feature selection probabilities, in general, are usually defined over a document event model [McCallum & Nigam 1998, Manning et al. 2008], i.e., Pr(t|c) is the probability of seeing term t one or more times in the document given the class of the document is c (in other words, the presence of t given the class c). The maximum likelihood estimate (MLE) of this probability would be the proportion of documents of class c that contain t at least once. [Technically, this is known as a Multivariate Bernoulli event model, and is distinct from a Multinomial event model over words, which would calculate Pr(t|c) using integer word counts - see the McCallum paper or the Manning IR textbook for more details, specifically on how this applies to a Naive Bayes text classifier.]
One key to using LOR effectively is to smooth these conditional probability estimates, since, as #yura noted, rare events are problematic here (e.g., the MLE of Pr(t|!c) could be zero, leading to an infinite LOR). But how do we smooth?
In the literature, Forman reports smoothing the LOR by "adding one to any zero count in the denominator" (Forman, 2003), while Zheng et al (2004) use "ELE [Expected Likelihood Estimation] smoothing" which usually amounts to adding 0.5 to each count.
To smooth in a way that is consistent with probability theory, I follow standard practices in text classification with a Multivariate Bernoulli event model. Essentially, we assume that we have seen each presence count AND each absence count B extra times. So our estimate for Pr(t|c) can be written in terms of #(t,c): the number of times we've seen t and c, and #(t,!c): the number of times we've seen t without c, as follows:
Pr(t|c) = [#(t,c) + B] / [#(t,c) + #(t,!c) + 2B]
= [#(t,c) + B] / [#(c) + 2B]
If B = 0, we have the MLE. If B = 0.5, we have ELE. If B = 1, we have the Laplacian prior. Note this looks different than smoothing for the Multinomial event model, where the Laplacian prior leads you to add |V| in the denominator [McCallum & Nigam, 1998]
You can choose 0.5 or 1 as your smoothing value, depending on which prior work most inspires you, and plug this into the equation for LOR(t,c) above, and score all the features.
Typically, you then decide on how many features you want to use, say N, and then choose the N highest-ranked features based on the score.
In a multi-class setting, people have often used 1 vs All classifiers and thus did feature selection independently for each classifier and thus each positive class with the 1-sided metrics (Forman, 2003). However, if you want to find a unique reduced set of features that works in a multiclass setting, there are some advanced approaches in the literature (e.g. Chapelle & Keerthi, 2008).
References:
Zheng, Wu, Srihari, 2004
McCallum & Nigam 1998
Manning, Raghavan & Schütze, 2008
Forman, 2003
Chapelle & Keerthi, 2008
Odd ratio is not good measure for feature selection, because it is only shows what happen when feature present, and nothing when it is not. So it will not work for rare features and almost all features are rare so it not work for almost all features. Example feature with 100% confidence that class is positive which present in 0.0001 is useless for classification. Therefore if you still want to use odd ratio add threshold on frequency of feature, like feature present in 5% of cases. But I would recommend better approach - use Chi or info gain metrics which automatically solve those problems.