Two time dependent covariates in SPSS, Possible? - spss

My data has three covariates, one static from baseline and two covariates that are time dependent. They are binominal and occurs some time after start of follow-up. Both are significant in univariate analysis. The SPSS manual hints that it would be possible to include both in an analysis but I cannot fathom how. Is this possible and how to go about it?

It is possible using SPSS syntax. I think you have to look at the command that defines the time-dependent variable, and create similar ones for your other variables, with the same format. After that you can enter these time-dependent variables, just computed, in the Cox model.

Related

Best way to treat (too) many classes in one categorical variable

I'm working on a ML prediction model and I have a dataset with a categorical variable (let's say product id) and I have 2k distinct products.
If I convert this variable with dummy variables like one hot enconder, the dataset may have a size of 2k times the number of examples (millions of examples), but it's too many to be processed.
How is this used to be treated?
Should I use the variable only with the whitout the conversion?
Thanks.
High cardinality of categorial features is a well-known problem and "the best" way typically depends on the prediction task and requires a trial-and-error approach. It is case-dependent if you can even find a strategy that is clearly better than others.
Addressing your first question, a good collection of different encoding strategies is provided by the category_encoders library:
A set of scikit-learn-style transformers for encoding categorical variables into numeric
They follow the scikit-learn API for transformers and a simple example is provided as well. Again, which one will provide the best results depends on your dataset and the prediction task. I suggest incorporating them in a pipeline and test (some or all of) them.
In regard to your second question, you would then continue to use the encoded features for your predictions and analysis.

Applying multiple weights

I have a data set that has weighted scores based on gender and age profiles. I also have region data broken up into 7 states. I basically want to exclude one of those states and apply additional weights based on state to come up with a new "overall" score.
Manual excel calculations is only way I can think of doing this.
I need to take scores that already have a variable weight applied and add an additional weight dependent on region.
SPSS Statistics only allows a single weight variable to be applied at any one time, so Kevin Troy's comments are correct: you'll have to combine things into a single weight. If the data are properly combined into a single file you may find the Rake Weights extension that's installed with the Python Essentials useful, as you can specify multiple variables as inputs to the overall weighting scheme and have the weights calculated for you. If you're not familiar with the theory behind this, look up raking or rim weighting.

Statistic model beyond ANOVA using SPSS

I have two groups in which I compare two times (before and after treatment).
At first, I tried an repeated measures ANOVA and observed significant difference between groups. However, I have the following question: baseline means (time 1) are very different between groups and we fear that the significant result of ANOVA is due to this difference between baselines.
Therefore, we consider using another statistical model, specifically mixed models. However, the SPSS does not run telling me that "no valid case was found".
I talked to a statistician who said the problem is that the number of observations (rows) is less than the number of columns (dependent variables).
Would anyone know if this information makes sense? And does anyone know of any statistical model that would help us control this difference between baselines using SPSS?

Why should i use summary, and what can i get from these?

I'm studying deep-learning and tensorboard, almost example code use summaries.
I wonder that why I need to use Variables summaries.
Their are a many type of data for summary like min, max, mean, variation, etc.
What should I use in a typical situation?
How to analyze and What can i get from these summary graph?
thank you :D
There is an awesome video tutorial (https://www.youtube.com/watch?v=eBbEDRsCmv4) on Tensorboard that describes almost everything about Tensorboard (Graph, Summaries etc.)
Variable summaries (scalar, histogram, image, text, etc) help track your model through the learning process. For example, tf.summary.scalar('v_loss', validation_loss) will add one point to the loss curve each time you call the summary op, thus give you a rough idea whether the model has converged and when to stop.
It depends on your variable type. For values like loss, tf.summary.scalar shows the trend across epochs; for variables like weights in a layer, it would be better to use tf.summary.histogram, which shows the change of entire distribution of weights; I typically use tf.summary.image and tf.summary.text to check the images / texts my model generates over different epochs.
The graph shows your model structure and the size of tensors flowing through each op. I found it hard at the beginning to organise ops nicely in the graph presentation, and I learnt a lot about variable scope from that. The other answer provides a link for a great tutorial for beginners.

SPSS two way repeated measures ANOVA

i am fairly new with statitistic.
I made an experiment and used the two way ANOVA with repeated measures. The calculation was done in SPSS. In most papers I have seen, the f-value and the degree of freedom were reported as well. is it normal to report those values as well? if so, which values do i take from the spss output.
how do I interpret these values? what do they mean?
when does the f-value support a significant result and when not?
what are good values for the f-value and the degree of freedom.
in some article is also read about the critical f-values, how do I get this value?
most articles describe how to calculate those values but do not explain their meaning for the experiment.
some clarification in these issues is greatly appreciated.
My English is not very good, but I will try to answer your question.
The main purpose of ANOVA is that we want statistical proof that the measured groups have the same mean or not. So we make a null hypothesis and an alternative hypothesis, then we use a test statistics on the data. You can use ANOVA if the groups has the same variance (squared standard deviation).
You need to test this. This is a hyptest too, the nullhyp. is the groups have the same variance, the anternative hyp. is they dont.
You need to make decision from the Sig. value, if the value is higher than 0,05, we usually accept the nullhyp. If the variances are equal, we can use ANOVA. (I assume that the data is following the Normal distribution.) The nullhyp. is that the groups have equal means, the alternative hyp is that we have at least 1 group with a different mean. You can make your decision from the Sig. value, as I said before, if the value higher than 0.05 we accept the nullhyp. The F-critical value is not important if you are calculating on a computer. You can make an accepting interval from the lower and the upper F-critical, and if the F-value is in the interval you accept the nullhyp, but I only used this method in statistics class. You don't need the F-value and the df in the report, because they don't explain anything on their own.

Resources