Before I ask my query I need to explain some basic details with respect to my study.
Design: I have a mixed factorial design 2X2x2 with two within subject factors (having 2 levels in each factors) and 2 between subject factors. I ran two analysis:
Analysis1: Mixed factorial ANOVA and found that there was was a significant main effect of all the three factors with a significant interraction effect of all the three factors.
Result of Analysis 1 (without the covariate):
Analysis 2: I ran a Mixed Factorial ANOCOVA which was the same analysis as decribed above with just ONE covariate named BDI. However, I got the following output from SPSS which I am unable to interpret.
Note: Alerting_effect and Flanker type are within subject factors. Group is a between subject factor which contains two groups of participants.
Questions:
1.I am not able to understand whether I have a significant interraction effect of the covariate between Alerting_effect X Flanker_type X Group. Which was the main aim of conducting the second analysis. The covariate may theoritical causes modulation in the result.
2. In the second analysis Alerting_effect X Flanker_type X Group is not significant as p = 0.161 while in the first analysis this interraction was significant p = 0.001. How do I interprete this finding?
If anyone can help me out in this I would be grateful.
Thanks
In my data set, consistent of employees nested in teams, I have to calculate proportional diversity index for gender. This index is the percentage of minority group members present within each team. in my data set male coded as 0 and female as 1. Now I wonder if there is any simple way for coming up with the number of minority in each team.
Tnx for your guidance
If what you need is just the percentage of males and females in each team you can calculate:
sort cases by teamVar.
split file by teamVar.
freq genderVar.
split file off.
This will get you the results in the output window.
If you want the results in another dataset you can use aggregate:
dataset declare byteam.
aggregate out=byteam /break=teamVar
/Pfemales=Pin(genderVar 1 1)
/Pmales=Pin(genderVar 0 0).
I have a use-case of monitoring that I'm not entirely sure if it's a good
match for Prometheus or not, and I wanted to ask for opinions before I delve
deeper.
The numbers of what I'm going to store:
Only 1 metric.
That metric has 1 label with 1,000,000 to 2,000,000 distinct values.
The values are gauges (but does it make a difference if they are counters?)
Sample rate is once every 5 minutes. Retaining data for 180 days.
Estimated storage size if I have 1 million distinct label values:
(According to formula in Prometheus' documentation: retention_time_seconds *
ingested_samples_per_second * bytes_per_sample)
(24*60)/5=288 5-minute intervals in a day.
(180*288) * (1,000,000) * 2 = 103,680,000,000 ~= 100GB
samples/label-value label-value-count bytes/sample
So I assume 100-200GB will be required.
Is this estimation correct?
I read in multiple places about avoiding high-cardinality labels, and I would
like to ask about this. Considering I will be looking at one time-series at a time Is the problem with high-cardinality labels? Or
having a high number of time-series? As each label value produces another
time-series? I also read in multiple places that Prometheus can handle
millions of time-series at once, so even if I have 1 label with one million
distinct values, I should be fine in terms of time-series count, do I have to
worry about the labels having high cardinality in this case? I'm aware that
it depends on the strength of the server, but assuming average capacity, I
would like to know if Prometheus' implementation has a problem handling this
case efficiently.
And also, if it's a matter of time-series count, am I correct in assuming
that it will not make a significant difference between the following
options?
1 metric with 1 label of 1,000,000 distinct label values.
10 metrics each with 1 label of 100,000 distinct label values.
X metrics each with 1 label of Y distinct label values.
where X * Y = 1,000,000
Thanks for the help!
That might work, but it's not what Prometheus is designed for and you'll likely run into issues. You probably want a database rather than a monitoring system, maybe Cassandra here.
How the cardinality is split out across metrics won't affect ingestion performance, however it'll be relatively slow to have to read 1M series in a query.
Note that Victoria Metrics is an easy to configure backend for Prometheus which will reduce storage requirements significantly.
I have studied association rules and know how to implement the algorithm on the classic basket of goods problem, such as:
Transaction ID Potatoes Eggs Milk
A 1 0 1
B 0 1 1
In this problem each item has a binary identifier. 1 indicates the basket contains the good, 0 indicates it does not.
But what would be the best way to model a basket which can contain many of the same good? E.g., take the below, very unrealistic example.
Transaction ID Potatoes Eggs Milk
A 5 0 178
B 0 35 7
Using binary indicators in this case would obviously be losing a lot of information and I am seeking a model which takes into account not only the presence of items in the basket, but also the frequency that the items occur.
What would be a suitable algorithm for this problem?
In my actual data there are over one hundred items and, based on the profile of a user's basket, I would like to calculate the probabilities of the customer consuming the other available items.
An alternative is to use binary indicators but constructing them in a more clever way.
The idea is to set the indicator when an amount is more than the central value, which means that it shall be significant. If everyone buys 3 breads on average, does it make sense to flag someone as a "bread-lover" for buying two or three?
Central value can a plain arithmetic mean, one with outliers removed, or the median.
Instead of:
binarize(x) = 0 if x = 0
1 otherwise
you can use
binarize*(x) = 0 if x <= central(X)
1 otherwise
I think if you really want to have probabilities is to encode your data in a probabilistic way. Bayesian or Markov networks might be a feasible way. Nevertheless without having a reasonable structure this will be computational extremely expansive. For three item types this, however, seems to be feasible
I would try to go for a Neural Network Autoencoder if you have many more item types. If there is some dependency in the data it will discover that.
For the above example you could use a network with three input, two hidden and three output neurons.
A little bit more fancy would be to use 3 fully connected layers with drop out in the middle layer.
I have been given this raw data to use in Spss and i'm so confused since i'm used to R instead.
An experiment monitored the amount of weight gained by anorexic girls after various treatments. Girls were placed to assigned to one of three groups. Group 1 had no therapy, Group 2 had cognitive behaviour therapy. Group 3 had family therapy. The researchers wanted to know if the two treatment groups produced weight gain relative to the control group.
This is the data
group1<- c(-9.3,-5.4,12.3,-2,-10.2,-12.2,11.6,-7.1,6.2,9.2,8.3,3.3,11.3,-10.6,-4.6,-6.7,2.8,3.7,15.9,-10.2)
group2<-c(-1.7,-3.5,14.9,3.5,17.1,-7.6,1.6,11.7,6.1,-4,20.9,-9.1,2.1,-1.4,1.4,-3.7,2.4,12.6,1.9,3.9,15.4)
group3<-c(11.4,11.0,5.5,9.4,13.6,-2.9,7.4,21.5,-5.3,-3.8,13.4,13.1,9,3.9,5.7,10.7)
I have been asked to come up with the mean and stdeviation of the independant variable which i believe is the treatment groups as a function of weight.
then do anova for the data and pairwise comparisons
i dont know where to start with this data besides putting it in the SPSS
with R i would use summary and anova functions but with the SPSS im lost.
Please help
For comparison of means and one-way ANOVA (and all of the potential options) navigate the menus for Analyze -> Compare Means. Below is an example using Tukey post-hoc comparisons. In the future just search the command syntax reference. A search for ANOVA would have told you all you needed to know.
DATA LIST FREE (",") / val.
BEGIN DATA
-9.3,-5.4,12.3,-2,-10.2,-12.2,11.6,-7.1,6.2,9.2,8.3,3.3,11.3,-10.6,-4.6,-6.7,2.8,3.7,15.9,-10.2
-1.7,-3.5,14.9,3.5,17.1,-7.6,1.6,11.7,6.1,-4,20.9,-9.1,2.1,-1.4,1.4,-3.7,2.4,12.6,1.9,3.9,15.4
11.4,11.0,5.5,9.4,13.6,-2.9,7.4,21.5,-5.3,-3.8,13.4,13.1,9,3.9,5.7,10.7
END DATA.
DATASET NAME val.
DO IF $casenum <= 20.
COMPUTE grID = 1.
ELSE IF $casenum > 20 AND $casenum <= 41.
COMPUTE grID = 2.
ELSE.
COMPUTE grID = 3.
END IF.
*Means and Standard Deviations.
MEANS
TABLES=val BY grID
/CELLS MEAN COUNT STDDEV .
*Anova.
ONEWAY val BY grID
/MISSING ANALYSIS
/POSTHOC = TUKEY ALPHA(.05).