How to interpret reverse score as dependent variable in a two-way ANOVA - spss

I have two set of independent variables, a dependent variable.
The dependency variable is speed taken by each individual to complete a task.
The independent variables are Type of Task, Gender.
For the dependent variable, the speed which is measured as the time duration in minutes is interpreted that a lower speed implies a higher efficiency.
How do I cater for this while conducting a two-way ANOVA analysis in SPSS ?

In running an ANOVA on these data, you can run it either way, using the time (duration), or transforming that to speed. Since transforming time elapsed to speed involves taking a reciprocal, the transformation is not linear, and the results of the two analyses may not agree. Thus you need to decide which one is the right one for your purposes.

Related

Time Series Variable Analysis

The big picture of the problem is the following : predicting the failure on engine by predicting temperature of the engine because that is intuitively the main reason of failure. The first thing I want to do is to check which other variables influence the temperature, like torque of the engine, functioning mode of the engine etc. Why? Because those are variables we can change in real life and thus avoid high temperatures, and thus the failure.
So, my question is how to find on which variables the temperature is dependent and how much. As the temperature is dependent on the time we're in the case of a time series problem but not all the variables are dependent on their past values. Therefore, I am not sure that auto-regressive models could work.
First thing that came in mind was to check whether there is a linear relationship. But by reflecting on how physically a temperature will evolve I'm pretty sure it is exponentially, so perhaps just taking the natural logarithm we can transform it into a linear problem and then apply a linear regression. The problem is that won't capture the time dependency of the temperatures. I looked into autoregressive models but I'm not sure they will work. All I want is to see which variables have an impact on temperature not to predict the temperature for now.

Machine learning algorithm to predict/find/converge to correct parameters in mathematical model

I am currently trying to find a machine learning algorithm that can predict about 5 - 15 parameters used in a mathematical model(MM). The MM has 4 different ordinary differential equations(ODE) and a few more will be added and thus more parameters will be needed. Most of the parameters can be measured, but others need to be guessed. We know all the 15 parameters, but we want the computer to guess 5 or even 10. To test if parameters are correct, we fill in the parameters in the MM and then calculate the ODEs with a numerical method. Subsequently we calculate the error between the calculations of the model with the parameters we know(and want to guess) and the calculated values of the MM for which we guessed the parameters. Calculating the values of the models ODEs is done multiple times, the ODEs represent one minute in real time and we calculate for 24 hours, thus 1440 calculations.
Currently we are using a particle filter to gues the variables, this works okay but we want to see if there are any better methods out there to gues parameters in a model. The particle filter takes a random value for a parameter which lies between a range we know about the parameter, e.g. 0,001 - 0,01. this is done for each parameter that needs to be guessed.
If you can run a lot of full simulations (tens of thousands) you can try black-box optimization. I'm not sure if black-box is the right approach for you (I'm not familiar with particle filters). But if it is, CMA-ES is a clear match here and easy to try.
You have to specify a loss function (e.g. the total sum of square errors for a whole simulation) and an initial guess (mean and sigma) for your parameters. Among black-box algorithms CMA-ES is a well-established baseline. It is hard to beat if you have only few (at most a few hundreds) continuous parameters and no gradient information. However anything less black-box-ish that can e.g. exploit the ODE nature of your problem will do better.

When do you control for initial judgment vs. take the difference between first and second judgment?

I am analyzing data for my dissertation, and I have participants see initial information, make judgments, see additional information, and make the same judgments again. I don't know how or if I need to control for these initial judgments when doing analyses about the second judgments.
I understand that the first judgments cannot be covariates because they are affected by my IV/manipulations. Also, I only expect the second judgments to change for some conditions, so if I use the difference between first and second judgments, I only expect that to change for two of my four conditions.
A common way to handle comparisons between the first and second judgments would be as paired data. If condition is a between-subjects factor, then a between x within design using repeated measures ANOVA or for judgments where the scaling isn't such that you're willing to make assumptions necessary for linear models, using a generalized linear model setup that handles repeated measurements might be applicable. In SPSS for linear models, you can set up the judgments as two different variables and condition as a third, then use Analyze>General Linear Models>Repeated Measures. For generalized linear models you can use with generalized estimating equations (GEE) or mixed models, though these require a fair amount of data to be reliable. In the menus, there are Analyze>Generalized Linear Models>Generalized Estimating Equations and Analyze>Mixed Models>Generalized Linear, respectively. Each of these requires data setup for repeated measures to be in the "long" or "narrow" format, where you have a subject ID variable, a time index, the judgment variable, and the condition variable. You'd have two cases per subject, one for each time point.

In a group of correlated variables, how can I deduce which subset of variables best describe the remaining variables?

I have a data set of 60 sensors making 1684 measurements. I wish to decrease the number of sensors used during experiment, and use the remaining sensor data to predict (using machine learning) the removed sensors.
I have had a look at the data (see image) and uncovered several strong correlations between the sensors, which should make it possible to remove X sensors and use the remaining sensors to predict their behaviour.
How can I “score” which set of sensors (X) best predict the remaining set (60-X)?
Are you familiar with Principal Component Analysis (PCA)? It's a child of Analysis of Variance (ANOVA). Dimensionality Reduction is another term to describe this process.
These are usually aimed at a set of inputs that predict a single output, rather than a set of peer measurements. To adapt your case to these methods, I would think that you'd want to begin by considering each of the 60 sensors, in turn, as the "ground truth", to see which ones can be most reliably driven by the remainder. Remove those and repeat the process until you reach your desired threshold of correlation.
I also suggest a genetic method to do this winnowing; perhaps random forests would be of help in this phase.

Regression or Classification? How to determine sample size?

I have a group of instances with n features (numerical) each.
I am resampling my features every X time steps, so each instance has a set of features at t1:tn.
The continues respond variable (e.g. with range 50:100) is only measured every X*z times. (e.g. sampling every minute, respond only every 30) The features might change over time. So might the response.
Now at any time point T i want to map a new instance to the respond range.
In case i did not lose you yet :-)
Do you rather see this as a regression or a multi class classification problem (with discretized respond range)?
In either case, is there a rule of thumb how many instances i will need? In case the instances do not follow the same distribution, (e.g. different response for same set of feature values, can i use clustering to filter / analyze this?)

Resources