I am currently using the factor analysis in spss using the maximum likelihood method. I need to extract two factors necessarily. Suppose also that we dont need any rotations.
However, the output gives me the following message: "In iteration 25, no local minimum was found and extraction was terminated" So i reduce the number of iterations, using only 7 instead of 25 and it works but i get the warning "Attempted to extract 2 factors. More than 7 iterations required. Extraction was terminated." and i do not receive any fit test results.
What should i do to fix this problem and get the fit test?
We are amateurs and not very familiar with the spss environment,we are using it for the first time. Any help would be appreciated
I'm not sure there's a really good solution to the problem. The FACTOR procedure is only going to provide the test if it deems the solution to have converged. You can paste your FACTOR command into a syntax window and edit it to add the ECONVERGE keyword to the CRITERIA subcommand, with a larger value than the default .001, in hopes that you can get convergence extracting factors with a looser convergence criterion. That's the only possibility I can think of to get the procedure to produce the test in these circumstances. It does involve the danger of not actually having the model reasonably converge prior to stopping though. If you can get convergence with the criterion set to something not too far from the .001 default, it's probably okay, but if you have to increase the criterion dramatically, it's pretty risky.
If you want to try this, set up the factor analysis in the dialog boxes, then click Paste instead of OK. The pasted FACTOR command should look something like:
FACTOR
/VARIABLES varlist
/MISSING LISTWISE
/ANALYSIS varlist
/PRINT INITIAL EXTRACTION
/CRITERIA FACTORS(2) ITERATE(25)
/EXTRACTION ML
/ROTATION NOROTATE.
Add the ECONVERGE keyword and your chosen value to the CRITERIA subcommand:
FACTOR
/VARIABLES col1 col2 col3 col4 col5 col6 col7 col8
/MISSING LISTWISE
/ANALYSIS col1 col2 col3 col4 col5 col6 col7 col8
/PRINT INITIAL EXTRACTION
/CRITERIA FACTORS(2) ITERATE(25) ECONVERGE(.002)
/EXTRACTION ML
/ROTATION NOROTATE.
Then click Run>All to run the command. I've used .002, which is just larger than the default .001. You'll probably have to use something larger to possibly get this to work, but see my comments above.
Related
The problem is in the picture
Question's image:
Question 2
Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemists obtains the dataset below. In the column on the right, kj/mole is the unit measuring the amount of energy released. examples.
You would like to use linear regression (h a(x)=a0+a1 x) to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for a0 and a1? You should be able to select the right answer without actually implementing linear regression.
A) a0=−1780.0, a1=−530.9 B) a0=−569.6, a1=−530.9
C) a0=−1780.0, a1=530.9 D) a0=−569.6, a1=530.9
Since all a0s are negative but two a1s are positive lets figure out the latter first.
As you can see by increasing the number of carbon atoms the energy is become more and more negative, so the relation cannot be positively correlated which rules out options c and d.
Then for the intercept the value that produces the least error is the correct one. For the 1 and 10 (easier to calculate) the outputs are about -2300 and -7000 for a, -1100 and -5900 for b, so one would prefer b over a.
PS: You might be thinking there should be obvious values for a0 and a1 from the data, it's not. The intention of the question is to give you a general understanding of the best fit. Also this way of solving is kinda machine learning as well
I have a very large sample of 11236 cases for each of my two variables (ms and gar). I now want to calculate Spearman's rho correlation with bootstrapping in SPSS.
I figured out the standard syntax for bootstrapping in SPSS with bias corrected and accelerated confidence intervals:
DATASET ACTIVATE DataSet1.
BOOTSTRAP
/SAMPLING METHOD=SIMPLE
/VARIABLES INPUT=ms gar
/CRITERIA CILEVEL=95 CITYPE=BCA NSAMPLES=10000
/MISSING USERMISSING=EXCLUDE.
NONPAR CORR
/VARIABLES=ms gar
/PRINT=SPEARMAN TWOTAIL NOSIG
/MISSING=PAIRWISE.
But this syntax is resampling my 11236 cases 10000 times.
How can i achieve taking a random sample of 106 cases (√11236), calculate Spearman's rho and repeat 10000 times (with new random sample of 106 cases each bootstrap step)?
Use the sample selection procedures - Data > Select Cases. You can specify an approximate or exact random sample or select specific cases. Then run the BOOTSTRAP and NONPAR CORR commands.
I am trying to implement a subroutine in ABAQUS.
It is a very simple non-linear elastic model, in which the Young's modulus depends on the mean pressure, in details, E=3*(1-2*poisson)*p/kap (where, poisson=0.3 is Poisson's coefficient and kap=0.005 is swelling index). The initial stress is 1e5 Pa for sigma11, 22 and 33.
When I run the subroutine , it gives linear behavior with E=3*(1-2*0.3)*(3*1e5/3)/0.05 (which is the Young's modulus calculated with the initial stress). If the initial stress is 0 for all components, it gives us 0 for all calculation because E=3*(1-2*0.3)*(3*0/3)/0.05=0.
I would like to ask if you could help me to solve this problem (define the initial conditions as the previous values for each variables).
There are two widely used formulas (found in most Information retrieval lecture slides on the internet, e.g. at Stanford) for computing the Discounted Cumulative Gain. One of them is, for the DCG at rank p:
This is, in fact:
Because log_2(2) = 1. This means that the so-called "discounted" CG is actually not discounted before the third rank!
The following rankings are therefore not distinguishable by the DCG using this formula: (10,5,1,2,...) and (5,10,1,2,...).
I am guessing that the formula is incorrect and should be:
Note btw that the other very common formula (see wikipedia) has this denominator.
I would not be asking if I hadn't seen this formula in practically all the lectures I found on the internet and even in my own lectures at UCL. Is it not wrong? It would be incredible that an error has propagated from Wikipedia and not been picked up by the professors... Am I wrong then?
I found this paper from Microsoft (see equation 6) which backs up my claim that it is basically a typo start discounting at rank 3 only. When you think of it, it makes no sense at all to not discount the rank 2! The metric would be unable to distinguish the rankings (10, 5, 2) and (5, 10, 2), when the first ranking is better. Note that all the other DCG formulas do discount rank 2 and thus would pick up a difference.
So a "+1" is indeed missing in the log, and it is a typo which has been creeping in a lot of papers and lectures...
I was given a data set with 1,000 variables and have been asked to run Pearson's correlations on the explanatory variables and a binary dependent variable. I generate the correlations using the following code:
correlations /variables = x1 to x500 with y
correlations /variables = x501 to x1000 with y
The resulting output is a table which appears un-sortable in SPSS or other software (e.g. Excel)
x1 Pearson Correlation
p-value
N
-----------------------
x2 Pearson Correlation
p-value
N
-----------------------
.
.
.
-----------------------
xi Pearson Correlation
p-value
N
-----------------------
I want to be able to rank the variables according to Pearson's Correlation and then p-value. Does SPSS have the capability to save the Variable Name, Pearson's Correlation value and p-Value as a table, and then rank them?
I am too used to Stata and R and could not notice anything in the manual. Would a workaround be to run univariate regression models with only one dependent variable 1,000 times and try saving those coefficients?
Thanks!
You can easily pivot the statistics into the columns in the output table, which would give you a sortable arrangement. Try it with a few variables to see how this works. You double click the table to activate it and then Use Pivot > Pivoting Trays to open the controls for pivoting.
To do this for your real data, you will want to capture the table using OMS, creating a new dataset, which you can then sort or do other data manipulation operations. When you create your OMS command, you will want to tell it to pivot the table so that the dataset arrangement is convenient.
Bear in mind that fishing for the highest correlations is likely to give you an overly optimistic view of the predictive power of the top variables.
The NAIVEBAYES procedure (Statistics Server) might be another approach to consider. Check the Command Syntax Reference for details.