Problems on running DCC GARCH model in R? - time-series

I am doing my research paper based on the DCC GARCH model for an index and real effective exchange rate for G3 currencies. I am investigating the spillover effect of the index on the real effective exchange rate (REER) for USD, JPY, and EUR. I have encountered some problems:
First difference was taken for the index and log return for REER. Before starting the DCC I did an arch test and I found out there is no arch effect on my USD REER and EUR REER. Do they need to have an arch effect before doing any multivariate Garch?
I ignored the Arch effect problem and did DCC on each country's REER. The result turned out normally for (index, USD) and (index, JPY) but not (index, EUR). For (index, EUR), I got the following error:
Error in solve.default (A): system is computationally singular: reciprocal condition number = 4.2122e-17. Is there any way that I can fix this issue? Will I encounter the same problem if I run it with these 3 currencies together (index, USD, JPY, EUR) instead of running separately?
Thank you!

Related

Ran a MANOVA where Pillai's/Wilks isn't significant, but one of the DVs is very significant in my output table of between-subjects effects

I'm a stats newb and was told by my professor to run a MANOVA for something I was checking out. Basically, I wanted to see if there was an interaction between ethnicity and a certain quadrant grouping for a set of outcome variables that are subscales of an overall measure (ders_tot).
An ANCOVA (one DV) already found an interaction between ethnicity and the quadrant grouping for ders_tot.
My MANOVA output is showing me that with Pillai's/Wilks there is no significance (p = .098 for both), but in SPSS there is also a table of between-subjects effects automatically generated that indicates strong interaction significance for one particular outcome variable (p = .003). The other DVs are far from significance (some as high as p = .27 or p = .66).
Is my MANOVA significance (or lack thereof) being seriously skewed by the highly nonsignificant variables? Am I still "allowed" to run analysis on that one particular variable included in the MANOVA that suggests strong significance? I also have data viz/chart output that makes a strong case for analyzing that particular variable.
(EDIT: BELOW PROBLEM HAS BEEN FIXED)
[Also, I've noticed that one of my covariates is always being run in SPSS with 1 df when it should be 2. I've triple checked the variable type and added labels and all that, and can't get it to run appropriately. When I run the same analysis in R, df = 2. This isn't affecting my sig. findings by much, but it's driving me crazy!]

LDA: Coherence Values using u_mass v c_v

I am currently attempting to record and graph coherence scores for various topic number values in order to determine the number of topics that would be best for my corpus. After several trials using u_mass, the data proved to be inconclusive since the scores don't plateau around a specific topic number. I'm aware that CV ranges from -14 to 14 when using u_mass, however my values range from -2 to -1 and selecting an accurate topic number is not possible. Due to these issues, I attempted to use c_v instead of u_mass but I receive the following error:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
This is my code for computing the coherence value
cm = CoherenceModel(model=ldamodel, texts=texts, dictionary=dictionary,coherence='c_v')
print("THIS IS THE COHERENCE VALUE ")
coherence = cm.get_coherence()
print(coherence)
If anyone could provide assistance in resolving my issues for either c_v or u_mass, it would be greatly appreciated! Thank you!

Time series- Not periodic, despite having included frequency

This is actually part of my thesis research, where I have to run a time series analysis on pollution and economic growth of a single country.
I have data of over 144 years of the two variables with each value representing a single year. I imported, set the values as numeric and attached the dataset through the console and ran:
ts_gdp= (data=`GDP per capita, start=1871,end=2014,frequency=1, names=gdp)
I get to see all the values for the first variable and then follow up with the stl() but I get this error. Any clues why this shows up, although I have set the frequency=1, which is the number of observations for the unit of time, in this case a year? Thank you in advance!
Error in stl(GDP, s.window = "periodic") :
series is not periodic or has less than two periods

General Linear Model - Repeated Measures with Covariates, Estimated Marginal Means are not adjusting? Bug?

I am running a Repeated Measures two-way ANCOVA. The model produces an Estimated Marginal Means table, but the values are exactly the same (to the hundredths decimal place) as the Means in the descriptive statistics, despite there being a note at the bottom of the EMM table indicating that "the covariates appearing in the model are evaluated at the following values:..."
Is this a bug, or could I be doing something wrong?
Update:
Responding to question below, I should note that I used the drop down menus to run the analysis; however, this is the code that is used when I 'paste' the code.
DATASET ACTIVATE DataSet1.
GLM FT10 FT11 FT12 FT13 FT14 FT15 FT16 FT17 FT18 FT19 FT110 FT111 FT20 FT21 FT22 FT23 FT24 FT25 FT26 FT27 FT28 FT29 FT210 FT211 WITH SpatialScore FPSRTScore LDMean VGTotal
/WSFACTOR=Matching 2 Polynomial Trial 12 Polynomial
/METHOD=SSTYPE(3)
/PLOT=PROFILE(Trial*Matching) TYPE=LINE ERRORBAR=CI MEANREFERENCE=NO AXIS=AUTO
/EMMEANS=TABLES(OVERALL) WITH(SpatialScore=MEAN FPSRTScore=MEAN LDMean=MEAN VGTotal=MEAN)
/EMMEANS=TABLES(Matching) WITH(SpatialScore=MEAN FPSRTScore=MEAN LDMean=MEAN VGTotal=MEAN)COMPARE ADJ(BONFERRONI)
/EMMEANS=TABLES(Trial) WITH(SpatialScore=MEAN FPSRTScore=MEAN LDMean=MEAN VGTotal=MEAN)COMPARE ADJ(BONFERRONI)
/EMMEANS=TABLES(Matching*Trial) WITH(SpatialScore=MEAN FPSRTScore=MEAN LDMean=MEAN VGTotal=MEAN)
/PRINT=DESCRIPTIVE ETASQ
/CRITERIA=ALPHA(.05)
/WSDESIGN=Matching Trial Matching*Trial
/DESIGN=SpatialScore FPSRTScore LDMean VGTotal.
This is expected behavior. The reason that the EMMEANS don't differ from the observed means is that the covariate adjustment is done at the cell level in terms of between-subjects effects, and you have only one cell because you don't have any between-subjects factors.

Is there a cleverer Ruby algorithm than brute-force for finding correlation in multidimensional data?

My platform here is Ruby - a webapp using Rails 3.2 in particular.
I'm trying to match objects (people) based on their ratings for certain items. People may rate all, some, or none of the same items as other people. Ratings are integers between 0 and 5. The number of items available to rate, and the number of users, can both be considered to be non-trivial.
A quick illustration -
The brute-force approach is to iterate through all people, calculating differences for each item. In Ruby-flavoured pseudo-code -
MATCHES = {}
for each (PERSON in (people except USER)) do
for each (RATING that PERSON has made) do
if (USER has rated the item that RATING refers to) do
MATCHES[PERSON's id] += difference between PERSON's rating and USER's rating
end
end
end
lowest values in MATCHES are the best matches for USER
The problem here being that as the number of items, ratings, and people increase, this code will take a very significant time to run, and ignoring caching for now, this is code that has to run a lot, since this matching is the primary function of my app.
I'm open to cleverer algorithms and cleverer databases to achieve this, but doing it algorithmically and as such allowing me to keep everything in MySQL or PostgreSQL would make my life a lot easier. The only thing I'd say is that the data does need to persist.
If any more detail would help, please feel free to ask. Any assistance greatly appreciated!
Check out the KD-Tree. It's specifically designed to speed up neighbour-finding in N-Dimensional spaces, like your rating system (Person 1 is 3 units along the X axis, 4 units along the Y axis, and so on).
You'll likely have to do this in an actual programming language. There are spatial indexes for some DBs, but they're usually designed for geographic work, like PostGIS (which uses GiST indexing), and only support two or three dimensions.
That said, I did find this tantalizing blog post on PostGIS. I was then unable to find any other references to this, but maybe your luck will be better than mine...
Hope that helps!
Technically your task is matching long strings made out of characters of a 5 letter alphabet. This kind of stuff is researched extensively in the area of computational biology. (Typically with 4 letter alphabets). If you do not know the book http://www.amazon.com/Algorithms-Strings-Trees-Sequences-Computational/dp/0521585198 then you might want to get hold of a copy. IMHO this is THE standard book on fuzzy matching / scoring of sequences.
Is your data sparse? With rating, most of the time not every user rates every object.
Naively comparing each object to every other is O(n*n*d), where d is the number of operations. However, a key trick of all the Hadoop solutions is to transpose the matrix, and work only on the non-zero values in the columns. Assuming that your sparsity is s=0.01, this reduces the runtime to O(d*n*s*n*s), i.e. by a factor of s*s. So if your sparsity is 1 out of 100, your computation will be theoretically 10000 times faster.
Note that the resulting data will still be a O(n*n) distance matrix, so strictl speaking the problem is still quadratic.
The way to beat the quadratic factor is to use index structures. The k-d-tree has already been mentioned, but I'm not aware of a version for categorical / discrete data and missing values. Indexing such data is not very well researched AFAICT.

Resources