how to set unstructured covariance matrix for random effect in linear mixed models in spss - spss

I would like to ask how to set unstructured covariance matrix for random effect in linear mixed models in SPSS. When I do so in my SPSS (version 26) it is showing me in the output (in model dimension table) that covariance structure for random effects is "Identity" and giving me the bad results compare to the book of Andy Field, where he uses this method. It could be possible, that the versions of SPSS are different as SPSS writes me "As of version 11.5, the syntax rules for the RANDOM subcommand have changed. Your command syntax may yield results that differ from those produced by prior versions. If you are using version 11 syntax, please consult the current syntax reference guide for more information." Thus I am looking for the solution how to set unstructured covariance matrix for random effect in linear mixed models to really get "Unstructured" covariance structure and not "Identity" and thus to get the expected output.
Thank you very much for any help.

If you're using the syntax, the COVTYPE specification on the RANDOM subcommand is where you would specify this, using the keyword "UN" to do so. In the dialog boxes, the Random dialog lets you specify this with a drop down menu.
If you're specifying unstructured and the program is changing it to identity on you, it would indicate that the random effect of interest only has one parameter it can fit.
The message about versions 11 and 11.5 is not relevant at this point.

Related

How to deal with negative numbers while creating building the log features?

I have some questions on a data set for ML::-->
so, there are only 2 features input and output for my data which has 1700 examples, And input and output are having non-linear relationship ship..they are not normally distributed respectfully, and the scatter is shown below..what can we say is the relationship between them? How to approach a solution to this kind of problem? how to create features? - I have built features like log and sqrt of input gave me a good correlation with output but this function doesn't apply for negative no.'s so what to do for -ve no.'s?enter image description here
I have tried using cubicroot(1/3) but not badly correlated with output.

How to seek for bigram similarity in gensim word2vec model

Here I have a word2vec model, suppose I use the google-news-300 model
import gensim.downloader as api
word2vec_model300 = api.load('word2vec-google-news-300')
I want to find the similar words for "AI" or "artifical intelligence", so I want to write
word2vec_model300.most_similar("artifical intelligence")
and I got errors
KeyError: "word 'artifical intelligence' not in vocabulary"
So what is the right way to extract similar words for bigram words?
Thanks in advance!
At one level, when a word-token isn't in a fixed set of word-vectors, the creators of that set of word-vectors chose not to train/model that word. So, anything you do will only be a crude workaround for its absence.
Note, though, that when Google prepared those vectors – based on a dataset of news articles from before 2012 – they also ran some statistical multigram-combinations on it, creating multigrams with connecting _ characters. So, first check if a vector for 'artificial_intelligence' might be present.
If it isn't, you could try other rough workarounds like averaging together the vectors for 'artificial' and 'intelligence' – though of course that won't really be what people mean by the distinct combination of those words, just meanings suggested by the independent words.
The Gensim .most_similar() method can take either a raw vectors you've created by operations such as averaging, or even a list of multiple words which it will average for you, as arguments via its explicit keyword positive parameter. For example:
word2vec_model300.most_similar(positive=[average_vector])
...or...
word2vec_model300.most_similar(positive=['artificial', 'intelligence'])
Finally, though Google's old vectors are handy, they're a bit old now, & from a particular domain (popular news articles) where senses may not match tose used in other domains (or more recently). So you may want to seek alternate vectors, or train your own if you have sufficient data from your area of interest, to have apprpriate meanings – including vectors for any particular multigrams you choose to tokenize in your data.

What happens to the positional encodings in the output of of the Transformer model?

I've been learning about the new popular Transformer model, which can be used for sequence-to-sequence language applications. I am considering an application of time-series modeling, which is not necessarily language modeling. Thus I am modeling where the output layer maybe is not a probability, but could perhaps be a prediction of the next value of the time series.
If I consider the original language model presented in the paper (see Figure 1), we notice that positional encodings are applied to the embedded input data, however there is no indication of a position in the output. The output simply gives probabilities for value at the "next" time step. To me it seems like something is being lost here. The output assumes an iterative process, where the "next" output is just next because it is next. However in the input we feel the need to insert some positional information with the positional encodings. I would think we should also be interested in the positional encodings of the output as well. Is there a way to recover them?
This problem becomes more pronounced if we consider non-uniformly sampled time series data. This is really what I am interested in. It would be interesting to use non-uniformly sampled time series as input and predict the "next" value of the time series, where we also get the time position of that prediction. This comes down to somehow recovering the positional information from that output value. Since the positional encoding of the input is added to the input, it is not trivial how to extract this positional information from the output, perhaps it should be called "positional decoding".
To sumarize, my question is, what happens to the positional information in the output? Is it still there but I am just missing it? Also, does anyone see a straightforward way of recovering this data if not immediately available by the model?
Thanks

Natural Language Processing techniques for understanding contextual words

Take the following sentence:
I'm going to change the light bulb
The meaning of change means replace, as in someone is going to replace the light bulb. This could easily be solved by using a dictionary api or something similar. However, the following sentences
I need to go the bank to change some currency
You need to change your screen brightness
The first sentence does not mean replace anymore, it means Exchangeand the second sentence, change means adjust.
If you were trying to understand the meaning of change in this situation, what techniques would someone use to extract the correct definition based off of the context of the sentence? What is what I'm trying to do called?
Keep in mind, the input would only be one sentence. So something like:
Screen brightness is typically too bright on most peoples computers.
People need to change the brightness to have healthier eyes.
Is not what I'm trying to solve, because you can use the previous sentence to set the context. Also this would be for lots of different words, not just the word change.
Appreciate the suggestions.
Edit: I'm aware that various embedding models can help gain insight on this problem. If this is your answer, how do you interpret the word embedding that is returned? These arrays can be upwards of 500+ in length which isn't practical to interpret.
What you're trying to do is called Word Sense Disambiguation. It's been a subject of research for many years, and while probably not the most popular problem it remains a topic of active research. Even now, just picking the most common sense of a word is a strong baseline.
Word embeddings may be useful but their use is orthogonal to what you're trying to do here.
Here's a bit of example code from pywsd, a Python library with implementations of some classical techniques:
>>> from pywsd.lesk import simple_lesk
>>> sent = 'I went to the bank to deposit my money'
>>> ambiguous = 'bank'
>>> answer = simple_lesk(sent, ambiguous, pos='n')
>>> print answer
Synset('depository_financial_institution.n.01')
>>> print answer.definition()
'a financial institution that accepts deposits and channels the money into lending activities'
The methods are mostly kind of old and I can't speak for their quality but it's a good starting point at least.
Word senses are usually going to come from WordNet.
I don't know how useful this is but from my POV, word vector embeddings are naturally separated and the position in the sample space is closely related to different uses of the word. However like you said often a word may be used in several contexts.
To Solve this purpose, generally encoding techniques that utilise the context like continuous bag of words, or continous skip gram models are used for classification of the usage of word in a particular context like change for either exchange or adjust. This very idea is applied in LSTM based architectures as well or RNNs where the context is preserved over input sequences.
The interpretation of word-vectors isn't practical from a visualisation point of view, but only from 'relative distance' point of view with other words in the sample space. Another way is to maintain a matrix of the corpus with contextual uses being represented for the words in that matrix.
In fact there's a neural network that utilises bidirectional language model to first predict the upcoming word then at the end of the sentence goes back and tries to predict the previous word. It's called ELMo. You should go through the paper.ELMo Paper and this blog
Naturally the model learns from representative examples. So the better training set you give with the diverse uses of the same word, the better model can learn to utilise context to attach meaning to the word. Often this is what people use to solve their specific cases by using domain centric training data.
I think these could be helpful:
Efficient Estimation of Word Representations in
Vector Space
Pretrained language models like BERT could be useful for this as mentioned in another answer. Those models generate a representation based on the context.
The recent pretrained language models use wordpieces but spaCy has an implementation that aligns those to natural language tokens. There is a possibility then for example to check the similarity of different tokens based on the context. An example from https://explosion.ai/blog/spacy-transformers
import spacy
import torch
import numpy
nlp = spacy.load("en_trf_bertbaseuncased_lg")
apple1 = nlp("Apple shares rose on the news.")
apple2 = nlp("Apple sold fewer iPhones this quarter.")
apple3 = nlp("Apple pie is delicious.")
print(apple1[0].similarity(apple2[0])) # 0.73428553
print(apple1[0].similarity(apple3[0])) # 0.43365782

How IBM SPSS calculates exact p value for Pearson Chi-Square statistic?

Usually, I use Fisher's exact test p value when the sample size is small for Pearson chi-square test. Then I realized that IBM SPSS reports exact p values for all the tests (i.e. Pearson Chi-Square, Likelihood ratio, Fisher's exact test, Linear-by-Linear association).
I especially wonder if anyone knows how IBM SPSS calculates exact p value for Pearson Chi-Square statistic (here p=0.042). I also realized Fisher's Exact Test has a test statistic value (here 6.143), which surprises me, since (as far as I know) this test calculates p value directly from the hypergeometric distribution.
I also read the IBM SPSS Exact Tests manual, but I'm still without a clue (http://www.sussex.ac.uk/its/pdfs/SPSS_Exact_Tests_21.pdf).
Any help would be appreciated.
If you request chi-square statistics in CROSSTABS for a 2x2 table, you also get the Fisher exact test, whether or not you have the Exact Tests module and request exact results in that dialog box (or the subcommand when using command syntax). That version provides only the exact one- and two-sided hypergeometric probabilities, computed by native SPSS Statistics code.
If you have the Exact Tests module, you have access to a much broader set of exact tests provided through code from Cytel, the makers of StatXact.
The value listed for the Fisher Exact Test (which is the Freeman-Halton extension of Fisher's test, sometimes referred to as the Fisher-Freeman-Halton test) is a transformation of the hypergeometric probability that is scaled to have an asymptotic chi-square distribution under the null hypothesis with (R-1)(C-1) degrees of freedom. The full formula can be found on page 139 of the following paper:
Mehta, C. R. (1994). The exact analysis of contingency tables in medical research. Statistical Methods in Medical Research, 3, 135-156.

Resources