Why does "repeated measures" not appear under "analyze - glm" menu in SPSS? - spss

This is my first time using SPSS for doing within-subjects ANOVA. Every tutorial I see tells me that I should go to analyze -> General Linear Model -> Repeated Measures.
The problem is that under "General Linear Model" the only command I see is "Univariate". Why can't I find "Repeated Measures"? I'm using SPSS 21.

As #ttnphns states, you need to obtain and install the Avanced Statistics add-on module.
It includes a range of additional modelling tools like GLMs, mixed models, etc.
SPSS divides up its packages into Base and a range of add-on modules. The add-on modules are often automatically bundled in various packages. For example, a quick look at the gradpack shows a range of different versions some that come with the "advanced statistics" add-on and others that don't:
IBM SPSS Statistics Base GradPack offers beginners the most frequently used procedures for statistical analysis, and includes IBM
SPSS Statistics Base, which provides the foundation for many types of
analyses. [i.e., no advanced stats]
IBM SPSS Statistics Standard GradPack enables intermediate students to use more advanced analytical algorithms and techniques, and
includes IBM SPSS Statistics Base, IBM SPSS Advanced Statistics, IBM
SPSS Regression.

Related

Export math formulas from MoveIt

I want to write article about frameworks like ROS and industrial control operating systems. One of my chapters must contains some mathematical that uses from calculating trajectory. Is there is the way to get math formulas that has been used in MoveIt while calculating movements?
MoveIt is a big ROS package. By default, it uses the OMPL library to do the motion planning. Within Moveit, many different planners from OMPL are accessible (RRT, RRT*, PRM, etc). So if you want mathematical formulas, you'll first need to look for the planner you've been using with MoveIt, and then look for detailed information about it on the web.

Watson Deep Learning: Experiment Builder, Command Line Interface and Python Client - maturity and features

The Watson Machine Learning service provides three options for training deep learning models. The docs list the following:
There are several ways to train models Use one of the following
methods to train your model:
Experiment Builder
Command line interface (CLI)
Python client
I believe these approaches will differ with their (1) maturity and (2) the features they support.
What are the differences in these approaches? To ensure this question meets the quality requirements, can you please provide a objective list of the differences? Providing your answer as a community wiki answer will also allow the answer to be updated over time when the list changes.
If you feel this question is not a good fit for stack overflow, please provide a comment listing why and I will do my best to improve it.
The reasons to use these techniques depends on a user's skillset and how they are fitting the training/monitoring/deploying steps into their workflow:
Command Line Interface (CLI)
The CLI is useful for quick and random access to details about your training runs. It's also useful if you're building a data science workflow using shell scripts.
Python Library
WML's python library allows users to integrate their model training+deployment into a programmatic workflow. It can be used both within notebooks as well as via IDEs. The library has become the most widely used way for executing batch training experiments.
Experiment Builder UI
This is the "easy button" for executing batch training experiments within Watson Studio. It's a quick way to learn the basics of the batch training capabilities in Watson Studio. At present, it's not expected that data scientists would use Experiment Builder as their primary way of starting batch training experiments. Perhaps as Model Builder matures, this could change but the Python library is more flexible for integrating into production workflows.

Find startup's industry from its description

I am using AngelList DB to categorize startups based on their industries since these startups are categorized based on community input which is misleading most of the time.
My business objective is to extract keywords that indicate to which industry this specific startup belongs to then map it to one of the industries specified in LinkedIn sheet https://developer.linkedin.com/docs/reference/industry-codes
I experimented with Azure Machine learning, where I pushed 300 startups descriptions and analyzed the keyword extraction was pretty bad and was not even close to what I am trying to achieve.
I would like to know how data scientists will approach this problem? where should I look? and where I should not? is keyword analysis tools (like Google Adwords keyword planner is a viable option)
Using Text Classification...
To be able to treat this as a classification problem, you need a training set, which is a set of AngelList entries that are labeled with correct LinkedIn categories. This can be done manually, or you can hire some Mechanical Turks to do the job for you.
Since you have ~150 categories, I'd imagine you need at least 20-30* AngelList entries for each of them. So your training set will be {input: angellist_description, result: linkedin_id}
After that, you need to dig through text classification techniques to try and optimize the accuracy/precision of your results. The book "Taming Text" has a full chapter on text classification. And a good tool to implement a text-based classifier would be Apache Solr or Apache Lucene.
* 20-30 is a quick personal estimate and not based on a scientific method. You can look up some methods online for a good estimation method.
Using Text Clustering.
Step #1
Use text clustering to extract main 'topics' from all the descriptions. (Carrot2 can be helpful here)
Input corpus of all descriptions
Process: Text Clustering using Carrot2
Output each document will be labeled with a topic
Step #2
Manually map the extracted topics into LinkedIn's categories.
Step #3
Use the output of the first two steps to traverse from company -> extracted topic -> linkedin category

System identification toolbox vs. Econometrics toolbox in time series analysis

I'm doing time series analysis and I want to build an ARIMAX-model for my data. I was just curious if someone could give me any recommendations on whether to use System Ide. or Econometrics toolbox in Matlab? Which one would you prefer for general time series analysis?
jjepsuomi,
You get what you pay for. What you are trying to do is really complicated. I would suggest not trying to reinvent the wheel, but buying software that can handle the complexities of denominator structure on your causals plus outliers like pulses, level shifts, changes in trend and seasonality. I would recommend looking at SAS, Autobox, SPSS. We developed Autobox.
As of MATLAB 2013b, you cannot compile ARMAX() of the System Identification Toolbox into a standalone application. As this was no-go for us we moved to arima() of the Econometrics Toolbox.

Disease named entity recognition

I have a bunch of text documents that describe diseases. Those documents are in most cases quite short and often only contain a single sentence. An example is given here:
Primary pulmonary hypertension is a progressive disease in which widespread occlusion of the smallest pulmonary arteries leads to increased pulmonary vascular resistance, and subsequently right ventricular failure.
What I need is a tool that finds all disease terms (e.g. "pulmonary hypertension" in this case) in the sentences and maps them to a controlled vocabulary like MeSH.
Thanks in advance for your answers!
Here are two pipelines that are specifically designed for medical document parsing:
Apache cTAKES
NLM's MetaMap
Both use UMLS, the unified medical language system, and thus require that you have a (free) license. Both are Java and more or less easy to set up.
See http://www.ebi.ac.uk/webservices/whatizit/info.jsf
Whatizit is a text processing system that allows you to do textmining
tasks on text. The tasks come defined by the pipelines in the drop
down list of the above window and the text can be pasted in the text
area.
You could also ask biostars: http://www.biostars.org/show/questions/
there are many tools to do that. some popular ones:
NLTK (python)
LingPipe (java)
Stanford NER (java)
OpenCalais (web service)
Illinois NER (java)
most of them come with some predefined models, i.e. they've already been trained on some general datasets (news articles, etc.). however, your texts are pretty specific, so you might want to first constitute a corpus and re-train one of those tools, in order to adjust it to your data.
more simply, as a first test, you can try a dictionary-based approach: design a list of entity names, and perform some exact or approximate matching. for instance, this operation is decribed in LingPipe's tutorial.
Open Targets has a module for this as part of LINK. It's not meant to be used directly so it might require some hacking and tinkering, but it's the most complete medical NER (named entity recognition) tool I've found for python. For more info, read their blog post.
a bash script that has as example a lexicon generated from the disease ontology:
https://github.com/lasigeBioTM/MER

Resources