How to select random subset of cases in SPSS based on student number? - spss

I am setting some student assignments where most students will be using SPSS. In order to encourage students to do their own work, I want students to have a partially unique dataset. Thus, I'd like to get each to the open the master data file, and then get the student to run a couple of lines of syntax that produces a unique data file. In pseudo code, I'd like to do something like the following where 12345551234 is a student number:
set random number generator = 12345551234
select 90% random subset ofcases and drop the rest.
What is simple SPSS syntax dropping a subset of cases from the data file?

After playing around I came up with this syntax, but perhaps there are simpler or otherwise better suggestions.
* Replace number below with student number or first 10 numbers of student number.
SET SEED=1234567891.
FILTER OFF.
USE ALL.
SAMPLE .90.
EXECUTE.

Related

How to identify cases that have multiple variables with the same value in SPSS

I have a dataset in which there are multiple variables for various times.
Here is a sample part of the dataset:
I'm trying to identify the number/percentage of cases that have the same value in any of the multiple variables.
For example, if I have a database of teachers who left a school where they worked and there are variables for why the teacher left each school, how would I find out if a teacher left multiple schools for the same reason. So I have reasonleft1, reasonleft2, reasonleft3, up to 20. Each reasonleft has the same coded response options. For example, 1=better opportunity elsewhere, 2=retired, 3=otherwise left workforce, etc. I'm stumped on how to figure out if any case/teacher left multiple schools for the same reason. Which teachers left multiple schools out of the 20 for 1=better opportunity elsewhere, for example.
Thanks!
This can be done in the following two steps:
You need to restructure the dataset so that each "time" appears in a separate row.
Now you can aggregate to count the number of appearances of each reason per person.
The following syntax will do that:
varstocases
/make facilititype from facilititype1_pre facilititype2_pre facilititype3_pre
/make timeinplace from timeinplace1_pre timeinplace2_pre timeinplace3_pre
/make reasonleft from reasonleft1_pre reasonleft2_pre reasonleft3_pre
/index = timeN(reasonleft).
* you should continue the numbering for as much as needed.
dataset declare MyAgg.
aggregate outfile=MyAgg /break=ID reasonleft/Ntimes=n.
At this point you have a new dataset which has the count of each reason for each ID. If you wish you can go back to wide format, and create a column for each reason (the values in each column are the count of times this reason appeared for the ID).
This way:
dataset activate MyAgg.
casestovars /id=ID /index=reasonleft/separator="_".

How to Order by relevance in a SQL query in Rails?

I am using a SQL query for a specified and faster search in my Rails app, currently I have the following query that works wonderfully for gathering the data from the initial search:
SELECT * FROM selected_tables
WHERE field1 LIKE '%#{present_param}'
AND field2 LIKE '%#{present_param2}'
And so on like that, with each LIKE line only appearing if the relevant parameter is present from the form.
So I am now able to get back a large amount of results from this query, but they're not ordered in any helpful way. I need some way of ordering the results based on their relevance to the original user input from the form, but I can't seem to find anything on google about it. Is there a way in SQL (specifically postgresql) that I can order the results based on this?
To be clear, when I say relevance I mean that a given search keyword should be in the title or company name for the result, not just present somewhere in the content.
For example: if you search "Sony" you get Sony Electronics first, not another listing containing Sony somewhere in the middle of its name.
I ended up using a series of Case/When statements that were weighted with various integer scores to apply priority to my results. They work wonderfully and turned out something like this:
SELECT title, company, user, CASE
WHEN upper(company_name) LIKE '%#{word[0].upcase}%' THEN 3
WHEN upper(company_name) LIKE '%#{company_name.upcase}%' THEN 2
ELSE 0 END as score
FROM selected_tables
WHERE company_name LIKE '%#{company_name}%'
ORDER BY score DESC;

Can SPSS treat a collection of Nominal Variables as one variable?

I have a lot of movie data from IMDB and I'm in the middle of cleaning up the data and making it so that 1 row = 1 movie as the database often has multiple records for a single film.
I've restructured the data so that what was a single 'Country' variable with multiple cases for a single film, is now a set of 29 country columns. A single film may have up to 29 countries affiliated with it (most have just 1 or 2).
I plan to do some simple descriptive statistics and expected frequencies, perhaps look for correlations with other variables like genre etc.
Is it possible to have SPSS treat all 29 variables as a single variable? It doesn't matter which of the country variables a country is present in, just that it is present in one of them. For example I might want to find all Indian films, and ask SPSS to check for each row, whether 'India' is in any one of the country variables and return the row if it is present in any of them.
Is this possible, or do I just need to manually instruct SPSS with a list of OR commands whenever I run a query.
There are two types of multiple response sets: multiple dichotomy, which would be 29 yes/no variables as you describe, and multiple category, in which you have a list of arbitrary categories. See the MRSETS command for details.
Once defined, CTABLES can do all your statistical calculations on these, and these sets can also be used in graphics constructed in the Chart Builder or GGRAPH commands.
Don't confuse the sets created by MRSETS with the older MULTIPLE RESPONSE procedure, which is still available. MRSETS definitions persist with the data and are used with CTABLES and GGRAPH only.
With the ANY function, as Andy said above, you would use the individual variables, but you can use TO. So, for example, you could write
COMPUTE FILM7 = ANY(7, f1 to f29)
if you have MC variables. If using the MD structure, you would have to check, say, variable f7 in this example.

Delphi - What Structure allows for SAVING inverted index type of information?

Delphi XE6. Looking to implemented a limited style of search, specifically an edit field for the user to enter a business name which would get looked up. I need to allow the user to enter multiple words, or part of multiple words. For Example, on a business "First Bank of Kansas", user should be able to enter "Fir Kan", and it should return a match. This means an inverted index type of structure. I have some type of list of each unique word, then a (document ID, primary Key ID, etc, which is an integer). I am struggling with WHAT type of structure to make this... I have approximately 250,000 business names, which have 43,500 unique words. Word count will vary from 1 occurrence of a word to several thousand (company, corporation, etc) I have some requirements...
1). Assume the user enters BAN. I need to find ALL words that start with BAN. I need to return BANK, BANKER, etc... This means that whatever structure I use, I have to be able to find BAN and then move to the next alphabetic entry... and keep moving to the next until I find a value that does NOT start with BAN. This eliminates any type of HASH structure, correct?
2). I obviously want this to be fast. HASH is the fastest, but I can't use this, correct? See requirement 1.
3). Each entry in this structure needs to be able to hold a list of integers. If I end up going with a LinkedList, then each element has to hold a list of Integers.
4). I need to be able to save and load this structure. I don't want to have to build it each time I use it.
Whatever I end up with, it appears to have to be a NESTED structure, a higher level list (LinkedList?) with each node being an Integer List.
What am I looking for? What do commercial product use? Outlook, etc have search capabilities.
Every word is linked to a specific set of IDs, each representing a business name, right?.
I recommend using a binary tree data structure because effort for searching is normally log(n), which is quite fast. Especially, if business names are changing at runtime, an AVLTree should do well, although it's quite some work to implement it by yourself. But there should be many ready-to-use units on binary trees all over the internet.
For each successful search for a word in your tree data structure, you should take their list of IDs and aggregate those grouped by the entered word they succeeded for.
As the last step you take all those aggregated lists of IDs and do an intersection.
There should only be IDs left which are fitting to all entered words. Those IDs are referencing the searched business names.

How to Sum calulated fields

I‘d like to ask I question that here that I think would be easy to some people.
Ok I have query that return records of two related tables. (One to many)
In this query I have about 3 to 4 calculated fields that are based on the fields from the 2 tables.
Now I want to have a group by clause for names and sum clause to sum the calculated fields but it ends up in error message saying:
“You tried to execute a query that is not part of aggregate function”
So I decided to just run the query without the totals *(ie no group by , sum etc,,,)
:
And then I created another query that totals my previous query. ( i.e. using group by clause for names and sum for calculated fields… no calculation here) This is fine ( I use to do this) but I don’t like having two queries just to get summary total. Is their any other way of doing this in the design view and create only one query?.
I would very much appreciate.
Thankyou:
JM
Sounds like the query is thinking the calculated fields need to be part of the grouping or something. You might need to look into sub-querying.
Can you post the sql (before and after). It would help in getting an understanding of what the issue is.

Resources