spss custom tables crashing when row matches column - spss

I've defined a function for running batches of custom tables:
DEFINE !xtables (myvars=!CMDEND)
CTABLES
/VLABELS VARIABLES=!myvars retailer total DISPLAY=LABEL
/TABLE !myvars [C][COLPCT.COUNT PCT40.0, TOTALS[UCOUNT F40.0]] BY retailer [c] + total [c]
/SLABELS POSITION=ROW
/CRITERIA CILEVEL=95
/CATEGORIES VARIABLES=!myvars ORDER=D KEY=COLPCT.COUNT (!myvars) EMPTY=INCLUDE TOTAL=YES LABEL='Base' POSITION=AFTER
/COMPARETEST TYPE=PROP ALPHA=.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO
!ENDDEFINE.
I can then run a series for commands to run these in one batch.
!XTABLES MYVARS=q1.
!XTABLES MYVARS=q2.
!XTABLES MYVARS=q3.
However, if a table has the same row and column, Custom Tables freezes:
!XTABLES MYVARS=retailer.
The culprit appears to be SLABELS. I hadn't encountered this problem before v24.

I tried replicating a CTABLES spec as close as possible to yours and found that VLABELSdoes not like the same variable specified twice.
GET FILE="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
CTABLES /VLABELS VARIABLES=Gender Gender DISPLAY=LABEL
/TABLE Gender[c][COLPCT.COUNT PCT40.0, TOTALS[UCOUNT F40.0]]
BY Gender[c] /SLABELS POSITION=ROW
/CATEGORIES VARIABLES=Gender ORDER=D KEY=COLPCT.COUNT(Gender) .
Which yields an error message:
VLABELS: Text GENDER. The same keyword, option, or subcommand is used more than once.

The macro has a parmeter named MYVARS, which suggests that more than one variable can be listed, however, if you do that, it will generate an invalid command. Something else to watch out for. I can see the infinite loop in V24. In V23, an error message is produced.

Related

Visualize sum of column percentage for multiple response set variables

I'm trying to understand how to visualise the sum of column percentages in some tabulations of multiple variables.
suppose that i have defined the variable $q12 as a multiple response set of categorical values of the variables sq12m1 sq12m2 sq12m3 sq12m4 sq12m5.
i could have cases with values only in sq12m1 or cases with values in all of those.
if i want to see how many times any brand appear in any of those sq12m1 to sq12m5 i am using this:
CTABLES
/VLABELS VARIABLES=$q12 DISPLAY=DEFAULT
/TABLE $q12 [C][COUNT F40.0, COLPCT.COUNT PCT40.1]
/CATEGORIES VARIABLES=$q12 ORDER=A KEY=VALUE EMPTY=INCLUDE TOTAL=YES POSITION=AFTER
MISSING=EXCLUDE.
and it will generate this:
how can i sum the column percentages? using this syntax the total is always 100%, i would like to visualise the sum (which in this case is 215.10%) which represents the average number of mentions...
do you know how to do it?
thanks!!!
Only one thing you need to change in your syntax, in the /TABLE sub-command:COLPCT.RESPONSES.COUNT instead of COLPCT.COUNT:
CTABLES
/VLABELS VARIABLES=$q12 DISPLAY=DEFAULT
/TABLE $q12 [C][COUNT F40.0, COLPCT.RESPONSES.COUNT PCT40.1]
/CATEGORIES VARIABLES=$q12 ORDER=A KEY=VALUE EMPTY=INCLUDE TOTAL=YES POSITION=AFTER
MISSING=EXCLUDE.

How to count instances of text

I have a list of email addresses in SPSS. I'm trying to write syntax to count how many times each email address appears.
For instance:
In my desired output, if johndoe#aol.com appears in the data 3 times, I want all instances of his email to show a 3 in my new column.
I know I can write syntax to have it count (ie johndoe#aol.com will be assigned 1 the first time, then 2 then 3)... but this is not what I want.
Thanks!
Steps to do this:
Sort cases by email.
Get the counts using the Aggregate command.
Use the Identify Duplicate Cases command to generate an indicator of whether a given email is the first of its kind in the file.
Select cases that aren't the first with that particular email.
All four of those commands are in the Data menu in the GUI. Syntax to do the whole thing:
SORT CASES BY Email.
*This will create a new variable N_EMAIL with the counts. It will appear for every case.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/PRESORTED
/BREAK=Email
/N_EMAIL=N.
*Now we generate a "PrimaryFirst" indicator showing whether a given case is the first instance of its email.
MATCH FILES
/FILE=*
/BY Email
/FIRST=PrimaryFirst
/LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE MatchSequence=MatchSequence+1.
END IF.
LEAVE MatchSequence.
FORMATS MatchSequence (f7).
COMPUTE InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
/FILE=*
/DROP=PrimaryLast InDupGrp MatchSequence.
EXECUTE.
*Filter out duplicate cases.
SELECT IF PrimaryFirst = 1.
EXECUTE.
*Final cleanup.
DELETE VARIABLES PrimaryFirst.
Just run this:
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=EmailAddress /num_instances=N.
A new column will appear in the dataset called num_instances (you can of course select another name) which will have the desired count appear in all instances of each Email address.

How to concatenate three columns into one and obtain count of unique entries among them using Cypher neo4j?

I can query using Cypher in Neo4j from the Panama database the countries of three types of identity holders (I define that term) namely Entities (companies), officers (shareholders) and Intermediaries (middle companies) as three attributes/columns. Each column has single or double entries separated by colon (eg: British Virgin Islands;Russia). We want to concatenate the countries in these columns into a unique set of countries and hence obtain the count of the number of countries as new attribute.
For this, I tried the following code from my understanding of Cypher:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
SET BEZ4.countries= (BEZ1.countries+","+BEZ2.countries+","+BEZ3.countries)
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,DISTINCT count(BEZ4.countries) AS NoofConnections
The relevant part is the SET statement in the 7th line and the DISTINCT count in the last line. The code shows error which makes no sense to me: Invalid input 'u': expected 'n/N'. I guess it means to use COLLECT probably but we tried that as well and it shows the error vice-versa'd between 'u' and 'n'. Please help us obtain the output that we want, it makes our job hell lot easy. Thanks in advance!
EDIT: Considering I didn't define variable as suggested by #Cybersam, I tried the command CREATE as following but it shows the error "Invalid input 'R':" for the command RETURN. This is unfathomable for me. Help really needed, thank you.
CODE 2:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-
[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND
BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved",
"Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections{countries:
split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries),";")
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress, AS TOTAL, collect (DISTINCT
COUNT(p.countries)) AS NumberofConnections
Lines 8 and 9 are the ones new and to be in examination.
First Query
You never defined the identifier BEZ4, so you cannot set a property on it.
Second Query (which should have been posted in a separate question):
You have several typos and a syntax error.
This query should not get an error (but you will have to determine if it does what you want):
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)- [:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR (BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections {countries: split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries), ";")})
RETURN BEZ3.countries AS IntermediaryCountries,
BEZ3.name AS Intermediaryname,
BEZ2.countries AS OfficerCountries ,
BEZ2.name AS Officername,
BEZ1.countries as EntityCountries,
BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,
SIZE(p.countries) AS NumberofConnections;
Problems with the original:
The CREATE clause was missing a closing } and also a closing ).
The RETURN clause had a dangling AS TOTAL term.
collect (DISTINCT COUNT(p.countries)) was attempting to perform nested aggregation, which is not supported. In any case, even if it had worked, it probably would not have returned what you wanted. I suspect that you actually wanted the size of the p.countries collection, so that is what I used in my query.

SPSS output modify 'select all except (tables)' still selects custom tables

When I run this syntax in SPSS:
output modify
/select all except (Tables)
/deleteobject delete=yes.
my custom tables still get deleted. Do you have any idea whether this is a bug or I am doing something wrong?
Many thanks in advance!
TABLES is a generic term for all objects of type table, which includes custom tables output. You can do what you want with OMS using syntax like this.
oms select all /exceptif subtypes='Custom Table'/destination viewer=no.
CTABLES
/VLABELS VARIABLES=educ DISPLAY=DEFAULT
/TABLE educ [C][COUNT F40.0]
/CATEGORIES VARIABLES=educ ORDER=A KEY=VALUE EMPTY=INCLUDE MISSING=EXCLUDE
/CRITERIA CILEVEL=95.
DESCRIPTIVES VARIABLES=bdate educ id jobcat jobtime
/STATISTICS=MEAN STDDEV MIN MAX.
omsend.

Listing two or more variables alongside each other

I want an alternative to running frequency for string variables because I also want to get a case number for each of the string value (I have a separate variable for case ID).
After reviewing the string values I will need to find them to recode which is the reason I need to know the case number.
I know that PRINT command should do what I want but I get an error - is there any alternative?
PRINT / id var2 .
EXECUTE.
>Error # 4743. Command name: PRINT
>The line width specified exceeds the output page width or the record length or
>the maximum record length of 2147483647. Reduce the number of variables or
>split the output line into several records.
>Execution of this command stops.
Try the LIST command.
I often use the TEMPORARY commond prior to the LIST command, as often there is only a small select of record of interest I may want to "list"/investigate.
For example, in the below, only to list the records where VAR2 is not a blank string.
TEMP.
SELECT IF (len(VAR2)>0).
LIST ID VAR2.
Alternatively, you could also (but dependent on having CUSTOM TABLES add-on module), do something like below which would get the results into a tabular format also (which may be preferable if then exporting to Excel, for example.
CTABLES /TABLE CTABLES /VLABELS VARIABLES=ALL DISPLAY=NONE
/TABLE A[C]>B[C]
/CATEGORIES VARIABLES=ALL EMPTY=EXCLUDE.

Resources