I'm trying to understand how to visualise the sum of column percentages in some tabulations of multiple variables.
suppose that i have defined the variable $q12 as a multiple response set of categorical values of the variables sq12m1 sq12m2 sq12m3 sq12m4 sq12m5.
i could have cases with values only in sq12m1 or cases with values in all of those.
if i want to see how many times any brand appear in any of those sq12m1 to sq12m5 i am using this:
CTABLES
/VLABELS VARIABLES=$q12 DISPLAY=DEFAULT
/TABLE $q12 [C][COUNT F40.0, COLPCT.COUNT PCT40.1]
/CATEGORIES VARIABLES=$q12 ORDER=A KEY=VALUE EMPTY=INCLUDE TOTAL=YES POSITION=AFTER
MISSING=EXCLUDE.
and it will generate this:
how can i sum the column percentages? using this syntax the total is always 100%, i would like to visualise the sum (which in this case is 215.10%) which represents the average number of mentions...
do you know how to do it?
thanks!!!
Only one thing you need to change in your syntax, in the /TABLE sub-command:COLPCT.RESPONSES.COUNT instead of COLPCT.COUNT:
CTABLES
/VLABELS VARIABLES=$q12 DISPLAY=DEFAULT
/TABLE $q12 [C][COUNT F40.0, COLPCT.RESPONSES.COUNT PCT40.1]
/CATEGORIES VARIABLES=$q12 ORDER=A KEY=VALUE EMPTY=INCLUDE TOTAL=YES POSITION=AFTER
MISSING=EXCLUDE.
I have a list of email addresses in SPSS. I'm trying to write syntax to count how many times each email address appears.
For instance:
In my desired output, if johndoe#aol.com appears in the data 3 times, I want all instances of his email to show a 3 in my new column.
I know I can write syntax to have it count (ie johndoe#aol.com will be assigned 1 the first time, then 2 then 3)... but this is not what I want.
Thanks!
Steps to do this:
Sort cases by email.
Get the counts using the Aggregate command.
Use the Identify Duplicate Cases command to generate an indicator of whether a given email is the first of its kind in the file.
Select cases that aren't the first with that particular email.
All four of those commands are in the Data menu in the GUI. Syntax to do the whole thing:
SORT CASES BY Email.
*This will create a new variable N_EMAIL with the counts. It will appear for every case.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/PRESORTED
/BREAK=Email
/N_EMAIL=N.
*Now we generate a "PrimaryFirst" indicator showing whether a given case is the first instance of its email.
MATCH FILES
/FILE=*
/BY Email
/FIRST=PrimaryFirst
/LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE MatchSequence=MatchSequence+1.
END IF.
LEAVE MatchSequence.
FORMATS MatchSequence (f7).
COMPUTE InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
/FILE=*
/DROP=PrimaryLast InDupGrp MatchSequence.
EXECUTE.
*Filter out duplicate cases.
SELECT IF PrimaryFirst = 1.
EXECUTE.
*Final cleanup.
DELETE VARIABLES PrimaryFirst.
Just run this:
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=EmailAddress /num_instances=N.
A new column will appear in the dataset called num_instances (you can of course select another name) which will have the desired count appear in all instances of each Email address.
I can query using Cypher in Neo4j from the Panama database the countries of three types of identity holders (I define that term) namely Entities (companies), officers (shareholders) and Intermediaries (middle companies) as three attributes/columns. Each column has single or double entries separated by colon (eg: British Virgin Islands;Russia). We want to concatenate the countries in these columns into a unique set of countries and hence obtain the count of the number of countries as new attribute.
For this, I tried the following code from my understanding of Cypher:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
SET BEZ4.countries= (BEZ1.countries+","+BEZ2.countries+","+BEZ3.countries)
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,DISTINCT count(BEZ4.countries) AS NoofConnections
The relevant part is the SET statement in the 7th line and the DISTINCT count in the last line. The code shows error which makes no sense to me: Invalid input 'u': expected 'n/N'. I guess it means to use COLLECT probably but we tried that as well and it shows the error vice-versa'd between 'u' and 'n'. Please help us obtain the output that we want, it makes our job hell lot easy. Thanks in advance!
EDIT: Considering I didn't define variable as suggested by #Cybersam, I tried the command CREATE as following but it shows the error "Invalid input 'R':" for the command RETURN. This is unfathomable for me. Help really needed, thank you.
CODE 2:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-
[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND
BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved",
"Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections{countries:
split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries),";")
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress, AS TOTAL, collect (DISTINCT
COUNT(p.countries)) AS NumberofConnections
Lines 8 and 9 are the ones new and to be in examination.
First Query
You never defined the identifier BEZ4, so you cannot set a property on it.
Second Query (which should have been posted in a separate question):
You have several typos and a syntax error.
This query should not get an error (but you will have to determine if it does what you want):
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)- [:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR (BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections {countries: split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries), ";")})
RETURN BEZ3.countries AS IntermediaryCountries,
BEZ3.name AS Intermediaryname,
BEZ2.countries AS OfficerCountries ,
BEZ2.name AS Officername,
BEZ1.countries as EntityCountries,
BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,
SIZE(p.countries) AS NumberofConnections;
Problems with the original:
The CREATE clause was missing a closing } and also a closing ).
The RETURN clause had a dangling AS TOTAL term.
collect (DISTINCT COUNT(p.countries)) was attempting to perform nested aggregation, which is not supported. In any case, even if it had worked, it probably would not have returned what you wanted. I suspect that you actually wanted the size of the p.countries collection, so that is what I used in my query.
When I run this syntax in SPSS:
output modify
/select all except (Tables)
/deleteobject delete=yes.
my custom tables still get deleted. Do you have any idea whether this is a bug or I am doing something wrong?
Many thanks in advance!
TABLES is a generic term for all objects of type table, which includes custom tables output. You can do what you want with OMS using syntax like this.
oms select all /exceptif subtypes='Custom Table'/destination viewer=no.
CTABLES
/VLABELS VARIABLES=educ DISPLAY=DEFAULT
/TABLE educ [C][COUNT F40.0]
/CATEGORIES VARIABLES=educ ORDER=A KEY=VALUE EMPTY=INCLUDE MISSING=EXCLUDE
/CRITERIA CILEVEL=95.
DESCRIPTIVES VARIABLES=bdate educ id jobcat jobtime
/STATISTICS=MEAN STDDEV MIN MAX.
omsend.
I want an alternative to running frequency for string variables because I also want to get a case number for each of the string value (I have a separate variable for case ID).
After reviewing the string values I will need to find them to recode which is the reason I need to know the case number.
I know that PRINT command should do what I want but I get an error - is there any alternative?
PRINT / id var2 .
EXECUTE.
>Error # 4743. Command name: PRINT
>The line width specified exceeds the output page width or the record length or
>the maximum record length of 2147483647. Reduce the number of variables or
>split the output line into several records.
>Execution of this command stops.
Try the LIST command.
I often use the TEMPORARY commond prior to the LIST command, as often there is only a small select of record of interest I may want to "list"/investigate.
For example, in the below, only to list the records where VAR2 is not a blank string.
TEMP.
SELECT IF (len(VAR2)>0).
LIST ID VAR2.
Alternatively, you could also (but dependent on having CUSTOM TABLES add-on module), do something like below which would get the results into a tabular format also (which may be preferable if then exporting to Excel, for example.
CTABLES /TABLE CTABLES /VLABELS VARIABLES=ALL DISPLAY=NONE
/TABLE A[C]>B[C]
/CATEGORIES VARIABLES=ALL EMPTY=EXCLUDE.