Listing two or more variables alongside each other - spss

I want an alternative to running frequency for string variables because I also want to get a case number for each of the string value (I have a separate variable for case ID).
After reviewing the string values I will need to find them to recode which is the reason I need to know the case number.
I know that PRINT command should do what I want but I get an error - is there any alternative?
PRINT / id var2 .
EXECUTE.
>Error # 4743. Command name: PRINT
>The line width specified exceeds the output page width or the record length or
>the maximum record length of 2147483647. Reduce the number of variables or
>split the output line into several records.
>Execution of this command stops.

Try the LIST command.
I often use the TEMPORARY commond prior to the LIST command, as often there is only a small select of record of interest I may want to "list"/investigate.
For example, in the below, only to list the records where VAR2 is not a blank string.
TEMP.
SELECT IF (len(VAR2)>0).
LIST ID VAR2.
Alternatively, you could also (but dependent on having CUSTOM TABLES add-on module), do something like below which would get the results into a tabular format also (which may be preferable if then exporting to Excel, for example.
CTABLES /TABLE CTABLES /VLABELS VARIABLES=ALL DISPLAY=NONE
/TABLE A[C]>B[C]
/CATEGORIES VARIABLES=ALL EMPTY=EXCLUDE.

Related

check for matching rows in csv file ruby

I am very new to ruby and I want to check for rows with the same number in a csv file.
What I am trying to do is go through the input csv file and copy element from the input file to the output file also adding another column called "duplicate" to the output file, then check if a similar phone is already in the output file while copying data from input to output then if the phone already exist, add "dupl" to the row in the duplicate column.
This is what I have.
file=CSV.read('input_file.csv')
output_file=File.open("output2.csv","w")
for row in file
output_file.write(row)
output_file.write("\n")
end
output_file.close
Example:
Phone
(202) 221-1323
(201) 321-0243
(202) 221-1323
(310) 343-4923
output file
Phone
Duplicate
(202) 221-1323
(201) 321-0243
(202) 221-1323
dupl
(310) 343-4923
So basically you want to write the input to output and append a "dupl" on the second occurrence of a duplicate?
Your input to output seems fine. To get the "dupl" flag, simply count the occurrence of each number in the list. If it's more than one, its a duplicate. But since you only want the flag to be shown on the second occurrence just count how often the number appeared up until that point:
lines = CSV.read('input_file.csv')
lines.each_with_index do |l,i|
output_file.write(l + ",")
if lines.take(i).count(l) >= 1
output_file.write("dupl")
end
output_file.write("\n")
end
l is the current line. take(i) is all lines before but not including the current line and count(l) applied to this counts how often the number appeared before if it's more than one, print a "dupl"
There probably is a more efficient answer to this, this is just a quick and easy to understand version.

How to count instances of text

I have a list of email addresses in SPSS. I'm trying to write syntax to count how many times each email address appears.
For instance:
In my desired output, if johndoe#aol.com appears in the data 3 times, I want all instances of his email to show a 3 in my new column.
I know I can write syntax to have it count (ie johndoe#aol.com will be assigned 1 the first time, then 2 then 3)... but this is not what I want.
Thanks!
Steps to do this:
Sort cases by email.
Get the counts using the Aggregate command.
Use the Identify Duplicate Cases command to generate an indicator of whether a given email is the first of its kind in the file.
Select cases that aren't the first with that particular email.
All four of those commands are in the Data menu in the GUI. Syntax to do the whole thing:
SORT CASES BY Email.
*This will create a new variable N_EMAIL with the counts. It will appear for every case.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/PRESORTED
/BREAK=Email
/N_EMAIL=N.
*Now we generate a "PrimaryFirst" indicator showing whether a given case is the first instance of its email.
MATCH FILES
/FILE=*
/BY Email
/FIRST=PrimaryFirst
/LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE MatchSequence=MatchSequence+1.
END IF.
LEAVE MatchSequence.
FORMATS MatchSequence (f7).
COMPUTE InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
/FILE=*
/DROP=PrimaryLast InDupGrp MatchSequence.
EXECUTE.
*Filter out duplicate cases.
SELECT IF PrimaryFirst = 1.
EXECUTE.
*Final cleanup.
DELETE VARIABLES PrimaryFirst.
Just run this:
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=EmailAddress /num_instances=N.
A new column will appear in the dataset called num_instances (you can of course select another name) which will have the desired count appear in all instances of each Email address.

How to limit Jenkins API response to last n build IDs

http://xxx/api/xml?&tree=builds[number,description,result,id,actions[parameters[name,value]]]
Above API returns all the build IDs. Is there a way to limit results to get last 5 build IDS?
The tree query parameter allows you to explicitly specify and retrieve only the information you are looking for, by using an XPath-ish path expression. The value should be a list of property names to include, with sub-properties inside square braces. Try tree=jobs[name],views[name,jobs[name]] to see just a list of jobs (only giving the name) and views (giving the name and jobs they contain). Note: for array-type properties (such as jobs in this example), the name must be given in the original plural, not in the singular as the element would appear in XML (). This will be more natural for e.g. json?tree=jobs[name] anyway: the JSON writer does not do plural-to-singular mangling because arrays are represented explicitly.
For array-type properties, a range specifier is supported. For example, tree=jobs[name]{0,10} would retrieve the name of the first 10 jobs. The range specifier has the following variants:
{M,N}: From the M-th element (inclusive) to the N-th element (exclusive).
{M,}: From the M-th element (inclusive) to the end.
{,N}: From the first element (inclusive) to the N-th element (exclusive). The same as {0,N}.
{N}: Just retrieve the N-th element. The same as {N,N+1}.
Another way to retrieve more data is to use the depth=N query parameter . This retrieves all the data up to the specified depth. Compare depth=0 and depth=1 and see what the difference is for yourself. Also note that data created by a smaller depth value is always a subset of the data created by a bigger depth value.
Because of the size of the data, the depth parameter should really be only used to explore what data Jenkins can return. Once you identify the data you want to retrieve, you can then come up with the tree parameter to exactly specify the data you need.
I'm on version 1.509.4. which doesn't support range specifier.
Source: http://ci.citizensnpcs.co/api/
You can create an xml object with the build numbers via xpath and parse it yourself with via different means.
http://xxx/api/xml?xpath=//build/number&wrapper=meep
Creates an xml that looks like:
<meep>
<number>n</number>
<number>n+1</number>
...
<number>m</number>
</meep>
And will be populated with the build numbers n through m that are currently in jenkins for the specified job in the url. You can substitute anything for the word "meep", that will become the wrapper object for the newly created xml object.
How are you collecting/manipulating the api xml output once you get it? Because there is a solution here for How do I select the last N elements with XPath?. I tried using some of these xpath manipulations but I couldn't get it to work when playing with the url in my browser; it might work if you are doing something else.
When I get the xml object, I happen to manipulate it via shell scripts.
#!/bin/sh
# NOTE: To get the url to work with curl, you need a valid jenkins user and api token
# Put all build numbers in a variable called build_ids
build_ids="$(curl -sL --user ${_jenkins_api_user}:${_jenkins_api_token} \
"${_jenkins_url}/job/${_job_name}/api/xml?xpath=//build/number&wrapper=meep" \
| sed -e 's/<[^>]*>/ /g' | sed -e 's/ / /g')"
# Print the last 5 items with awk
echo "${build_ids}" | awk '{n = 5; for (--n; n >= 0; n--){ printf "%s\t",$(NF-n)} print ""}';
Once you have your xml object you can essentially parse it however you want.
NOTE: I am running Jenkins ver. 2.46.1
Looking at the doco at the raw .../api/ endpoint (on Jenkins 2.60.3) it says
For array-type properties, a range specifier is supported. For
example, tree=jobs[name]{0,10} would retrieve the name of the first 10
jobs. The range specifier has the following variants:
{M,N}: From the M-th element (inclusive) to the N-th element (exclusive).
{M,}: From the M-th element (inclusive) to the end.
{,N}: From the first element (inclusive) to the N-th element (exclusive). The same as {0,N}.
{N}: Just retrieve the N-th element. The same as {N,N+1}.
For the OP's case, you'd append {,5} to the end of the URL to get the first 5 results:
http://xxx/api/xml?&tree=builds[number,description,result,id,actions[parameters[name,value]]]{,5}

spss custom tables crashing when row matches column

I've defined a function for running batches of custom tables:
DEFINE !xtables (myvars=!CMDEND)
CTABLES
/VLABELS VARIABLES=!myvars retailer total DISPLAY=LABEL
/TABLE !myvars [C][COLPCT.COUNT PCT40.0, TOTALS[UCOUNT F40.0]] BY retailer [c] + total [c]
/SLABELS POSITION=ROW
/CRITERIA CILEVEL=95
/CATEGORIES VARIABLES=!myvars ORDER=D KEY=COLPCT.COUNT (!myvars) EMPTY=INCLUDE TOTAL=YES LABEL='Base' POSITION=AFTER
/COMPARETEST TYPE=PROP ALPHA=.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO
!ENDDEFINE.
I can then run a series for commands to run these in one batch.
!XTABLES MYVARS=q1.
!XTABLES MYVARS=q2.
!XTABLES MYVARS=q3.
However, if a table has the same row and column, Custom Tables freezes:
!XTABLES MYVARS=retailer.
The culprit appears to be SLABELS. I hadn't encountered this problem before v24.
I tried replicating a CTABLES spec as close as possible to yours and found that VLABELSdoes not like the same variable specified twice.
GET FILE="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
CTABLES /VLABELS VARIABLES=Gender Gender DISPLAY=LABEL
/TABLE Gender[c][COLPCT.COUNT PCT40.0, TOTALS[UCOUNT F40.0]]
BY Gender[c] /SLABELS POSITION=ROW
/CATEGORIES VARIABLES=Gender ORDER=D KEY=COLPCT.COUNT(Gender) .
Which yields an error message:
VLABELS: Text GENDER. The same keyword, option, or subcommand is used more than once.
The macro has a parmeter named MYVARS, which suggests that more than one variable can be listed, however, if you do that, it will generate an invalid command. Something else to watch out for. I can see the infinite loop in V24. In V23, an error message is produced.

Given the hexadecimal code of a character, how to convert it to the corresponding character in CL program?

Now I need to find a particular entry in a journal using a CL program. The way I use to locate it is to DSPJRNE to put the journal entries in an output file, then use OPNQRYF to filter the desired one. The file is uniquely keyed so my plan is to compare the journal entry data with the key. The problem is that one of the key is a packed decimal so in the journal entry it is treated as hexadecimal code of characters and displayed as some strange symbols. So in order to compare the strings I need to convert the packed decimal key into the corresponding characters. How to achieve this in CL? If using CL is not possible, what about RPG?
To answer your immediate question, the CVTCH MI instruction will convert hex to char but I would not go that route; neither in CL nor RPG. Rather, I would take James' advice with a few additional steps.
DSPJRNE OUTFILE(QTEMP/DSPJRNE)
QRY input file DSPJRNE, output file QRYJRNE, select only JOESD
CRTDUPOBJ PRODUCTION_FILE QTEMP/JRNF DATA(*NO)
CPYF QRYJRNE JRNF FMTOPT(*NOCHK)
This will give you an externally described file with the exact same layout as your production file. You can query that, etc.
If you are pulling journal entries for a specific file you can dump them into an externally described file with a clever use of SQL:
CREATE TABLE QTEMP/QADSPJRN LIKE QSYS/QADSPJRN
ALTER TABLE QTEMP/QADSPJRN DROP COLUMN JOESD
CREATE TABLE QTEMP/DSPJRNE AS (SELECT * FROM QTEMP/QADSPJRN, FILE-LIB/FILE)
WITH NO DATA
DSPJRNE ... OUTPUT(*OUTFILE) OUTFILFMT(*TYPE1) OUTFILE(QTEMP/DSPJRNE)
ENDDTALEN(*CALC)

Resources