Input File 1 [VB 1504 Bytes]
HEADER REC
2000A.....
REC1....
REC2....
2300....
REC3....
REC4....
.
.
RECN......
2000A
REC1....
REC2....
2300....
REC3....
REC4....
.
.
RECN...
FILE2 [10 Bytes FB]
1234567891
9876544211
I want to copy record where 10 BYtes key in File 2 match with 10 Bytes key present in record starting with 2300. Key position [15:10]
If key matches copy record starting from 2000A till next 2000A record.
Any suggestions ....
you can try DFSORT to do it....see link below https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.icea100/ice2ca_Example_3_-_Create_files_with_matching_and_non-matching_records.htm
What I understood from your question is that you need records from the VB1504 file where the records are starting with 2300 and matching the key from FB10.
You are going to need DFSORT/ICETOOL join operation.
Assuming the given positions; FILE1 being FB10 and FILE2 being VB1504; the JCL along with the SYSIN card goes something like this -
//JOBNAME JOB 'DFSORT JOIN',CLASS=A,MSGCLASS=A,
// NOTIFY=&SYSUID
//*
//SORTJOIN EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//FILE1 DD DISP=SHR,DSN=FILE1
//FILE2 DD DISP=SHR,DSN=FILE2
//SORTOUT DD DSN=OUTPUT.FILE,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(10,10),RLSE),DCB=*.FILE2
//SYSIN DD *
JOINKEYS FILE=FILE1,FIELDS=(01,10,CH,A)
JOINKEYS FILE=FILE2,FIELDS=(19,10,CH,A),
INCLUDE=(05,04,ZD,EQ,2300)
REFORMAT FIELDS=(F2:01,1504)
OPTION COPY
/*
This will give you non-duplicate records from the VB1504 file where the records are starting with 2300 and matching the key.
If you want duplicate records, then change OPTION COPY to OPTION EQUALS
Method 1:
Open files.
Process header for 'File 1'.
Load 'File 2' into a 'lookup table'.
For each '2000A group' in 'File 1' until 'end of file'.
Set a 'record counter' to zero.
Load all records into a 'buffer table', until another '2000A' record
or end of file is found, counting the number of records.
Locate the '2300' record in the 'buffer table'. (The location may be saved
while loading the 'buffer table'.)
Search the 'lookup table' for a value matching the value in
the '2300' record.
If a match is found.
Write the 'buffer table' to the output file.
End of for each.
Close files.
Or, as NicC suggests,
Method 2:
Open files.
Process header for 'File 1'.
Load 'File 2' into a 'lookup table'.
For each '2000A group' in 'File 1' until 'end of file'.
Set the 'buffer table' 'record counter' to zero.
Load records into a 'buffer table', counting the number of records,
until the '2300' record is found.
Search the 'lookup table' for a value matching the value in
the '2300' record.
If a match is found.
Write the 'buffer table' to the output file.
Write the '2300' record to the output file.
Copy records from the input file to the output file until another
'2000A' record is found or end of file.
Else.
Skip records from 'File 1' until another '2000A' record is found
or end of file.
End of if.
End of for each.
Close files.
Which method to choose may depend on the number of records to be saved in the 'buffer table'. Method 1 uses two routines: Load 'buffer table' and Write 'buffer table'. Method 2 uses four routines: Load 'buffer table', Write 'buffer table', Copy 'File 1', and Skip 'File 1' records (though Copy 'File 1' could have 'Skip flag' to prevent writing the records). This is not much of a difference.
Related
I am very new to ruby and I want to check for rows with the same number in a csv file.
What I am trying to do is go through the input csv file and copy element from the input file to the output file also adding another column called "duplicate" to the output file, then check if a similar phone is already in the output file while copying data from input to output then if the phone already exist, add "dupl" to the row in the duplicate column.
This is what I have.
file=CSV.read('input_file.csv')
output_file=File.open("output2.csv","w")
for row in file
output_file.write(row)
output_file.write("\n")
end
output_file.close
Example:
Phone
(202) 221-1323
(201) 321-0243
(202) 221-1323
(310) 343-4923
output file
Phone
Duplicate
(202) 221-1323
(201) 321-0243
(202) 221-1323
dupl
(310) 343-4923
So basically you want to write the input to output and append a "dupl" on the second occurrence of a duplicate?
Your input to output seems fine. To get the "dupl" flag, simply count the occurrence of each number in the list. If it's more than one, its a duplicate. But since you only want the flag to be shown on the second occurrence just count how often the number appeared up until that point:
lines = CSV.read('input_file.csv')
lines.each_with_index do |l,i|
output_file.write(l + ",")
if lines.take(i).count(l) >= 1
output_file.write("dupl")
end
output_file.write("\n")
end
l is the current line. take(i) is all lines before but not including the current line and count(l) applied to this counts how often the number appeared before if it's more than one, print a "dupl"
There probably is a more efficient answer to this, this is just a quick and easy to understand version.
I have the following syntax to merge two datasets. I expect that the resulting dataset (test1) contains 5 cases with 4 of them (2 to 5) a value in variable set2.
The result I am getting is dataset test1 with 5 cases but only 1 of them (case with id 5) has a value in variable set2.
Do I need to contact my ICT department, or am I misunderstanding something about merging data in SPSS. I am used to working with SAS, R and SQL, but need to help someone with a data merging within SPSS
INPUT PROGRAM.
LOOP id=1 to 5.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
COMPUTE set1 = RV.NORMAL(1,1).
EXECUTE.
DATASET NAME test1.
INPUT PROGRAM.
LOOP id=2 to 5.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
COMPUTE set2 = RV.NORMAL(1,1).
EXECUTE.
DATASET NAME test2.
DATASET ACTIVATE test1.
STAR JOIN
/SELECT t0.set1, t1.set2
/FROM * AS t0
/JOIN 'test2' AS t1
ON t0.id=t1.id
/OUTFILE FILE=*.
results in:
id set1 set2
1,00 1,74
2,00 1,58
3,00 1,01
4,00 ,12
5,00 2,52 ,79
SPSS version 21
When I run the syntax you provide I get the desired results (and not what you indicate):
If it continues to fail (after contacting SPSS support), try using MATCH FILES:
DATASET ACTIVATE test1.
SORT CASES BY ID.
DATASET ACTIVATE test2.
SORT CASES BY ID.
MATCH FILES FILE=test1 /FILE=test2 /BY ID.
DATASET NAME Restult.
I've defined a function for running batches of custom tables:
DEFINE !xtables (myvars=!CMDEND)
CTABLES
/VLABELS VARIABLES=!myvars retailer total DISPLAY=LABEL
/TABLE !myvars [C][COLPCT.COUNT PCT40.0, TOTALS[UCOUNT F40.0]] BY retailer [c] + total [c]
/SLABELS POSITION=ROW
/CRITERIA CILEVEL=95
/CATEGORIES VARIABLES=!myvars ORDER=D KEY=COLPCT.COUNT (!myvars) EMPTY=INCLUDE TOTAL=YES LABEL='Base' POSITION=AFTER
/COMPARETEST TYPE=PROP ALPHA=.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO
!ENDDEFINE.
I can then run a series for commands to run these in one batch.
!XTABLES MYVARS=q1.
!XTABLES MYVARS=q2.
!XTABLES MYVARS=q3.
However, if a table has the same row and column, Custom Tables freezes:
!XTABLES MYVARS=retailer.
The culprit appears to be SLABELS. I hadn't encountered this problem before v24.
I tried replicating a CTABLES spec as close as possible to yours and found that VLABELSdoes not like the same variable specified twice.
GET FILE="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
CTABLES /VLABELS VARIABLES=Gender Gender DISPLAY=LABEL
/TABLE Gender[c][COLPCT.COUNT PCT40.0, TOTALS[UCOUNT F40.0]]
BY Gender[c] /SLABELS POSITION=ROW
/CATEGORIES VARIABLES=Gender ORDER=D KEY=COLPCT.COUNT(Gender) .
Which yields an error message:
VLABELS: Text GENDER. The same keyword, option, or subcommand is used more than once.
The macro has a parmeter named MYVARS, which suggests that more than one variable can be listed, however, if you do that, it will generate an invalid command. Something else to watch out for. I can see the infinite loop in V24. In V23, an error message is produced.
I want to export several spss custom tables to excel. I want to export just the tables and exclude the syntax. I tried to select all and exclude if, but I am still getting all of the output.
You can export the output with the OMS command. Within this command you can specify which output elements you want to export.
If you want to export just the custom tables, you can run the following command.
OMS /SELECT TABLES
/IF SUBTYPES = 'Custom Table'
/DESTINATION FORMAT = XLSX
OUTFILE = '/mydir/myfile.xlsx'.
... Some CTABLES Commands ...
OMSEND.
Every custom table (generated from CTABLES commands) between OMS and OMSEND will be exported to a single .xlsx file specified by the outfile option.
See the SPSS Command Syntax Reference for more information on the OMS command.
Here is an complete example of Output Management System (OMS) in xlsx with Ctable using SPSS Syntax. Here I have run custom table between Month and A1A variables. I have used VIEWER=NO is OMS Syntax which does not display CTables in SPSS output window but create xlsx output with desired tables.
OMS
/SELECT TABLES
/IF COMMANDS=['CTables'] SUBTYPES=['Custom Table']
/DESTINATION FORMAT=XLSX
OUTFILE ='...\Custom Tables.xlsx'
VIEWER=NO.
CTABLES
/VLABELS VARIABLES=A1A MONTH DISPLAY=LABEL
/TABLE A1A [C] BY MONTH [C][COLPCT.COUNT PCT40.1]
/CATEGORIES VARIABLES=A1A MONTH ORDER=A KEY=VALUE EMPTY=INCLUDE
/SLABELS VISIBLE=NO
/TITLES
TITLE='[UnAided Brand Awareness] A1A TOM.'
CAPTION= ')DATE)TIME'.
OMSEND.
Try something like this, for which you will need the SPSSINC MODIFY OUTPUT extension:
get file="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
/* Swich printback on to demo how to exclude printback in export */.
set printback on.
ctables /table jobcat[c] /titles title="Table: Job cat".
ctables /table gender[c] /titles title="Table: Gender".
spssinc modify output logs charts headings notes page texts warnings trees model /if process=all /visibility visible=false.
/* Exclude the Custom Table titles */.
spssinc modify output titles /if itemtitle="Custom Tables" process=all /visibility visible=false.
output export
/contents export=visible layers=visible modelviews=printsetting
/xlsx documentfile="C:/Temp/Test.xlsx"
operation=createfile sheet='CTables'
location=lastcolumn notescaptions=yes.
These are good answers, but I wanted to get the simple solution on the record:
Unless there's some reason you need a script (e.g. for automated processes), you can copy and paste the tables straight into excel.
In the output window, right-click on the table, select "copy", and it will paste into Excel without issue.
Another solution is to use some .sps script written by a smart guy named Reynolds, located here:
http://www.spsstools.net/en/scripts/577/
Simply download this as .sps on right hand side of screen and save it out into your SPSS folder. At the end of your ctables syntax you will write this simple 1 line syntax that calls this file and will do all the work for you.
script 'N:\WEB\SPSS19\FILENAME.sps'.
It loops through the output window, deletes all syntax/titles and keeps the ctables right before your eyes. It works very well, saves me lots of time at work.
I have a CSV file like:
Header: 1,2,3,4
content: a,b,c,d,e
a,b,c,d
a,b
a,b,c,d,d
Is there any CSV method that I can use to easily validate the column consistency instead of
parsing the CSV line by line?
One way or another the whole file has to be read.
Here is a relative simple way. First the file is read and converted to an array which is then mapped to another array based on length (number of fields per row). This array is the checked if the length is always the same.
If you'd hate to read the file twice you could remember the length of the header and while you parse the file check each record if it has the same number of fields and otherwise trow an exeption.
require "csv"
def valid? file
a = CSV.read(file).map { |e|e.length }
a.min == a.max
end
p valid?("data.csv")
csv_validator gem would be helpful here.