Compare PS file variables(1-8) and PDS members(1-8) and create PDS members file for all matching PDS names

Compare PS file variables(1-8) and PDS members(1-8) and create PDS members file for all matching PDS names - cobol

I have a requirement to match 2 files(first file is PS file and second file is a PDS file)
PS file 1-8 bytes are member names:
Example:
ABCD1234
DDFF2345
QWER3456
PDS file has 100 members and above given 3 members are matching.
Example:
AAAA1234
ABCD1234
DDFF2345
QWER3456
SSSS2222
HHHH1212
My requirement is to create output PDS file with those 3 matching members(Same as of input PDS file)
Result PDS file:
ABCD1234
DDFF2345
QWER3456
If anyone can please guide or give an idea would be very helpful.
Regards
Harry

Write a step to obtain the list of members in the PDS. There are various ways to do this, in-house utilities, third-party utilities, write your own clist or Rexx wrapping the LISTDS TSO command. Use whatever method is commonly in use in your shop. If you don't know what's commonly in use in your shop, ask your coworkers. Save the output in a file.
Write a step to execute your shop's SORT utility to match-merge the list of members from the first step with the list in the PS file. Both DFSORT and Syncsort can do this, see the JOINFILE keyword in the documentation. Save the list of members that match in a file.
Here's the tricky part: write a step to read the list of members from the match-merge step and output IEBCOPY control statements to copy the members. If you want to get fancy, you could combine this step with the previous step.
Write a step to execute IEBCOPY to execute the control statements written by the previous step.

If you have access to USS you can do what you want by following this example:
The example uses the following dataset names, but you can replace
them with the actual dataset names in your case:
PS: hlq.MEMBER.LIST <--- the list of members you want
PDS: hlq.PDS.DATASET <--- the PDS dataset with members you wan to copy
PDS: hlq.NEWPDS.DATASET <--- the PDS dataset to which you want to copy the members
Create the list of members from hlq.PDS.DATASET into a USS file pds.members
> tsocmd "listds 'hlq.PDS.DATASET' MEMBERS" >pds.members
Copy each member listed in hlq.MEMBER.LIST that is in hlq.PDS.DATASET into the hlq.NEWPDS.DATASET
> for fn in $(cat "//'hlq.MEMBER.LIST'"); do grep $fn pds.members >/dev/null 2>&1; [ $? -eq 0 ] && cp "//'hlq.PDS.DATASET($fn)'" "//'hlq.NEWPDS.DATASET'"; done

Write an edit macro to turn that list of members into IEBCOPY control statements to copy the members:
COPY INDD=INLIB,OUTDD=OUTLIB
SELECT MEMBER=(ABCD1234)
SELECT MEMBER=(DDFF2345)
SELECT MEMBER=(QWER3456)
and then use this edited data as input to an IEBCOPY job to copy those members to a new PDS:
//COPY EXEC PGM=IEBCOPY
//*
//INLIB DD DISP=SHR,DSN=<old.pds>
//OUTLIB DD DISP=SHR,DSN=<new.pds>
//*
//SYSUT3 DD UNIT=SYSDA,SPACE=(CYL,(50,50))
//SYSUT4 DD UNIT=SYSDA,SPACE=(CYL,(50,50))
//SYSPRINT DD SYSOUT=*
//SYSIN DD DISP=SHR,DSN=<new.control.statements>

Related

how to print all match using grep

I have a txt file that has only information about location (location.txt)
Another large txt file (all.txt) has a lot of information like id , and a location.txt is subset of all.txt ( some records common in both )
I want to search the location.txt in another file with grep (all.txt)
and print all common records ( but all information like all.txt )
I try to grep by :
grep -f location.txt all.txt
the problem grep just give me the last location not all locations
how can I print all location?

I'm assuming you mean to use one of the files as a set of patterns for grep. If this is the case, you seem to be looking for a way to print all lines in one file not found in the other and this is what you want:
grep -vFf file_with_patterns other_file
Explanation
-F means to interpret the pattern(s) literally, giving no particular meaning to regex metacharacters (like * and +, for example)
-f means read regex patterns from the file named as argument (file_with_patterns in this case).

Systematically renaming variables in SPSS (without Python)

I'm looking for a solution to rename variables in SPSS. I can't use Python because of software restrictions at my workplace.
The goal is to rename variables into "oldname_new".
I tried "do repeat" like this, but it can't be combined with the rename function.
do repeat x= var1 to var100.
rename var (x=concat("x","_new")).
end repeat print.
exe.
Also, I figured that even without the do repeat, the rename command doesn't allow concat and similar commands? Is that correct?
So, is there any solution for this in SPSS?

As you found out you can't use rename within a do repeat loop.
SPSS macro can do this -
define DoNewnames ()
rename vars
!do !v=1 !to 100 !concat("var", !v, " = var", !v, "_new") !doend .
!enddefine.
* now the macro is defined, we can run it.
DoNewnames .
EDIT:
The code above is good for a set of variables with systematic names. In case the names are not systematic, you will need a different macro:
define DoNewnames (varlist=!cmdend)
rename vars
!do !v !in(!varlist) !concat(!v, " = ", !v, "_new") !doend .
!enddefine.
* Now in this case you need to feed the variable list into the macro.
DoNewnames varlist = age sex thisvar thatvar othervar.
If you want to see the syntax generated by the macro (like you did with end repeat print) you can run this before running the macro:
set mprint on.
EDIT 2:
As the OP says - the last macro requires naming all the variables to be renamed, which is a hassle if there are many. So the next code will get them all automatically without naming them individually. The process - as described in #petit_dejeuner's comment - creates a new data set that contains each original variable as an observation, and the original variable name as a value (=meta information about the variables, like a codebook). This way, you can recode the variable name into the renaming syntax.
dataset name orig.
DATASET DECLARE varnames.
OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
/DESTINATION FORMAT=SAV OUTFILE='varnames' VIEWER=NO.
display dictionary.
omsend.
dataset activate varnames.
string cmd (a50).
compute cmd=concat("rename vars ", rtrim(var1), " = ", rtrim(var1), "_new .").
* Before creating the rename syntax in the following line, this is your chance to remove variables from the list which you do not wish to rename (using "select if" etc' on VAR1).
write out="my rename syntax.sps" /cmd.
dataset activate orig.
insert file="my rename syntax.sps" .
A couple of notes:
Before writing to (and inserting from) "my rename syntax.sps" you may need to add a writable path in the file name.
This code will rename ALL the variable in the dataset. If you want to avoid some of the variables - you should filter them in the variable list before writing out to "my rename syntax.sps" (see where I point this out in the code).

Grepping twice using result of first Grep in Large file

Am given a list if ID which I need to trace back a name in a file
file: ID contains
1
2
3
4
5
6
The ID are contained in a Large 2 GB file called result.txt
ABC=John,dhds,72828,73737,3939,92929
CDE=John,uubad,32424,ajdaio,343533
FG1=Peter,iasisaio,097282,iosoido
WER=Ann,97391279,89719379,7391739
result,**id=1**,iuhdihdio,ihwoihdoih,iuqhwiuh,ABC
result2,**id=2**,9729179,hdqihi,hidqi,82828,CDE
result3,**id=3**,biasi,8u9829,90u209w,jswjso,FG1
So I cat the ID file into a variable
I then use this variable in a loop to grep out the values to link back to the name using grep and cut -d from results.txt and output to a variable
so variable contains ABS CDE FG1
In the same loop I pass the output of the grep to perform another grep on results.txt, to get the name
ie regrets file for ABC CDE FG1
I do get the answer but takes a long time is their a more efficient way?
Thanks

Making some assumptions about your requirement... ID's that are not found in the big file will not be shown in the output; the desired output is in the format shown below.
Here are mock input files - f1 for the id's and f2 for the large file:
[mathguy#localhost test]$ cat f1
1
2
3
4
5
6
[mathguy#localhost test]$ cat f2
ABC=John,dhds,72828,73737,3939,92929
CDE=John,uubad,32424,ajdaio,343533
FG1=Peter,iasisaio,097282,iosoido
WER=Ann,97391279,89719379,7391739
result,**id=1**,iuhdihdio,ihwoihdoih,iuqhwiuh,ABC
result2,**id=2**,9729179,hdqihi,hidqi,82828,CDE
result3,**id=3**,biasi,8u9829,90u209w,jswjso,FG1
Proposed solution and output:
[mathguy#localhost test]$ sed 's/.*/\*\*id=&\*\*/' f1 | grep -Ff - f2 | \
> sed -E 's/^.*\*\*id=([[:digit:]]*)\*\*.*,([^,]*)$/\1 \2/'
1 ABC
2 CDE
3 FG1
The hard work here is done by grep -F which might be just fast enough for your needs. There is some prep work and some clean-up work done by sed, but those are both on small datasets.
First we take the id's from the input file and we output strings in the format **id=<number>**. The output is presented as the fixed-character patterns to grep -F via the option -f (take the patterns from file, in this case from stdin, invoked as -; that is, from the output of sed).
After we find the needed lines from the big file, the final sed just extracts the id and the name from each line.
Note: this assumes that each id is only found once in the big file. (Actually the command will work regardless; but if there are duplicate lines for an id, your business users will have to tell you how to handle. What if you get contradictory names for the same id? Etc.)

How to use a whitelist with grep

I have a list of email addresses. I also have a list of common first and last names. I want to filter the email list against the one with common first and last names, thus only printing emails with either a common first and / or last name in the output file.
So, I tried:
cat file | egrep -e -i < whitelist | tee emails_with_common_first_and_last_names.txt
At first, this seemed like it was working. Then, after examining the output, it did not seem to do anything.
wc -l input output
This revealed that nothing was filtered.
So, how else can I do this or what am I doing incorrectly?
Here is a sample of the file that I would like filtered:
aynz#falskdf.com
8zlkhsdf0#fmail.com
afjsg#domain.com
Here is a sample of the whitelist that I would like to use as a reference to filer the file:
ALEX
johnson
WINTERS
miles
christina
tonya
jackson
schmidt
jake
So, if an email contains any of these, grep or whatever needs to print it to a output file.

Addressing a specific occurrence of a character in sed

How do I remove or address a specific occurrence of a character in sed?
I'm editing a CSV file and I want to remove all text between the third and the fifth occurrence of the comma (that is, dropping fields four and five) . Is there any way to achieve this using sed?
E.g:
% cat myfile
one,two,three,dropthis,dropthat,six,...
% sed -i 's/someregex//' myfile
% cat myfile
one,two,three,,six,...

If it is okay to consider cut command then:
$ cut -d, -f1-3,6- file

awk or any other tools that are able to split strings on delimiters are better for the job than sed
$ cat file
1,2,3,4,5,6,7,8,9,10
Ruby(1.9+)
$ ruby -ne 's=$_.split(","); s[2,3]=nil ;puts s.compact.join(",") ' file
1,2,6,7,8,9,10
using awk
$ awk 'BEGIN{FS=OFS=","}{$3=$4=$5="";}{gsub(/,,*/,",")}1' file
1,2,6,7,8,9,10

A real parser in action
#!/usr/bin/python
import csv
import sys
cr = csv.reader(open('my-data.csv', 'rb'))
cw = csv.writer(open('stripped-data.csv', 'wb'))
for row in cr:
cw.writerow(row[0:3] + row[5:])
But do note the preface to the csv module:
The so-called CSV (Comma Separated
Values) format is the most common
import and export format for
spreadsheets and databases. There is
no “CSV standard”, so the format is
operationally defined by the many
applications which read and write it.
The lack of a standard means that
subtle differences often exist in the
data produced and consumed by
different applications. These
differences can make it annoying to
process CSV files from multiple
sources. Still, while the delimiters
and quoting characters vary, the
overall format is similar enough that
it is possible to write a single
module which can efficiently
manipulate such data, hiding the
details of reading and writing the
data from the programmer.
$ cat my-data.csv
1
1,2
1,2,3
1,2,3,4,
1,2,3,4,5
1,2,3,4,5,6
1,2,3,4,5,6,
1,2,,4,5,6
1,2,"3,3",4,5,6
1,"2,2",3,4,5,6
,,3,4,5
,,,4,5
,,,,5
$ python csvdrop.py
$ cat stripped-data.csv
1
1,2
1,2,3
1,2,3
1,2,3
1,2,3,6
1,2,3,6,
1,2,,6
1,2,"3,3",6
1,"2,2",3,6
,,3
,,
,,

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Compare PS file variables(1-8) and PDS members(1-8) and create PDS members file for all matching PDS names - cobol

Related

how to print all match using grep

Systematically renaming variables in SPSS (without Python)

Grepping twice using result of first Grep in Large file

How to use a whitelist with grep

Addressing a specific occurrence of a character in sed

Categories

Resources