I have a txt file that has only information about location (location.txt)
Another large txt file (all.txt) has a lot of information like id , and a location.txt is subset of all.txt ( some records common in both )
I want to search the location.txt in another file with grep (all.txt)
and print all common records ( but all information like all.txt )
I try to grep by :
grep -f location.txt all.txt
the problem grep just give me the last location not all locations
how can I print all location?
I'm assuming you mean to use one of the files as a set of patterns for grep. If this is the case, you seem to be looking for a way to print all lines in one file not found in the other and this is what you want:
grep -vFf file_with_patterns other_file
Explanation
-F means to interpret the pattern(s) literally, giving no particular meaning to regex metacharacters (like * and +, for example)
-f means read regex patterns from the file named as argument (file_with_patterns in this case).
I'm looking for a solution to rename variables in SPSS. I can't use Python because of software restrictions at my workplace.
The goal is to rename variables into "oldname_new".
I tried "do repeat" like this, but it can't be combined with the rename function.
do repeat x= var1 to var100.
rename var (x=concat("x","_new")).
end repeat print.
exe.
Also, I figured that even without the do repeat, the rename command doesn't allow concat and similar commands? Is that correct?
So, is there any solution for this in SPSS?
As you found out you can't use rename within a do repeat loop.
SPSS macro can do this -
define DoNewnames ()
rename vars
!do !v=1 !to 100 !concat("var", !v, " = var", !v, "_new") !doend .
!enddefine.
* now the macro is defined, we can run it.
DoNewnames .
EDIT:
The code above is good for a set of variables with systematic names. In case the names are not systematic, you will need a different macro:
define DoNewnames (varlist=!cmdend)
rename vars
!do !v !in(!varlist) !concat(!v, " = ", !v, "_new") !doend .
!enddefine.
* Now in this case you need to feed the variable list into the macro.
DoNewnames varlist = age sex thisvar thatvar othervar.
If you want to see the syntax generated by the macro (like you did with end repeat print) you can run this before running the macro:
set mprint on.
EDIT 2:
As the OP says - the last macro requires naming all the variables to be renamed, which is a hassle if there are many. So the next code will get them all automatically without naming them individually. The process - as described in #petit_dejeuner's comment - creates a new data set that contains each original variable as an observation, and the original variable name as a value (=meta information about the variables, like a codebook). This way, you can recode the variable name into the renaming syntax.
dataset name orig.
DATASET DECLARE varnames.
OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
/DESTINATION FORMAT=SAV OUTFILE='varnames' VIEWER=NO.
display dictionary.
omsend.
dataset activate varnames.
string cmd (a50).
compute cmd=concat("rename vars ", rtrim(var1), " = ", rtrim(var1), "_new .").
* Before creating the rename syntax in the following line, this is your chance to remove variables from the list which you do not wish to rename (using "select if" etc' on VAR1).
write out="my rename syntax.sps" /cmd.
dataset activate orig.
insert file="my rename syntax.sps" .
A couple of notes:
Before writing to (and inserting from) "my rename syntax.sps" you may need to add a writable path in the file name.
This code will rename ALL the variable in the dataset. If you want to avoid some of the variables - you should filter them in the variable list before writing out to "my rename syntax.sps" (see where I point this out in the code).
Am given a list if ID which I need to trace back a name in a file
file: ID contains
1
2
3
4
5
6
The ID are contained in a Large 2 GB file called result.txt
ABC=John,dhds,72828,73737,3939,92929
CDE=John,uubad,32424,ajdaio,343533
FG1=Peter,iasisaio,097282,iosoido
WER=Ann,97391279,89719379,7391739
result,**id=1**,iuhdihdio,ihwoihdoih,iuqhwiuh,ABC
result2,**id=2**,9729179,hdqihi,hidqi,82828,CDE
result3,**id=3**,biasi,8u9829,90u209w,jswjso,FG1
So I cat the ID file into a variable
I then use this variable in a loop to grep out the values to link back to the name using grep and cut -d from results.txt and output to a variable
so variable contains ABS CDE FG1
In the same loop I pass the output of the grep to perform another grep on results.txt, to get the name
ie regrets file for ABC CDE FG1
I do get the answer but takes a long time is their a more efficient way?
Thanks
Making some assumptions about your requirement... ID's that are not found in the big file will not be shown in the output; the desired output is in the format shown below.
Here are mock input files - f1 for the id's and f2 for the large file:
[mathguy#localhost test]$ cat f1
1
2
3
4
5
6
[mathguy#localhost test]$ cat f2
ABC=John,dhds,72828,73737,3939,92929
CDE=John,uubad,32424,ajdaio,343533
FG1=Peter,iasisaio,097282,iosoido
WER=Ann,97391279,89719379,7391739
result,**id=1**,iuhdihdio,ihwoihdoih,iuqhwiuh,ABC
result2,**id=2**,9729179,hdqihi,hidqi,82828,CDE
result3,**id=3**,biasi,8u9829,90u209w,jswjso,FG1
Proposed solution and output:
[mathguy#localhost test]$ sed 's/.*/\*\*id=&\*\*/' f1 | grep -Ff - f2 | \
> sed -E 's/^.*\*\*id=([[:digit:]]*)\*\*.*,([^,]*)$/\1 \2/'
1 ABC
2 CDE
3 FG1
The hard work here is done by grep -F which might be just fast enough for your needs. There is some prep work and some clean-up work done by sed, but those are both on small datasets.
First we take the id's from the input file and we output strings in the format **id=<number>**. The output is presented as the fixed-character patterns to grep -F via the option -f (take the patterns from file, in this case from stdin, invoked as -; that is, from the output of sed).
After we find the needed lines from the big file, the final sed just extracts the id and the name from each line.
Note: this assumes that each id is only found once in the big file. (Actually the command will work regardless; but if there are duplicate lines for an id, your business users will have to tell you how to handle. What if you get contradictory names for the same id? Etc.)
I have a list of email addresses. I also have a list of common first and last names. I want to filter the email list against the one with common first and last names, thus only printing emails with either a common first and / or last name in the output file.
So, I tried:
cat file | egrep -e -i < whitelist | tee emails_with_common_first_and_last_names.txt
At first, this seemed like it was working. Then, after examining the output, it did not seem to do anything.
wc -l input output
This revealed that nothing was filtered.
So, how else can I do this or what am I doing incorrectly?
Here is a sample of the file that I would like filtered:
aynz#falskdf.com
8zlkhsdf0#fmail.com
afjsg#domain.com
Here is a sample of the whitelist that I would like to use as a reference to filer the file:
ALEX
johnson
WINTERS
miles
christina
tonya
jackson
schmidt
jake
So, if an email contains any of these, grep or whatever needs to print it to a output file.
How do I remove or address a specific occurrence of a character in sed?
I'm editing a CSV file and I want to remove all text between the third and the fifth occurrence of the comma (that is, dropping fields four and five) . Is there any way to achieve this using sed?
E.g:
% cat myfile
one,two,three,dropthis,dropthat,six,...
% sed -i 's/someregex//' myfile
% cat myfile
one,two,three,,six,...
If it is okay to consider cut command then:
$ cut -d, -f1-3,6- file
awk or any other tools that are able to split strings on delimiters are better for the job than sed
$ cat file
1,2,3,4,5,6,7,8,9,10
Ruby(1.9+)
$ ruby -ne 's=$_.split(","); s[2,3]=nil ;puts s.compact.join(",") ' file
1,2,6,7,8,9,10
using awk
$ awk 'BEGIN{FS=OFS=","}{$3=$4=$5="";}{gsub(/,,*/,",")}1' file
1,2,6,7,8,9,10
A real parser in action
#!/usr/bin/python
import csv
import sys
cr = csv.reader(open('my-data.csv', 'rb'))
cw = csv.writer(open('stripped-data.csv', 'wb'))
for row in cr:
cw.writerow(row[0:3] + row[5:])
But do note the preface to the csv module:
The so-called CSV (Comma Separated
Values) format is the most common
import and export format for
spreadsheets and databases. There is
no “CSV standard”, so the format is
operationally defined by the many
applications which read and write it.
The lack of a standard means that
subtle differences often exist in the
data produced and consumed by
different applications. These
differences can make it annoying to
process CSV files from multiple
sources. Still, while the delimiters
and quoting characters vary, the
overall format is similar enough that
it is possible to write a single
module which can efficiently
manipulate such data, hiding the
details of reading and writing the
data from the programmer.
$ cat my-data.csv
1
1,2
1,2,3
1,2,3,4,
1,2,3,4,5
1,2,3,4,5,6
1,2,3,4,5,6,
1,2,,4,5,6
1,2,"3,3",4,5,6
1,"2,2",3,4,5,6
,,3,4,5
,,,4,5
,,,,5
$ python csvdrop.py
$ cat stripped-data.csv
1
1,2
1,2,3
1,2,3
1,2,3
1,2,3,6
1,2,3,6,
1,2,,6
1,2,"3,3",6
1,"2,2",3,6
,,3
,,
,,