I am parsing a text file
Lines File Name Gen LnkLN LINK Time
----- -------------------- ---- ----- ---- ------------------------
00090 TEST1_1519230912 0 00092 .X.X Wed Feb 21 16:35:14 2018
00091 TEST2_1619330534 0 00093 .X.X Wed Feb 21 16:35:14 2018
using code
awk '{if (($1 ~ /^[0-9A-Fa-f]+$/) && (length($1)==5)) {
if (! c[$4]) TLN=TLN $4 ","
c[$4]=$4;
if (! d[$3]) TGN=TGN $3 ","
d[$3]=$3
if (! b[$2]) TLNK=TLNK $2 ","
b[$2]=$2
}
} END {print "TLines="TLN,"TGEN="TGN,"TLink="TLNK}' /var/tmp/slink.jnk
I get O/p
TLines=00092,00093, TGEN=0,0, TLink=TEST1_1519230912,TEST2_1619330534,
I have two questions with this.
First one is I don't understand why value for TGN is being printed twice in the output "0,0,". If file has duplicate value for the field I want only one value in the o/p.
Second, I redirect these o/p into another file and use #source filename.txt command to set these values as environment variables and use them in later part of the script. Is there any better way to use them as variables inside the script rather than creating another file and sourcing it.
Use in to see if a value is being repeated to avoid the case where the value itself evaluates to false. That is what's happening with your 0 value and why it's being repeated in your output.
$ awk '{if (($1 ~ /^[0-9A-Fa-f]+$/) && (length($1)==5)) {
if (!($4 in c)) TLN=TLN $4 ","
c[$4]
if (!($3 in d)) TGN=TGN $3 ","
d[$3]
if (!($2 in b)) TLNK=TLNK $2 ","
b[$2]
}
} END {print "TLines="TLN,"TGEN="TGN,"TLink="TLNK}' f
Output:
TLines=00092,00093, TGEN=0, TLink=TEST1_1519230912,TEST2_1619330534,
EDIT
Above I've kept things close to your original version, but as mentioned in the comments, a more idiomatic and nicer version would be:
$ awk '($1 ~ /^[0-9A-Fa-f]+$/) && (length($1)==5) {
if (!c[$4]++) TLN=TLN $4 ","
if (!d[$3]++) TGN=TGN $3 ","
if (!b[$2]++) TLNK=TLNK $2 ","
} END {print "TLines="TLN,"TGEN="TGN,"TLink="TLNK}' f
END EDIT
For setting the variables, this worked for me (where a.awk contains the awk code, above):
$ eval "$(awk -f a.awk f)"
$ echo $TLines
00092,00093,
$ echo $TGEN
0,
$ echo $TLink
TEST1_1519230912,TEST2_1619330534,
Related
I am trying to extract bin names from from Cargo.toml using Bash, I enabled perl regular expression like this
First attempt
grep -Pzo '(?<=(^\[\[bin\]\]))\s*name\s*=\s*"(.*)"' ./Cargo.toml
The regular expression is tested at regex101
But got nothing
the Pzo options usage can be found here
Second attempt
grep -P (?<=(^[[bin]]))\n*\sname\s=\s*"(.*)" ./Cargo.toml
Still nothing
grep -Pzo '(?<=(^\[\[bin\]\]))\s*name\s*=\s*"(.*)"' ./Cargo.toml
Cargo.toml
[[bin]]
name = "acme1"
path = "bin/acme1.rs"
[[bin]]
name = "acme2"
path = "src/acme1.rs"
grep:
grep -A1 '^\[\[bin\]\]$' |
grep -Po '(?<=^name = ")[^"]*(?=".*)'
or if you can use awk, this is more robust
awk '
$1 ~ /^\[\[?[[:alnum:]]*\]\]?$/{
if ($1=="[[bin]]" || $1=="[bin]") {bin=1}
else {bin=0}
}
bin==1 &&
sub(/^[[:space:]]*name[[:space:]]*=[[:space:]]*/, "") {
sub(/^"/, ""); sub(/".*$/, "")
print
}' cargo.toml
Example:
$ cat cargo.toml
[[bin]]
name = "acme1"
path = "bin/acme1.rs"
[bin]
name="acme2"
[[foo]]
name = "nobin"
[bin]
not_name = "hello"
name="acme3"
path = "src/acme3.rs"
[[bin]]
path = "bin/acme4.rs"
name = "acme4" # a comment
$ sh solution
acme1
acme2
acme3
acme4
Obviously, these are no substitute for a real toml parser.
With your shown samples and attempts, please try following code with tac + awk combination, which will be easier to maintain and does the job with easiness, which will be difficult in grep.
tac Input_file |
awk '
/^name =/{
gsub(/"/,"",$NF)
value=$NF
next
}
/^path[[:space:]]+=[[:space:]]+"bin\//{
print value
value=""
}
' |
tac
Explanation: Adding detailed explanation for above code.
tac Input_file | ##Using tac command on Input_file to print it in bottom to top order.
awk ' ##passing tac output to awk as standard input.
/^name =/{ ##Checking if line starts from name = then do following.
gsub(/"/,"",$NF) ##Globally substituting " with NULL in last field.
value=$NF ##Setting value to last field value here.
next ##next will skip all further statements from here.
}
/^path[[:space:]]+=[[:space:]]+"bin\//{ ##Checking if line starts from path followed by space = followed by spaces followed by "bin/ here.
print value ##printing value here.
value="" ##Nullifying value here.
}
' | ##Passing awk program output as input to tac here.
tac ##Printing values in their actual order.
I have a file f1:
line1
line2
line3
line4
..
..
I want to delete all the lines which are in another file f2:
line2
line8
..
..
I tried something with cat and sed, which wasn't even close to what I intended. How can I do this?
grep -v -x -f f2 f1 should do the trick.
Explanation:
-v to select non-matching lines
-x to match whole lines only
-f f2 to get patterns from f2
One can instead use grep -F or fgrep to match fixed strings from f2 rather than patterns (in case you want remove the lines in a "what you see if what you get" manner rather than treating the lines in f2 as regex patterns).
Try comm instead (assuming f1 and f2 are "already sorted")
comm -2 -3 f1 f2
For exclude files that aren't too huge, you can use AWK's associative arrays.
awk 'NR == FNR { list[tolower($0)]=1; next } { if (! list[tolower($0)]) print }' exclude-these.txt from-this.txt
The output will be in the same order as the "from-this.txt" file. The tolower() function makes it case-insensitive, if you need that.
The algorithmic complexity will probably be O(n) (exclude-these.txt size) + O(n) (from-this.txt size)
Similar to Dennis Williamson's answer (mostly syntactic changes, e.g. setting the file number explicitly instead of the NR == FNR trick):
awk '{if (f==1) { r[$0] } else if (! ($0 in r)) { print $0 } } ' f=1 exclude-these.txt f=2 from-this.txt
Accessing r[$0] creates the entry for that line, no need to set a value.
Assuming awk uses a hash table with constant lookup and (on average) constant update time, the time complexity of this will be O(n + m), where n and m are the lengths of the files. In my case, n was ~25 million and m ~14000. The awk solution was much faster than sort, and I also preferred keeping the original order.
if you have Ruby (1.9+)
#!/usr/bin/env ruby
b=File.read("file2").split
open("file1").each do |x|
x.chomp!
puts x if !b.include?(x)
end
Which has O(N^2) complexity. If you want to care about performance, here's another version
b=File.read("file2").split
a=File.read("file1").split
(a-b).each {|x| puts x}
which uses a hash to effect the subtraction, so is complexity O(n) (size of a) + O(n) (size of b)
here's a little benchmark, courtesy of user576875, but with 100K lines, of the above:
$ for i in $(seq 1 100000); do echo "$i"; done|sort --random-sort > file1
$ for i in $(seq 1 2 100000); do echo "$i"; done|sort --random-sort > file2
$ time ruby test.rb > ruby.test
real 0m0.639s
user 0m0.554s
sys 0m0.021s
$time sort file1 file2|uniq -u > sort.test
real 0m2.311s
user 0m1.959s
sys 0m0.040s
$ diff <(sort -n ruby.test) <(sort -n sort.test)
$
diff was used to show there are no differences between the 2 files generated.
Some timing comparisons between various other answers:
$ for n in {1..10000}; do echo $RANDOM; done > f1
$ for n in {1..10000}; do echo $RANDOM; done > f2
$ time comm -23 <(sort f1) <(sort f2) > /dev/null
real 0m0.019s
user 0m0.023s
sys 0m0.012s
$ time ruby -e 'puts File.readlines("f1") - File.readlines("f2")' > /dev/null
real 0m0.026s
user 0m0.018s
sys 0m0.007s
$ time grep -xvf f2 f1 > /dev/null
real 0m43.197s
user 0m43.155s
sys 0m0.040s
sort f1 f2 | uniq -u isn't even a symmetrical difference, because it removes lines that appear multiple times in either file.
comm can also be used with stdin and here strings:
echo $'a\nb' | comm -23 <(sort) <(sort <<< $'c\nb') # a
Seems to be a job suitable for the SQLite shell:
create table file1(line text);
create index if1 on file1(line ASC);
create table file2(line text);
create index if2 on file2(line ASC);
-- comment: if you have | in your files then specify “ .separator ××any_improbable_string×× ”
.import 'file1.txt' file1
.import 'file2.txt' file2
.output result.txt
select * from file2 where line not in (select line from file1);
.q
Did you try this with sed?
sed 's#^#sed -i '"'"'s%#g' f2 > f2.sh
sed -i 's#$#%%g'"'"' f1#g' f2.sh
sed -i '1i#!/bin/bash' f2.sh
sh f2.sh
Not a 'programming' answer but here's a quick and dirty solution: just go to http://www.listdiff.com/compare-2-lists-difference-tool.
Obviously won't work for huge files but it did the trick for me. A few notes:
I'm not affiliated with the website in any way (if you still don't believe me, then you can just search for a different tool online; I used the search term "set difference list online")
The linked website seems to make network calls on every list comparison, so don't feed it any sensitive data
A Python way of filtering one list using another list.
Load files:
>>> f1 = open('f1').readlines()
>>> f2 = open('f2.txt').readlines()
Remove '\n' string at the end of each line:
>>> f1 = [i.replace('\n', '') for i in f1]
>>> f2 = [i.replace('\n', '') for i in f2]
Print only the f1 lines that are also in the f2 file:
>>> [a for a in f1 if all(b not in a for b in f2)]
$ cat values.txt
apple
banana
car
taxi
$ cat source.txt
fruits
mango
king
queen
number
23
43
sentence is long
so what
...
...
I made a small shell scrip to "weed" out the values in source file which are present in values.txt file.
$cat weed_out.sh
from=$1
cp -p $from $from.final
for x in `cat values.txt`;
do
grep -v $x $from.final > $from.final.tmp
mv $from.final.tmp $from.final
done
executing...
$ ./weed_out source.txt
and you get a nicely cleaned up file....
Am not able to get the desired o/p when the data field has pipe in it.
If the i/p is
SAmple file is tst
hdr1|"hdr2|tst"|"hdr3|tst|tst"|hdr4|"hdr5|tst|tst"
lbl1|"lbl2|tst"|"lbl3|tst|tst"|lbl4|"lbl5|tst|tst"
I tried with this cmd but dont get the expected o/p - cut -f2,3 -d"|" tst
The expected o/p is
"hdr2|tst"|"hdr3|tst|tst"
"lbl2|tst"|"lbl3|tst|tst"
Is there an easy way that we can crack this o/p...Dont want to go with sed bcoz the tool that am using doesnt allow the charecter (""- backslash). I mean am embedding this command in one of the tool
Also am using old version of gawk -
so this cmd doesnt give te desired o/p
gawk -v FPAT='[^|]*|("[^"]*")+' '{print $2, $3}' OFS="|"
Output of gawk --version
GNU Awk 3.1.7
Output of cat -vet tst
hdr1|"hdr2|tst"|"hdr3|tst|tst"|hdr4|"hdr5|tst|tst"$
lbl1|"lbl2|tst"|"lbl3|tst|tst"|lbl4|"lbl5|tst|tst"$
Upgrading your gawk version is by far the best approach as you're missing a few bug fixes and a ton of extremely useful functionality introduced since gawk 3.1.7 came out 10+ years ago (we're currently on gawk version 5.1!) but if you can't do that for some reason then - here's what you can do if you don't have FPAT using any awk in any shell on every UNIX box:
$ cat tst.awk
BEGIN { OFS="|" }
{
orig = $0
$0 = i = ""
while ( (orig != "") && match(orig,/[^|]*|("[^"]*")+/) ) {
$(++i) = substr(orig,RSTART,RLENGTH)
orig = substr(orig,RSTART+RLENGTH+1)
}
print $2, $3
}
.
$ awk -f tst.awk file
"hdr2|tst"|"hdr3|tst|tst"
"lbl2|tst"|"lbl3|tst|tst"
Just to verify that it's identifying all of the fields correctly:
$ cat tst.awk
BEGIN { OFS="|" }
{
orig = $0
$0 = i = ""
while ( (orig != "") && match(orig,/[^|]*|("[^"]*")+/) ) {
$(++i) = substr(orig,RSTART,RLENGTH)
orig = substr(orig,RSTART+RLENGTH+1)
}
print NF " <" $0 ">"
for (i=1; i<=NF; i++) {
print "\t" i " <" $i ">"
}
}
.
$ awk -f tst.awk file
5 <hdr1|"hdr2|tst"|"hdr3|tst|tst"|hdr4|"hdr5|tst|tst">
1 <hdr1>
2 <"hdr2|tst">
3 <"hdr3|tst|tst">
4 <hdr4>
5 <"hdr5|tst|tst">
5 <lbl1|"lbl2|tst"|"lbl3|tst|tst"|lbl4|"lbl5|tst|tst">
1 <lbl1>
2 <"lbl2|tst">
3 <"lbl3|tst|tst">
4 <lbl4>
5 <"lbl5|tst|tst">
if you don't have embedded double quotes, you can substitute the quoted delimiter values with another unused character (I used ~) and after extraction switch back to the original values. Obviously it requires that the new delimiter is not used within text.
$ awk 'BEGIN{OFS=FS="\""} {for(i=2;i<NF;i+=2) gsub("\\|","~",$i)}1' file |
awk 'BEGIN{OFS=FS="|"} {print $2,$3}' |
sed 's/~/|/g'
"hdr2|tst"|"hdr3|tst|tst"
"lbl2|tst"|"lbl3|tst|tst"
Not sure it's simpler than the single awk script though.
Main problem here is the document format design. Requires another patch if there are embedded double quotes, or escaped pipes etc.
I know there have been similar questions posted but I'm still having a bit of trouble getting the output I want using awk FNR==NR...
I have 2 files as such
File 1:
123|this|is|good
456|this|is|better
...
File 2:
aaa|123
bbb|456
...
So I want to join on values from file 2/column2 to file 1/column1 and output file 1 (col 2,3,4) and file 2 (col 1).
Thanks in advance.
With awk you could do something like
awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } $1 in val { $(NF + 1) = val[$1]; print }' file2 file1
NF is the number of fields in a record (line by default), so $NF is the last field, and $(NF + 1) is the field after that. By assigning the saved value from the pass over file2 to it, a new field is appended to the record before it is printed.
One thing to note: This behaves like an inner join, i.e., only records are printed whose key appears in both files. To make this a right join, you can use
awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } { $(NF + 1) = val[$1]; print }' file2 file1
That is, you can drop the $1 in val condition on the append-and-print action. If $1 is not in val, val[$1] is empty, and an empty field will be appended to the record before printing.
But it's probably better to use join:
join -1 1 -2 2 -t \| file1 file2
If you don't want the key field to be part of the output, pipe the output of either of those commands through cut -d \| -f 2- to get rid of it, i.e.
join -1 1 -2 2 -t \| file1 file2 | cut -d \| -f 2-
If the files have the same number of lines in the same order, then
paste -d '|' file1 file2 | cut -d '|' -f 2-5
this|is|good|aaa
this|is|better|bbb
I see in a comment to Wintermute's answer that the files aren't sorted. With bash, process substitutions are handy to sort on the fly:
paste -d '|' <(sort -t '|' -k 1,1 file1) <(sort -t '|' -k 2,2 file2) |
cut -d '|' -f 2-5
To reiterate: this solution requires a one-to-one correspondence between the files
It is typical to have something like this in your cshrc file for setting the path:
set path = ( . $otherpath $path )
but, the path gets duplicated when you source your cshrc file multiple times, how do you prevent the duplication?
EDIT: This is one unclean way of doing it:
set localpaths = ( . $otherpaths )
echo ${path} | egrep -i "$localpaths" >& /dev/null
if ($status != 0) then
set path = ( . $otherpaths $path )
endif
Im surprised no one used the tr ":" "\n" | grep -x techique to search if a given folder already exists in $PATH. Any reason not to?
In 1 line:
if ! $(echo "$PATH" | tr ":" "\n" | grep -qx "$dir") ; then PATH=$PATH:$dir ; fi
Here is a function ive made myself to add several folders at once to $PATH (use "aaa:bbb:ccc" notation as argument), checking each one for duplicates before adding:
append_path()
{
local SAVED_IFS="$IFS"
local dir
IFS=:
for dir in $1 ; do
if ! $( echo "$PATH" | tr ":" "\n" | grep -qx "$dir" ) ; then
PATH=$PATH:$dir
fi
done
IFS="$SAVED_IFS"
}
It can be called in a script like this:
append_path "/test:$HOME/bin:/example/my dir/space is not an issue"
It has the following advantages:
No bashisms or any shell-specific syntax. It run perfectly with !#/bin/sh (ive tested with dash)
Multiple folders can be added at once
No sorting, preserves folder order
Deals perfectly with spaces in folder names
A single test works no matter if $folder is at begginning, end, middle, or is the only folder in $PATH (thus avoiding testing x:*, *:x, :x:, x, as many of the solutions here implicitly do)
Works (and preserve) if $PATH begins or ends with ":", or has "::" in it (meaning current folder)
No awk or sed needed.
EPA friendly ;) Original IFS value is preserved, and all other variables are local to the function scope.
Hope that helps!
ok, not in csh, but this is how I append $HOME/bin to my path in bash...
case $PATH in
*:$HOME/bin | *:$HOME/bin:* ) ;;
*) export PATH=$PATH:$HOME/bin
esac
season to taste...
you can use the following Perl script to prune paths of duplicates.
#!/usr/bin/perl
#
# ^^ ensure this is pointing to the correct location.
#
# Title: SLimPath
# Author: David "Shoe Lace" Pyke <eselle#users.sourceforge.net >
# : Tim Nelson
# Purpose: To create a slim version of my envirnoment path so as to eliminate
# duplicate entries and ensure that the "." path was last.
# Date Created: April 1st 1999
# Revision History:
# 01/04/99: initial tests.. didn't wok verywell at all
# : retreived path throught '$ENV' call
# 07/04/99: After an email from Tim Nelson <wayland#ne.com.au> got it to
# work.
# : used 'push' to add to array
# : used 'join' to create a delimited string from a list/array.
# 16/02/00: fixed cmd-line options to look/work better
# 25/02/00: made verbosity level-oriented
#
#
use Getopt::Std;
sub printlevel;
$initial_str = "";
$debug_mode = "";
$delim_chr = ":";
$opt_v = 1;
getopts("v:hd:l:e:s:");
OPTS: {
$opt_h && do {
print "\n$0 [-v level] [-d level] [-l delim] ( -e varname | -s strname | -h )";
print "\nWhere:";
print "\n -h This help";
print "\n -d Debug level";
print "\n -l Delimiter (between path vars)";
print "\n -e Specify environment variable (NB: don't include \$ sign)";
print "\n -s String (ie. $0 -s \$PATH:/looser/bin/)";
print "\n -v Verbosity (0 = quiet, 1 = normal, 2 = verbose)";
print "\n";
exit;
};
$opt_d && do {
printlevel 1, "You selected debug level $opt_d\n";
$debug_mode = $opt_d;
};
$opt_l && do {
printlevel 1, "You are going to delimit the string with \"$opt_l\"\n";
$delim_chr = $opt_l;
};
$opt_e && do {
if($opt_s) { die "Cannot specify BOTH env var and string\n"; }
printlevel 1, "Using Environment variable \"$opt_e\"\n";
$initial_str = $ENV{$opt_e};
};
$opt_s && do {
printlevel 1, "Using String \"$opt_s\"\n";
$initial_str = $opt_s;
};
}
if( ($#ARGV != 1) and !$opt_e and !$opt_s){
die "Nothing to work with -- try $0 -h\n";
}
$what = shift #ARGV;
# Split path using the delimiter
#dirs = split(/$delim_chr/, $initial_str);
$dest;
#newpath = ();
LOOP: foreach (#dirs){
# Ensure the directory exists and is a directory
if(! -e ) { printlevel 1, "$_ does not exist\n"; next; }
# If the directory is ., set $dot and go around again
if($_ eq '.') { $dot = 1; next; }
# if ($_ ne `realpath $_`){
# printlevel 2, "$_ becomes ".`realpath $_`."\n";
# }
undef $dest;
#$_=Stdlib::realpath($_,$dest);
# Check for duplicates and dot path
foreach $adir (#newpath) { if($_ eq $adir) {
printlevel 2, "Duplicate: $_\n";
next LOOP;
}}
push #newpath, $_;
}
# Join creates a string from a list/array delimited by the first expression
print join($delim_chr, #newpath) . ($dot ? $delim_chr.".\n" : "\n");
printlevel 1, "Thank you for using $0\n";
exit;
sub printlevel {
my($level, $string) = #_;
if($opt_v >= $level) {
print STDERR $string;
}
}
i hope thats useful.
I've been using the following (Bourne/Korn/POSIX/Bash) script for most of a decade:
: "#(#)$Id: clnpath.sh,v 1.6 1999/06/08 23:34:07 jleffler Exp $"
#
# Print minimal version of $PATH, possibly removing some items
case $# in
0) chop=""; path=${PATH:?};;
1) chop=""; path=$1;;
2) chop=$2; path=$1;;
*) echo "Usage: `basename $0 .sh` [$PATH [remove:list]]" >&2
exit 1;;
esac
# Beware of the quotes in the assignment to chop!
echo "$path" |
${AWK:-awk} -F: '#
BEGIN { # Sort out which path components to omit
chop="'"$chop"'";
if (chop != "") nr = split(chop, remove); else nr = 0;
for (i = 1; i <= nr; i++)
omit[remove[i]] = 1;
}
{
for (i = 1; i <= NF; i++)
{
x=$i;
if (x == "") x = ".";
if (omit[x] == 0 && path[x]++ == 0)
{
output = output pad x;
pad = ":";
}
}
print output;
}'
In Korn shell, I use:
export PATH=$(clnpath /new/bin:/other/bin:$PATH /old/bin:/extra/bin)
This leaves me with PATH containing the new and other bin directories at the front, plus one copy of each directory name in the main path value, except that the old and extra bin directories have bin removed.
You would have to adapt this to C shell (sorry - but I'm a great believer in the truths enunciated at C Shell Programming Considered Harmful). Primarily, you won't have to fiddle with the colon separator, so life is actually easier.
Well, if you don't care what order your paths are in, you could do something like:
set path=(`echo $path | tr ' ' '\n' | sort | uniq | tr '\n' ' '`)
That will sort your paths and remove any extra paths that are the same. If you have . in your path, you may want to remove it with a grep -v and re-add it at the end.
Here is a long one-liner without sorting:
set path = ( echo $path | tr ' ' '\n' | perl -e 'while (<>) { print $_ unless $s{$_}++; }' | tr '\n' ' ')
dr_peper,
I usually prefer to stick to scripting capabilities of the shell I am living in. Makes it more portable. So, I liked your solution using csh scripting. I just extended it to work on per dir in the localdirs to make it work for myself.
foreach dir ( $localdirs )
echo ${path} | egrep -i "$dir" >& /dev/null
if ($status != 0) then
set path = ( $dir $path )
endif
end
Using sed(1) to remove duplicates.
$ PATH=$(echo $PATH | sed -e 's/$/:/;s/^/:/;s/:/::/g;:a;s#\(:[^:]\{1,\}:\)\(.*\)\1#\1\2#g;ta;s/::*/:/g;s/^://;s/:$//;')
This will remove the duplicates after the first instance, which may or may not be what you want, e.g.:
$ NEWPATH=/bin:/usr/bin:/bin:/usr/local/bin:/usr/local/bin:/bin
$ echo $NEWPATH | sed -e 's/$/:/; s/^/:/; s/:/::/g; :a; s#\(:[^:]\{1,\}:\)\(.*\)\1#\1\2#g; t a; s/::*/:/g; s/^://; s/:$//;'
/bin:/usr/bin:/usr/local/bin
$
Enjoy!
Here's what I use - perhaps someone else will find it useful:
#!/bin/csh
# ABSTRACT
# /bin/csh function-like aliases for manipulating environment
# variables containing paths.
#
# BUGS
# - These *MUST* be single line aliases to avoid parsing problems apparently related
# to if-then-else
# - Aliases currently perform tests in inefficient in order to avoid parsing problems
# - Extremely fragile - use bash instead!!
#
# AUTHOR
# J. P. Abelanet - 11/11/10
# Function-like alias to add a path to the front of an environment variable
# containing colon (':') delimited paths, without path duplication
#
# Usage: prepend_path ENVVARIABLE /path/to/prepend
alias prepend_path \
'set arg2="\!:2"; if ($?\!:1 == 0) setenv \!:1 "$arg2"; if ($?\!:1 && $\!:1 !~ {,*:}"$arg2"{:*,}) setenv \!:1 "$arg2":"$\!:1";'
# Function-like alias to add a path to the back of any environment variable
# containing colon (':') delimited paths, without path duplication
#
# Usage: append_path ENVVARIABLE /path/to/append
alias append_path \
'set arg2="\!:2"; if ($?\!:1 == 0) setenv \!:1 "$arg2"; if ($?\!:1 && $\!:1 !~ {,*:}"$arg2"{:*,}) setenv \!:1 "$\!:1":"$arg2";'
When setting path (lowercase, the csh variable) rather than PATH (the environment variable) in csh, you can use set -f and set -l, which will only keep one occurrence of each list element (preferring to keep either the first or last, respectively).
https://nature.berkeley.edu/~casterln/tcsh/Builtin_commands.html#set
So something like this
cat foo.csh # or .tcshrc or whatever:
set -f path = (/bin /usr/bin . ) # initial value
set -f path = ($path /mycode /hercode /usr/bin ) # add things, both new and duplicates
Will not keep extending PATH with duplicates every time you source it:
% source foo.csh
% echo $PATH
% /bin:/usr/bin:.:/mycode:/hercode
% source foo.csh
% echo $PATH
% /bin:/usr/bin:.:/mycode:/hercode
set -f there ensures that only the first occurrence of each PATH element is kept.
I always set my path from scratch in .cshrc.
That is I start off with a basic path, something like:
set path = (. ~/bin /bin /usr/bin /usr/ucb /usr/bin/X11)
(depending on the system).
And then do:
set path = ($otherPath $path)
to add more stuff
I have the same need as the original question.
Building on your previous answers, I have used in Korn/POSIX/Bash:
export PATH=$(perl -e 'print join ":", grep {!$h{$_}++} split ":", "'$otherpath:$PATH\")
I had difficulties to translate it directly in csh (csh escape rules are insane). I have used (as suggested by dr_pepper):
set path = ( `echo $otherpath $path | tr ' ' '\n' | perl -ne 'print $_ unless $h{$_}++' | tr '\n' ' '`)
Do you have ideas to simplify it more (reduce the number of pipes) ?