When doing a docker push or when pulling an image, how does Docker determine if there is a registry server in the image name or if it is a path/username on the default registry (e.g. Docker Hub)?
I'm seeing the following from the 1.1 image specification:
Tag
A tag serves to map a descriptive, user-given name to any single image
ID. Tag values are limited to the set of characters [a-zA-Z_0-9].
Repository
A collection of tags grouped under a common prefix (the name component
before :). For example, in an image tagged with the name my-app:3.1.4,
my-app is the Repository component of the name. A repository name is
made up of slash-separated name components, optionally prefixed by a
DNS hostname. The hostname must follow comply with standard DNS rules,
but may not contain _ characters. If a hostname is present, it may
optionally be followed by a port number in the format :8080. Name
components may contain lowercase characters, digits, and separators. A
separator is defined as a period, one or two underscores, or one or
more dashes. A name component may not start or end with a separator.
For the DNS host name, does it need to be fully qualified with dots, or is "my-local-server" a valid registry hostname? For the name components, I'm seeing periods as valid, which implies "team.user/appserver" is a valid image name. If the registry server is running on port 80, and therefore no port number is needed on the hostname in the image name, it seems like there would be ambiguity between the hostname and the path on the registry server. I'm curious how Docker resolves that ambiguity.
TL;DR: The hostname must contain a . dns separator, a : port separator, or the value "localhost" before the first /. Otherwise the code assumes you want the default registry, Docker Hub.
After some digging through the code, I came across distribution/distribution/reference/reference.go with the following:
// Grammar
//
// reference := name [ ":" tag ] [ "#" digest ]
// name := [hostname '/'] component ['/' component]*
// hostname := hostcomponent ['.' hostcomponent]* [':' port-number]
// hostcomponent := /([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])/
// port-number := /[0-9]+/
// component := alpha-numeric [separator alpha-numeric]*
// alpha-numeric := /[a-z0-9]+/
// separator := /[_.]|__|[-]*/
//
// tag := /[\w][\w.-]{0,127}/
//
// digest := digest-algorithm ":" digest-hex
// digest-algorithm := digest-algorithm-component [ digest-algorithm-separator digest-algorithm-component ]
// digest-algorithm-separator := /[+.-_]/
// digest-algorithm-component := /[A-Za-z][A-Za-z0-9]*/
// digest-hex := /[0-9a-fA-F]{32,}/ ; At least 128 bit digest value
The actual implementation of that is via a regex in distribution/distribution/reference/regexp.go.
But with some digging and poking, I found that there's another check beyond that regex (e.g. you'll get errors with an uppercase hostname if you don't don't include a . or :). And I tracked down the actual split of the name to the following in distribution/distribution/reference/normalize.go:
// splitDockerDomain splits a repository name to domain and remotename string.
// If no valid domain is found, the default domain is used. Repository name
// needs to be already validated before.
func splitDockerDomain(name string) (domain, remainder string) {
i := strings.IndexRune(name, '/')
if i == -1 || (!strings.ContainsAny(name[:i], ".:") && name[:i] != "localhost") {
domain, remainder = defaultDomain, name
} else {
domain, remainder = name[:i], name[i+1:]
}
if domain == legacyDefaultDomain {
domain = defaultDomain
}
if domain == defaultDomain && !strings.ContainsRune(remainder, '/') {
remainder = officialRepoName + "/" + remainder
}
return
}
The important part of that for me is the check for the ., :, or the hostname localhost before the first / in the first if statement. With it, the hostname is split out from before the first /, and without it, the entire name is passed to the default registry hostname.
The image-spec at https://github.com/moby/moby/blob/master/image/spec/v1.1.md has now been updated to say that tags are limited to 128 characters.
The PR thread is here https://github.com/docker/distribution/issues/2248
Some Ruby code is here https://github.com/cyber-dojo/runner/blob/e98bc280c5349cb2919acecb0dfbfefa1ac4e5c3/src/docker/image_name.rb
Some Ruby tests are https://github.com/cyber-dojo/runner/blob/e98bc280c5349cb2919acecb0dfbfefa1ac4e5c3/test_server/image_name_test.rb
Note: Many URL parsing libraries aren't able to parse docker image references / tags, unless they conform to standardized URL format.
Example Ansible Snippet:
- debug: #(FAILS)
msg: "{{ 'docker.io/alpine' | urlsplit() }}"
# ^-- This will fail, because the image reference isn't in standard URL format
# If you can convert the docker image reference to standard URL format
# Then most URL parsing libraries will work correctly
- debug: #(WORKS)
msg: "{{ ('https://' + 'docker.io/alpine') | urlsplit() }}"
# ^-- Example: This becomes standard URL syntax, so it parses correctly
- debug: #(FAILS)
msg: "{{ ('http://' + 'busybox:1.34.1-glibc') | urlsplit('path') }}"
# ^-- Unfortunately, this trick won't work to turn 100% of images into
# Standard URL format for parsing. (This example fails as well)
Based on BMitch's answer I realized a simple if statement algorithmic logic could be used to convert arbitrary docker image references / tags into standardized URL format, which allows them to be parsed by most libraries.
Algorithm in human speak:
1. look for / in $TAG
2. If / not found
Then return ("https://docker.io/" + $TAG)
3. If / found, split $TAG into 2 parts by first /
and test text left of /, to look for ".", ":", or "localhost"
4. If (".", ":", or "localhost" found in text left of 1st /)
Then return (https://" + $TAG)
5. If (".", ":", or "localhost" not found in text left of 1st /)
Then return (https://docker.io/ + $TAG)
(This logic converts docker tags into standardized URL format
so they can be processed by URL parsing libraries.)
Algorithm in Bash:
vi docker_tag_to_standardized_url_format.sh
(Copy paste the following)
#!/bin/bash
#This standardizes the naming of docker images
#Basically busybox --------------------> https://docker.io/busybox
# myregistry.tld/myimage:tag -> https://myregistry.tld/myimage:tag
STDIN=$(cat -)
INPUT=$STDIN
OUTPUT=""
echo "$INPUT" | grep "/" > /dev/null
if [ $? -eq 0 ]; then
echo "$INPUT" | cut -d "/" -f1 | egrep "\.|:|localhost" > /dev/null
#Note: grep considers . as wildcard, \ is escape character to treat \. as .
if [ $? -eq 0 ]; then
OUTPUT="https://$INPUT"
else
OUTPUT="https://docker.io/$INPUT"
fi
else
OUTPUT="https://docker.io/$INPUT"
fi
echo $OUTPUT
Make it executable:
chmod +x ./docker_tag_to_standardized_url_format.sh
Usage Example:
# Test data, to verify against edge cases
A=docker.io/alpine
B=docker.io/rancher/system-upgrade-controller:v0.8.0
C=busybox:1.34.1-glibc
D=busybox
E=rancher/system-upgrade-controller:v0.8.0
F=localhost:5000/helloworld:latest
G=quay.io/go/go/gadget:arms
####################################
echo $A | ./docker_tag_to_standardized_url_format.sh
echo $B | ./docker_tag_to_standardized_url_format.sh
echo $C | ./docker_tag_to_standardized_url_format.sh
echo $D | ./docker_tag_to_standardized_url_format.sh
echo $E | ./docker_tag_to_standardized_url_format.sh
echo $F | ./docker_tag_to_standardized_url_format.sh
echo $G | ./docker_tag_to_standardized_url_format.sh
Related
When doing a docker push or when pulling an image, how does Docker determine if there is a registry server in the image name or if it is a path/username on the default registry (e.g. Docker Hub)?
I'm seeing the following from the 1.1 image specification:
Tag
A tag serves to map a descriptive, user-given name to any single image
ID. Tag values are limited to the set of characters [a-zA-Z_0-9].
Repository
A collection of tags grouped under a common prefix (the name component
before :). For example, in an image tagged with the name my-app:3.1.4,
my-app is the Repository component of the name. A repository name is
made up of slash-separated name components, optionally prefixed by a
DNS hostname. The hostname must follow comply with standard DNS rules,
but may not contain _ characters. If a hostname is present, it may
optionally be followed by a port number in the format :8080. Name
components may contain lowercase characters, digits, and separators. A
separator is defined as a period, one or two underscores, or one or
more dashes. A name component may not start or end with a separator.
For the DNS host name, does it need to be fully qualified with dots, or is "my-local-server" a valid registry hostname? For the name components, I'm seeing periods as valid, which implies "team.user/appserver" is a valid image name. If the registry server is running on port 80, and therefore no port number is needed on the hostname in the image name, it seems like there would be ambiguity between the hostname and the path on the registry server. I'm curious how Docker resolves that ambiguity.
TL;DR: The hostname must contain a . dns separator, a : port separator, or the value "localhost" before the first /. Otherwise the code assumes you want the default registry, Docker Hub.
After some digging through the code, I came across distribution/distribution/reference/reference.go with the following:
// Grammar
//
// reference := name [ ":" tag ] [ "#" digest ]
// name := [hostname '/'] component ['/' component]*
// hostname := hostcomponent ['.' hostcomponent]* [':' port-number]
// hostcomponent := /([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])/
// port-number := /[0-9]+/
// component := alpha-numeric [separator alpha-numeric]*
// alpha-numeric := /[a-z0-9]+/
// separator := /[_.]|__|[-]*/
//
// tag := /[\w][\w.-]{0,127}/
//
// digest := digest-algorithm ":" digest-hex
// digest-algorithm := digest-algorithm-component [ digest-algorithm-separator digest-algorithm-component ]
// digest-algorithm-separator := /[+.-_]/
// digest-algorithm-component := /[A-Za-z][A-Za-z0-9]*/
// digest-hex := /[0-9a-fA-F]{32,}/ ; At least 128 bit digest value
The actual implementation of that is via a regex in distribution/distribution/reference/regexp.go.
But with some digging and poking, I found that there's another check beyond that regex (e.g. you'll get errors with an uppercase hostname if you don't don't include a . or :). And I tracked down the actual split of the name to the following in distribution/distribution/reference/normalize.go:
// splitDockerDomain splits a repository name to domain and remotename string.
// If no valid domain is found, the default domain is used. Repository name
// needs to be already validated before.
func splitDockerDomain(name string) (domain, remainder string) {
i := strings.IndexRune(name, '/')
if i == -1 || (!strings.ContainsAny(name[:i], ".:") && name[:i] != "localhost") {
domain, remainder = defaultDomain, name
} else {
domain, remainder = name[:i], name[i+1:]
}
if domain == legacyDefaultDomain {
domain = defaultDomain
}
if domain == defaultDomain && !strings.ContainsRune(remainder, '/') {
remainder = officialRepoName + "/" + remainder
}
return
}
The important part of that for me is the check for the ., :, or the hostname localhost before the first / in the first if statement. With it, the hostname is split out from before the first /, and without it, the entire name is passed to the default registry hostname.
The image-spec at https://github.com/moby/moby/blob/master/image/spec/v1.1.md has now been updated to say that tags are limited to 128 characters.
The PR thread is here https://github.com/docker/distribution/issues/2248
Some Ruby code is here https://github.com/cyber-dojo/runner/blob/e98bc280c5349cb2919acecb0dfbfefa1ac4e5c3/src/docker/image_name.rb
Some Ruby tests are https://github.com/cyber-dojo/runner/blob/e98bc280c5349cb2919acecb0dfbfefa1ac4e5c3/test_server/image_name_test.rb
Note: Many URL parsing libraries aren't able to parse docker image references / tags, unless they conform to standardized URL format.
Example Ansible Snippet:
- debug: #(FAILS)
msg: "{{ 'docker.io/alpine' | urlsplit() }}"
# ^-- This will fail, because the image reference isn't in standard URL format
# If you can convert the docker image reference to standard URL format
# Then most URL parsing libraries will work correctly
- debug: #(WORKS)
msg: "{{ ('https://' + 'docker.io/alpine') | urlsplit() }}"
# ^-- Example: This becomes standard URL syntax, so it parses correctly
- debug: #(FAILS)
msg: "{{ ('http://' + 'busybox:1.34.1-glibc') | urlsplit('path') }}"
# ^-- Unfortunately, this trick won't work to turn 100% of images into
# Standard URL format for parsing. (This example fails as well)
Based on BMitch's answer I realized a simple if statement algorithmic logic could be used to convert arbitrary docker image references / tags into standardized URL format, which allows them to be parsed by most libraries.
Algorithm in human speak:
1. look for / in $TAG
2. If / not found
Then return ("https://docker.io/" + $TAG)
3. If / found, split $TAG into 2 parts by first /
and test text left of /, to look for ".", ":", or "localhost"
4. If (".", ":", or "localhost" found in text left of 1st /)
Then return (https://" + $TAG)
5. If (".", ":", or "localhost" not found in text left of 1st /)
Then return (https://docker.io/ + $TAG)
(This logic converts docker tags into standardized URL format
so they can be processed by URL parsing libraries.)
Algorithm in Bash:
vi docker_tag_to_standardized_url_format.sh
(Copy paste the following)
#!/bin/bash
#This standardizes the naming of docker images
#Basically busybox --------------------> https://docker.io/busybox
# myregistry.tld/myimage:tag -> https://myregistry.tld/myimage:tag
STDIN=$(cat -)
INPUT=$STDIN
OUTPUT=""
echo "$INPUT" | grep "/" > /dev/null
if [ $? -eq 0 ]; then
echo "$INPUT" | cut -d "/" -f1 | egrep "\.|:|localhost" > /dev/null
#Note: grep considers . as wildcard, \ is escape character to treat \. as .
if [ $? -eq 0 ]; then
OUTPUT="https://$INPUT"
else
OUTPUT="https://docker.io/$INPUT"
fi
else
OUTPUT="https://docker.io/$INPUT"
fi
echo $OUTPUT
Make it executable:
chmod +x ./docker_tag_to_standardized_url_format.sh
Usage Example:
# Test data, to verify against edge cases
A=docker.io/alpine
B=docker.io/rancher/system-upgrade-controller:v0.8.0
C=busybox:1.34.1-glibc
D=busybox
E=rancher/system-upgrade-controller:v0.8.0
F=localhost:5000/helloworld:latest
G=quay.io/go/go/gadget:arms
####################################
echo $A | ./docker_tag_to_standardized_url_format.sh
echo $B | ./docker_tag_to_standardized_url_format.sh
echo $C | ./docker_tag_to_standardized_url_format.sh
echo $D | ./docker_tag_to_standardized_url_format.sh
echo $E | ./docker_tag_to_standardized_url_format.sh
echo $F | ./docker_tag_to_standardized_url_format.sh
echo $G | ./docker_tag_to_standardized_url_format.sh
After a lengthy pipe which ends with a grep, I correctly end up with a set of matching absolute paths/files and match string separated by a comma delimiter for each. I want to tag each file with its match string. Complicated also in that the path has spaces but there is none between the delimiter and the preceding and succeeding characters.
I need to be able to deal with an absolute path rather than just the filename within the directory. The match strings are space_free but the filename might not be:
So by way of example, the output of the pipe might look like:
pipe1 | pipe2 |
outputs
/Users/bloggs/Directory One/matched_file.doc,attributes_0001ABC
/Users/bloggs/Directory One/matched_file1.doc,attributeY_2
/Users/bloggs/Directory One/match_file_00x.doc,Attribute_00201
/Users/bloggs/Directory One/matching file 2.doc,attribute_0004
I want to tag each using something which will probably include:
tag --add "$attribute" "$file"
Where attribute refers to the match string eg "Attribute_00201"
Normally I'd just say eg:
tag --add Attribute_00201 /Users/bloggs/Directory\ One/match_file_00x.doc
At this point I am stuck how to parse each line ideally via another pipe and to deal with spaces correctly and execute the tag command. Grateful for any help
So I'm looking for a new pipe, pipe3 to execute or give me the correctly formatted tag command:
pipe1 | pipe2 | pipe3
delivers eg
tag --add Attribute_00201 /Users/bloggs/Directory\ One/match_file_00x.doc
etc
etc
This seems to work
| tee >(cut -f2 -d","| sed 's/^/tag --add /' > temp_out.txt) >(cut -d"," -f1 | sed -e 's/[[:space:]]/\\ /g' > temp_out1.txt) > /dev/null && paste -d' ' temp_out.txt temp_out1.txt > command.sh && chmod +x ./command.sh
I am trying to generate md5 hash from Powershell. I installed Powershell Community Extension (Pscx) to get command : Get-Hash
However when I generate md5 hash using Get-Hash, it doesn't seem to match the hash generated using md5sum on an Ubuntu machine.
Powershell:
PS U:\> "hello world" | get-hash -Algorithm MD5
Path Algorithm HashString Hash
---- --------- ---------- ----
MD5 E42B054623B3799CB71F0883900F2764 {228, 43, 5, 70...}
Ubuntu:
root#LT-A03433:~# echo "hello world" | md5sum
6f5902ac237024bdd0c176cb93063dc4 -
I know that the one generated by Ubuntu is correct as a couple of online sites show the same result.
What am I going wrong with Powershell Get-Hash?
The difference is not obvious, but you are not hashing the same data. MD5 is a hashing algorithm, and it has no notion of text encoding – this is why you can create a hash of binary data just as easily as a hash of text. With that in mind, we can find out what bytes (or octets; strictly a stream of values of 8 bits each) MD5 is calculating the hash of. For this, we can use xxd, or any other hexeditor.
First, your Ubuntu example:
$ echo "hello world" | xxd
0000000: 6865 6c6c 6f20 776f 726c 640a hello world.
Note the 0a, Unix-style newline at the end, displayed as . in the right view. echo by default appends a newline to what it prints, you could use printf, but this would lead to a different hash.
$ echo "hello world" | md5
6f5902ac237024bdd0c176cb93063dc4
Now let's consider what PowerShell is doing. It is passing a string of its own directly to the get-hash cmdlet. As it turns out, the natural representation of string data in a lot of Windows is not the same as for Unix – Windows uses wide strings, where each character is represented (in memory) as two bytes. More specifically, we can open a text editor, paste in:
hello world
With no trailing newline, and save it as UTF-16, little-endian. If we examine the actual bytes this produces, we see the difference:
$ xxd < test.txt
0000000: 6800 6500 6c00 6c00 6f00 2000 7700 6f00 h.e.l.l.o. .w.o.
0000010: 7200 6c00 6400 r.l.d.
Each character now takes two bytes, with the second byte being 00 – this is normal (and is the reason why UTF-8 is used across the Internet instead of UTF-16, for example), since the Unicode codepoints for basic ASCII characters are the same as their ASCII representation. Now let's see the hash:
$ md5 < thefile.txt
e42b054623b3799cb71f0883900f2764
Which matches what PS is producing for you.
So, to answer your question – you're not doing anything wrong. You just need to encode your string the same way to get the same hash. Unfortunately I don't have access to PS, but this should be a step in the right direction: UTF8Encoding class.
This question is surely related to How to get an MD5 checksum in PowerShell, but it’s different and makes an important point.
Md5sums are computed from bytes. In fact, your Ubuntu result is, in a sense, wrong:
$ echo "hello world" | md5sum
6f5902ac237024bdd0c176cb93063dc4 -
$ echo -n "hello world" | md5sum
5eb63bbbe01eeed093cb22bb8f5acdc3 -
In the first case you sum the 12 bytes which make up the ASCII representation of your string, plus a final carriage return. In the second case, you don’t include the carriage return.
(As an aside, it is interesting to note that a here string includes a carriage return:)
$ md5sum <<<"hello world"
6f5902ac237024bdd0c176cb93063dc4
In Windows powershell, your string is represented in UTF-16LE, 2 bytes per character. To get the same result in Ubuntu and in Windows, you have to use a recoding program. A good choice for Ubuntu is iconv:
$ echo -n "hello world" | iconv -f UTF-8 -t UTF-16LE | md5sum
e42b054623b3799cb71f0883900f2764 -
md5sum is wrong-ish, in spite of other people agreeing with it. It is adding a platform-specific end-of-line characters to the input string, on unix an lf, on windows a cr-lf.
Verify this on a machine with powershell and bash and e.g. postgres installed for comparison:
'A string with no CR or LF at the end' | %{ psql -c "select md5('$_' || Chr(13) || Chr(10) )" }
echo 'A string with no CR or LF at the end' | md5sum.exe
'A string with no CR or LF at the end' | %{ psql -c "select md5('$_' || Chr(10) )" }
bash -c "echo 'A string with no CR or LF at the end' | md5sum.exe"
Output first two lines:
PS> 'A string with no CR or LF at the end' | %{ psql -c "select md5('$_' || Chr(13) || Chr(10) )" }
md5
----------------------------------
1b16276b75aba6ebb88512b957d2a198
PS> echo 'A string with no CR or LF at the end' | md5sum.exe
1b16276b75aba6ebb88512b957d2a198 *-
Output second two lines:
PS> 'A string with no CR or LF at the end' | %{ psql -c "select md5('$_' || Chr(10) )" }
md5
----------------------------------
68a1fcb16b4cc10bce98c5f48df427d4
PS> bash -c "echo 'A string with no CR or LF at the end' | md5sum.exe"
68a1fcb16b4cc10bce98c5f48df427d4 *-
Is there a way to filter/follow a TCP/SSL stream based on a particular process ID using Wireshark?
Just in case you are looking for an alternate way and the environment you use is Windows, Microsoft's Network Monitor 3.3 is a good choice. It has the process name column. You easily add it to a filter using the context menu and apply the filter.. As usual the GUI is very intuitive...
I don't see how. The PID doesn't make it onto the wire (generally speaking), plus Wireshark allows you to look at what's on the wire - potentially all machines which are communicating over the wire. Process IDs aren't unique across different machines, anyway.
You could match the port numbers from wireshark up to port numbers from, say, netstat which will tell you the PID of a process listening on that port.
Use Microsoft Message Analyzer v1.4
Navigate to ProcessId from the field chooser.
Etw
-> EtwProviderMsg
--> EventRecord
---> Header
----> ProcessId
Right click and Add as Column
Use strace is more suitable for this situation.
strace -f -e trace=network -s 10000 -p <PID>;
options -f to also trace all forked processes, -e trace=netwrok to only filter network system-call and -s to display string length up to 10000 char.
You can also only trace certain calls like send,recv, read operations.
strace -f -e trace=send,recv,read -s 10000 -p <PID>;
If you want to follow an application that still has to be started then it's certainly possible:
Install docker (see https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/)
Open a terminal and run a tiny container: docker run -t -i ubuntu /bin/bash (change "ubuntu" to your favorite distro, this doesn't have to be the same as in your real system)
Install your application in the container using the same way that you would install it in a real system.
Start wireshark in your real system, go to capture > options . In the window that will open you'll see all your interfaces. Instead of choosing any, wlan0, eth0, ... choose the new virtual interface docker0 instead.
Start capturing
Start your application in the container
You might have some doubts about running your software in a container, so here are the answers to the questions you probably want to ask:
Will my application work inside a container ? Almost certainly yes, but you might need to learn a bit about docker to get it working
Won't my application run slow ? Negligible. If your program is something that runs heavy calculations for a week then it might now take a week and 3 seconds
What if my software or something else breaks in the container ? That's the nice thing about containers. Whatever is running inside can only break the current container and can't hurt the rest of the system.
On Windows there is an experimental build that does this, as described on the mailing list, Filter by local process name
This is an important thing to be able to do for monitoring where certain processes try to connect to, and it seems there isn't any convenient way to do this on Linux. However, several workarounds are possible, and so I feel it is worth mentioning them.
There is a program called nonet which allows running a program with no Internet access (I have most program launchers on my system set up with it). It uses setguid to run a process in group nonet and sets an iptables rule to refuse all connections from this group.
Update: by now I use an even simpler system, you can easily have a readable iptables configuration with ferm, and just use the program sg to run a program with a specific group. Iptables also alows you to reroute traffic so you can even route that to a separate interface or a local proxy on a port whith allows you to filter in wireshark or LOG the packets directly from iptables if you don't want to disable all internet while you are checking out traffic.
It's not very complicated to adapt it to run a program in a group and cut all other traffic with iptables for the execution lifetime and then you could capture traffic from this process only.
If I ever come round to writing it, I'll post a link here.
On another note, you can always run a process in a virtual machine and sniff the correct interface to isolate the connections it makes, but that would be quite an inferior solution...
I have a PowerShell script that might help in cases like that and made it a bit nicer to place it here. My tests with PowerShell Version 5.2 and 7.2 on Windows 10 were both successful, but atm i can't test it on other OS.
What it does:
It builds a Wireshark filter with IPs and ports a process had used in network statistics. You may watch the last two picture first, to understand it better.
The long story:
It gets network statistics for TCP (listener and connections) and UDP
(listener) multiple times until you want to proceed. You will want to
wait until you finished testing your process. After you choose to
continue, it shows the current processes with process ID from wich you
must select one or multiple processes. The processes are the first
filter you can apply - the case the OP would like to have should be
only one process. Then you must select what connections/ports you may
want in your filter - usually select all here. After that you must
select another type of filter wich also defines how the Wireshark
filter will look like. The filter will be displayed and automatically
copied to clipboard.
Depending on your selections and your process, the filter might get
long.
What it doesn't:
It can't monitor your processes and their network activities. It justs gets the data multiple times. Between the get commands you might miss some connections.
It also can't see any udp packet, so it does not get anything about the remote part for udp. But it will get the local UDP listening ports.
Other limitations are: Local listening on 0.0.0.0 will be translated to your local ip address. Listening on 127.0.0.1 will be skipped, as i had no need for local connection monitoring for now.
So here is the Code:
"Attention: This script can NOT make a filter for a process, but it can build it regarding some local to remote connections (TCP) and vice versa, and ports (UDP)."
"It works good for some cases, but not for all."
"In general it is designed to filter as less as possible."
"You may still see packets from some other processes depending on your selection"
""
"Press return to continue"
Read-Host | Out-Null
# Load Functions
function Out-WireSharkSyntax($data) {
$data = $data -replace "\)|\(| eq | or | and |==|!|{|}| not | in ",';$0;' -split ";"
foreach ($Line in $data) {
$color = switch ($Line) {
"(" {"blue"}
")" {"blue"}
"!" {"cyan"}
" eq " {"yellow"}
" or " {"cyan"}
" and " {"cyan"}
" not " {"cyan"}
" in " {"cyan"}
"==" {"yellow"}
"||" {"yellow"}
"{" {"darkred"}
"}" {"darkred"}
Default {"green"}
}
Write-Host -ForegroundColor $color -NoNewline -BackgroundColor Black $line}
}
$count=0
$sleepTimer=500 #in milliseconds to restart the query for used TCP ports and listening UDP ports
$QuitKey=81 #Character code for 'q' key.
$CurrentDateTime = Get-Date
#$LocalIPv4address = #(Get-NetIPAddress -AddressFamily IPv4 -InterfaceIndex $(Get-NetConnectionProfile | Select-Object -ExpandProperty InterfaceIndex) | Select-Object -ExpandProperty IPAddress)
$LocalIPv4address = (Get-NetIPAddress -AddressFamily IPv4 -AddressState Preferred -PrefixOrigin manual,dhcp).IPAddress
if ($LocalIPv4address.count -ne 1) {
"Could not detect exact one IPAddress. Enter the IPAddress to be used:`r`nYour local dectected addresses were:$($LocalIPv4address -join " OR ")"
$LocalIPv4address = Read-Host
}
"Retrieving network network statistics every $sleepTimer milliseconds..."
"(very short connections may not be captured with this script because of this!)"
$TcpAndUdpProperties = #{Name="NetStatEntryAsString";Expression={$_.LocalAddress + "--" + $_.LocalPort + "--" + $_.RemoteAddress + "--" + $_.RemotePort + "--" + $_.cimclass.cimclassname}},`
"LocalAddress","LocalPort","OwningProcess","RemoteAddress","RemotePort","CreationTime"
# Properties for both equal to get equal list header in all cases
$TcpAndUdpNetworkStatistic = #()
Write-Host "Press 'q' to stop collecting network statistics and to continue with the script."
Write-Host "Wireshark should now capture and you start what ever you would like to monitor now."
while($true)
{
if($host.UI.RawUI.KeyAvailable) {
$key = $host.ui.RawUI.ReadKey("NoEcho,IncludeKeyUp")
if($key.VirtualKeyCode -eq $QuitKey) {
#For Key Combination: eg., press 'LeftCtrl + q' to quit.
#Use condition: (($key.VirtualKeyCode -eq $Qkey) -and ($key.ControlKeyState -match "LeftCtrlPressed"))
Write-Host ("`r`n'q' is pressed! going on with the script now.")
break
}
}
# Temporary convertion to JSON ensures that not too much irrelevant data being bound to the new variable
$TcpAndUdpNetworkStatistic += `
(Get-NetTCPConnection | select -Property $($TcpAndUdpProperties + #{Name="Protocol";Expression={"TCP"}}) | ConvertTo-Json | ConvertFrom-Json) + `
(Get-NetUDPEndpoint | select -Property $($TcpAndUdpProperties + #{Name="Protocol";Expression={"UDP"}}) | ConvertTo-Json | ConvertFrom-Json)
# exclude IPv6 as it is not handled in this script, remove 127.0.0.1 connections and remove duplicates
$TcpAndUdpNetworkStatistic = $TcpAndUdpNetworkStatistic | where {$_.LocalAddress -notmatch ":" -and $_.LocalAddress -notlike "127.*"} | ConvertTo-Csv -NoTypeInformation | Sort-Object -Unique -Descending |ConvertFrom-Csv | sort Protocol,LocalAddress,LocalPort
$TcpAndUdpNetworkStatistic | where {$_.localaddress -eq "0.0.0.0"} | foreach {$_.localaddress = $LocalIPv4Address}
$count++
Write-Host ("`rChecked network statistics {0} time{1}. Collected {2} netstat entries" -f $count,$(("s"," ")[($count -eq "1")]),$TcpAndUdpNetworkStatistic.Count) -NoNewline
Start-Sleep -m $sleepTimer
}
$TcpAndUdpNetworkStatistic | where {$_.localaddress -eq "0.0.0.0"} | foreach {$_.localaddress = $LocalIPv4Address}
$ProcessIDToNetworkstatistic = $TcpAndUdpNetworkStatistic | Group-Object OwningProcess -AsHashTable -AsString
"Getting processlist..."
$processselection = "Id", "Name", #{Name="MainModuleName";Expression={$_.MainModule.ModuleName}}, "Company",
"Path", "Product", "Description", "FileVersion", "ProductVersion", "SessionID", "CPU", "Threads", "StartTime"
$GetNetListedProcesses = Get-Process | Where {$ProcessIDToNetworkstatistic.GetEnumerator().name -contains $_.ID} | Select -Property $processselection
"Output processlist to gridview... Read the gridview title and make selection there..."
$ProcessIDs = ($GetNetListedProcesses |Select #{Name="Port/Session Count";Expression={$ProcessIDToNetworkstatistic["$($_.id)"].count}},* | `
Out-GridView -Title "Select process to view network statistics related to process id" -Passthru).ID
"Output related network statistics to gridview... Read the gridview title and make selection there..."
$TcpAndUdpNetworkStatisticFilteredByProcessID = $TcpAndUdpNetworkStatistic | Where {$ProcessIDs -contains $_.OwningProcess} | `
Out-Gridview -Title "Select lines that contain data you may like to have in your Wireshark filter" -Passthru
# for statistic and later processing
$UDPLocalPorts = ($TcpAndUdpNetworkStatisticFilteredByProcessID | where {$_.Protocol -eq "UDP"}).LocalPort | foreach {[int]$_} | Sort-Object -Unique
$TCPConnections = $TcpAndUdpNetworkStatisticFilteredByProcessID | where {$_.Protocol -eq "TCP"}
$TCPLocalPorts = #(foreach ($Connection in $TCPConnections) { [int]$Connection.LocalPort }) | Sort-Object -unique
$TCPRemotePorts = #(foreach ($Connection in $TCPConnections) { [int]$Connection.RemotePort })| Sort-Object -unique | where {$_ -ne 0}
$UDPLocalEndpoints = $TcpAndUdpNetworkStatisticFilteredByProcessID | where {$_.Protocol -eq "UDP"}
$UDPLocalPorts = #(foreach ($Endpoint in $UDPLocalEndpoints) { [int]$Endpoint.LocalPort }) | Sort-Object -unique
$FilterOptionsDialog = "
You can choose between the following filters
[all] for UDP + TCP filter - including remote address where possible ( filterable: $(($TcpAndUdpNetworkStatisticFilteredByProcessID).count) )
[tall] for TCP with listening ports and connections including remote ports and addresses ( filterable: $(($TcpAndUdpNetworkStatisticFilteredByProcessID | where {$_.Protocol -eq "TCP"}).count) )
[tcon] for TCP without listening ports - only connections including remote ports and addresses ( filterable: $(($TcpAndUdpNetworkStatisticFilteredByProcessID | where {$_.Protocol -eq "TCP" -and [int]$_.RemotePort -eq 0}).count) )
[u] for UDP portfilter - only local listening port - no `"connections`" ( filterable: $(($TcpAndUdpNetworkStatisticFilteredByProcessID | where {$_.Protocol -eq "UDP"}).count) )
[p] for portfilter only by ports ( filterable: $($TCPLocalPorts.count) local TCP / $($TCPRemotePorts.count) remote TCP / $($UDPLocalPorts.count) UDP )
[ptl] for portfilter only by local TCP ports (no UDP) ( filterable: $($TCPLocalPorts.count) local TCP / $($TCPRemotePorts.count) remote TCP )
[pt] for portfilter only by TCP ports (remote port ignored and no UDP) ( filterable: $($TCPLocalPorts.count) local TCP )
[pu] for portfilter only by UDP ports (only listening ports - no information about used ports) ( filterable: $($UDPLocalPorts.count) )
Type your selection and press return"
$WiresharkFilter = ""
do {
$tmp = read-host $FilterOptionsDialog
} while ("all","u","tcon","tall","p","pt","ptl","pu" -notcontains $tmp)
switch ($tmp)
{
"all" {
# TCP connections with local and remote IP filter - both ports included - udp only listening are included
$ConnectionFilterResolved = "("
$ConnectionFilterResolved += $(foreach ($connection in $TcpAndUdpNetworkStatisticFilteredByProcessID) {
if ([int]$connection.remoteport -eq 0) {
$ConnectionFilter = "(ip.addr eq {0} and {2}.port eq {1})"
$ConnectionFilter -f $connection.LocalAddress,$connection.LocalPort,$connection.Protocol.ToLower()
} else {
$ConnectionFilter = "(ip.addr eq {0} and ip.addr eq {1}) and (tcp.port eq {2} and tcp.port eq {3})"
$ConnectionFilter -f $connection.LocalAddress,$connection.RemoteAddress, $connection.LocalPort, $connection.RemotePort
}
}) -join ") or ("
$ConnectionFilterResolved += ")"
$WiresharkFilter += $ConnectionFilterResolved
}
"u" {
# udp.port only - without remote IP filter
#Building the filter variable
$FilteredPortlist = $TcpAndUdpNetworkStatisticFilteredByProcessID | where {$_.Protocol -eq "UDP"} | foreach { "udp.port eq $($_.LocalPort)"} | sort | get-unique
if ($FilteredPortlist) {
$WiresharkFilter += "(" +
($FilteredPortlist -join ") or (") +
")"
}
}
"tall" {#tall
# TCP connections with local and remote IP filter - both ports included - only listening are included without remote data)
$tcpStatsFilterResolved = "("
$tcpStatsFilterResolved += $(foreach ($connection in ($TcpAndUdpNetworkStatisticFilteredByProcessID | where {$_.Protocol -eq "TCP"} )) {
if ([int]$connection.remoteport -eq 0) {
$TcpFilter = "(ip.addr eq {0} and tcp.port eq {1})"
$TcpFilter -f $connection.LocalAddress,$connection.LocalPort
} else {
$TcpFilter = "(ip.addr eq {0} and ip.addr eq {1}) and (tcp.port eq {2} and tcp.port eq {3})"
$TcpFilter -f $connection.LocalAddress,$connection.RemoteAddress, $connection.LocalPort, $connection.RemotePort
}
}) -join ") or ("
$tcpStatsFilterResolved += ")"
$WiresharkFilter += $tcpStatsFilterResolved
}
"tcon" {
# TCP connections only - listening only ports are not included)
$tcpStatsFilterResolved = "("
$tcpStatsFilterResolved += $(foreach ($connection in ($TcpAndUdpNetworkStatisticFilteredByProcessID | where {$_.Protocol -eq "TCP" -and [int]$_.RemotePort -eq 0} )) {
$TcpFilter = "(ip.addr eq {0} and ip.addr eq {1}) and (tcp.port eq {2} and tcp.port eq {3})"
$TcpFilter -f $connection.LocalAddress,$connection.RemoteAddress, $connection.LocalPort, $connection.RemotePort
}) -join ") or ("
$tcpStatsFilterResolved += ")"
$WiresharkFilter = $tcpStatsFilterResolved
}
"p" {
# Ports only - remote and local
$TCPWiresharkFilter = "tcp.port in {" + ( ($TCPLocalPorts + $TCPRemotePorts | Sort-Object -unique ) -join ", " ) + "}"
$UDPWiresharkFilter = "udp.port in {" + ( $UDPLocalPorts -join ", " ) + "}"
$Or = ( ""," or " )[$TCPConnections.count -gt 0 -and $UDPLocalEndpoints.count -gt 0]
$WiresharkFilter = "$TCPWiresharkFilter$Or$UDPWiresharkFilter"
}
"ptl" {
# Local tcp ports only - remote are excluded
$WiresharkFilter = "tcp.port in {" + ( $TCPLocalPorts -join ", " ) + "}"
}
"pt" {
# tcp ports only - remote and local ports
$WiresharkFilter = "tcp.port in {" + ( ($TCPLocalPorts + $TCPRemotePorts | Sort-Object -unique ) -join ", " ) + "}"
}
"pu" {
# udp ports only - no remote anyway
$WiresharkFilter = "udp.port in {" + ( $UDPLocalPorts -join ", " ) + "}"
}
}
if ($WiresharkFilter.toString().length -gt 5) {
# Output to clipboard
$WiresharkFilter | Set-Clipboard
"The following filter should be in your clipboard already"
""
""
Out-WireSharkSyntax $WiresharkFilter
""
""
"Attention: All filtering is done on network statistic data arrount that time `"$CurrentDateTime`" and the additional $(1 - $count) checks that were done."
"`tThis filter is not perfect, but it works for some cases or is a good template to be customized afterwards."
} else {
"Everything was filtered out by your selections - I got no data to create a filter"
}
""
"Press return to end script"
Read-Host | Out-Null
Here is what it might look like
Another possible result
You may optimize the code for your needs, but for me it is more than enough.
If someone has already found a better/builtin solution for Wireshark, please share your information.
In some cases you can not filter by process id. For example, in my case i needed to sniff traffic from one process. But I found in its config target machine IP-address, added filter ip.dst==someip and voila. It won't work in any case, but for some it's useful.
Get the port number using netstat:
netstat -b
And then use the Wireshark filter:
tcp.port == portnumber
You can check for port numbers with these command examples on wireshark:-
tcp.port==80
tcp.port==14220
It is typical to have something like this in your cshrc file for setting the path:
set path = ( . $otherpath $path )
but, the path gets duplicated when you source your cshrc file multiple times, how do you prevent the duplication?
EDIT: This is one unclean way of doing it:
set localpaths = ( . $otherpaths )
echo ${path} | egrep -i "$localpaths" >& /dev/null
if ($status != 0) then
set path = ( . $otherpaths $path )
endif
Im surprised no one used the tr ":" "\n" | grep -x techique to search if a given folder already exists in $PATH. Any reason not to?
In 1 line:
if ! $(echo "$PATH" | tr ":" "\n" | grep -qx "$dir") ; then PATH=$PATH:$dir ; fi
Here is a function ive made myself to add several folders at once to $PATH (use "aaa:bbb:ccc" notation as argument), checking each one for duplicates before adding:
append_path()
{
local SAVED_IFS="$IFS"
local dir
IFS=:
for dir in $1 ; do
if ! $( echo "$PATH" | tr ":" "\n" | grep -qx "$dir" ) ; then
PATH=$PATH:$dir
fi
done
IFS="$SAVED_IFS"
}
It can be called in a script like this:
append_path "/test:$HOME/bin:/example/my dir/space is not an issue"
It has the following advantages:
No bashisms or any shell-specific syntax. It run perfectly with !#/bin/sh (ive tested with dash)
Multiple folders can be added at once
No sorting, preserves folder order
Deals perfectly with spaces in folder names
A single test works no matter if $folder is at begginning, end, middle, or is the only folder in $PATH (thus avoiding testing x:*, *:x, :x:, x, as many of the solutions here implicitly do)
Works (and preserve) if $PATH begins or ends with ":", or has "::" in it (meaning current folder)
No awk or sed needed.
EPA friendly ;) Original IFS value is preserved, and all other variables are local to the function scope.
Hope that helps!
ok, not in csh, but this is how I append $HOME/bin to my path in bash...
case $PATH in
*:$HOME/bin | *:$HOME/bin:* ) ;;
*) export PATH=$PATH:$HOME/bin
esac
season to taste...
you can use the following Perl script to prune paths of duplicates.
#!/usr/bin/perl
#
# ^^ ensure this is pointing to the correct location.
#
# Title: SLimPath
# Author: David "Shoe Lace" Pyke <eselle#users.sourceforge.net >
# : Tim Nelson
# Purpose: To create a slim version of my envirnoment path so as to eliminate
# duplicate entries and ensure that the "." path was last.
# Date Created: April 1st 1999
# Revision History:
# 01/04/99: initial tests.. didn't wok verywell at all
# : retreived path throught '$ENV' call
# 07/04/99: After an email from Tim Nelson <wayland#ne.com.au> got it to
# work.
# : used 'push' to add to array
# : used 'join' to create a delimited string from a list/array.
# 16/02/00: fixed cmd-line options to look/work better
# 25/02/00: made verbosity level-oriented
#
#
use Getopt::Std;
sub printlevel;
$initial_str = "";
$debug_mode = "";
$delim_chr = ":";
$opt_v = 1;
getopts("v:hd:l:e:s:");
OPTS: {
$opt_h && do {
print "\n$0 [-v level] [-d level] [-l delim] ( -e varname | -s strname | -h )";
print "\nWhere:";
print "\n -h This help";
print "\n -d Debug level";
print "\n -l Delimiter (between path vars)";
print "\n -e Specify environment variable (NB: don't include \$ sign)";
print "\n -s String (ie. $0 -s \$PATH:/looser/bin/)";
print "\n -v Verbosity (0 = quiet, 1 = normal, 2 = verbose)";
print "\n";
exit;
};
$opt_d && do {
printlevel 1, "You selected debug level $opt_d\n";
$debug_mode = $opt_d;
};
$opt_l && do {
printlevel 1, "You are going to delimit the string with \"$opt_l\"\n";
$delim_chr = $opt_l;
};
$opt_e && do {
if($opt_s) { die "Cannot specify BOTH env var and string\n"; }
printlevel 1, "Using Environment variable \"$opt_e\"\n";
$initial_str = $ENV{$opt_e};
};
$opt_s && do {
printlevel 1, "Using String \"$opt_s\"\n";
$initial_str = $opt_s;
};
}
if( ($#ARGV != 1) and !$opt_e and !$opt_s){
die "Nothing to work with -- try $0 -h\n";
}
$what = shift #ARGV;
# Split path using the delimiter
#dirs = split(/$delim_chr/, $initial_str);
$dest;
#newpath = ();
LOOP: foreach (#dirs){
# Ensure the directory exists and is a directory
if(! -e ) { printlevel 1, "$_ does not exist\n"; next; }
# If the directory is ., set $dot and go around again
if($_ eq '.') { $dot = 1; next; }
# if ($_ ne `realpath $_`){
# printlevel 2, "$_ becomes ".`realpath $_`."\n";
# }
undef $dest;
#$_=Stdlib::realpath($_,$dest);
# Check for duplicates and dot path
foreach $adir (#newpath) { if($_ eq $adir) {
printlevel 2, "Duplicate: $_\n";
next LOOP;
}}
push #newpath, $_;
}
# Join creates a string from a list/array delimited by the first expression
print join($delim_chr, #newpath) . ($dot ? $delim_chr.".\n" : "\n");
printlevel 1, "Thank you for using $0\n";
exit;
sub printlevel {
my($level, $string) = #_;
if($opt_v >= $level) {
print STDERR $string;
}
}
i hope thats useful.
I've been using the following (Bourne/Korn/POSIX/Bash) script for most of a decade:
: "#(#)$Id: clnpath.sh,v 1.6 1999/06/08 23:34:07 jleffler Exp $"
#
# Print minimal version of $PATH, possibly removing some items
case $# in
0) chop=""; path=${PATH:?};;
1) chop=""; path=$1;;
2) chop=$2; path=$1;;
*) echo "Usage: `basename $0 .sh` [$PATH [remove:list]]" >&2
exit 1;;
esac
# Beware of the quotes in the assignment to chop!
echo "$path" |
${AWK:-awk} -F: '#
BEGIN { # Sort out which path components to omit
chop="'"$chop"'";
if (chop != "") nr = split(chop, remove); else nr = 0;
for (i = 1; i <= nr; i++)
omit[remove[i]] = 1;
}
{
for (i = 1; i <= NF; i++)
{
x=$i;
if (x == "") x = ".";
if (omit[x] == 0 && path[x]++ == 0)
{
output = output pad x;
pad = ":";
}
}
print output;
}'
In Korn shell, I use:
export PATH=$(clnpath /new/bin:/other/bin:$PATH /old/bin:/extra/bin)
This leaves me with PATH containing the new and other bin directories at the front, plus one copy of each directory name in the main path value, except that the old and extra bin directories have bin removed.
You would have to adapt this to C shell (sorry - but I'm a great believer in the truths enunciated at C Shell Programming Considered Harmful). Primarily, you won't have to fiddle with the colon separator, so life is actually easier.
Well, if you don't care what order your paths are in, you could do something like:
set path=(`echo $path | tr ' ' '\n' | sort | uniq | tr '\n' ' '`)
That will sort your paths and remove any extra paths that are the same. If you have . in your path, you may want to remove it with a grep -v and re-add it at the end.
Here is a long one-liner without sorting:
set path = ( echo $path | tr ' ' '\n' | perl -e 'while (<>) { print $_ unless $s{$_}++; }' | tr '\n' ' ')
dr_peper,
I usually prefer to stick to scripting capabilities of the shell I am living in. Makes it more portable. So, I liked your solution using csh scripting. I just extended it to work on per dir in the localdirs to make it work for myself.
foreach dir ( $localdirs )
echo ${path} | egrep -i "$dir" >& /dev/null
if ($status != 0) then
set path = ( $dir $path )
endif
end
Using sed(1) to remove duplicates.
$ PATH=$(echo $PATH | sed -e 's/$/:/;s/^/:/;s/:/::/g;:a;s#\(:[^:]\{1,\}:\)\(.*\)\1#\1\2#g;ta;s/::*/:/g;s/^://;s/:$//;')
This will remove the duplicates after the first instance, which may or may not be what you want, e.g.:
$ NEWPATH=/bin:/usr/bin:/bin:/usr/local/bin:/usr/local/bin:/bin
$ echo $NEWPATH | sed -e 's/$/:/; s/^/:/; s/:/::/g; :a; s#\(:[^:]\{1,\}:\)\(.*\)\1#\1\2#g; t a; s/::*/:/g; s/^://; s/:$//;'
/bin:/usr/bin:/usr/local/bin
$
Enjoy!
Here's what I use - perhaps someone else will find it useful:
#!/bin/csh
# ABSTRACT
# /bin/csh function-like aliases for manipulating environment
# variables containing paths.
#
# BUGS
# - These *MUST* be single line aliases to avoid parsing problems apparently related
# to if-then-else
# - Aliases currently perform tests in inefficient in order to avoid parsing problems
# - Extremely fragile - use bash instead!!
#
# AUTHOR
# J. P. Abelanet - 11/11/10
# Function-like alias to add a path to the front of an environment variable
# containing colon (':') delimited paths, without path duplication
#
# Usage: prepend_path ENVVARIABLE /path/to/prepend
alias prepend_path \
'set arg2="\!:2"; if ($?\!:1 == 0) setenv \!:1 "$arg2"; if ($?\!:1 && $\!:1 !~ {,*:}"$arg2"{:*,}) setenv \!:1 "$arg2":"$\!:1";'
# Function-like alias to add a path to the back of any environment variable
# containing colon (':') delimited paths, without path duplication
#
# Usage: append_path ENVVARIABLE /path/to/append
alias append_path \
'set arg2="\!:2"; if ($?\!:1 == 0) setenv \!:1 "$arg2"; if ($?\!:1 && $\!:1 !~ {,*:}"$arg2"{:*,}) setenv \!:1 "$\!:1":"$arg2";'
When setting path (lowercase, the csh variable) rather than PATH (the environment variable) in csh, you can use set -f and set -l, which will only keep one occurrence of each list element (preferring to keep either the first or last, respectively).
https://nature.berkeley.edu/~casterln/tcsh/Builtin_commands.html#set
So something like this
cat foo.csh # or .tcshrc or whatever:
set -f path = (/bin /usr/bin . ) # initial value
set -f path = ($path /mycode /hercode /usr/bin ) # add things, both new and duplicates
Will not keep extending PATH with duplicates every time you source it:
% source foo.csh
% echo $PATH
% /bin:/usr/bin:.:/mycode:/hercode
% source foo.csh
% echo $PATH
% /bin:/usr/bin:.:/mycode:/hercode
set -f there ensures that only the first occurrence of each PATH element is kept.
I always set my path from scratch in .cshrc.
That is I start off with a basic path, something like:
set path = (. ~/bin /bin /usr/bin /usr/ucb /usr/bin/X11)
(depending on the system).
And then do:
set path = ($otherPath $path)
to add more stuff
I have the same need as the original question.
Building on your previous answers, I have used in Korn/POSIX/Bash:
export PATH=$(perl -e 'print join ":", grep {!$h{$_}++} split ":", "'$otherpath:$PATH\")
I had difficulties to translate it directly in csh (csh escape rules are insane). I have used (as suggested by dr_pepper):
set path = ( `echo $otherpath $path | tr ' ' '\n' | perl -ne 'print $_ unless $h{$_}++' | tr '\n' ' '`)
Do you have ideas to simplify it more (reduce the number of pipes) ?