I have a space-separated file that looks like this:
$ cat in_file
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004927566.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004919950.1 FAD_binding_3
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 FAD_binding_3
I am using the following shell script utilizing grep to search for strings:
$ cat search_script.sh
grep "GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1" Pfam_anntn_temp.txt
grep "GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1" Pfam_anntn_temp.txt
The problem is that I want each grep command to return only the first instance of the string it finds exclusive of the previous identical grep command's output.
I need an output which would look like this:
$ cat out_file
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 FAD_binding_3
in which line 1 is exclusively the output of the first grep command and line 2 is exclusively the output of the second grep command. How do I do it?
P.S. I am running this on a big file (>125,000 lines). So, search_script.sh is mostly composed of unique grep commands. It is the identical commands' execution that is messing up my downstream analysis.
I'm assuming you are generating search_script.sh automatically from the contents of in_file. If you can count how many times you'll repeat the same grep command you can just use grep once and use head, for example if you know you'll be using it 2 times:
grep "foo" bar.txt | head -2
Will output the first 2 occurrences of "foo" in bar.txt.
If you have to do the grep commands separately, for example if you have other code in between the grep commands, you can mix head and tail:
grep "foo" bar.txt | head -1 | tail -1
Some other commands...
grep "foo" bar.txt | head -2 | tail -1
head -n displays the first n lines of the input
tail -n displays the last n lines of the input
If you really MUST always use the same command, but ensure that the outputs always differ, the only way I can think of to achieve this is using temporary files and a complex sequence of commands:
cat foo.bar.txt.tmp 2>&1 | xargs -I xx echo "| grep -v \\'xx\\' " | tr '\n' ' ' | xargs -I xx sh -c "grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp"
So to explain this command, given foo as a search string and bar.txt as the filename, then foo.bar.txt.tmp is a unique name for a temporary file. The temporary file will hold the strings that have already been output:
cat foo.bar.txt.tmp 2>&1 : outputs the contents of the temporary file. If none is present, will output an error message to stdout, (important because if the output was empty the rest of the command wouldn't work.)
xargs -I xx echo "| grep -v \\'xx\\' " adds | grep -v to the start of each line in the temporary file, grep -v something excludes lines that include something.
tr '\n' ' ' replaces newlines with spaces, to have on a single string a sequence of grep -vs.
xargs -I xx sh -c "grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp" runs a new command, grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp, replacing xx with the previous output. xx should be the sequence of grep -vs that exclude previous outputs.
head -1 makes sure only one line is output at a time
tee -a foo.bar.txt.tmp appends the new output to the temporary file.
Just be sure to clear the temporary files, rm *.tmp, at the end of your script.
If I am getting question right and you want to remove duplicates based on last field of each line then try following(this should be easy task for awk).
awk '!a[$NF]++' Input_file
I have this a file.txt with one line, whose content is
/app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
and when I do
cat file.txt | grep -io "Xms.*" | awk '{FS" "; print $1} ' | cut -d "s" -f2
output:
3g
why is grep not reading the second occurrence, i.e. I expect 3g and 8192m.
Infact, how do I print only 8192m in this case?
Your regex just says "find Xms followed by anything repeated 0 to n times". That returns the rest of the row from Xms onward.
What you actually want is something like "find Xms followed by anything until there's a whitespace repeated 0 to n times".
grep -io "Xms[^ ]*" file.txt | awk '{FS" "; print $1} ' | cut -d "s" -f2
In [^ ] the ^ means "not"
I'm not really sure what you are trying to achieve here but if you want the endings of all space-separated strings starting with -Xms, using bare awk is:
$ awk -v RS=" " '/^-Xms/{print substr($0,5)}' file
3g
8192m
Explained:
$ awk -v RS=" " ' # space separated records
/^-Xms/ { # strings starting with -Xms
print substr($0,5) # print starting from 5th position
}' file
If you wanted something else (word repeated in the title puzzles me a bit), please update the question with more detailed requirements.
Edit: I just noticed how do I print only 8192m in this case (that's the repeated maybe). Let's add a counter c and not print the first instance:
$ awk -v RS=" " '/^-Xms/&&++c>1{print substr($0,5)}' file
8192m
You could use grep -io "Xms[0-9]*[a-zA-Z]" instead of grep -io "Xms.*" to match a sequence of digits followed by a single character instead the entire line within a single group:
cat file.txt | grep -io "Xms[0-9]*[a-zA-Z]" | awk '{FS" "; print $1} ' | cut -d "s" -f2
Hope this helps!
The .* in your regexp is matching the rest of the line, you need [^ ]* instead. Look:
$ grep -o 'Xms.*' file
Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
$ grep -o 'Xms[^ ]*' file
Xms3g
Xms8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2
3g
8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2 | tail -1
8192m
or more concisely:
$ sed 's/.*Xms\([^ ]*\).*/\1/' file
8192m
The positive lookbehind of PCRE (the form: (?<=RE1)RE2) can resolve the problem easily:
$ grep -oP '(?<=Xms)\S+' file.txt
3g
8192m
Explains:
-o: show only the part of a line matching PATTERN.
-P: PATTERN is a Perl regular expression.
(?<=Xms)\S+: matches all continuous non-whitespace strings which are just following the string Xms.
I am having trouble with command output formatting.
In terminal this works nicely:
df | grep sda1 | head -c33 | tail -c7 | tr -d " "
In genmon, I get only numbers such as "1145944":
SDAFREE=$(df | grep sda1 | head -c33 | tail -c7 | tr -d " ")
echo="$SDAFREE"
How do I print that command's output through genmon to xfce panel correctly (same as in terminal)?
Thank you.
I have the same issue with every command with a pipe. As a workaround I put the command in a executable script and run the script in genmon.
BTW:
if you want just one value of a table, you can use awk instead of head, tail and tr:
df | awk '/sda1/ {print $4}'
Docker caching is not yet available on travis: https://github.com/travis-ci/travis-ci/issues/5358
I'm trying to write a workaround by doing:
`docker save -o file.tar $(docker history -q image_name | grep -v missing)`
`docker load -i file.tar
Which works great, gives me all the image layers back. My only problem now is the saving takes a long time, and most of the time I'm actually changing one layer, so I don't need to rewrite all the rest. Is there a way of telling the docker save command to skip layers already in file.tar?
In the manifest.json file inside the tar you have the information you need.
tar -xOf file.tar manifest.json
Check the value of the Config keys. The first 12 characters are the image id. You can use the command above, extract the image ids that you already have, and exclude them in your docker save command.
I'm not very good with bash scripting, but this works on my mac
tar -xOf file.tar manifest.json | tr , '\n' | grep -o '"Config":".*"' | awk -F ':' '{print $2}' | awk '{print substr($0,2,12)}'
Using this outputs everything
docker history -q IMAGE_HERE | grep -v missing && tar -xOf file.tar manifest.json | tr , '\n' | grep -o '"Config":".*"' | awk -F ':' '{print $2}' | awk '{print substr($0,2,12)}'
After this you only need to get the unique values. This could be done with sort and uniq -u, but for some reason, sort doesn't work as expected. This command assumes the presence of file.tar so take that into consideration too.
I couldn't find anything about append in the docker save command. The above strategy could work with multiple file tars that are all different with each other.
I try to locate one specific tag for a Docker image. How can I do it on the command line? I want to avoid downloading all the images and then removing the unneeded ones.
In the official Ubuntu release, https://registry.hub.docker.com/_/ubuntu/, there are several tags (release for it), while when I search it on the command line,
user#ubuntu:~$ docker search ubuntu | grep ^ubuntu
ubuntu Official Ubuntu base image 354
ubuntu-upstart Upstart is an event-based replacement for ... 7
ubuntufan/ping 0
ubuntu-debootstrap 0
Also in the help of command line search https://docs.docker.com/engine/reference/commandline/search/, no clue how it can work?
Is it possible in the docker search command?
If I use a raw command to search via the Docker registry API, then the information can be fetched:
$ curl https://registry.hub.docker.com//v1/repositories/ubuntu/tags | python -mjson.tool
[
{
"layer": "ef83896b",
"name": "latest"
},
.....
{
"layer": "463ff6be",
"name": "raring"
},
{
"layer": "195eb90b",
"name": "saucy"
},
{
"layer": "ef83896b",
"name": "trusty"
}
]
When using CoreOS, jq is available to parse JSON data.
So like you were doing before, looking at library/centos:
$ curl -s -S 'https://registry.hub.docker.com/v2/repositories/library/centos/tags/' | jq '."results"[]["name"]' |sort
"6"
"6.7"
"centos5"
"centos5.11"
"centos6"
"centos6.6"
"centos6.7"
"centos7.0.1406"
"centos7.1.1503"
"latest"
The cleaner v2 API is available now, and that's what I'm using in the example. I will build a simple script docker_remote_tags:
#!/usr/bin/bash
curl -s -S "https://registry.hub.docker.com/v2/repositories/library/$#/tags/" | jq '."results"[]["name"]' |sort
Enables:
$ ./docker_remote_tags library/centos
"6"
"6.7"
"centos5"
"centos5.11"
"centos6"
"centos6.6"
"centos6.7"
"centos7.0.1406"
"centos7.1.1503"
"latest"
Reference:
jq: https://stedolan.github.io/jq/ | apt-get install jq
I didn't like any of the solutions above because A) they required external libraries that I didn't have and didn't want to install. B) I didn't get all the pages.
The Docker API limits you to 100 items per request. This will loop over each "next" item and get them all (for Python it's seven pages; other may be more or less... It depends)
If you really want to spam yourself, remove | cut -d '-' -f 1 from the last line, and you will see absolutely everything.
url=https://registry.hub.docker.com/v2/repositories/library/redis/tags/?page_size=100 `# Initial url` ; \
( \
while [ ! -z $url ]; do `# Keep looping until the variable url is empty` \
>&2 echo -n "." `# Every iteration of the loop prints out a single dot to show progress as it got through all the pages (this is inline dot)` ; \
content=$(curl -s $url | python -c 'import sys, json; data = json.load(sys.stdin); print(data.get("next", "") or ""); print("\n".join([x["name"] for x in data["results"]]))') `# Curl the URL and pipe the output to Python. Python will parse the JSON and print the very first line as the next URL (it will leave it blank if there are no more pages) then continue to loop over the results extracting only the name; all will be stored in a variable called content` ; \
url=$(echo "$content" | head -n 1) `# Let's get the first line of content which contains the next URL for the loop to continue` ; \
echo "$content" | tail -n +2 `# Print the content without the first line (yes +2 is counter intuitive)` ; \
done; \
>&2 echo `# Finally break the line of dots` ; \
) | cut -d '-' -f 1 | sort --version-sort | uniq;
Sample output:
$ url=https://registry.hub.docker.com/v2/repositories/library/redis/tags/?page_size=100 `#initial url` ; \
> ( \
> while [ ! -z $url ]; do `#Keep looping until the variable url is empty` \
> >&2 echo -n "." `#Every iteration of the loop prints out a single dot to show progress as it got through all the pages (this is inline dot)` ; \
> content=$(curl -s $url | python -c 'import sys, json; data = json.load(sys.stdin); print(data.get("next", "") or ""); print("\n".join([x["name"] for x in data["results"]]))') `# Curl the URL and pipe the JSON to Python. Python will parse the JSON and print the very first line as the next URL (it will leave it blank if there are no more pages) then continue to loop over the results extracting only the name; all will be store in a variable called content` ; \
> url=$(echo "$content" | head -n 1) `#Let's get the first line of content which contains the next URL for the loop to continue` ; \
> echo "$content" | tail -n +2 `#Print the content with out the first line (yes +2 is counter intuitive)` ; \
> done; \
> >&2 echo `#Finally break the line of dots` ; \
> ) | cut -d '-' -f 1 | sort --version-sort | uniq;
...
2
2.6
2.6.17
2.8
2.8.6
2.8.7
2.8.8
2.8.9
2.8.10
2.8.11
2.8.12
2.8.13
2.8.14
2.8.15
2.8.16
2.8.17
2.8.18
2.8.19
2.8.20
2.8.21
2.8.22
2.8.23
3
3.0
3.0.0
3.0.1
3.0.2
3.0.3
3.0.4
3.0.5
3.0.6
3.0.7
3.0.504
3.2
3.2.0
3.2.1
3.2.2
3.2.3
3.2.4
3.2.5
3.2.6
3.2.7
3.2.8
3.2.9
3.2.10
3.2.11
3.2.100
4
4.0
4.0.0
4.0.1
4.0.2
4.0.4
4.0.5
4.0.6
4.0.7
4.0.8
32bit
alpine
latest
nanoserver
windowsservercore
If you want the bash_profile version:
function docker-tags () {
name=$1
# Initial URL
url=https://registry.hub.docker.com/v2/repositories/library/$name/tags/?page_size=100
(
# Keep looping until the variable URL is empty
while [ ! -z $url ]; do
# Every iteration of the loop prints out a single dot to show progress as it got through all the pages (this is inline dot)
>&2 echo -n "."
# Curl the URL and pipe the output to Python. Python will parse the JSON and print the very first line as the next URL (it will leave it blank if there are no more pages)
# then continue to loop over the results extracting only the name; all will be stored in a variable called content
content=$(curl -s $url | python -c 'import sys, json; data = json.load(sys.stdin); print(data.get("next", "") or ""); print("\n".join([x["name"] for x in data["results"]]))')
# Let's get the first line of content which contains the next URL for the loop to continue
url=$(echo "$content" | head -n 1)
# Print the content without the first line (yes +2 is counter intuitive)
echo "$content" | tail -n +2
done;
# Finally break the line of dots
>&2 echo
) | cut -d '-' -f 1 | sort --version-sort | uniq;
}
And simply call it: docker-tags redis
Sample output:
$ docker-tags redis
...
2
2.6
2.6.17
2.8
--trunc----
32bit
alpine
latest
nanoserver
windowsservercore
As far as I know, the CLI does not allow searching/listing tags in a repository.
But if you know which tag you want, you can pull that explicitly by adding a colon and the image name: docker pull ubuntu:saucy
This script (docker-show-repo-tags.sh) should work for any Docker enabled host that has curl, sed, grep, and sort. This was updated to reflect the fact the repository tag URLs changed.
This version correctly parses the "name": field without a JSON parser.
#!/bin/sh
# 2022-07-20
# Simple script that will display Docker repository tags
# using basic tools: curl, awk, sed, grep, and sort.
# Usage:
# $ docker-show-repo-tags.sh ubuntu centos
# $ docker-show-repo-tags.sh centos | cat -n
for Repo in "$#" ; do
URL="https://registry.hub.docker.com/v2/repositories/library/$Repo/tags/"
curl -sS "$URL" | \
/usr/bin/sed -Ee 's/("name":)"([^"]*)"/\n\1\2\n/g' | \
grep '"name":' | \
awk -F: '{printf("'$Repo':%s\n",$2)}'
done
This older version no longer works. Many thanks to #d9k for pointing this out!
#!/bin/sh
# WARNING: This no long works!
# Simple script that will display Docker repository tags
# using basic tools: curl, sed, grep, and sort.
#
# Usage:
# $ docker-show-repo-tags.sh ubuntu centos
for Repo in $* ; do
curl -sS "https://hub.docker.com/r/library/$Repo/tags/" | \
sed -e $'s/"tags":/\\\n"tags":/g' -e $'s/\]/\\\n\]/g' | \
grep '^"tags"' | \
grep '"library"' | \
sed -e $'s/,/,\\\n/g' -e 's/,//g' -e 's/"//g' | \
grep -v 'library:' | \
sort -fu | \
sed -e "s/^/${Repo}:/"
done
This older version no longer works. Many thanks to #viky for pointing this out!
#!/bin/sh
# WARNING: This no long works!
# Simple script that will display Docker repository tags.
#
# Usage:
# $ docker-show-repo-tags.sh ubuntu centos
for Repo in $* ; do
curl -s -S "https://registry.hub.docker.com/v2/repositories/library/$Repo/tags/" | \
sed -e $'s/,/,\\\n/g' -e $'s/\[/\\\[\n/g' | \
grep '"name"' | \
awk -F\" '{print $4;}' | \
sort -fu | \
sed -e "s/^/${Repo}:/"
done
This is the output for a simple example:
$ docker-show-repo-tags.sh centos | cat -n
1 centos:5
2 centos:5.11
3 centos:6
4 centos:6.10
5 centos:6.6
6 centos:6.7
7 centos:6.8
8 centos:6.9
9 centos:7.0.1406
10 centos:7.1.1503
11 centos:7.2.1511
12 centos:7.3.1611
13 centos:7.4.1708
14 centos:7.5.1804
15 centos:centos5
16 centos:centos5.11
17 centos:centos6
18 centos:centos6.10
19 centos:centos6.6
20 centos:centos6.7
21 centos:centos6.8
22 centos:centos6.9
23 centos:centos7
24 centos:centos7.0.1406
25 centos:centos7.1.1503
26 centos:centos7.2.1511
27 centos:centos7.3.1611
28 centos:centos7.4.1708
29 centos:centos7.5.1804
30 centos:latest
I wrote a command line tool to simplify searching Docker Hub repository tags, available in my PyTools GitHub repository. It's simple to use with various command line switches, but most basically:
./dockerhub_show_tags.py repo1 repo2
It's even available as a Docker image and can take multiple repositories:
docker run harisekhon/pytools dockerhub_show_tags.py centos ubuntu
DockerHub
repo: centos
tags: 5.11
6.6
6.7
7.0.1406
7.1.1503
centos5.11
centos6.6
centos6.7
centos7.0.1406
centos7.1.1503
repo: ubuntu
tags: latest
14.04
15.10
16.04
trusty
trusty-20160503.1
wily
wily-20160503
xenial
xenial-20160503
If you want to embed it in scripts, use -q / --quiet to get just the tags, like normal Docker commands:
./dockerhub_show_tags.py centos -q
5.11
6.6
6.7
7.0.1406
7.1.1503
centos5.11
centos6.6
centos6.7
centos7.0.1406
centos7.1.1503
The v2 API seems to use some kind of pagination, so that it does not return all the available tags. This is clearly visible in projects such as python (or library/python). Even after quickly reading the documentation, I could not manage to work with the API correctly (maybe it is the wrong documentation).
Then I rewrote the script using the v1 API, and it is still using jq:
#!/bin/bash
repo="$1"
if [[ "${repo}" != */* ]]; then
repo="library/${repo}"
fi
url="https://registry.hub.docker.com/v1/repositories/${repo}/tags"
curl -s -S "${url}" | jq '.[]["name"]' | sed 's/^"\(.*\)"$/\1/' | sort
The full script is available at: https://github.com/denilsonsa/small_scripts/blob/master/docker_remote_tags.sh
I've also written an improved version (in Python) that aggregates tags that point to the same version: https://github.com/denilsonsa/small_scripts/blob/master/docker_remote_tags.py
Add this function to your .zshrc file or run the command manually:
#usage list-dh-tags <repo>
#example: list-dh-tags node
function list-dh-tags(){
wget -q https://registry.hub.docker.com/v1/repositories/$1/tags -O - | sed -e 's/[][]//g' -e 's/"//g' -e 's/ //g' | tr '}' '\n' | awk -F: '{print $3}'
}
Thanks to this -> How can I list all tags for a Docker image on a remote registry?
For anyone stumbling across this in modern times, you can use Skopeo to retrieve an image's tags from the Docker registry:
$ skopeo list-tags docker://jenkins/jenkins \
| jq -r '.Tags[] | select(. | contains("lts-alpine"))' \
| sort --version-sort --reverse
lts-alpine
2.277.3-lts-alpine
2.277.2-lts-alpine
2.277.1-lts-alpine
2.263.4-lts-alpine
2.263.3-lts-alpine
2.263.2-lts-alpine
2.263.1-lts-alpine
2.249.3-lts-alpine
2.249.2-lts-alpine
2.249.1-lts-alpine
2.235.5-lts-alpine
2.235.4-lts-alpine
2.235.3-lts-alpine
2.235.2-lts-alpine
2.235.1-lts-alpine
2.222.4-lts-alpine
Reimplementation of the previous post, using Python over sed/AWK:
for Repo in $* ; do
tags=$(curl -s -S "https://registry.hub.docker.com/v2/repositories/library/$Repo/tags/")
python - <<EOF
import json
tags = [t['name'] for t in json.loads('''$tags''')['results']]
tags.sort()
for tag in tags:
print "{}:{}".format('$Repo', tag)
EOF
done
For a script that works with OAuth bearer tokens on Docker Hub, try this:
Listing the tags of a Docker image on a Docker hub through the HTTP API
You can use Visual Studio Code to provide autocomplete for available Docker images and tags. However, this requires that you type the first letter of a tag in order to see autocomplete suggestions.
For example, when writing FROM ubuntu it offers autocomplete suggestions like ubuntu, ubuntu-debootstrap and ubuntu-upstart. When writing FROM ubuntu:a it offers autocomplete suggestions, like ubuntu:artful and ubuntu:artful-20170511.1