How to escape special characters in Terraform local-exec command line - grep

I have the following terraform :
resource "null_resource" "cognito_user_pool_client_id" {
triggers = { always_run = "${timestamp()}" }
provisioner "local-exec" {
command = "aws cognito-idp list-user-pool-clients --user-pool-id ${aws_cognito_user_pool.main.id} --region ${var.region} | grep -o '"ClientId": "[^"]*' | grep -o '[^"]*$' > ${path.module}/cognito_app_client.txt"
depends_on = ["aws_cognito_user_pool_domain.main"]
}
where I am trying to get the app client id from userpool in cognito. Anyway this line fails with terraform :
Error: Invalid character
on kibana_cognito.tf line 163, in resource "null_resource" "cognito_user_pool_client_id":
163: command = "aws cognito-idp list-user-pool-clients --user-pool-id ${aws_cognito_user_pool.main.id} --region ${var.region} | grep -o '"ClientId": "[^"]*' | grep -o '[^"]*$' > ${path.module}/cognito_app_client.txt"
This character is not used within the language.
Error: Invalid character
on kibana_cognito.tf line 163, in resource "null_resource" "cognito_user_pool_client_id":
163: command = "aws cognito-idp list-user-pool-clients --user-pool-id ${aws_cognito_user_pool.main.id} --region ${var.region} | grep -o '"ClientId": "[^"]*' | grep -o '[^"]*$' > ${path.module}/cognito_app_client.txt"
Single quotes are not valid. Use double quotes (") to enclose strings.
Error: Invalid character
on kibana_cognito.tf line 163, in resource "null_resource" "cognito_user_pool_client_id":
163: command = "aws cognito-idp list-user-pool-clients --user-pool-id ${aws_cognito_user_pool.main.id} --region ${var.region} | grep -o '"ClientId": "[^"]*' | grep -o '[^"]*$' > ${path.module}/cognito_app_client.txt"
This character is not used within the language.
This works in aws cli :
(venv) ➜ virginia git:(master) ✗ aws cognito-idp list-user-pool-clients --user-pool-id 123456 --region us-east-1 | grep -o '"ClientId": "[^"]*' | grep -o '[^"]*$'
dsewe24dwr2e
how to make this work in terraform ?

You have a few different "languages" here to deal with, each of which will possibly need some escaping to contain the one inside it, but since you asked about Terraform in particular I'll focus on that first layer of escaping, to make sure the shell sees the string you want it to see.
The relevant documentation for this part of Terraform language syntax is Strings and Templates, which talks about the two different kinds of strings in the Terraform language and what kinds of escaping are available in each of them.
You're currently using a quoted string, so you need to contend both with the quoted string escaping and the template escaping. Your desired string only interacts with the quoted string escaping, because it contains some literal quote characters " which you want to be handled by the shell rather than by Terraform, and so you can escape them as \" like the documentation suggests:
command = "aws cognito-idp list-user-pool-clients --user-pool-id ${aws_cognito_user_pool.main.id} --region ${var.region} | grep -o '\"ClientId\": \"[^\"]*' | grep -o '[^\"]*$' > ${path.module}/cognito_app_client.txt"
For completeness here I'll also say that Unix shells typically also support a ${NAME} syntax as an alternative to $NAME in situations where the latter would be ambiguous. ${NAME} conflicts with Terraform's template syntax and so if you want to include a sequence like that in the shell command like then you'd need to escape it as shown in the second table in the documentation, by writing $${NAME} instead.
If you read the following section on heredoc strings you'll see that one benefit they have is that they aren't delimited by quotes and so the contents are not interpreted for backslash escape sequences, and so you can write quotes and other special characters literally:
command = <<-EOT
aws cognito-idp list-user-pool-clients --user-pool-id ${aws_cognito_user_pool.main.id} --region ${var.region} | grep -o '"ClientId": "[^"]*' | grep -o '[^"]*$' > ${path.module}/cognito_app_client.txt
EOT
Terraform does still interpret template sequences ${ ... } and %{ ... } in here, so if you had literal ${NAME} sequences for the shell you'd still need to write them as $${NAME} in this context, but you don't need to escape the quotes anymore because this type of string is delimited by a longer, user-specified marker EOT, allowing quotes inside to be treated as literals.
One further advantage of using a heredoc string is that you can split the command over multiple lines, which might make it easier to read:
command = <<-EOT
aws cognito-idp list-user-pool-clients --user-pool-id ${aws_cognito_user_pool.main.id} --region ${var.region} \
| grep -o '"ClientId": "[^"]*' | grep -o '[^"]*$' > ${path.module}/cognito_app_client.txt
EOT
Note that Unix shells typically require escaping a newline with \ to avoid interpreting it as two separate commands, so I included that in the above example, but from Terraform's perspective this is just a two-line string to be passed on to the shell, after template expansion is complete.

Related

How to grep lines non-repeatedly for same command?

I have a space-separated file that looks like this:
$ cat in_file
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004927566.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004919950.1 FAD_binding_3
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 FAD_binding_3
I am using the following shell script utilizing grep to search for strings:
$ cat search_script.sh
grep "GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1" Pfam_anntn_temp.txt
grep "GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1" Pfam_anntn_temp.txt
The problem is that I want each grep command to return only the first instance of the string it finds exclusive of the previous identical grep command's output.
I need an output which would look like this:
$ cat out_file
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 FAD_binding_3
in which line 1 is exclusively the output of the first grep command and line 2 is exclusively the output of the second grep command. How do I do it?
P.S. I am running this on a big file (>125,000 lines). So, search_script.sh is mostly composed of unique grep commands. It is the identical commands' execution that is messing up my downstream analysis.
I'm assuming you are generating search_script.sh automatically from the contents of in_file. If you can count how many times you'll repeat the same grep command you can just use grep once and use head, for example if you know you'll be using it 2 times:
grep "foo" bar.txt | head -2
Will output the first 2 occurrences of "foo" in bar.txt.
If you have to do the grep commands separately, for example if you have other code in between the grep commands, you can mix head and tail:
grep "foo" bar.txt | head -1 | tail -1
Some other commands...
grep "foo" bar.txt | head -2 | tail -1
head -n displays the first n lines of the input
tail -n displays the last n lines of the input
If you really MUST always use the same command, but ensure that the outputs always differ, the only way I can think of to achieve this is using temporary files and a complex sequence of commands:
cat foo.bar.txt.tmp 2>&1 | xargs -I xx echo "| grep -v \\'xx\\' " | tr '\n' ' ' | xargs -I xx sh -c "grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp"
So to explain this command, given foo as a search string and bar.txt as the filename, then foo.bar.txt.tmp is a unique name for a temporary file. The temporary file will hold the strings that have already been output:
cat foo.bar.txt.tmp 2>&1 : outputs the contents of the temporary file. If none is present, will output an error message to stdout, (important because if the output was empty the rest of the command wouldn't work.)
xargs -I xx echo "| grep -v \\'xx\\' " adds | grep -v to the start of each line in the temporary file, grep -v something excludes lines that include something.
tr '\n' ' ' replaces newlines with spaces, to have on a single string a sequence of grep -vs.
xargs -I xx sh -c "grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp" runs a new command, grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp, replacing xx with the previous output. xx should be the sequence of grep -vs that exclude previous outputs.
head -1 makes sure only one line is output at a time
tee -a foo.bar.txt.tmp appends the new output to the temporary file.
Just be sure to clear the temporary files, rm *.tmp, at the end of your script.
If I am getting question right and you want to remove duplicates based on last field of each line then try following(this should be easy task for awk).
awk '!a[$NF]++' Input_file

Why is xargs' exit code different based on the presence of "-I" option?

After reading the xargs man page, I am unable to understand the difference in exit codes from the following xargs invocations.
(The original purpose was to combine find and grep to check if an expressions exists in ALL the given files when I came across this behaviour)
To reproduce:
(use >>! if using zsh to force creation of file)
# Create the input files.
echo "a" >> 1.txt
echo "ab" >> 2.txt
# The end goal is to check for a pattern (in this case simply 'b') inside
# ALL the files returned by a find search.
find . -name "1.txt" -o -name "2.txt" | xargs -I {} grep -q "b" {}
echo $?
123 # Works as expected since 'b' is not present in 1.txt
find . -name "1.txt" -o -name "2.txt" | xargs grep -q "b"
echo $?
0 # Am more puzzled by why the behaviour is inconsistent
The EXIT_STATUS section on the man page says:
xargs exits with the following status:
0 if it succeeds
123 if any invocation of the command exited with status 1-125
124 if the command exited with status 255
125 if the command is killed by a signal
126 if the command cannot be run
127 if the command is not found
1 if some other error occurred.
I would have thought, that 123 if any invocation of the command exited with status 1-125 should apply irrespective of whether or not -I is used ?
Could you share any insights to explain this conundrum please?
Here is evidence of the effect of -I option with xargs with the help of a wrapper script which shows the number of invocations:
cat ./grep.sh
#/bin/bash
echo "I am being invoked at $(date +%Y%m%d_%H-%M-%S)"
grep $#
(the actual command being invoked, in this case grep doesn't really matter)
Now execute the same commands as in the question using the wrapper script instead:
❯ find . -name "1.txt" -o -name "2.txt" | xargs -I {} ./grep.sh -q "b" {}
I am being invoked at 20190410_09-46-29
I am being invoked at 20190410_09-46-30
❯ find . -name "1.txt" -o -name "2.txt" | xargs ./grep.sh -q "b"
I am being invoked at 20190410_09-46-53
I have just discovered a comment on the answer of a similar question that answers this question (complete credit to https://superuser.com/users/49184/daniel-andersson for his wisdom):
https://superuser.com/questions/557203/xargs-i-behaviour#comment678705_557230
Also, unquoted blanks do not terminate input items; instead the separator is the newline character. — this is central to understanding the behavior. Without -I, xargs only sees the input as a single field, since newline is not a field separator. With -I, suddenly newline is a field separator, and thus xargs sees three fields (that it iterates over). That is a real subtle point, but is explained in the man page quoted.
-I replace-str
Replace occurrences of replace-str in the initial-arguments
with names read from standard input. Also, unquoted blanks do
not terminate input items; instead the separator is the
newline character. Implies -x and -L 1.
Based on that,
find . -name "1.txt" -o -name "2.txt"
#returns
# ./1.txt
# ./2.txt
xargs -I {} grep -q "b" {}
# interprets the above as two separate lines since,
# with -I option the newline is now a *field separator*.
# So this results in TWO invocations of grep and since one of them fails,
# the overall output is 123 as documented in the EXIT_STATUS section
xargs grep -q "b"
# interprets the above as a single input field,
# so a single grep invocation which returns a successful exit code of 0 since the pattern was found in one of the files.

Linux: Search through sub-folders recursively for a file that contains a string and move it to another file

So far, I have this command on my terminal and it doesn't do anything.
Essentially it's to look for any file that contains the word bango and move it to another directory.
grep -r ".*bango.*" /Users/user/Desktop/drums | xargs mv /Users/user/Desktop/bango
Grep has a function to list the filename only you should use that to list the name of the files.
Also xargs can build commands with positional arguments.
Try to use
grep -rlE ".*bango.*" /Users/user/Desktop/drums | xargs -I # mv # /Users/user/Desktop/bango
The option -E allows to use regular expressions.
However, a regular expression is not needed, you can activate a fast grep algorithm for fixed strings:
grep -rlF "bango" /Users/user/Desktop/drums | xargs -I # mv # /Users/user/Desktop/bango

How to filter sting with "[" and "]" inside using grep?

How to filter anything inside "<torrent:magnetURI><![CDATA[" and "]]></torrent:magnetURI>" so it would output the string "EXAMPLE" using grep?
<torrent:magnetURI><![CDATA[EXAMPLE]]></torrent:magnetURI>
I'm trying to get all the magnet url in the web and add them to transmission.
for url in $(wget -q -O- "http://sample.com/rss.xml" | grep -o '<torrent:magnetURI><![CDATA["[^"]*' | grep -o '[^>]*$'); do
transmission-remote localhost:9091 -a "$url";
done
You can use:
$ grep -Po '(?<=<torrent:magnetURI><!\[CDATA\[)\w*(?=\]\]>)' file
EXAMPLE
Note it is using a look behind and look forward (?<=before)\w*(?=after), also escaping the [:
(?<=<torrent:magnetURI><!\[CDATA\[)\w*(?=\]\]>)
------------------------------- --- -----
string to find before | string after
string matched

Determining word count using grep (in cases where there are multiple words in a line)

Is it possible to determine the number of times a particular word appears using grep
I tried the "-c" option but this returns the number of matching lines the particular word appears in
For example if I have a file with
some words and matchingWord and matchingWord
and then another matchingWord
running grep on this file for "matchingWord" with the "-c" option will only return 2 ...
note: this is the grep command line utility on a standard unix os
grep -o string file will return all matching occurrences of string. You can then do grep -o string file | wc -l to get the count you're looking for.
I think that using grep -i -o string file | wc -l should give you the correct output, what happens when you do grep -i -o string file on the file?
You can simply count words (-w) with wc program:
> echo "foo foo" | grep -o "foo" | wc -w
> 2

Resources