Searching tabs with grep

Searching tabs with grep - grep

I have a file that might contain a line like this.
A B //Seperated by a tab
I wanna return true to terminal if the line is found, false if the value isn't found.
when I do
grep 'A' 'file.tsv', It returns to row (not true / false)
but
grep 'A \t B' "File.tsv"
or
grep 'A \\t B' "File.tsv"
or
grep 'A\tB'
or
grep 'A<TAB>B' //pressing tab button
doesn't return anything.
How do I search tab seperated values with grep.
How do I return a boolean value with grep.

Use a literal Tab character, not the \t escape. (You may need to press Ctrl+V first.) Also, grep is not Perl 6 (or Perl 5 with the /x modifier); spaces are significant and will be matched literally, so even if \t worked A \t B with the extra spaces around the \t would not unless the spaces were actually there in the original.
As for the return value, know that you get three different kinds of responses from a program: standard output, standard error, and exit code. The latter is 0 for success and non-0 for some error (for most programs that do matching, 1 means not found and 2 and up mean some kind of usage error). In traditional Unix you redirect the output from grep if you only want the exit code; with GNU grep you could use the -q option instead, but be aware that that is not portable. Both traditional and GNU grep allow -s to suppress standard error, but there are some differences in how the two handle it; most portable is grep PATTERN FILE >/dev/null 2>&1.

Two methods:
use the -P option:
grep -P 'A\tB' "File.tsv"
enter ctrl+v first and enter tab
grep 'A B' "File.tsv"

Here's a handy way to create a variable with a literal tab as its value:
TAB=`echo -e "\t"`
Then, you can use it as follows:
grep "A${TAB}B" File.tsv
This way, there's no literal tab required. Note that with this approach, you'll need to use double quotes (not single quotes) around the pattern string, otherwise the variable reference won't be replaced.

Related

select only a word that is part of colon

I have a text file using markup language (similar to wikipedia articles)
cat test.txt
This is a sample text having: colon in the text. and there is more [[in single or double: brackets]]. I need to select the first word only.
and second line with no [brackets] colon in it.
I need to select the word "having:" only because that is part of regular text. I tried
grep -v '[*:*]' test.txt
This will correctly avoid the tags, but does not select the expected word.

The square brackets specify a character class, so your regular expression looks for any occurrence of one of the characters * or : (or *, but we said that already, didn't we?)
grep has the option -o to only print the matching text, so something lie
grep -ow '[^[:space:]]*:[^[:space:]]*' file.txt
would extract any text with a colon in it, surrounded by zero or more non-whitespace characters on each side. The -w option adds the condition that the match needs to be between word boundaries.
However, if you want to restrict in which context you want to match the text, you will probably need to switch to a more capable tool than plain grep. For example, you could use sed to preprocess each line to remove any bracketed text, and then look for matches in the remaining text.
sed -e 's/\[.*]//g' -e 's/ [^: ]*$/ /' -e 's/[^: ]* //g' -e 's/ /\n/' file.txt
(This assumes that your sed recognizes \n in the replacement string as a literal newline. There are simple workarounds available if it doesn't, but let's not go there if it's not necessary.)
In brief, we first replace any text between square brackets. (This needs to be improved if your input could contain multiple sequences of square brackets on a line with normal text between them. Your example only shows nested square brackets, but my approach is probably too simple for either case.) Then, we remove any words which don't contain a colon, with a special provision for the last word on the line, and some subsequent cleanup. Finally, we replace any remaining spaces with newlines, and (implicitly) print whatever is left. (This still ends up printing one newline too many, but that is easy to fix up later.)
Alternatively, we could use sed to remove any bracketed expressions, then use grep on the remaining tokens.
sed -e :a -e 's/\[[^][]*\]//' -e ta file.txt |
grep -ow '[^[:space:]]*:[^[:space:]]*'
The :a creates a label a and ta says to jump back to that label and try again if the regex matched. This one also demonstrates how to handle nested and repeated brackets. (I suppose it could be refactored into the previous attempt, so we could avoid the pipe to grep. But outlining different solution models is also useful here, I suppose.)
If you wanted to ensure that there is at least one non-colon character adjacent to the colon, you could do something like
... file.txt |
grep -owE '[^:[:space:]]+:[^[:space:]]*|[^[:space:]]*:[^: [:space:]]+'
where the -E option selects a slightly more modern regex dialect which allows us to use | between alternatives and + for one or more repetitions. (Basic grep in 1969 did not have these features at all; much later, the POSIX standard grafted them on with a slightly wacky syntax which requires you to backslash them to remove the literal meaning and select the metacharacter behavior... but let's not go there.)
Notice also how [^:[:space:]] matches a single character which is not a colon or a whitespace character, where [:space:] is the (slightly arcane) special POSIX named character class which matches any whitespace character (regular space, horizontal tab, vertical tab, possibly Unicode whitespace characters, depending on locale).
Awk easily lets you iterate over the tokens on a line. The requirement to ignore matches within square brackets complicates matters somewhat; you could keep a separate variable to keep track of whether you are inside brackets or not.
awk '{ for(i=1; i<=NF; ++i) {
if($i ~ /\]/) { brackets=0; next }
if($i ~ /\[/) brackets=1;
if(brackets) next;
if($i ~ /:/) print $i }' file.txt
This again hard-codes some perhaps incorrect assumptions about how the brackets can be placed. It will behave unexpectedly if a single token contains a closing square bracket followed by an opening one, and has an oversimplified treatment of nested brackets (the first closing bracket after a series of opening brackets will effectively assume we are no longer inside brackets).

A combined solution using sed and awk:
sed 's/ /\n/g' test.txt | gawk 'i==0 && $0~/:$/{ print $0 }/\[/{ i++} /\]/ {i--}'
sed will change all spaces to a newline
awk (or gawk) will output all lines matching $0~/:$/, as long as i equals zero
The last part of the awk stuff keeps a count of the opening and closing brackets.
Another solution using sed and grep:
sed -r -e 's/\[.*\]+//g' -e 's/ /\n/g' test.txt | grep ':$'
's/\[.*\]+//g' will filter the stuff between brackets
's/ /\n/g' will replace a space with a newline
grep will only find lines ending with :
A third on using only awk:
gawk '{ for (t=1;t<=NF;t++){
if(i==0 && $t~/:$/) print $t;
i=i+gsub(/\[/,"",$t)-gsub(/\]/,"",$t) }}' test.txt
gsub returns the number of replacements.
The variable i is used to count the level of brackets. On every [ it is incremented by 1, and on every ] it is decremented by one. This is done because gsub(/\[/,"",$t) returns the number of replaced characters. When having a token like [[][ the count is increased by (3-1=) 2. When a token has brackets AND a semicolon my code will fail, because the token will match, if it ends with a :, before the count of the brackets.

Get content inside brackets using grep

I have text that looks like this:
Name (OneData) [113C188D-5F70-44FE-A709-A07A5289B75D] (MoreData)
I want to use grep or some other way to get the ID inside [].
How to do it?

You can do something like this via bash (GNU grep required):
t="Name (OneData) [113C188D-5F70-44FE-A709-A07A5289B75D] (MoreData)"
echo "$t" | grep -Po "(?<=\[).*(?=\])"
The pattern will give you everything between the brackets, and uses a zero-width look-behind assertion (?<= ...) to eliminate the opening bracket and uses a zero-width look-ahead assertion (?= ...) to eliminate the closing bracket.
The -P flag activates perl-style regexes which can be useful not having too much to escape, then. The -o flag will give you only the wanted result (not the "non-capturing groups").
If you don't have GNU grep available, you can solve the problem in two steps (there are probably also other solutions):
Get the ID with the brackets (\[.*\])
Remove the brackets (] and [, here via sed, for example)
echo "$t" | grep -o "\[.*\]" | sed 's/[][]//g'
As Cyrus commented, you can also use the pattern grep -oE '[0-9A-F-]{36}' if you can ensure not having strings of length 36 or larger containing only the characters 0-9, A-F and - and if all the IDs have the length of 36 characters, of course. Then you can simply ignore the brackets.

How to use grep to search for an exact word match in TextWrangler

There is a possibility to search using grep in TextWrangler
I want to find and replace the following word: bauvol, but not bauvolumen.
I tried typing ^bauvol$ into the search field but that didn't do the trick, it didn't find anything, although the word is clearly there.
I think it's because, in grep, the ^and $signify start and end of line, not a word?!

You want to use \b as word boundaries, as #gromi08 said:
\bbauvol\b
If you want to copy any portion of this word (so you can replace it, modify it, change the case, etc.) it is usually best to wrap it in ( and ) braces so you can reference them in the Replace box:
Find:
(\bbauvol\b)
Replace:
<some_tag>\1</some_tag>
Did you have anything specific you were trying to do with the result once you found it (cut it, duplicate it, etc.)?

Use the -w option of grep (see grep man-page.
This option searches for the expression as a word.
Therefore the command will be:
cat file.txt | grep -w bauvol
And yes, ^ and $ are for start and end of line.

why to use singlequotes and \ in the patterens in grep command?

In some book I have seen a grep command example as
$grep '^no(fork\|group)' /etc/group
I need explanation for "why to use single quotes for the patteren and \ before the characters ( | )".

The advantage of using single quotes with grep, is that you do not need to escape double quotes when you need to grep for them. For example, if you wanted to search for "findthis" (including searching for the quotes) with grep, using single quotes, it would look like this:
grep '"findthis"' yourfile.txt
If you were using double quotes you would need to escape the quotes with a \, so it would look like this:
grep "\"findthis\"" yourfile.txt
The reason a backslash is needed to search for certain characters is that grep assumes that those characters have special meanings. For example grep uses " to find out the beginning and end of what you are searching for (among other things). But that means that you cannot ever search for " unless there is some way around this. The solution is to place a \ before the " like so: \". If you do that, then grep knows that you actually want to search for " rather than end the string.

quoting arguments for a command is always recommended. single quote won't expand variable. in your example, it makes no different to use single/double quotes.
take an example:
kent$ cat f
foo
bar
ooo
without quote:
kent$ grep foo|bar f
zsh: correct 'bar' to 'bzr' [nyae]? n
zsh: command not found: bar
you see, my zsh thought you want to pipe output to a command "bar"
now why escape |:
Assume your grep is not an alias. grep use BRE by default, in BRE you need to escape some char to give them special meaning, | is one of them.
You can however let grep work in ERE or PCRE mode, with -E, -P option. then you don't need escape those char any longer:
kent$ grep -E 'foo|bar' f
foo
bar
in ERE or PCRE, you escape some char, to take the special meaning away.

How to filter using grep on a selected word

grep (GNU grep) 2.14
Hello,
I have a log file that I want to filter on a selected word. However, it tends to filter on many for example.
tail -f gateway-* | grep "P_SIP:N_iptB1T1"
This will also find words like this:
"P_SIP:N_iptB1T10"
"P_SIP:N_iptB1T11"
"P_SIP:N_iptB1T12"
etc
However, I don't want to display anything after the 1. grep is picking up 11, 12, 13, etc.
Many thanks for any suggestions,

You can restrict the word to end at 1:
tail -f gateway-* | grep "P_SIP:N_iptB1T1\>"
This will work assuming that you have a matching case which is only "P_SIP:N_iptB1T1".
But if you want to extract from P_SIP:N_iptB1T1x, and display only once, then you need to restrict to show only first match.

grep -o "P_SIP:N_iptB1T1"
-o, --only-matching show only the part of a line matching PATTERN
More info

At least two approaches can be tried:
grep -w pattern matches for full words. Seems to work for this case too, even though the pattern has punctuation.
grep pattern -m 1 to restrict the output to first match. (Also doable with grep xxx | head -1)

If the lines contains the quotes as in your example, just use the -E option in grep and match the closing quote with \". For example:
grep -E "P_SIP:N_iptB1T1\"" file
If these quotes aren't in the text file, and there's blank spaces or endlines after the word, you can match these too:
# The word is followed by one or more blanks
grep -E "P_SIP:N_iptB1T1\s+" file
# Match lines ending with the interesting word
grep -E "P_SIP:N_iptB1T1$" file

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Searching tabs with grep - grep

Two methods: use the -P option: grep -P 'A\tB' "File.tsv" enter ctrl+v first and enter tab grep 'A B' "File.tsv"

Related

select only a word that is part of colon

Get content inside brackets using grep

How to use grep to search for an exact word match in TextWrangler

why to use singlequotes and \ in the patterens in grep command?

How to filter using grep on a selected word

Categories

Resources