Grepping list of phpass hashes against a file - grep

I'm trying to grep multiple strings which look like this (there's a few hundred) against a file which contains data:string
Example strings: (no sensitive data is provided, they have been modified).
$H$9a...DcuCqC/rMVmfiFNm2rqhK5vFW1
$H$9n...AHZAV.sTefg8ap8qI8U4A5fY91
$H$9o...Bi6Z3E04x6ev1ZCz0hItSh2JJ/
$H$9w...CFva1ddp8IRBkgwww3COVLf/K1
I've been researching how to grep a file of patterns against another file, and came across the following commands
grep -f strings.txt datastring.txt > output.txt
grep -Ff strings.txt datastring.txt > output.txt
But unfortunately, these commands do NOT work successfully, and only print out a handful of results to my output file. I think it may be something to do with the symbols contained in strings.txt, but I'm unsure. Any help/advice would be great.
To further mention, I'm using Cygwin on Windows (if this is relevant).
Here's an updated example:
strings.txt contains the following:
$H$9a...DcuCqC/rMVmfiFNm2rqhK5vFW1
$H$9n...AHZAV.sTefg8ap8qI8U4A5fY91
$H$9o...Bi6Z3E04x6ev1ZCz0hItSh2JJ/
$H$9w...CFva1ddp8IRBkgwww3COVLf/K1
datastring.txt contains the following:
$H$9a...DcuCqC/rMVmfiFNm2rqhK5vFW1:53491
$H$9n...AHZAV.sTefg8ap8qI8U4A5fY91:03221
$H$9o...Bi6Z3E04x6ev1ZCz0hItSh2JJ/:20521
$H$9w...CFva1ddp8IRBkgwww3COVLf/K1:30142
So technically, all lines should be included in the OUTPUT file, but only this line is outputted:
$H$9w...CFva1ddp8IRBkgwww3COVLf/K1:30142
I just don't understand.

You have showed the output of cat -A strings.txt elsewhere, which includes ^M representing a CR (carriage return) character at the end of each line:
This indicates your file has Windows line endings (CR LF) instead of the Unix line endings (only LF) that grep would expect.
You can convert files with dos2unix strings.txt and back with unix2dos strings.txt.
Alternatively, if you don't have dos2unix installed in your Cygwin environment, you can also do that with sed.
sed -i 's/\r$//' strings.txt # dos2unix
sed -i 's/$/\r/' strings.txt # unix2dos

Related

How to grep a matching filename AND extension from pattern file to a text file?

Content of testfile.txt
/path1/abc.txt
/path2/abc.txt.1
/path3/abc.txt123
Content of pattern.txt
abc.txt$
Bash Command
grep -i -f pattern.txt testfile.txt
Output:
/path1/abc.txt
This is a working solution, but currently the $ in the pattern is manually added to each line and this edited pattern file is uploaded to users. I am trying to avoid the manual amendment.
Alternate solution to loop and read line by line, but required scripting skills or upload scripts to user environment.
Want to keep the original pattern files in an audited environment, users just login and run simple cut-n-paste commands.
Any one liner solution?
You can use sed to add $ to pattern.txt and then use grep, but you might run into issues due to regexp metacharacters like the . character. For example, abc.txt$ will also match abc1txt. And unless you take care of matching only the basename from the file path, abc.txt$ will also match /some/path/foobazabc.txt.
I'd suggest to use awk instead:
$ awk '!f{a[$0]; next} $NF in a' pattern.txt f=1 FS='/' testfile.txt
/path1/abc.txt
pattern.txt f=1 FS='/' testfile.txt here a flag f is set between the two files and field separator is also changed to / for the second file
!f{a[$0]; next} if flag f is not set (i.e. for the first file), build an array a with line contents as the key
$NF in a for the second file, if the last field matches a key in array a, print the line
Just noticed that you are also using -i option, so use this for case insensitive matching:
awk '!f{a[tolower($0)]; next} tolower($NF) in a'
Since pattern.txt contains only a single pattern, and you don't want to change it, since it is an audited file, you could do
grep -i -f "$(<pattern.txt)'$' testfile.txt
instead. Note that this would break, if the maintainer of the file one day decided to actually write there a terminating $.
IMO, it would make more sense to explain to the maintainer of pattern.txt that he is supposed to place there a simple regular expression, which is going to match your testfile. In this case s/he can decide whether the pattern really should match only the right edge or some inner part of the lines.
If pattern.txt contains more than one line, and you want to add the $ to each line, you can likewise do a
grep -i -f <(sed 's/$/$/' <pattern.txt) testfile.txt
As the '$' symbol indicates pattern end. The following script should work.
#!/bin/bash
file_pattern='pattern.txt' # path to pattern file
file_test='testfile.txt' # path to test file
while IFS=$ read -r line
do
echo "$line"
grep -wn "$line" $file_test
done < "$file_pattern"
You can remove the IFS descriptor if the pattern file comes with leading/trailing spaces.
Also the grep option -w matches only whole word and -n provides with line number.

using grep command to get spectfic word [LINUX]

I have a test.txt file with links for example:
google.com?test=
google.com?hello=
and this code
xargs -0 -n1 -a FUZZvul.txt -d '\n' -P 20 -I % curl -ks1L '%/?=DarkLotus' | grep -a 'DarkLotus'
When I type a specific word, such as DarkLotus, in the terminal, it checks the links in the file and it brings me the word which is reflected in the links i provided in the test file
There is no problem here, the problem is that I have many links, and when the result appears in the terminal, I do not know which site reflected the DarkLotus word.
How can i do it?
Try -n option. It shows the line number of file with the matched line.
Best Regards,
Haridas.
I'm not sure what you are up to there, but can you invert it? grep by default prints matching lines. The problem here is you are piping the input from the stdout of the previous commands into grep, and that can lack context at grep. Since you have a file to work with:
$ grep 'DarkLotus' FUZZvul.txt
If your intention is to also follow the link then it might be easier to write a bash script:
#!/bin/bash
for line in `grep 'DarkLotus FUZZvul.txt`
do
link=# extract link from line
echo ${link}
curl -ks1L ${link}
done
Then you could make your script accept user input:
#/bin/bash
word="${0}"
for line in `grep ${word} FUZZvul.txt`
...
and then
$ my_link_getter "DarkLotus"
https://google?somearg=DarkLotus
...
And then you could make the txt file a parameter.
etc.

Output file much larger than input files after cat + grep

I have 18 csv files, all between 1mb and 14mb. The sum of all files is 64mb. I want to create a new csv file that contains a subset of those files-- only the lines featuring the pattern "Hello" (or "HELLO", or "hello" ...). Here's what I'm doing
cat *.csv | head -n 1 > new.csv # I want to create a header first
cat *.csv | grep -i "hello" >> new.csv
I'm running Debian on WSL. The output file is much, much larger than the original 64mb (I stopped the process after 1+ hour, and the file was 300+ GB).
How can a subset of a text file be larger than the original files? Does it have anything to do with WSL?
This is not an OS issue. When you redirect your output to new.csv, shell creates that file first, before the glob expression *.csv is evaluated. That means the expansion of *.csv would include new.csv as well. That seems like the root cause of the recursive grep issue you are facing.
You are reading all the files twice, which is not necessary. You can make your operation a lot simpler and efficient with a single awk command:
awk 'NR==1 {print} tolower($0) ~ /hello/ {print}' *.csv > csv.new
mv csv.new new.csv
since the output file is named csv.new it won't interfere with the glob *.csv
NR==1 picks up the first line (header) from the very first file
The awk command can be written more succinctly as:
awk 'NR==1 || tolower($0) ~ /hello/' *.csv > csv.new
You are using *.csv and redirecting the output to new.csv which falls under *.csv which is causing recursion in grep result. perhaps you can try,
grep -i hello *.csv --exclude="new.csv" >> new.csv

(bash) grep -i not making search case insensitive for input files

I am trying to search inside a folder containing several files. The name of the files is written in upper case with a .sub extension in lower case:
AAA.sub
BBB.sub
CCC.sub
DDD.sub
I am searching a pattern trough those file using grep, however i would like to only use lower case letter for the input files.
In the man page for grep it is written:
-i, --ignore-case
Ignore case distinctions in both the PATTERN and the input files. (-i is specified by POSIX.)
So, if i understood properly:
grep -i subckt /schematics/aaa
and
grep -i subckt /schematics/AAA
Are supposed to both be able to search a pattern "subckt" in the file "aaa" regardless of its case (AAA or aaa) and if two files named aaa and AAA are present at the same time in the foler, i expect grep to search trough both of them.
However when i try my search with the 1st instruction (lower case) it does not work, giving me "no such file or directory" message.
When i try to search with the 2nd instruction (upper case) it works properly.
I obviously understood something wrong about how the -i option with grep, can anyone give me an answer regarding this matter?
Is it possible to be case insensitive with the input files when using grep?
EDIT:
My question was lacking details, even tough i have found the answer to my problem i will add the details in case someone else stumbles upon this:
I have one file that contains a list of each file name i want to grep. My list looks like this:
aaa capacitor C_0
bbb capacitor C_0
ccc resistor R_in
...
The grep is done inside a perl script, the perl script parses the list file and gets the name of each individual file name (aaa bbb ccc) inside a while loop.
However the name inside the list file is written in lower case whereas the name of the files i want to grep is written in upper case.
This is why i wanted to have the input file search to be case insensitive so that i could directly do a grep -i subck aaa and it would search inside the file 'AAA'
However, since the grep is launched from a perl script, and since it is apparently not possible to have grep behave like that, i used the uc() function of perl to convert aaa to AAA and do my grep with it. (see my answer below)
-i affects how the contents are searched, not the name of the files.
When the man page says "Ignore case distinctions in both the PATTERN and the input files." that really means that case is ignored in the pattern ( searching for AAA and aaa are equivalent) and the contents of the input files (a line would match if it includes "AAA" or "aaa" or even "AaA")
I think you want to either list all the filenames on the command line, or find a glob (i.e. wildcard) that matches all the filenames:
grep -i subckt *.sub
In Unix/Linux shells (bash, zsh, and so on) "*" is processed by the shell (bash) not the command (grep). The command receives the list of files and actually can't tell the difference between whether a user typed "grep foo *" and "grep foo file1 file2 file3" (if the directory includes those 3 files)
Please try the following command
find . -iname aaa.sub | grep -rn subckt
find with -iname option will list out files ignoring their case. In the above case find . -iname will list out both aaa.sub & AAA.sub. The output is piped to the grep command.
I have found a way to circumvent my problem by using the uc (upper case) function of perl to convert the input files for the grep function into upper case.
The grep command was launched from a perl script in the first place:
grep -i subckt /schematics/aaa
So, i just did that in my perl script:
$tmp=aaa
$tmp=uc($tmp)
grep -i subckt /schematics/$tmp
Now, the "aaa" name is just an example. In the perl script it is recovered from another parsed file that is written in lower case.
Thanks for the answers tough.
grep uses the filenames as they are listed on the command line. The -i option affects the contents of the files, not the names of the files.
You can use find to select filenames to be searched. The -iname option lets you match files ignoring case.
grep subckt $(find /schematics -iname aaa.sub -print)
If you have many filenames, or those filenames include spaces or other characters that would confuse the shell, the safe and secure way to do this is using the -print0 and -0 options:
find /schematics -iname aaa.sub -print0 | xargs -r -0 grep -i subckt

grepping a not matching pattern with a pattern file and data from a pipe

I have an ignore.txt file:
cat ignore.txt
clint
when I do:
pip freeze | grep -v -f ignore.txt
I get:
GitPython==0.3.2.RC1
Markdown==2.2.1
async==0.6.1
clint==0.3.1
gitdb==0.5.4
legit==0.1.1
push-to-wordpress==0.1
python-wordpress-xmlrpc==2.2
smmap==0.8.2
but when I do:
pip freeze | grep -v clint
I do get the correct output:
GitPython==0.3.2.RC1
Markdown==2.2.1
async==0.6.1
gitdb==0.5.4
legit==0.1.1
push-to-wordpress==0.1
python-wordpress-xmlrpc==2.2
smmap==0.8.2
How can I achieve that with grep and command line tools?
Clarfication Edit: I use windows with cygwin so I believe this is GNU grep 2.6.3 (from grep --version)
Your syntax looks correct and works on my system.
There may be a problem with your ignore.txt file.
In particular, check that:
there are no leading or trailing spaces, tabs and the like around the word you are trying to filter (as suggested by Kent above)
the file has Unix line endings
the file is terminated by a single newline
About the latter, the Single Unix Specification says:
Patterns in pattern_file shall be terminated by a <newline>.
Which means that a file with no terminator, or with a different terminator (e.g. CR LF), might behave unexpectedly (though that might be system-dependent).

Resources