using grep to parse non-reg directories - parsing

I have a non-reg directory in which all performed test have a specific sub-directory named tc_TestName. In each of these subdirectories there is a log file call sim.log. I want to verify if the test is PASSED by searching for the string "TEST PASSED". So far I have performed the command
grep -rni --include "sim.log" "TEST PASSED" >& test_passed.txt
I am perfectly happy with the result, but since the sim.log files may be very big and I have more than 1000 tests, it takes quite a while to perform.
Is there any way to perform the search of the string "TEST PASSED" only on the last, let's say, 100 lines of my sim.log files ?
Many thanks in advance

Related

Can I get the arguments passed to bazel itself?

I want to create some "build traceability" functionality, and include the actual bazel command that was run to produce one of my build artifacts. So if the user did this:
bazel run //foo/bar:baz --config=blah
I want to actually get the string "bazel run //foo/bar:baz --config=blah" and write that to a file during the build. Is this possible?
Stamping is the "correct" way to get information like that into a Bazel build. Note the implications around caching though. You could also just write a wrapper script that puts the command line into a file or environment variable. Details of each approach below.
I can think of three ways to get the information you want via stamping, each with differing tradeoffs.
First way: Hijack --embed_label. This shows up in the BUILD_EMBED_LABEL stamping key. You'd add a line like build:blah --embed_label blah in your .bazelrc. This is easy, but that label is often used for things like release_50, which you might want to preserve.
Second way: hijack the hostname or username. These show up in the BUILD_HOST and BUILD_USER stamping keys. On Linux, you can write a shell script at tools/bazel which will automatically be used to wrap bazel invocations. In that shell script, you can use unshare --uts --map-root-user, which will work if the machine is set up to enable bazel's sandboxing. Inside that new namespace, you can easily change the hostname and then exec the real bazel binary, like the default /usr/bin/bazel shell script does. That shell script has full access to the command line, so it can encode any information you want.
Third way: put it into an environment variable and have a custom --workspace_status_command that extracts it into a stamping key. Add a line like build:blah --action_env=MY_BUILD_STYLE=blah to your .bazelrc, and then do echo STABLE_MY_BUILD_STYLE ${MY_BUILD_STYLE} in your workspace status script. If you want the full command line, you could have a tools/bazel wrapper script put that into an environment variable, and then use build --action_env=MY_BUILD_STYLE to preserve the value and pass it to all the actions.
Once you pick a stamping key to use, src/test/shell/integration/stamping_test.sh in the Bazel source tree is a good example of writing stamp information to a file. Something like this:
genrule(
name = "stamped",
outs = ["stamped.txt"],
cmd = "grep BUILD_EMBED_LABEL bazel-out/volatile-status.txt | cut -d ' ' -f 2 >\$#",
stamp = True,
)
If you want to do it without stamping, just write the information to a file in the source tree in a tools/bazel wrapper. You'd want to put that file in your .gitignore, of course. echo "$#" > cli_args is all it takes to dump them to a file, and then you can use that as a source file like normal in your build. This approach is simplest, but interacts the most poorly with Bazel's caching, because everything that depends on that file will be rebuilt every time with no way to control it.

Search for files that contain pattern

I have this search - I would like to print out the paths of files that contain the matching text:
grep -r "jasmine" .
and it yields results that look like this:
./app-root/runtime/repo/node_modules/jasmine-core/.github/CONTRIBUTING.md:- [Jasmine Google Group](http://groups.google.com/group/jasmine-js)
./app-root/runtime/repo/node_modules/jasmine-core/.github/CONTRIBUTING.md:- [Jasmine-dev Google Group](http://groups.google.com/group/jasmine-js-dev)
./app-root/runtime/repo/node_modules/jasmine-core/.github/CONTRIBUTING.md:git clone git#github.com:yourUserName/jasmine.git # Clone your fork
./app-root/runtime/repo/node_modules/jasmine-core/.github/CONTRIBUTING.md:cd jasmine # Change directory
./app-root/runtime/repo/node_modules/jasmine-core/.github/CONTRIBUTING.md:git remote add upstream https://github.com/jasmine/jasmine.git # Assign original repository to a remote named 'upstream'
./app-root/runtime/repo/node_modules/jasmine-core/.github/CONTRIBUTING.md:Note that Jasmine tests itself. The files in `lib` are loaded first, defining the reference `jasmine`. Then the files in `src` are loaded, defining the reference `j$`. So there are two copies of the code loaded under test.
./app-root/runtime/repo/node_modules/jasmine-core/.github/CONTRIBUTING.md:The tests should always use `j$` to refer to the objects and functions that are being tested. But the tests can use functions on `jasmine` as needed. _Be careful how you structure any new test code_. Copy the patterns you see in the existing code - this ensures that the code you're testing is not leaking into the `jasmine` reference and vice-versa.
But I just want the file names, I don't want to print out the matching contents, just the file names, how can I do that?
The problem is that the matching text will wrap around in the terminal and make the results basically unreadable.
Did you use the -l flag from grep?
-l, --files-with-matches
Suppress normal output; instead print the name of each input file from which output would normally have been printed. The scanning will stop on the first match.
A simple search on my home directory,
grep -rl 'bash' .
./.bashrc
./.bash_history
./.bash_logout
./.bash_profile
./.profile
./.viminfo
As a matter of fact, -l is a POSIX defined option for grep should be available in almost all distros.

Grep-ing from multiple files

I have a lot of log files with format foo.log.[1-100].gz, and another one detail-20161205-[00-23]. Need to find some string from multiple files.
I'm trying to do the following:
zfgrep String foo.log.[45-64].gz,
but I'm always getting wrong output, not from mentioned files.
Thus, I want to understand how to grep from .gz files and from not .gz (from the second format). Can I use commands other than grep as well?
I believe you should be able to do something like grep String foo.log.{1..100}.gz and grep String detail-20161205-{00..23}. If not all the files exist in that range, you can add the -s option so you don't see all the errors.
grep -s String foo.log.{1..100}.gz
What detail-20161205-[00-23] does for example is expand to 0, 0-2, 3, leading to the wrong files being searched.

Rails: How to pass data to Bash Script

I have a Rails webapp full of students with test scores. Each student has exactly one test score.
I want the following functionality:
1.) The user enters an arbitrary test score into the website and presses "enter"
2.) "Some weird magic where data is passed from Rails database to bash script"
3.) The following bash script is run:
./tool INPUT file.txt
where:
INPUT = the arbitrary test score
file.txt = a list of all student test scores in the database
4.) "More weird magic where output from the bash script is sent back up to a rails view and made displayable on the webpage"
And that's it.
I have no idea how to do the weird magic parts.
My attempt at a solution:
In the rails dbconsole, I can do this:
SELECT score FROM students;
which gives me a list of all the test scores (which satisfies the "file.txt" argument to the bash script).
But I still don't know how my bash script is supposed to gain access to that data.
Is my controller supposed to pass the data down to the bash script? Or is my model supposed to? And what's the syntax for doing so?
I know I can run a bash script from the controller like this:
system("./tool")
But, unfortunately, I still need to pass the arguments to my script, and I don't see how I can do that...
You can just use the built-in ruby tools for running shell commands:
https://ruby-doc.org/core-2.3.1/Kernel.html#method-i-60
For example, in one of my systems I need to get image orientation:
exif_orientation = `exiftool -Orientation -S "#{image_path}"`.to_s.chomp
Judging from my use of .to_s, running the command may sometimes return nil, and I don't want an error trying to chomp nil. A normal output includes the line ending which I feed to chomp.

ack misses results (vs. grep)

I'm sure I'm misunderstanding something about ack's file/directory ignore defaults, but perhaps somebody could shed some light on this for me:
mbuck$ grep logout -R app/views/
Binary file app/views/shared/._header.html.erb.bak.swp matches
Binary file app/views/shared/._header.html.erb.swp matches
app/views/shared/_header.html.erb.bak: <%= link_to logout_text, logout_path, { :title => logout_text, :class => 'login-menuitem' } %>
mbuck$ ack logout app/views/
mbuck$
Whereas...
mbuck$ ack -u logout app/views/
Binary file app/views/shared/._header.html.erb.bak.swp matches
Binary file app/views/shared/._header.html.erb.swp matches
app/views/shared/_header.html.erb.bak
98:<%= link_to logout_text, logout_path, { :title => logout_text, :class => 'login-menuitem' } %>
Simply calling ack without options can't find the result within a .bak file, but calling with the --unrestricted option can find the result. As far as I can tell, though, ack does not ignore .bak files by default.
UPDATE
Thanks to the helpful comments below, here are the new contents of my ~/.ackrc:
--type-add=ruby=.haml,.rake
--type-add=css=.less
ack is peculiar in that it doesn't have a blacklist of file types to ignore, but rather a whitelist of file types that it will search in.
To quote from the man page:
With no file selections, ack-grep only searches files of types that it recognizes. If you have a file called foo.wango, and ack-grep doesn't know what a .wango file is, ack-grep won't search it.
(Note that I'm using Ubuntu where the binary is called ack-grep due to a naming conflict)
ack --help-types will show a list of types your ack installation supports.
If you are ever confused about what files ack will be searching, simply add the -f option. It will list all the files that it finds to be searchable.
ack --man states:
If you want ack to search every file,
even ones that it always ignores like
coredumps and backup files, use the
"−u" switch.
and
Why does ack ignore unknown files by
default? ack is designed by a
programmer, for programmers, for
searching large trees of code. Most
codebases have a lot files in them
which aren’t source files (like
compiled object files, source control
metadata, etc), and grep wastes a lot
of time searching through all of those
as well and returning matches from
those files.
That’s why ack’s behavior of not
searching things it doesn’t recognize
is one of its greatest strengths: the
speed you get from only searching the
things that you want to be looking at.
EDIT: Also if you look at the source code, bak files are ignored.
Instead of wrestling with ack, you could just use plain old grep, from 1973. Because it uses explicitly blacklisted files, instead of whitelisted filetypes, it never omits correct results, ever. Given a couple of lines of config (which I created in my home directory 'dotfiles' repo back in the 1990s), grep actually matches or surpasses many of ack's claimed advantages - in particular, speed: When searching the same set of files, grep is faster than ack.
The grep config that makes me happy looks like this, in my .bashrc:
# Custom 'grep' behaviour
# Search recursively
# Ignore binary files
# Output in pretty colors
# Exclude a bunch of files and directories by name
# (this both prevents false positives, and speeds it up)
function grp {
grep -rI --color --exclude-dir=node_modules --exclude-dir=\.bzr --exclude-dir=\.git --exclude-dir=\.hg --exclude-dir=\.svn --exclude-dir=build --exclude-dir=dist --exclude-dir=.tox --exclude=tags "$#"
}
function grpy {
grp --include=*.py "$#"
}
The exact list of files and directories to ignore will probably differ for you: I'm mostly a Python dev and these settings work for me.
It's also easy to add sub-customisations, as I show for my 'grpy', that I use to grep Python source.
Defining bash functions like this is preferable to setting GREP_OPTIONS, which will cause ALL executions of grep from your login shell to behave differently, including those invoked by programs you have run. Those programs will probably barf on the unexpectedly different behaviour of grep.
My new functions, 'grp' and 'grpy', deliberately don't shadow 'grep', so that I can still use the original behaviour any time I need that.

Resources