"updatedb; locate foo" vs "ls -laR > somefile; grep foo somefile"

"updatedb; locate foo" vs "ls -laR > somefile; grep foo somefile" - grep

On a single-user system where security isn't an issue, is there any
advantage to using "updatedb" and "locate" (or slocate or mlocate)
instead of just doing "ls -laR > somefile" nightly and then using
"grep phrase somefile" to find files?
In fact, it would seem that grep is more flexible than locate since it
allows for regular expressions.
What am I missing here?

For your specific scenario, the differences are marginal, but the locate database is optimized for fast searches.
On a multiuser system, modern locate replacements have various additional security features so as to e.g. not reveal to another user what files you have in your private directories.

Related

Why does Bazel's foreign_cc rules dereference symlinks in the output? How can I change this?

I currently into "migrating" some third party dependency projects (typically old style configure/make based) to Bazel using it's foreign_cc rules.
One goal is to have identical output compared to before the migration, and among some attributes like permissions and RPATH I'm still struggling with symlinks being de-referenced seemingly unconditionally.
So instead of libfoo.so -> libfoo.so.3, libfoo.so.3 -> libfoo.so.3.14 I'll always get three files now.
Inspecting the generated bazel-bin/external/foo/foo_foreign_cc/build_script.sh the last commands contain two invocations of cp -L with no variables modifying the behavior:
[configure command]
[make commands]
set +x
cp -L -r --no-target-directory "$BUILD_TMPDIR/$INSTALL_PREFIX" "$INSTALLDIR" && find "$INSTALLDIR" -type f -exec touch -r "$BUILD_TMPDIR/$INSTALL_PREFIX" "{}" \;
[content of #postfix_script]
replace_in_files $INSTALLDIR $BUILD_TMPDIR \${EXT_BUILD_DEPS}
replace_in_files $INSTALLDIR $EXT_BUILD_DEPS \${EXT_BUILD_DEPS}
replace_in_files $INSTALLDIR $EXT_BUILD_ROOT \${EXT_BUILD_ROOT}
mkdir -p $EXT_BUILD_ROOT/bazel-out/k8-fastbuild/bin/external/foo/copy_foo/foo
cp -L -r --no-target-directory "$INSTALLDIR" "$EXT_BUILD_ROOT/bazel-out/k8-fastbuild/bin/external/foo/copy_foo/foo" && find "$EXT_BUILD_ROOT/bazel-out/k8-fastbuild/bin/external/foo/copy_foo/foo" -type f -exec touch -r "$INSTALLDIR" "{}" \;
cd $EXT_BUILD_ROOT
So it looks quite obvious to me that for some reason configure_make doesn't even consider to keep symlinks, turning this into something I have to do outside the Bazel rule (while also possibly polluting the remote cache).
Is there a reason for this? I.e. why shouldn't I create a fork of rules_foreign_cc just to remove this -L flag which someone seem to have added intentionally?

I'm one of the rules_foreign_cc maintainers.
The reason why rules_foreign_cc dereferences the symlinks there is because in general the outputs being copied into named outputs may be dangling symlinks as they may not be relative outputs to other build outputs and at least in Bazel 4 which is the minimum version we currently support, dangling symlinks are not allowed as build artifacts. (this behaviour may have changed in later Bazel versions but I'm not 100% sure on this).
What you likely want to actually consume is the output_group gendir. This can be accessed like so:
filegroup(
name = "my_install_tree",
src = ":cmake_target",
output_group = "gendir",
)
The gendir output group is the entire install directory as created by the build artifacts.
Note that you wouldn't actually need to fork the rules to achieve what you were proposing either. The shell script is generated by a toolchain (whose type is currently in the private package and so the right to change this is reserved.) and thus you could provide your own implementation of the toolchain to override the behaviour.

How can users get bazel-run.sh?

bazel run typically occupies the Bazel server, blocking other commands.
https://github.com/bazelbuild/bazel/blob/c484f19a2cf7427887d6e4c71c8534806e1ba83e/scripts/bazel-run.sh is a fantastic replacement
Question: what's a good way for end-users to get hold of that shell script and add to their path? Can we make that part of the bazel install?
I tried ls -R $(bazel info install_base) | grep bazel-run but no luck there.

Bazel run is a good replacement for end-user to run a Bazel command if you need to run interactively or multiple command (#2337). There has been no need for us to consider it as an installation script.
Please file an issue on Github to discuss the possibility of installing it along with Bazel.

Rsync folder with a million files, but very small incremental daily updates

we run an rsync on a large folder. This has close to a million files inside it including html, jsp, gif/jpg, etc. We only need to of course incrementally update files. Just a few JSP and HTML files are updated in this folder, and we need this server to rsync to a different server, same folder.
Sometimes rsync is running quite slow these days, so one of our IT team members created this command:
find /usr/home/foldername \
-type f -name *.jsp -exec \
grep -l <ssi:include src=[^$]*${ {} ;`
This looks for only specific files which have JSP extension and which contain certain kinds of text, because these are the files which we need to rsync. But this command is consuming a lot of memory. I think it's a stupid way to rsync, but I'm being told this is how things will work.
Some googling suggests that this should work on this folder too:
rsync -a --update --progress --rsh --partial /usr/home/foldername /destination/server
I'm worried that this will be too slow on a daily basis, but I can't imagine why this will be slower than that silly find option that our IT folks are recommending. Any ideas about large directory rsyncs in the real world?

A find command will not be faster than the rsync scan, and the grep command must be slower than rsync because it requires reading all the text from all the .jsp files.
The only way a find-and-grep could be faster is if
The timestamps on your files do not match, so rsync has to checksum the contents (on both sides!)
This seems unlikely, since you're using -a that will sync the timestamps properly (because -a implies -t). However, it can happen if the file-systems on the different machines allow different timestamp precision (e.g. Linux vs. Windows), in which case the --modify-window option is what you need.
There are many more files changed than the ones you care about, and rsync is transferring those also.
If this is the case then you can limit the transfer to .jsp files like this:
--include '*.jsp' --include '*/' --exclude '*'
(Include all .jsp files and all directories, but exclude everything else.)
rsync does the scan up front, then does the compare (possibly using lots of RAM), then does the transfer, where as find/grep/copy does it now.
This used to be a problem, but rsync ought to do an incremental recursive scan as long as both local and remote versions are 3.0.0 or greater, and you don't use any of the fancy delete or delay options that force an up-front scan (see --recursive in the documentation).

Using globs in GNU grep's path argument

BSD (Mac) grep allows for this command:
grep -n "FIXME" **/*.rb
But GNU grep forces me to specify at least a folder to start from:
grep -n "FIXME" {lib,spec}/**/*.rb
Is there a way to get this to behave like it does in BSD grep?

Switch to ack. It uses the recursive strategy by default, and comes with loads of tricky regexes for types of language files available as flags.
For instance, writing:
ack FIXME --ruby
Will search the current directory recursively for anything that may be a Ruby file. This will work the same on Mac and Linux.

Wildcards in erlc's -I option?

Is it possible to use wildcards in the Erlang compiler's -I option?
For example, I want to do something like this:
erlc -I deps/*/include -I deps src/foo.erl
I know that other solutions exist (like using rebar or make) but in this case, I am looking explicitly at erlc.

In Linux (and other unixoid systems) wildcards are never resolved by the invoked program.
The shell you use (e.g. bash) resolves all wildcards.
So erlc won't see the the asterix at all.
(If you read the documentation of find(1) you may find that my previous explanation is somewhat oversimplified.)
If you don't want to use an extra tool (I'd recommend looking at rebar oder make, though), you could try:
erlc $(find deps -name include -exec echo '-I {}' ';') -I deps src/foo.erl
(Weak substitute, I know.)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

"updatedb; locate foo" vs "ls -laR > somefile; grep foo somefile" - grep

For your specific scenario, the differences are marginal, but the locate database is optimized for fast searches. On a multiuser system, modern locate replacements have various additional security features so as to e.g. not reveal to another user what files you have in your private directories.

Related

Why does Bazel's foreign_cc rules dereference symlinks in the output? How can I change this?

How can users get bazel-run.sh?

Rsync folder with a million files, but very small incremental daily updates

Using globs in GNU grep's path argument

Wildcards in erlc's -I option?

Categories

Resources