Flume "Spooling Directory Source" recursive-look for the the files within subdirectories - flume

I am looking for the Flume "Spooling Directory Source" recursive-look for the the files within subdirectories.
There are some references here https://issues.apache.org/jira/browse/FLUME-1899
however since then multiple versions have come out, is there any way we can have recursive directory lookup within subdirectories for the files in Spooling Source.

I think you can use the patch FLUME-1899-2.patch directly.
set the "recursiveDirectorySearch" as ture in your config file.
NOTE: the regex in ignorePattern of config file will also affect the recursiveDirectory folder name. so you might need to modify the code in org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java if you want to ignore the folder name.

Related

Recursively ignore within a structured repo using .tfignore

I have a solution with multiple projects. Each project uses a given Nuget that installs resources in some folders thar are shared with custom files. Something like
Solution
ProjA
Resources
Text <from nuget>
Img
Text
ProjB
Resources
Text <from nuget>
Img
Text
I already tried adding .tfignore at the solution level with
Resources/Text
But it doesn't work. The only way I succeeded was to copy the .tfignore into each project folder. Is there any better way?
.tfignore file will ignore the given pattern in all subdirectories . And it will ignore files or folders with the given name. For folders, it will apply recursively.
As a result, a .tfignore with:
Text
This will ignore any folder named Text in your filesystem hierarchy, and it will ignore them recursively.
For those Text folder which is not under Resources/Tables you can create .tfignore files in sub-folders to override the effects of a .tfignore file in a parent folder.
Note:
A filespec is recursive unless prefixed by the \ character.
This .tfignore file will not affect with those files already in source control. You need to remove them from source control first. Also make sure your .tfignore files have checked in source control.

Flume: How to track specified sub folders using spoolDir?

We're having a system uploads log files into a folder which named by date. It looks like:
/logs
/20181030
/20181031
/20181101
/20181102
/...
Suppose that I want to track the log files which produced during November by using spoolDir, How could I do this ?
#this won't work
a1.sources.r1.spoolDir = /logs/201811??
#this seems only works with files. Is it possible to filter folders here?
a1.sources.r1.includePattern = ^.*\.txt$
Acoording to the flume source code, folders that match the ignorePattern are skipped while recursing the folder tree(to register folder trackers). So you can ignore the folders which don't match your criteria. ^(?!201811..).*$ would exclude all the folders that are not folders of November 2018. Other folders will not be tracked.
But this pattern will also apply to file names. So any file with name that does not match ^201811..$ will also be ignored. You can add the ^.*\.txt$ pattern (the one you are using for the include pattern) to the regex to make flume accept your input files.
a1.sources.r1.ignorePattern = ^(?!(201810..)|(.*\\.txt)).*$
would do the trick for you.

how to find and deploy the correct files with Bazel's pkg_tar() in Windows?

please take a look at the bin-win target in my repository here:
https://github.com/thinlizzy/bazelexample/blob/master/demo/BUILD#L28
it seems to be properly packing the executable inside a file named bin-win.tar.gz, but I still have some questions:
1- in my machine, the file is being generated at this directory:
C:\Users\John\AppData\Local\Temp_bazel_John\aS4O8v3V\execroot__main__\bazel-out\x64_windows-fastbuild\bin\demo
which makes finding the tar.gz file a cumbersome task.
The question is how can I make my bin-win target to move the file from there to a "better location"? (perhaps defined by an environment variable or a cmd line parameter/flag)
2- how can I include more files with my executable? My actual use case is I want to supply data files and some DLLs together with the executable. Should I use a filegroup() rule and refer its name in the "srcs" attribute as well?
2a- for the DLLs, is there a way to make a filegroup() rule to interpret environment variables? (e.g: the directories of the DLLs)
Thanks!
Look for the bazel-bin and bazel-genfiles directories in your workspace. These are actually junctions (directory symlinks) that Bazel updates after every build. If you bazel build //:demo, you can access its output as bazel-bin\demo.
(a) You can also set TMP and TEMP in your environment to point to e.g. c:\tmp. Bazel will pick those up instead of C:\Users\John\AppData\Local\Temp, so the full path for the output directory (that bazel-bin points to) will be c:\tmp\aS4O8v3V\execroot\__main__\bazel-out\x64_windows-fastbuild\bin.
(b) Or you can pass the --output_user_root startup flag, e.g. bazel--output_user_root=c:\tmp build //:demo. That will have the same effect as (a).
There's currently no way to get rid of the _bazel_John\aS4O8v3V\execroot part of the path.
Yes, I think you need to put those files in pkg_tar.srcs. Whether you use a filegroup() rule is irrelevant; filegroup just lets you group files together, so you can refer to the group by name, which is useful when you need to refer to the same files in multiple rules.
2.a. I don't think so.

tfignore wildcard directory segment

Is it possible using .tfignore to add a wildcard to directories? I assumed it would have been a case of just adding an asterisk wildcard to the directory segment. For example:
\path\*\local.properties
However this does not work and I am unsure how I would achieve such behaviour without explicitly declaring every reference that I need excluding. .
Documentation
# begins a comment line
The * and ? wildcards are supported.
A filespec is recursive unless prefixed by the \ character.
! negates a filespec (files that match the pattern are not ignored)
Extract from the documentation.
The documentation should more correctly read:
The * and ? wildcards are supported in the leaf name only.
That is, you can use something like these to select multiple files or multiple subdirectories, respectively, in a common parent:
/path/to/my/file/foo*.txt
/path/to/my/directories/temp*
What may work in your case--to ignore the same file in multiple directories--is just this:
foo*.txt
That is, specify a path-less name or glob pattern to ignore matching files throughout your tree. Unfortunately you have only those two options, local or global; you cannot use a relative path like this--it will not match any files!
my/file/foo*.txt
The global option is a practical one because .tfignore only affects unversioned files. Once you add a file to source control, changes to that file will be properly recognized. Furthermore, if you need to add an instance of an ignored name to source control, you can always go into TFS source control explorer and manually add it.
It seems this is now supported
As you see I edited tfignore in the root folder of the project such that any new branch will ignore its .vs folder when being examined for source control changes
\*\.vs
Directory/folder name wildcarding works for me in VS2019 Professional. For example if I put this in .tfignore:
*uncheckedToTFS
The above will ignore any folder named ending with "uncheckedToTFS", regardless of where the folder is (it doesn't have to be top level folder, can be many levels deep).

jenkins archive artifact excluding all subdirectory

I have a couple of job in Jenkins that archive artifact from the source tree for another job (some unit tests or alike). I have the current situation :
top_dir
\scripts_dir
\some_files
\dir1
\dir2
\dir3
\other_dir
I would like to archive all that is in "top_dir" including the files in "scripts_dir", but not the subdirectories "dir1, dir2,...", which I do not know the name, that are in "scripts_dir". These subdirs are actually Windows directory joints that point to other places on the disk, and I do not want them to be copied.
How do I achieve this with the inculde/excludes pattern of Jenkins ?
I already tried, having include=top_dir/ , exclude=
**/scripts_dir/*/
**/scripts_dir/*/**
**/scripts_dir/**/*
but it always exculdes the whole "scripts_dir" folder.
Finally, by using brute force, I found that the following expression does exclude all the files in the subdirectories of scripts_dir (whatever symlink or not), then removing these subdirs, while keeping the files directly in scripts_dir :
**/scripts_dir/**/*/*/
Thanks for the help anyway.
Reading the ANT manual, there an followsymlinks attribute that defaults to true. You said those things you want to exclude are symlinks (although i am not sure if this will work with Windows joints). Try adding followsymlinks=false
Another solution: if all your files under scripts_dir have a set number of characters in the extension, you can put that into your include statement. This will only pickup files with extensions of 3 characters:
**/scripts_dir/*.???
More on this here

Resources