I’am working on how can I exclude monitoring of snapshot repository from geneos
Is there a way to do that by scripting ? I mean something like
If value = myfile then exclude it
Related
I've a data-fetch stage where I get multiple DFs and serialize those. I'm currently treating OutputPath as directory - create it if it doesn't exist etc. and then serialize all the DFs in that path with different names for each DF.
In a subsequent pipeline stage (say, predict) I need to retrieve all those through InputPath.
Now, from the documentation it seems InputPath/OutputPath as file. Does kubeflow as any limitation if I use it as directory?
The ComponentSpec's {inputPath: input_name} and {outputPath: output_name} placeholders and their Python analogs (input_name: InputPath()/output_name: OutputPath()) are designed to support both files/blobs and directories.
They are expected to provide the path for the input/output data. No matter whether the data is a blob/file or a directory.
The only limitation is that UX might not be able to preview such artifacts.
But the pipeline itself would work.
I have experimented with a trivial pipeline - no issue is observed if InputPath/OutputPath is used as directory.
Use Case: During dataflow job start up we should provide initial file name to read data and later on it should watch for new files in that directory and it should consider all remaining old files as already read.
Issues:
Approach 1:
PCollection<String> readfile = pipeline.apply(TextIO.read().from("gs://folder-Name/*").
watchForNewFiles(Duration.standardSeconds(10),
Watch.Growth.afterTimeSinceNewOutput(Duration.standardSeconds(30))));
If we are using like this its considering old files as new files for this dataflow job and reading all those files in that folder
Approach 2:
PCollection<String> readfile = pipeline.apply(TextIO.read().from("gs://folder-Name/file-name").
watchForNewFiles(Duration.standardSeconds(10),
Watch.Growth.afterTimeSinceNewOutput(Duration.standardSeconds(30))));
Its reading only this particular file and not able to read upcoming new files
can anyone please suggest the approach to achieve my use case?
The watchForNewFiles() function will always read all files matching the filepattern, both existing and new. In your second approach, the file pattern is only one file, so you just get that.
However, you can use the lower-level building block transforms in FileIO to accomplish what you need. The following code will just read files written after the pipeline starts:
PCollection<String> lines = p
.apply(FileIO.match().filepattern("gs://folder-Name/*")
.continuously(Duration.standardSeconds(30), afterTimeSinceNewOutput(Duration.standardHours(1)))
.setCoder(MetadataCoderV2.of())
.apply(Filter.by(metadata -> metadata.lastModifiedMillis() > PIPELINE_START))
.apply(FileIO.readMatches())
.apply(apply(TextIO.readFiles()))
You can change the details of the Filter transform to whatever precise condition you need. To also include specific older files, you can read those with a standard TextIO.read().from(...) and then use Flatten to combine that PCollection with the continuous set. Like this:
PCollection allLines =
PCollectionList.of(lines).and(p.apply(TextIO.read().from("gs://folder-Name/file-name)
.apply(Flatten.pCollections())
Maybe you need to clarify your Use Case, do you provide a file name to read ? or a file pattern ? What is the number of files expected ? Should you really use a Dataflow streaming pipeline ? Doesn't a Cloud Function answer your need ? What is your issue ? Files get read again when you restart your pipeline ?
You can, as suggested by danielm use FileIO to fetch and filter on file metadata in order to know which file was added after the pipeline began.
If you provide a file pattern, then all file will be read once by the pipeline. There's no way to keep a State between pipelines if you not code it yourself, so when you restart the pipeline you will read again all the file matching the pattern.
If you want to avoid that, you can manually move old files to another path between stopping the old pipeline and starting a new one.
You could also consider is consuming GCS notification on file creation with PubsubIO and use this event to know which file to treat in your pipeline.
A good practice though is to have multiple folders that reflects the status of the files:
input
processing
failed
succeed
This way you know the state of each file. You can put files to treat in the input folder, and inside your pipeline move the file to its corresponding state folder.
Given I have a list of files, e.g foo/src/main.cpp, foo/src/bar.cpp, foo/README.md is it possible to determine which of those files are part of a bazel package?
In my example, the output would e.g. be foo/src/main.cpp, foo/src/bar.cpp since the README.md would not be part of the build.
One way to do this would be to call bazel query on each file and see if it results in an output, but that is quite inefficient and so I was wondering if there is an easier way.
Background: I am trying to determine if a changes in a set of files have an impact on a target, and I want to use bazel query somepath(//some/target, set($FILES)) for that, but this will fail if any of the files in $FILES is not part of a BUILD file.
How about flipping it around and querying for all the source files of the target with:
bazel query 'kind("source file", deps(//some:target))'
and then checking if the result has any of the files in the set
In the Job DSL, there is the method readFileFromWorkspace(), which makes it possible to read a files content from the workspace.
Now it would like to have something like readFilesFromDirectory() which gives me all files in some directory.
The goal is to make it possible to choose from different ansible playbooks:
choiceParam('PLAYBOOK_FILE', ['playbook1.yml', 'playbook2.yml'])
and to populate this list with existing files from a directory. Is something like this possible?
Well, shortly after asking this question, I found the solution.
So the Hudson API can be used:
hudson.FilePath workspace =
hudson.model.Executor.currentExecutor().getCurrentWorkspace()
def resultList = workspace.list().findAll { it.name ==~ /deploy.*\.yml/ }
I'd like to compare two files at particular changesets to see if they are identical or not.
Something like:
>> cm diff rev:Folder\MyFile.py#cs:5 rev:Folder\MyFile.py#cs:10
<< True
I'm getting an error (can't find revision of file I specify) and I think I might not be using diff as it's intended. I've worked around my confusion by using getfile on the particular file and changesets I'm comparing and using a python library file compare.
Thanks.
The Plastic SCM default diff tool will open a GUI showing you the file differences.
But you can manually configure a different one (eg. diff.exe) manually editing the "/home/user/.plastic/client.conf" or using the Plastic SCM GUI:
<DiffToolData>
<FileType>enTextFile</FileType>
<FileExtensions>*</FileExtensions>
<Tools>
<string>diff.exe #sourcefile #destinationfile</string>
</Tools>
</DiffToolData>
This way, you can run diffs through the command line and based on the output, determine if the files are identical or not.
You can use cm patch command
reference : https://blog.plasticscm.com/2018/11/unified-diff-of-branch.html