How can i convert a FilePath to a File? - jenkins

I'm writing a Jenkins plugin and i'm using build.getWorkspace() to get the path to the current workspace. The issue is that this returns a FilePath object.
How can i convert this to a File object?

Although I haven't tried this, according to the javadoc you can obtain the URI from which you can then create a file: File myFile = new File(build.getWorkspace().toURI())

Please use the act function and call your own FileCallable implementation if your plugin should work for master and slaves. For more information check the documentation, chapter "Using FilePath smartly" or this stackoverflow answer.
Code example (source):
void someMethod(FilePath file) {
// make 'file' a fresh empty directory.
file.act(new Freshen());
}
// if 'file' is on a different node, this FileCallable will
// be transferred to that node and executed there.
private static final class Freshen implements FileCallable<Void> {
private static final long serialVersionUID = 1;
#Override public Void invoke(File f, VirtualChannel channel) {
// f and file represent the same thing
f.deleteContents();
f.mkdirs();
return null;
}
}

This approach
File myFile = new File(build.getWorkspace().toURI()) is not the correct solution. I don't know why this has been an accepted anwser till date.
The approach mentioned by Sascha Vetter is correct, taking the reference from official Jenkins javadocs
,which clearly says and I quote
Unlike File, which always implies a file path on the current computer, FilePath represents a file path on a specific agent or the controller.
So being an active contributor to Jenkins community, I would reference the answer given by Sascha Vetter.
PS. Reputation point criteria makes me unable to up-vote the correct answer.

Related

Jena read hook not invoked upon duplicate import read

My problem will probably be explained better with code.
Consider the snippet below:
// First read
OntModel m1 = ModelFactory.createOntologyModel();
RDFDataMgr.read(m1,uri0);
m1.loadImports();
// Second read (from the same URI)
OntModel m2 = ModelFactory.createOntologyModel();
RDFDataMgr.read(m2,uri0);
m2.loadImports();
where uri0 points to a valid RDF file describing an ontology model with n imports.
and the following custom ReadHook (which has been set in advance):
#Override
public String beforeRead(Model model, String source, OntDocumentManager odm) {
System.out.println("BEFORE READ CALLED: " + source);
}
Global FileManager and OntDocumentManager are used with the following settings:
processImports = true;
caching = true;
If I run the snippet above, the model will be read from uri0 and beforeRead will be invoked exactly n times (once for each import).
However, in the second read, beforeRead won't be invoked even once.
How, and what should I reset in order for Jena to invoke beforeRead in the second read as well?
What I have tried so far:
At first I thought it was due to caching being on, but turning it off or clearing it between the first and second read didn't do anything.
I have also tried removing all ignoredImport records from m1. Nothing changed.
Finally got to solve this. The problem was in ModelFactory.createOntologyModel(). Ultimately, this gets translated to ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM_RDFS_INF,null).
All ontology models created with the static OntModelSpec.OWL_MEM_RDFS_INF will have their ImportsModelMaker and some of its other objects shared, which results in a shared state. Apparently, this state has blocked the reading hook to be invoked twice for the same imports.
This can be prevented by creating a custom, independent and non-static OntModelSpec instance and using it when creating an OntModel, for example:
new OntModelSpec( ModelFactory.createMemModelMaker(), new OntDocumentManager(), RDFSRuleReasonerFactory.theInstance(), ProfileRegistry.OWL_LANG );

How to read file from an imported library

I have two packages: webserver and utils which provides assets to webserver.
The webserver needs access to static files inside utils. So I have this setup:
utils/
lib/
static.html
How can I access the static.html file in one of my dart scripts in webserver?
EDIT: What I tried so far, is to use mirrors to get the path of the library, and read it from there. The problem with that approach is, that if utils is included with package:, the url returned by currentMirrorSystem().findLibrary(#utils).uri is a package uri, that can't be transformed to an actual file entity.
Use the Resource class, a new class in Dart SDK 1.12.
Usage example:
var resource = new Resource('package:myapp/myfile.txt');
var contents = await resource.loadAsString();
print(contents);
This works on the VM, as of 1.12.
However, this doesn't directly address your need to get to the actual File entity, from a package: URI. Given the Resource class today, you'd have to route the bytes from loadAsString() into the HTTP server's Response object.
I tend to use Platform.script or mirrors to find the main package top folder (i.e. where pubspec.yaml is present) and find imported packages exported assets. I agree this is not a perfect solution but it works
import 'dart:io';
import 'package:path/path.dart';
String getProjectTopPath(String resolverPath) {
String dirPath = normalize(absolute(resolverPath));
while (true) {
// Find the project root path
if (new File(join(dirPath, "pubspec.yaml")).existsSync()) {
return dirPath;
}
String newDirPath = dirname(dirPath);
if (newDirPath == dirPath) {
throw new Exception("No project found for path '$resolverPath");
}
dirPath = newDirPath;
}
}
String getPackagesPath(String resolverPath) {
return join(getProjectTopPath(resolverPath), 'packages');
}
class _TestUtils {}
main(List<String> arguments) {
// User Platform.script - does not work in unit test
String currentScriptPath = Platform.script.toFilePath();
String packagesPath = getPackagesPath(currentScriptPath);
// Get your file using the package name and its relative path from the lib folder
String filePath = join(packagesPath, "utils", "static.html");
print(filePath);
// use mirror to find this file path
String thisFilePath = (reflectClass(_TestUtils).owner as LibraryMirror).uri.toString();
packagesPath = getPackagesPath(thisFilePath);
filePath = join(packagesPath, "utils", "static.html");
print(filePath);
}
To note that since recently Platform.script is not reliable in unit test when using the new test package so you might use the mirror tricks that I propose above and explained here: https://github.com/dart-lang/test/issues/110

How to Get Filename when using file pattern match in google-cloud-dataflow

Someone know how to get Filename when using file pattern match in google-cloud-dataflow?
I'm newbee to use dataflow. How to get filename when use file patten match, in this way.
p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*.txt"))
I'd like to how I detect filename that kinglear.txt,Hamlet.txt, etc.
If you would like to simply expand the filepattern and get a list of filenames matching it, you can use GcsIoChannelFactory.match("gs://dataflow-samples/shakespeare/*.txt") (see GcsIoChannelFactory).
If you would like to access the "current filename" from inside one of the DoFn's downstream in your pipeline - that is currently not supported (though there are some workarounds - see below). It is a common feature request and we are still thinking how best to fit it into the framework in a natural, generic and high-performant way.
Some workarounds include:
Writing a pipeline like this (the tf-idf example uses this approach):
DoFn readFile = ...(takes a filename, reads the file and produces records)...
p.apply(Create.of(filenames))
.apply(ParDo.of(readFile))
.apply(the rest of your pipeline)
This has the downside that dynamic work rebalancing features won't work particularly well, because they currently apply at the level of Read PTransform's only, but not at the level of ParDo's with high fan-out (like the one here, which would read a file and produce all records); and parallelization will only work to the level of files but files will not be split into sub-ranges. At the scale of reading Shakespeare this is not an issue, but if you are reading a set of files of wildly different size, some extremely large, then it may become an issue.
Implementing your own FileBasedSource (javadoc, general documentation) which would return records of type something like Pair<String, T> where the String is the filename and the T is the record you're reading. In this case the framework would handle the filepattern matching for you, dynamic work rebalancing would work just fine, however it is up to you to write the reading logic in your FileBasedReader.
Both of these work-arounds are non-ideal, but depending on your requirements, one of them may do the trick for you.
Update based on latest SDK
Java (sdk 2.9.0):
Beams TextIO readers do not give access to the filename itself, for these use cases we need to make use of FileIO to match the files and gain access to the information stored in the file name. Unlike TextIO, the reading of the file needs to be taken care of by the user in transforms downstream of the FileIO read. The results of a FileIO read is a PCollection the ReadableFile class contains the file name as metadata which can be used along with the contents of the file.
FileIO does have a convenience method readFullyAsUTF8String() which will read the entire file into a String object, this will read the whole file into memory first. If memory is a concern you can work directly with the file with utility classes like FileSystems.
From: Document Link
PCollection<KV<String, String>> filesAndContents = p
.apply(FileIO.match().filepattern("hdfs://path/to/*.gz"))
// withCompression can be omitted - by default compression is detected from the filename.
.apply(FileIO.readMatches().withCompression(GZIP))
.apply(MapElements
// uses imports from TypeDescriptors
.into(KVs(strings(), strings()))
.via((ReadableFile f) -> KV.of(
f.getMetadata().resourceId().toString(), f.readFullyAsUTF8String())));
Python (sdk 2.9.0):
For 2.9.0 for python you will need to collect the list of URI from outside of the Dataflow pipeline and feed it in as a parameter to the pipeline. For example making use of FileSystems to read in the list of files via a Glob pattern and then passing that to a PCollection for processing.
Once fileio see PR https://github.com/apache/beam/pull/7791/ is available, the following code would also be an option for python.
import apache_beam as beam
from apache_beam.io import fileio
with beam.Pipeline() as p:
readable_files = (p
| fileio.MatchFiles(‘hdfs://path/to/*.txt’)
| fileio.ReadMatches()
| beam.Reshuffle())
files_and_contents = (readable_files
| beam.Map(lambda x: (x.metadata.path,
x.read_utf8()))
One approach is to build a List<PCollection> where each entry corresponds to an input file, then use Flatten. For example, if you want to parse each line of a collection of files into a Foo object, you might do something like this:
public static class FooParserFn extends DoFn<String, Foo> {
private String fileName;
public FooParserFn(String fileName) {
this.fileName = fileName;
}
#Override
public void processElement(ProcessContext processContext) throws Exception {
String line = processContext.element();
// here you have access to both the line of text and the name of the file
// from which it came.
}
}
public static void main(String[] args) {
...
List<String> inputFiles = ...;
List<PCollection<Foo>> foosByFile =
Lists.transform(inputFiles,
new Function<String, PCollection<Foo>>() {
#Override
public PCollection<Foo> apply(String fileName) {
return p.apply(TextIO.Read.from(fileName))
.apply(new ParDo().of(new FooParserFn(fileName)));
}
});
PCollection<Foo> foos = PCollectionList.<Foo>empty(p).and(foosByFile).apply(Flatten.<Foo>pCollections());
...
}
One downside of this approach is that, if you have 100 input files, you'll also have 100 nodes in the Cloud Dataflow monitoring console. This makes it hard to tell what's going on. I'd be interested in hearing from the Google Cloud Dataflow people whether this approach is efficient.
I also had the 100 input files = 100 nodes on the dataflow diagram when using code similar to #danvk. I switched to an approach like this which resulted in all the reads being combined into a single block that you can expand to drill down into each file/directory that was read. The job also ran faster using this approach rather than the Lists.transform approach in our use case.
GcsOptions gcsOptions = options.as(GcsOptions.class);
List<GcsPath> paths = gcsOptions.getGcsUtil().expand(GcsPath.fromUri(options.getInputFile()));
List<String>filesToProcess = paths.stream().map(item -> item.toString()).collect(Collectors.toList());
PCollectionList<SomeClass> pcl = PCollectionList.empty(p);
for(String fileName : filesToProcess) {
pcl = pcl.and(
p.apply("ReadAvroFile" + fileName, AvroIO.Read.named("ReadFromAvro")
.from(fileName)
.withSchema(SomeClass.class)
)
.apply(ParDo.of(new MyDoFn(fileName)))
);
}
// flatten the PCollectionList, combining all the PCollections together
PCollection<SomeClass> flattenedPCollection = pcl.apply(Flatten.pCollections());
This might be a very late post for the above question, but I wanted to add answer with Beam bundled classes.
This could also be seen as an extracted code from the solution provided by #Reza Rokni.
PCollection<String> listOfFilenames =
pipe.apply(FileIO.match().filepattern("gs://apache-beam-samples/shakespeare/*"))
.apply(FileIO.readMatches())
.apply(
MapElements.into(TypeDescriptors.strings())
.via(
(FileIO.ReadableFile file) -> {
String f = file.getMetadata().resourceId().getFilename();
System.out.println(f);
return f;
}));
pipe.run().waitUntilFinish();
Above PCollection<String> will have a list of files available at any provided directory.
I was struggling with the same use case while using wildcard to read files from GCS but also needed to modify the collection based on the file name.The key is to use ReadFromTextWithFilename instead of readfromtext In java you already have a way out and you can use:
String filename =context.element().getMetadata().resourceId().getCurrentDirectory().toString()
inside your processElement method.
But for Python below technique will work:
-> Use beam.io.ReadFromTextWithFilename for reading the wildcard path from GCS
-> As per the document, ReadFromTextWithFilename returns the file's name and the file's content.
Below is the code snippet:
class GetFileNameFromWildcard(beam.DoFn):
def process(self, element, *args, **kwargs):
file_path, content = element
schema = ["id","name","mob","email","dept","store"]
store_name = file_path.split("/")[-2]
content_list = content.split(",")
content_list.append(store_name)
out_dict = dict(zip(schema,content_list))
print(out_dict)
yield out_dict
def run():
pipeline_options = PipelineOptions()
with beam.Pipeline(options=pipeline_options) as p:
# saving main session so that it can load global namespace on the Cloud Dataflow Worker
init = p | 'Begin Pipeline With Initiator' >> beam.Create(
["pcollection initializer"]) | 'Read From GCS' >> beam.io.ReadFromTextWithFilename(
"gs://<bkt-name>/20220826/*/dlp*", skip_header_lines=1) | beam.ParDo(
GetFileNameFromWildcard()) | beam.io.WriteToText(
'df_out.csv')

Create a file relative path for secure server certificate

I'm working on a Dart HttpServer using SSL, which looks something like this:
class Server {
//The path for the database is relative to the code's entry point (main.dart)
static const String CERTIFICATE_DB_PATH = '../lib/server/';
static const String CERTIFICATE_DB_PASS = '*******';
static const String CERTIFICATE_NAME = 'CN=mycert';
Future start() async {
SecureSocket.initialize(database: CERTIFICATE_DB_PATH, password: CERTIFICATE_DB_PASS);
httpServer = await HttpServer.bindSecure(ADDRESS, PORT, certificateName: CERTIFICATE_NAME);
listenSubscription = httpServer.listen(onRequest, onError: onError);
}
//more server code here
}
This all works exactly as expected, so no problems with the actual certificate or server code. The part that I'm having problems with is mentioned in that first comment. The CERTIFICATE_DB_PATH seems to be relative not to the file the Server class is defined in, but rather to the file that contains the main() method. This means that when I try to write a unit test for this class, the path is no longer pointing to the correct directory. If this were an import, I'd use the package:packageName/path/to/cert syntax, but it doesn't seem that applies here. How can I specify the path of the certificate in a way that will work with multiple entry points (actually running the server vs unit tests)?
I don't think there is a way to define the path so it is relative to the source file.
What you can do is to change the current working directory either before you run main() or pass a working directory path as argument to main() and let main() make this directory the current working directory.
Directory.current = someDirectory;

Execute multiple mojos from one "meta mojo" and parameter sharing

I have created a maven-plugin with some mojos, each has a very special purpose. But for the end user it would be nice to execute some of them at once (the order is crucial).
So how to execute other mojos from a mojos execute? The mojos being executed have some #Parameter fields. So i can't simple new MyMojo().execute.
My 2nd question is: Is there a way to share some #Parameters between Mojos or do i have to declare "#Parameter" in each Mojo that is using them?
My idea is to somehow deliver all shared Parameters via utility-class that is providing getters the parameters.
I think the answer to both question somehow lies in understanding the DI-mechanism behind maven-mojos?! I have some experience with Guice but no with Plexus. So could someone please give me some advises?
I don't know exactly what your second question means. Maybe they aren't really related. But I try to answer both starting with the first one.
Question 1: How to call a Goal from another Goal?
For this you can use the Apache Maven Invoker.
Add the Maven dependency to your plugin.
For example:
<dependency>
<groupId>org.apache.maven.shared</groupId>
<artifactId>maven-invoker</artifactId>
<version>2.2</version>
</dependency>
And then you can call another goal this way:
// parameters:
final Properties properties = new Properties();
properties.setProperty("example.param.one", exampleValueOne);
// prepare the execution:
final InvocationRequest invocationRequest = new DefaultInvocationRequest();
invocationRequest.setPomFile(new File(pom)); // pom could be an injected field annotated with '#Parameter(defaultValue = "${basedir}/pom.xml")' if you want to use the same pom for the second goal
invocationRequest.setGoals(Collections.singletonList("second-plugin:example-goal"));
invocationRequest.setProperties(properties);
// configure logging:
final Invoker invoker = new DefaultInvoker();
invoker.setOutputHandler(new LogOutputHandler(getLog())); // using getLog() here redirects all log output directly to the current console
// execute:
final InvocationResult invocationResult = invoker.execute(invocationRequest);
Question 2: How can I share parameters between mojos?
Did you mean:
How to share parameters between multiple goals inside of one plugin?(See "Answer for Question 2.1")
How to reuse parameters of your "meta mojo" for the executed "child mojos"?(See "Answer for Question 2.2")
Answer for question 2.1:
You can create a abstract parent class which contains the parameter fields.
Example:
abstract class AbstractMyPluginMojo extends Abstract Mojo {
#Parameter(required = true)
private String someParam;
protected String getSomeParam() {
return someParam;
}
}
#Mojo(name = "first-mojo")
public class MyFirstMojo extends AbstractMyPluginMojo {
public final void execute() {
getLog().info("someParam: " + getSomeParam());
}
}
#Mojo(name = "second-mojo")
public class MySecondMojo extends AbstractMyPluginMojo {
public final void execute() {
getLog().info("someParam: " + getSomeParam());
}
}
You can find this technique in almost avery larger maven plugin. For example look at the Apache Maven Plugin sources.
Answer for question 2.2:
You can ind the solution for this already in my answer to question 1. If you want to execute multiple goals inside your "meta mojo" you can just reuse the properties variable.

Resources