Read RDF:foaf file with Apache Jena - jena

I have a problem with reading RDF file, which is using foaf tags. I would like to read it with Apache Jena. Below is the snippet of the RDF file.
<rdf:RDF xmlns="http://test.example.com/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<foaf:Person rdf:about="http://test.example.com/MainPerson.rdf">
<foaf:firstName>John</foaf:firstName>
<foaf:lastName>Doe</foaf:lastName>
<foaf:nick>Doe</foaf:nick>
<foaf:gender>Male</foaf:gender>
<foaf:based_near>Honolulu</foaf:based_near>
<foaf:birthday>08-14-1990</foaf:birthday>
<foaf:mbox>john#example.com</foaf:mbox>
<foaf:homepage rdf:resource="http://www.example.com"/>
<foaf:img rdf:resource="http://weknowmemes.com/wp-content/uploads/2013/09/wat-meme.jpg"/>
<foaf:made>
Article: Developing applications in Java
</foaf:made>
<foaf:age>24</foaf:age>
<foaf:interest>
Java, Java EE (web tier), PrimeFaces, MySQL, PHP, OpenCart, Joomla, Prestashop, CSS3, HTML5
</foaf:interest>
<foaf:pastProject rdf:resource="http://www.supercombe.si"/>
<foaf:status>Student</foaf:status>
<foaf:geekcode>M+, L++</foaf:geekcode>
<foaf:knows>
<foaf:Person>
<rdfs:seeAlso rdf:resource="http://test.example.com/Person.rdf"/>
</foaf:Person>
</foaf:knows>
<foaf:knows>
<foaf:Person>
<rdfs:seeAlso rdf:resource="http://test.example.com/Person2.rdf"/>
</foaf:Person>
</foaf:knows>
<foaf:knows>
<foaf:Person>
<rdfs:seeAlso rdf:resource="http://test.example.com/Person3.rdf"/>
</foaf:Person>
</foaf:knows>
</foaf:Person>
</rdf:RDF>
I just don't understand how to read this data with Apache Jena in regular POJO object. Any help will be appreciated (couldn't find tutorial on the web for this kind of parsing).

I don't know if I understood your problem. But if you need to read RDF file to a POJO object, you have a lot of choice. For example, you can read your rdf file using Jena to a model, and then create POJO objects using the methods proposed by the framework to get the values of your properties.
This is a code example that extracts the foaf:firstName from your file
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.Property;
import com.hp.hpl.jena.rdf.model.Resource;
import com.hp.hpl.jena.util.FileManager;
public class Test {
//First, create a Jena model and use FileManager to read the file
public static Model model = ModelFactory.createDefaultModel();
public static void main(String[] args) {
//Use FileManager to read the file and add it to the Jena model
FileManager.get().readModel(model, "test.rdf");
//Apply methods like getResource, getProperty, listStatements,listLiteralStatements ...
//to your model to extract the information you want
Resource person = model.getResource("http://test.example.com/MainPerson.rdf");
Property firstName = model.createProperty("http://xmlns.com/foaf/0.1/firstName");
String firstNameValue = person.getProperty(firstName).getString();
System.out.println(firstNameValue);
}
}
You can use those methods in the setters of your POJO class. You can find a very good introduction here

Related

Grails Data Service Cannot Use Regular Service

Happy Another Covid Day. When I use generate-all, Grails creates the Data Service for me. I begin to understand what a data service is.
I also have my own service for my Author and Book classes to use. I name my service ImportService. I have methods in the ImportService to clean up my book data read from a CSV file before the Data Service saves my books to the database. I also follow the instruction to make the Data Service an Abstract Class. So, I can put my own method in the Data Service.
Since the Author has its own AuthorService, and the Book has its own BookService, I want the different Data Service to access the method in my ImportService. So, I don't have to copy and paste the import CSV code multiple times. So, I put the line ImportService importService in the AuthorServie class and the BookService class. That does not go well. importService is always NULL inside the Data Service classes. I google the problem. They say I cannot inject another service to the grails.gorm.services.Service.
There is a post that says to make a bean. I am new to Grails. I have no idea what they are talking about even with the codes posted. Part of my background is Assembly Language, C, and Pascal. My head is filled with lingo like Top Down, Subroutine, library, Address, and Pointer. I have no idea what a Bean is.
This is what it is. I am wondering whether this is a bug or by design that you cannot inject a service to the gorm service.
Thanks for your "Pointer".
See the project at https://github.com/jeffbrown/tom6502servicedi. That project uses Grails 4.0.3 and GORM 7.0.7.
https://github.com/jeffbrown/tom6502servicedi/blob/main/grails-app/services/tom6502servicedi/ImportService.groovy
package tom6502servicedi
class ImportService {
int getSomeNumber() {
42
}
}
https://github.com/jeffbrown/tom6502servicedi/blob/917c51ee173e7bb6844ca7d40ced5afbb8d9063f/grails-app/services/tom6502servicedi/AuthorService.groovy
package tom6502servicedi
import grails.gorm.services.Service
import org.springframework.beans.factory.annotation.Autowired
#Service(Author)
abstract class AuthorService {
#Autowired
ImportService importService
// ...
int getSomeNumberFromImportService() {
importService.someNumber
}
}
https://github.com/jeffbrown/tom6502servicedi/blob/917c51ee173e7bb6844ca7d40ced5afbb8d9063f/grails-app/controllers/tom6502servicedi/AuthorController.groovy
package tom6502servicedi
import grails.validation.ValidationException
import static org.springframework.http.HttpStatus.*
class AuthorController {
AuthorService authorService
// ...
def someNumber() {
render "The Number Is ${authorService.someNumberFromImportService}"
}
}
Sending a request to that someNumber action will verify that the ImportService is injected into the AuthorService and the AuthorService is injected into the AuthorController.
$ curl http://localhost:8080/author/someNumber
The Number Is 42

Example to read and write parquet file using ParquetIO through Apache Beam

Has anybody tried reading/writing Parquet file using Apache Beam. Support is added recently in version 2.5.0, hence not much documentation.
I am trying to read json input file and would like to write to parquet format.
Thanks in advance.
Add the following dependency as ParquetIO in different module.
<dependency>
<groupId>org.apache.beam</groupId>;
<artifactId>beam-sdks-java-io-parquet</artifactId>;
<version>2.6.0</version>;
</dependency>;
//Here is code to read and write....
PCollection<JsonObject> input = #Your data
PCollection<GenericRecord> pgr =input.apply("parse json", ParDo.of(new DoFn<JsonObject, GenericRecord> {
#ProcessElement
public void processElement(ProcessContext context) {
JsonObject json= context.getElement();
GenericRecord record = #convert json to GenericRecord with schema
context.output(record);
}
}));
pgr.apply(FileIO.<GenericRecord>write().via(ParquetIO.sink(schema)).to("path/to/save"));
PCollection<GenericRecord> data = pipeline.apply(
ParquetIO.read(schema).from("path/to/read"));
You will need to use ParquetIO.Sink. It implements FileIO.

Redstone mapper with flutter

I want to use the redstone mapper to decode Json to objects.
However flutter doesn't support mirrors and so I cannot initialize the mapper over the normal way with bootstrapMapper();
Therefore I looked it up, I have to use staticBootstrapMapper(...)
/**
* initialize the mapper system.
*
* This function provides a mapper implementation that
* uses data generated by the redstone_mapper's transformer,
* instead of relying on the mirrors API.
*
*/
void staticBootstrapMapper(Map<Type, TypeInfo> types) {
_staticTypeInfo = types;
configure(_getOrCreateMapper, _createValidator);
}
Link to source code
I dont know what I should put into the map of Map<Type, TypeInfo> types.
Lets say I want to use ObjectData to transform json data to this object.
But how do I have to use this initializing method? Unfortunately I didnt find an example how to use this static bootstrap manager.
class ObjectData {
#Field()
#NotEmpty()
DataType dateType; // might be a User object
#Field()
#NotEmpty()
String id;
#Field()
#NotEmpty()
List<String> versions;
}
Mirrors isn't supported in Flutter, as noted above in comments.
You might want try alternative packages that don't rely on mirrors:
https://pub.dartlang.org/packages/json_serializable
https://pub.dartlang.org/packages/built_value
Of those two (and others) json_serializable looks like the easiest to get started, but might not have as many features.

How to Get Filename when using file pattern match in google-cloud-dataflow

Someone know how to get Filename when using file pattern match in google-cloud-dataflow?
I'm newbee to use dataflow. How to get filename when use file patten match, in this way.
p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*.txt"))
I'd like to how I detect filename that kinglear.txt,Hamlet.txt, etc.
If you would like to simply expand the filepattern and get a list of filenames matching it, you can use GcsIoChannelFactory.match("gs://dataflow-samples/shakespeare/*.txt") (see GcsIoChannelFactory).
If you would like to access the "current filename" from inside one of the DoFn's downstream in your pipeline - that is currently not supported (though there are some workarounds - see below). It is a common feature request and we are still thinking how best to fit it into the framework in a natural, generic and high-performant way.
Some workarounds include:
Writing a pipeline like this (the tf-idf example uses this approach):
DoFn readFile = ...(takes a filename, reads the file and produces records)...
p.apply(Create.of(filenames))
.apply(ParDo.of(readFile))
.apply(the rest of your pipeline)
This has the downside that dynamic work rebalancing features won't work particularly well, because they currently apply at the level of Read PTransform's only, but not at the level of ParDo's with high fan-out (like the one here, which would read a file and produce all records); and parallelization will only work to the level of files but files will not be split into sub-ranges. At the scale of reading Shakespeare this is not an issue, but if you are reading a set of files of wildly different size, some extremely large, then it may become an issue.
Implementing your own FileBasedSource (javadoc, general documentation) which would return records of type something like Pair<String, T> where the String is the filename and the T is the record you're reading. In this case the framework would handle the filepattern matching for you, dynamic work rebalancing would work just fine, however it is up to you to write the reading logic in your FileBasedReader.
Both of these work-arounds are non-ideal, but depending on your requirements, one of them may do the trick for you.
Update based on latest SDK
Java (sdk 2.9.0):
Beams TextIO readers do not give access to the filename itself, for these use cases we need to make use of FileIO to match the files and gain access to the information stored in the file name. Unlike TextIO, the reading of the file needs to be taken care of by the user in transforms downstream of the FileIO read. The results of a FileIO read is a PCollection the ReadableFile class contains the file name as metadata which can be used along with the contents of the file.
FileIO does have a convenience method readFullyAsUTF8String() which will read the entire file into a String object, this will read the whole file into memory first. If memory is a concern you can work directly with the file with utility classes like FileSystems.
From: Document Link
PCollection<KV<String, String>> filesAndContents = p
.apply(FileIO.match().filepattern("hdfs://path/to/*.gz"))
// withCompression can be omitted - by default compression is detected from the filename.
.apply(FileIO.readMatches().withCompression(GZIP))
.apply(MapElements
// uses imports from TypeDescriptors
.into(KVs(strings(), strings()))
.via((ReadableFile f) -> KV.of(
f.getMetadata().resourceId().toString(), f.readFullyAsUTF8String())));
Python (sdk 2.9.0):
For 2.9.0 for python you will need to collect the list of URI from outside of the Dataflow pipeline and feed it in as a parameter to the pipeline. For example making use of FileSystems to read in the list of files via a Glob pattern and then passing that to a PCollection for processing.
Once fileio see PR https://github.com/apache/beam/pull/7791/ is available, the following code would also be an option for python.
import apache_beam as beam
from apache_beam.io import fileio
with beam.Pipeline() as p:
readable_files = (p
| fileio.MatchFiles(‘hdfs://path/to/*.txt’)
| fileio.ReadMatches()
| beam.Reshuffle())
files_and_contents = (readable_files
| beam.Map(lambda x: (x.metadata.path,
x.read_utf8()))
One approach is to build a List<PCollection> where each entry corresponds to an input file, then use Flatten. For example, if you want to parse each line of a collection of files into a Foo object, you might do something like this:
public static class FooParserFn extends DoFn<String, Foo> {
private String fileName;
public FooParserFn(String fileName) {
this.fileName = fileName;
}
#Override
public void processElement(ProcessContext processContext) throws Exception {
String line = processContext.element();
// here you have access to both the line of text and the name of the file
// from which it came.
}
}
public static void main(String[] args) {
...
List<String> inputFiles = ...;
List<PCollection<Foo>> foosByFile =
Lists.transform(inputFiles,
new Function<String, PCollection<Foo>>() {
#Override
public PCollection<Foo> apply(String fileName) {
return p.apply(TextIO.Read.from(fileName))
.apply(new ParDo().of(new FooParserFn(fileName)));
}
});
PCollection<Foo> foos = PCollectionList.<Foo>empty(p).and(foosByFile).apply(Flatten.<Foo>pCollections());
...
}
One downside of this approach is that, if you have 100 input files, you'll also have 100 nodes in the Cloud Dataflow monitoring console. This makes it hard to tell what's going on. I'd be interested in hearing from the Google Cloud Dataflow people whether this approach is efficient.
I also had the 100 input files = 100 nodes on the dataflow diagram when using code similar to #danvk. I switched to an approach like this which resulted in all the reads being combined into a single block that you can expand to drill down into each file/directory that was read. The job also ran faster using this approach rather than the Lists.transform approach in our use case.
GcsOptions gcsOptions = options.as(GcsOptions.class);
List<GcsPath> paths = gcsOptions.getGcsUtil().expand(GcsPath.fromUri(options.getInputFile()));
List<String>filesToProcess = paths.stream().map(item -> item.toString()).collect(Collectors.toList());
PCollectionList<SomeClass> pcl = PCollectionList.empty(p);
for(String fileName : filesToProcess) {
pcl = pcl.and(
p.apply("ReadAvroFile" + fileName, AvroIO.Read.named("ReadFromAvro")
.from(fileName)
.withSchema(SomeClass.class)
)
.apply(ParDo.of(new MyDoFn(fileName)))
);
}
// flatten the PCollectionList, combining all the PCollections together
PCollection<SomeClass> flattenedPCollection = pcl.apply(Flatten.pCollections());
This might be a very late post for the above question, but I wanted to add answer with Beam bundled classes.
This could also be seen as an extracted code from the solution provided by #Reza Rokni.
PCollection<String> listOfFilenames =
pipe.apply(FileIO.match().filepattern("gs://apache-beam-samples/shakespeare/*"))
.apply(FileIO.readMatches())
.apply(
MapElements.into(TypeDescriptors.strings())
.via(
(FileIO.ReadableFile file) -> {
String f = file.getMetadata().resourceId().getFilename();
System.out.println(f);
return f;
}));
pipe.run().waitUntilFinish();
Above PCollection<String> will have a list of files available at any provided directory.
I was struggling with the same use case while using wildcard to read files from GCS but also needed to modify the collection based on the file name.The key is to use ReadFromTextWithFilename instead of readfromtext In java you already have a way out and you can use:
String filename =context.element().getMetadata().resourceId().getCurrentDirectory().toString()
inside your processElement method.
But for Python below technique will work:
-> Use beam.io.ReadFromTextWithFilename for reading the wildcard path from GCS
-> As per the document, ReadFromTextWithFilename returns the file's name and the file's content.
Below is the code snippet:
class GetFileNameFromWildcard(beam.DoFn):
def process(self, element, *args, **kwargs):
file_path, content = element
schema = ["id","name","mob","email","dept","store"]
store_name = file_path.split("/")[-2]
content_list = content.split(",")
content_list.append(store_name)
out_dict = dict(zip(schema,content_list))
print(out_dict)
yield out_dict
def run():
pipeline_options = PipelineOptions()
with beam.Pipeline(options=pipeline_options) as p:
# saving main session so that it can load global namespace on the Cloud Dataflow Worker
init = p | 'Begin Pipeline With Initiator' >> beam.Create(
["pcollection initializer"]) | 'Read From GCS' >> beam.io.ReadFromTextWithFilename(
"gs://<bkt-name>/20220826/*/dlp*", skip_header_lines=1) | beam.ParDo(
GetFileNameFromWildcard()) | beam.io.WriteToText(
'df_out.csv')

How to convert a pojo to xml using "as" keyword

I have a requirement to send an object as xml to a webservice. I already have the pojo, now I need to convert it to xml using Groovy. In grails I have used the as keyword, what is the equivalent code to do this in Groovy?
Example Grails code:
import grails.converters.*
render Airport.findByIata(params.iata) as XML
A naive example of doing this with StreamingMarkupBuilder would be:
class Airport {
String name
String code
int id
}
Writable pogoToXml( object ) {
new groovy.xml.StreamingMarkupBuilder().bind {
"${object.getClass().name}" {
object.getClass().declaredFields.grep { !it.synthetic }.name.each { n ->
"$n"( object."$n" )
}
}
}
}
println pogoToXml( new Airport( name:'Manchester', code:'MAN', id:1 ) )
Which should print:
<Airport><name>Manchester</name><code>MAN</code><id>1</id></Airport>
The as keyword is actually part of the Groovy language spec. The part you are missing is the XML class that does the conversion. This is really just a fancy class that walks the POJO and writes the XML (possibly using MarkupBuilder).
Groovy does not have a built-in class like grails.converters.XML that makes it so easy. Instead, you'll need to manually build the XML using MarkupBuilder or StreamingMarkupBuilder.
Neither of these will automatically convert a POJO or POGO to XML, you'll have to either process this yourself manually, or use reflection to automate the process.
I'd suggest that you might be able to copy the grails converter over, but it may have a lot of dependencies. Still, it's open source, that might be a starting point if you need a more reusable component.

Resources