I want to unmarshal an XML file to java object using JAXB. The XML file is very large and contains some nodes which I want to skip in some cases to improve performance as these elements are non editable by client java program.
A sample XML is as follows:
<Example id="10" date="1970-01-01" version="1.0">
<Properties>...</Properties>
<Summary>...</Summary>
<RawData>
<Document id="1">...</Document>
<Document id="2">...</Document>
<Document id="3">...</Document>
------
------
</RawData>
<Location></Location>
<Title></Title>
----- // more elements
</Example>
I have two use cases:
unmarshal into Example object which contains Properties, Summaries, RawData etc. without skipping any RawData. (already done this part)
unmarshal into Example object which exclude RawData. Elements nested in RawData is very large so do not want to read this in this use case.
Now I want to unmarshal the XML such that RawData can be skipped. I have tried the technique provided at this link.
Using technique provided in above link also skips all elements which come after RawData.
I have fixed the issue with XMLEventReader with following code:
public class PartialXmlEventReader implements XMLEventReader {
private final XMLEventReader reader;
private final QName qName;
private boolean skip = false;
public PartialXmlEventReader(final XMLEventReader reader, final QName element) {
this.reader = reader;
this.qName = element;
}
#Override
public String getElementText() throws XMLStreamException {
return reader.getElementText();
}
#Override
public Object getProperty(final String name) throws IllegalArgumentException {
return reader.getProperty(name);
}
#Override
public boolean hasNext() {
return reader.hasNext();
}
#Override
public XMLEvent nextEvent() throws XMLStreamException {
while (isEof(reader.peek())) {
reader.nextEvent();
}
return reader.nextEvent();
}
#Override
public XMLEvent nextTag() throws XMLStreamException {
return reader.nextTag();
}
#Override
public XMLEvent peek() throws XMLStreamException {
return reader.peek();
}
#Override
public Object next() {
return reader.next();
}
#Override
public void remove() {
reader.remove();
}
#Override
public void close() throws XMLStreamException {
reader.close();
}
private boolean isEof(final XMLEvent e) {
boolean returnValue = skip;
switch (e.getEventType()) {
case XMLStreamConstants.START_ELEMENT:
final StartElement se = (StartElement) e;
if (se.getName().equals(qName)) {
skip = true;
returnValue = true;
}
break;
case XMLStreamConstants.END_ELEMENT:
final EndElement ee = (EndElement) e;
if (ee.getName().equals(qName)) {
skip = false;
}
break;
}
return returnValue;
}
}
While Unmarshalling just pass this eventReader to the unmarshal method
final JAXBContext context = JAXBContext.newInstance(classes);
final Unmarshaller um = context.createUnmarshaller();
Reader reader = null;
try {
reader = new BufferedReader(new FileReader(xmlFile));
final QName qName = new QName("RawData");
final XMLInputFactory xif = XMLInputFactory.newInstance();
final XMLEventReader xmlEventReader = xif.createXMLEventReader(reader);
final Example example =
(Example) um.unmarshal(new PartialXmlEventReader(xmlEventReader, qName));
}
} finally {
IOUtils.closeQuietly(reader);
}
I hope this would help
try {
// First create a new XMLInputFactory
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
// Setup a new eventReader
InputStream in = new FileInputStream("myXml");
XMLEventReader eventReader = inputFactory.createXMLEventReader(in);
// Read the XML document
Example example = null;
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if (event.isStartElement()) {
StartElement startElement = event.asStartElement();
// If we have a example element we create a new example
if (startElement.getName().getLocalPart().equals("Example")) {
example = new Example();
// We read the attributes from this tag and add the date
// and id attribute to our object
Iterator<Attribute> attributes = startElement
.getAttributes();
while (attributes.hasNext()) {
Attribute attribute = attributes.next();
if (attribute.getName().toString().equals("date")) {
example.setDate(attribute.getValue());
} else if (attribute.getName().toString().equals("id")) {
example.setId(attribute.getValue());
}
}
}
//get the Properties tag and add to object example
if (event.isStartElement()) {
if (event.asStartElement().getName().getLocalPart()
.equals("Properties")) {
event = eventReader.nextEvent();
example.setProperites(event.asCharacters().getData());
continue;
}
}
//get the Summary tag and add to object example
if (event.asStartElement().getName().getLocalPart()
.equals("Summary")) {
event = eventReader.nextEvent();
example.setSummary(event.asCharacters().getData());
continue;
}
// when you encounter the Rawdata tag just continue
//without adding it to the object created
if (event.asStartElement().getName().getLocalPart()
.equals("Rawdata")) {
event = eventReader.nextEvent();
// don't do anything
continue;
}
//get the location tag and add to object example
if (event.asStartElement().getName().getLocalPart()
.equals("Location")) {
event = eventReader.nextEvent();
example.setLocation(event.asCharacters().getData());
continue;
}
// read and add other elements that can be added
}
// If we reach the end of an example element/tag i.e closing tag
if (event.isEndElement()) {
EndElement endElement = event.asEndElement();
if (endElement.getName().getLocalPart().equals("Example")) {
//do something
}
}
}
} catch (FileNotFoundException | XMLStreamException e) {
}
Related
I am trying to develop custom source for parallel GCS content scanning. The naive approach would be to loop through listObjects function calls:
while (...) {
Objects objects = gcsUtil.listObjects(bucket, prefix, pageToken);
pageToken = objects.getNextPageToken();
...
}
The problem is performance for the tens of millions objects.
Instead of the single thread code we can add delimiter / and submit parallel processed for each prefix found:
...
Objects objects = gcsUtil.listObjects(bucket, prefix, pageToken, "/");
for (String subPrefix : object.getPrefixes()) {
scanAsync(bucket, subPrefix);
}
...
Next idea was to try to wrap this logic in Splittable DoFn.
Choice of RestrictionTracker: I don't see how can be used any of exiting RestrictionTracker. So decided to write own. Restriction itself is basically list of prefixes to scan. tryClaim checks if there is more prefixes left and receive newly scanned to append them to current restriction. trySplit splits list of prefixes.
The problem that I faced that trySplit can be called before all subPrefixes are found. In this case current restriction may receive more work to do after it was splitted. And it seems that trySplit is being called until it returns a not null value for a given RestrictionTracker: after certain number of elements goes to the output or after 1 second via scheduler or when ProcessContext.resume() returned. This doesn't work in my case as new prefixes can be found at any time. And I can't checkpoint via return ProcessContext.resume() because if split was already done, possible work that left in current restriction will cause checkDone() function to fail.
Another problem that I suspect that I couldn't achieve parallel execution in DirectRunner. As trySplit was always called with fractionOfRemainder=0 and new RestrictionTracker was started only after current one completed its piece of work.
It would be also great to read more detailed explanation about Splittable DoFn components lifecycle. How parallel execution per element is achieved. And how and when state of RestrictionTracker can be modified.
UPD: Adding simplified code that should show intended implementation
#DoFn.BoundedPerElement
private static class ScannerDoFn extends DoFn<String, String> {
private transient GcsUtil gcsUtil;
#GetInitialRestriction
public ScannerRestriction getInitialRestriction(#Element String bucket) {
return ScannerRestriction.init(bucket);
}
#ProcessElement
public ProcessContinuation processElement(
ProcessContext c,
#Element String bucket,
RestrictionTracker<ScannerRestriction, ScannerPosition> tracker,
OutputReceiver<String> outputReceiver) {
if (gcsUtil == null) {
gcsUtil = c.getPipelineOptions().as(GcsOptions.class).getGcsUtil();
}
ScannerRestriction currentRestriction = tracker.currentRestriction();
ScannerPosition position = new ScannerPosition();
while (true) {
if (tracker.tryClaim(position)) {
if (position.completedCurrent) {
// position.clear();
// ideally I would to get checkpoint here before starting new work
return ProcessContinuation.resume();
}
try {
Objects objects = gcsUtil.listObjects(
currentRestriction.bucket,
position.currentPrefix,
position.currentPageToken,
"/");
if (objects.getItems() != null) {
for (StorageObject o : objects.getItems()) {
outputReceiver.output(o.getName());
}
}
if (objects.getPrefixes() != null) {
position.newPrefixes.addAll(objects.getPrefixes());
}
position.currentPageToken = objects.getNextPageToken();
if (position.currentPageToken == null) {
position.completedCurrent = true;
}
} catch (Throwable throwable) {
logger.error("Error during scan", throwable);
}
} else {
return ProcessContinuation.stop();
}
}
}
#NewTracker
public RestrictionTracker<ScannerRestriction, ScannerPosition> restrictionTracker(#Restriction ScannerRestriction restriction) {
return new ScannerRestrictionTracker(restriction);
}
#GetRestrictionCoder
public Coder<ScannerRestriction> getRestrictionCoder() {
return ScannerRestriction.getCoder();
}
}
public static class ScannerPosition {
private String currentPrefix;
private String currentPageToken;
private final List<String> newPrefixes;
private boolean completedCurrent;
public ScannerPosition() {
this.currentPrefix = null;
this.currentPageToken = null;
this.newPrefixes = Lists.newArrayList();
this.completedCurrent = false;
}
public void clear() {
this.currentPageToken = null;
this.currentPrefix = null;
this.completedCurrent = false;
}
}
private static class ScannerRestriction {
private final String bucket;
private final LinkedList<String> prefixes;
private ScannerRestriction(String bucket) {
this.bucket = bucket;
this.prefixes = Lists.newLinkedList();
}
public static ScannerRestriction init(String bucket) {
ScannerRestriction res = new ScannerRestriction(bucket);
res.prefixes.add("");
return res;
}
public ScannerRestriction empty() {
return new ScannerRestriction(bucket);
}
public boolean isEmpty() {
return prefixes.isEmpty();
}
public static Coder<ScannerRestriction> getCoder() {
return ScannerRestrictionCoder.INSTANCE;
}
private static class ScannerRestrictionCoder extends AtomicCoder<ScannerRestriction> {
private static final ScannerRestrictionCoder INSTANCE = new ScannerRestrictionCoder();
private final static Coder<List<String>> listCoder = ListCoder.of(StringUtf8Coder.of());
#Override
public void encode(ScannerRestriction value, OutputStream outStream) throws IOException {
NullableCoder.of(StringUtf8Coder.of()).encode(value.bucket, outStream);
listCoder.encode(value.prefixes, outStream);
}
#Override
public ScannerRestriction decode(InputStream inStream) throws IOException {
String bucket = NullableCoder.of(StringUtf8Coder.of()).decode(inStream);
List<String> prefixes = listCoder.decode(inStream);
ScannerRestriction res = new ScannerRestriction(bucket);
res.prefixes.addAll(prefixes);
return res;
}
}
}
private static class ScannerRestrictionTracker extends RestrictionTracker<ScannerRestriction, ScannerPosition> {
private ScannerRestriction restriction;
private ScannerPosition lastPosition = null;
ScannerRestrictionTracker(ScannerRestriction restriction) {
this.restriction = restriction;
}
#Override
public boolean tryClaim(ScannerPosition position) {
restriction.prefixes.addAll(position.newPrefixes);
position.newPrefixes.clear();
if (position.completedCurrent) {
// completed work for current prefix
assert lastPosition != null && lastPosition.currentPrefix.equals(position.currentPrefix);
lastPosition = null;
return true; // return true but we would need to claim again if we need to get next prefix
} else if (lastPosition != null && lastPosition.currentPrefix.equals(position.currentPrefix)) {
// proceed work for current prefix
lastPosition = position;
return true;
}
// looking for next prefix
assert position.currentPrefix == null;
assert lastPosition == null;
if (restriction.isEmpty()) {
// no work to do
return false;
}
position.currentPrefix = restriction.prefixes.poll();
lastPosition = position;
return true;
}
#Override
public ScannerRestriction currentRestriction() {
return restriction;
}
#Override
public SplitResult<ScannerRestriction> trySplit(double fractionOfRemainder) {
if (lastPosition == null && restriction.isEmpty()) {
// no work at all
return null;
}
if (lastPosition != null && restriction.isEmpty()) {
// work at the moment only at currently scanned prefix
return SplitResult.of(restriction, restriction.empty());
}
int size = restriction.prefixes.size();
int newSize = new Double(Math.round(fractionOfRemainder * size)).intValue();
if (newSize == 0) {
ScannerRestriction residual = restriction;
restriction = restriction.empty();
return SplitResult.of(restriction, residual);
}
ScannerRestriction residual = restriction.empty();
for (int i=newSize; i<=size; i++) {
residual.prefixes.add(restriction.prefixes.removeLast());
}
return SplitResult.of(restriction, residual);
}
#Override
public void checkDone() throws IllegalStateException {
if (lastPosition != null) {
throw new IllegalStateException("Called checkDone on not completed job");
}
}
#Override
public IsBounded isBounded() {
return IsBounded.UNBOUNDED;
}
}
I'm trying to compile an android vector image in Xamarin Android based on a Java Sample but The system shows the following error:
{System.ArgumentException: type Parameter name: Type is not derived from a java type. at Java.Lang.Class.FromType (System.Type type) [0x00012] in the following part of the code:
Java.Lang.Class[] parameterType = new Java.Lang.Class[] { Java.Lang.Class.FromType(typeof(sbyte[])) };
Basically
private Drawable GetVectorDrawable(Context context, byte[] binXml)
{
try
{
// Get the binary XML parser (XmlBlock.Parser) and use it to create the drawable
// This is the equivalent of what AssetManager#getXml() does
// get the Class instance using forName method
var xmlBlock = Java.Lang.Class.ForName("android.content.res.XmlBlock");
System.Diagnostics.Debug.WriteLine(typeof(sbyte[]));
Java.Lang.Class[] parameterType = new Java.Lang.Class[] { Java.Lang.Class.FromType(typeof(sbyte[])) };
var xmlBlockConstr = xmlBlock.GetConstructor(parameterType);
//var xmlBlockConstr = xmlBlock.GetConstructor(Class.FromType(typeof(sbyte[])));
var xmlParserNew = xmlBlock.GetDeclaredMethod("newParser");
xmlBlockConstr.Accessible = true;
xmlParserNew.Accessible = true;
var parser = xmlParserNew.Invoke(xmlBlockConstr.NewInstance(binXml)) as XmlPullParser;
//System.Xml.XmlReader reader = parser.ToString();
//if (Build.VERSION.SdkInt >= (BuildVersionCodes)24)
//{
//ByteArray
//Drawable.CreateFromXml(context.Resources, reader);
//}
}
catch (System.Exception e)
{
}
return null;
}
This is the constructor I try to get:
public XmlBlock(byte[] data) {
mAssets = null;
mNative = nativeCreate(data, 0, data.length);
mStrings = new StringBlock(nativeGetStringBlock(mNative), false);
}
And this is the Java code I tried to traduce:
/**
* Create a vector drawable from a binary XML byte array.
* #param context Any context.
* #param binXml Byte array containing the binary XML.
* #return The vector drawable or null it couldn't be created.
*/
public static Drawable getVectorDrawable(#NonNull Context context, #NonNull byte[] binXml) {
try {
// Get the binary XML parser (XmlBlock.Parser) and use it to create the drawable
// This is the equivalent of what AssetManager#getXml() does
#SuppressLint("PrivateApi")
Class<?> xmlBlock = Class.forName("android.content.res.XmlBlock");
Constructor xmlBlockConstr = xmlBlock.getConstructor(byte[].class);
Method xmlParserNew = xmlBlock.getDeclaredMethod("newParser");
xmlBlockConstr.setAccessible(true);
xmlParserNew.setAccessible(true);
XmlPullParser parser = (XmlPullParser) xmlParserNew.invoke(
xmlBlockConstr.newInstance((Object) binXml));
if (Build.VERSION.SDK_INT >= 24) {
return Drawable.createFromXml(context.getResources(), parser);
} else {
// Before API 24, vector drawables aren't rendered correctly without compat lib
final AttributeSet attrs = Xml.asAttributeSet(parser);
int type = parser.next();
while (type != XmlPullParser.START_TAG) {
type = parser.next();
}
return VectorDrawableCompat.createFromXmlInner(context.getResources(), parser, attrs, null);
}
} catch (Exception e) {
Log.e(TAG, "Vector creation failed", e);
}
return null;
}
from this url: Given byte-array of VectorDrawable, how can I create an instance of VectorDrawable from it?
Can you help me?
Our program displays a tree control showing the metadata structure of the XML file they are using as a datasource. So it displays all elements & attributes in use in the XML file, like this:
Employees
Employee
FirstName
LastName
Orders
Order
OrderId
For the case where the user does not pass us a XSD file, we need to walk the XML file and build up the metadata structure.
The full code for this is at SaxonQuestions.zip, TestBuildTree.java and is also listed below.
I am concerned that my code is not the most efficient, or maybe even wrong. It works, but it works on the 3 XML files I tested it on. My questions are:
What is the best way to get the root element from the DOM? Is it walking the children of the DOM root node?
What is the best way to determine if an element has data (as opposed to just child elements)? The best I could come up with throws an exception if not and I don't think code executing the happy path should throw an exception.
What is the best way to get the class of the data held in an element or attribute? Is it: ((XdmAtomicValue)((XdmNode)currentNode).getTypedValue()).getValue().getClass();
Is the best way to walk all the nodes to use XdmNode.axisIterator? And to do so as I have in this code?
TestBuildTree.java
import net.sf.saxon.s9api.*;
import java.io.File;
import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.List;
public class TestBuildTree {
public static void main(String[] args) throws Exception {
XmlDatasource datasource = new XmlDatasource(
new FileInputStream(new File("files", "SouthWind.xml").getCanonicalPath()),
null);
// Question:
// Is this the best way to get the root element?
// get the root element
XdmNode rootNode = null;
for (XdmNode node : datasource.getXmlRootNode().children()) {
if (node.getNodeKind() == XdmNodeKind.ELEMENT) {
rootNode = node;
break;
}
}
TestBuildTree buildTree = new TestBuildTree(rootNode);
Element root = buildTree.addNode();
System.out.println("Schema:");
printElement("", root);
}
private static void printElement(String indent, Element element) {
System.out.println(indent + "<" + element.name + "> (" + (element.type == null ? "null" : element.type.getSimpleName()) + ")");
indent += " ";
for (Attribute attr : element.attributes)
System.out.println(indent + "=" + attr.name + " (" + (attr.type == null ? "null" : attr.type.getSimpleName()) + ")");
for (Element child : element.children)
printElement(indent, child);
}
protected XdmItem currentNode;
public TestBuildTree(XdmItem currentNode) {
this.currentNode = currentNode;
}
private Element addNode() throws SaxonApiException {
String name = ((XdmNode)currentNode).getNodeName().getLocalName();
// Question:
// Is this the best way to determine that this element has data (as opposed to child elements)?
Boolean elementHasData;
try {
((XdmNode) currentNode).getTypedValue();
elementHasData = true;
} catch (Exception ex) {
elementHasData = false;
}
// Questions:
// Is this the best way to get the type of the element value?
// If no schema is it always String?
Class valueClass = ! elementHasData ? null : ((XdmAtomicValue)((XdmNode)currentNode).getTypedValue()).getValue().getClass();
Element element = new Element(name, valueClass, null);
// add in attributes
XdmSequenceIterator currentSequence;
if ((currentSequence = moveTo(Axis.ATTRIBUTE)) != null) {
do {
name = ((XdmNode) currentNode).getNodeName().getLocalName();
// Questions:
// Is this the best way to get the type of the attribute value?
// If no schema is it always String?
valueClass = ((XdmAtomicValue)((XdmNode)currentNode).getTypedValue()).getValue().getClass();
Attribute attr = new Attribute(name, valueClass, null);
element.attributes.add(attr);
} while (moveToNextInCurrentSequence(currentSequence));
moveTo(Axis.PARENT);
}
// add in children elements
if ((currentSequence = moveTo(Axis.CHILD)) != null) {
do {
Element child = addNode();
// if we don't have this, add it
Element existing = element.getChildByName(child.name);
if (existing == null)
element.children.add(child);
else
// add in any children this does not have
existing.addNewItems (child);
} while (moveToNextInCurrentSequence(currentSequence));
moveTo(Axis.PARENT);
}
return element;
}
// moves to element or attribute
private XdmSequenceIterator moveTo(Axis axis) {
XdmSequenceIterator en = ((XdmNode) currentNode).axisIterator(axis);
boolean gotIt = false;
while (en.hasNext()) {
currentNode = en.next();
if (((XdmNode) currentNode).getNodeKind() == XdmNodeKind.ELEMENT || ((XdmNode) currentNode).getNodeKind() == XdmNodeKind.ATTRIBUTE) {
gotIt = true;
break;
}
}
if (gotIt) {
if (axis == Axis.ATTRIBUTE || axis == Axis.CHILD || axis == Axis.NAMESPACE)
return en;
return null;
}
return null;
}
// moves to next element or attribute
private Boolean moveToNextInCurrentSequence(XdmSequenceIterator currentSequence)
{
if (currentSequence == null)
return false;
while (currentSequence.hasNext())
{
currentNode = currentSequence.next();
if (((XdmNode)currentNode).getNodeKind() == XdmNodeKind.ELEMENT || ((XdmNode)currentNode).getNodeKind() == XdmNodeKind.ATTRIBUTE)
return true;
}
return false;
}
static class Node {
String name;
Class type;
String description;
public Node(String name, Class type, String description) {
this.name = name;
this.type = type;
this.description = description;
}
}
static class Element extends Node {
List<Element> children;
List<Attribute> attributes;
public Element(String name, Class type, String description) {
super(name, type, description);
children = new ArrayList<>();
attributes = new ArrayList<>();
}
public Element getChildByName(String name) {
for (Element child : children) {
if (child.name.equals(name))
return child;
}
return null;
}
public void addNewItems(Element child) {
for (Attribute attrAdd : child.attributes) {
boolean haveIt = false;
for (Attribute attrExist : attributes)
if (attrExist.name.equals(attrAdd.name)) {
haveIt = true;
break;
}
if (!haveIt)
attributes.add(attrAdd);
}
for (Element elemAdd : child.children) {
Element exist = null;
for (Element elemExist : children)
if (elemExist.name.equals(elemAdd.name)) {
exist = elemExist;
break;
}
if (exist == null)
children.add(elemAdd);
else
exist.addNewItems(elemAdd);
}
}
}
static class Attribute extends Node {
public Attribute(String name, Class type, String description) {
super(name, type, description);
}
}
}
XmlDatasource.java
import com.saxonica.config.EnterpriseConfiguration;
import com.saxonica.ee.s9api.SchemaValidatorImpl;
import net.sf.saxon.Configuration;
import net.sf.saxon.lib.FeatureKeys;
import net.sf.saxon.s9api.*;
import net.sf.saxon.type.SchemaException;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
import javax.xml.transform.Source;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamSource;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
public class XmlDatasource {
/** the DOM all searches are against */
private XdmNode xmlRootNode;
private XPathCompiler xPathCompiler;
/** key == the prefix; value == the uri mapped to that prefix */
private HashMap<String, String> prefixToUriMap = new HashMap<>();
/** key == the uri mapped to that prefix; value == the prefix */
private HashMap<String, String> uriToPrefixMap = new HashMap<>();
public XmlDatasource (InputStream xmlData, InputStream schemaFile) throws SAXException, SchemaException, SaxonApiException, IOException {
boolean haveSchema = schemaFile != null;
// call this before any instantiation of Saxon classes.
Configuration config = createEnterpriseConfiguration();
if (haveSchema) {
Source schemaSource = new StreamSource(schemaFile);
config.addSchemaSource(schemaSource);
}
Processor processor = new Processor(config);
DocumentBuilder doc_builder = processor.newDocumentBuilder();
XMLReader reader = createXMLReader();
InputSource xmlSource = new InputSource(xmlData);
SAXSource saxSource = new SAXSource(reader, xmlSource);
if (haveSchema) {
SchemaValidator validator = new SchemaValidatorImpl(processor);
doc_builder.setSchemaValidator(validator);
}
xmlRootNode = doc_builder.build(saxSource);
xPathCompiler = processor.newXPathCompiler();
if (haveSchema)
xPathCompiler.setSchemaAware(true);
declareNameSpaces();
}
public XdmNode getXmlRootNode() {
return xmlRootNode;
}
public XPathCompiler getxPathCompiler() {
return xPathCompiler;
}
/**
* Create a XMLReader set to disallow XXE aattacks.
* #return a safe XMLReader.
*/
public static XMLReader createXMLReader() throws SAXException {
XMLReader reader = XMLReaderFactory.createXMLReader();
// stop XXE https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#JAXP_DocumentBuilderFactory.2C_SAXParserFactory_and_DOM4J
reader.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
reader.setFeature("http://xml.org/sax/features/external-general-entities", false);
reader.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
return reader;
}
private void declareNameSpaces() throws SaxonApiException {
// saxon has some of their functions set up with this.
prefixToUriMap.put("saxon", "http://saxon.sf.net");
uriToPrefixMap.put("http://saxon.sf.net", "saxon");
XdmValue list = xPathCompiler.evaluate("//namespace::*", xmlRootNode);
if (list == null || list.size() == 0)
return;
for (int index=0; index<list.size(); index++) {
XdmNode node = (XdmNode) list.itemAt(index);
String prefix = node.getNodeName() == null ? "" : node.getNodeName().getLocalName();
// xml, xsd, & xsi are XML structure ones, not ones used in the XML
if (prefix.equals("xml") || prefix.equals("xsd") || prefix.equals("xsi"))
continue;
// use default prefix if prefix is empty.
if (prefix == null || prefix.isEmpty())
prefix = "def";
// this returns repeats, so if a repeat, go on to next.
if (prefixToUriMap.containsKey(prefix))
continue;
String uri = node.getStringValue();
if (uri != null && !uri.isEmpty()) {
xPathCompiler.declareNamespace(prefix, uri);
prefixToUriMap.put(prefix, uri);
uriToPrefixMap.put(uri, prefix); }
}
}
public static EnterpriseConfiguration createEnterpriseConfiguration()
{
EnterpriseConfiguration configuration = new EnterpriseConfiguration();
configuration.supplyLicenseKey(new BufferedReader(new java.io.StringReader(deobfuscate(key))));
configuration.setConfigurationProperty(FeatureKeys.SUPPRESS_XPATH_WARNINGS, Boolean.TRUE);
return configuration;
}
}
I found this description of implementing a Sentiment Analysis task with OpenNLP. In my case I am using the newest OPenNLP-version, i.e., version 1.8.0. In the following example, they use a Maximum Entropy Model. I am using the same input.txt (tweets.txt)
http://technobium.com/sentiment-analysis-using-opennlp-document-categorizer/
public class StartSentiment {
public static DoccatModel model = null;
public static String[] analyzedTexts = {"I hate Mondays!"/*, "Electricity outage, this is a nightmare"/*, "I love it"*/};
public static void main(String[] args) throws IOException {
// begin of sentiment analysis
trainModel();
for(int i=0; i<analyzedTexts.length;i++){
classifyNewText(analyzedTexts[i]);
}
}
private static String readFile(String pathname) throws IOException {
File file = new File(pathname);
StringBuilder fileContents = new StringBuilder((int)file.length());
Scanner scanner = new Scanner(file);
String lineSeparator = System.getProperty("line.separator");
try {
while(scanner.hasNextLine()) {
fileContents.append(scanner.nextLine() + lineSeparator);
}
return fileContents.toString();
} finally {
scanner.close();
}
}
public static void trainModel() {
MarkableFileInputStreamFactory dataIn = null;
try {
dataIn = new MarkableFileInputStreamFactory(
new File("bin/text.txt"));
ObjectStream<String> lineStream = null;
lineStream = new PlainTextByLineStream(dataIn, StandardCharsets.UTF_8);
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
TrainingParameters tp = new TrainingParameters();
tp.put(TrainingParameters.CUTOFF_PARAM, "2");
tp.put(TrainingParameters.ITERATIONS_PARAM, "30");
DoccatFactory df = new DoccatFactory();
model = DocumentCategorizerME.train("en", sampleStream, tp, df);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (dataIn != null) {
try {
} catch (Exception e2) {
e2.printStackTrace();
}
}
}
}
public static void classifyNewText(String text){
DocumentCategorizerME myCategorizer = new DocumentCategorizerME(model);
double[] outcomes = myCategorizer.categorize(new String[]{text});
String category = myCategorizer.getBestCategory(outcomes);
if (category.equalsIgnoreCase("1")){
System.out.print("The text is positive");
} else {
System.out.print("The text is negative");
}
}
}
In my case no matter what input String I am using, I am only getting a positive estimation of the input string. Any idea what could be the reason?
Thanks
I need to allow my content pipeline extension to use a pattern similar to a factory. I start with a dictionary type:
public delegate T Mapper<T>(MapFactory<T> mf, XElement d);
public class MapFactory<T>
{
Dictionary<string, Mapper<T>> map = new Dictionary<string, Mapper<T>>();
public void Add(string s, Mapper<T> m)
{
map.Add(s, m);
}
public T Get(XElement xe)
{
if (xe == null) throw new ArgumentNullException(
"Invalid document");
var key = xe.Name.ToString();
if (!map.ContainsKey(key)) throw new ArgumentException(
key + " is not a valid key.");
return map[key](this, xe);
}
public IEnumerable<T> GetAll(XElement xe)
{
if (xe == null) throw new ArgumentNullException(
"Invalid document");
foreach (var e in xe.Elements())
{
var val = e.Name.ToString();
if (map.ContainsKey(val))
yield return map[val](this, e);
}
}
}
Here is one type of object I want to store:
public partial class TestContent
{
// Test type
public string title;
// Once test if true
public bool once;
// Parameters
public Dictionary<string, object> args;
public TestContent()
{
title = string.Empty;
args = new Dictionary<string, object>();
}
public TestContent(XElement xe)
{
title = xe.Name.ToString();
args = new Dictionary<string, object>();
xe.ParseAttribute("once", once);
}
}
XElement.ParseAttribute is an extension method that works as one might expect. It returns a boolean that is true if successful.
The issue is that I have many different types of tests, each of which populates the object in a way unique to the specific test. The element name is the key to MapFactory's dictionary. This type of test, while atypical, illustrates my problem.
public class LogicTest : TestBase
{
string opkey;
List<TestBase> items;
public override bool Test(BehaviorArgs args)
{
if (items == null) return false;
if (items.Count == 0) return false;
bool result = items[0].Test(args);
for (int i = 1; i < items.Count; i++)
{
bool other = items[i].Test(args);
switch (opkey)
{
case "And":
result &= other;
if (!result) return false;
break;
case "Or":
result |= other;
if (result) return true;
break;
case "Xor":
result ^= other;
break;
case "Nand":
result = !(result & other);
break;
case "Nor":
result = !(result | other);
break;
default:
result = false;
break;
}
}
return result;
}
public static TestContent Build(MapFactory<TestContent> mf, XElement xe)
{
var result = new TestContent(xe);
string key = "Or";
xe.GetAttribute("op", key);
result.args.Add("key", key);
var names = mf.GetAll(xe).ToList();
if (names.Count() < 2) throw new ArgumentException(
"LogicTest requires at least two entries.");
result.args.Add("items", names);
return result;
}
}
My actual code is more involved as the factory has two dictionaries, one that turns an XElement into a content type to write and another used by the reader to create the actual game objects.
I need to build these factories in code because they map strings to delegates. I have a service that contains several of these factories. The mission is to make these factory classes available to a content processor. Neither the processor itself nor the context it uses as a parameter have any known hooks to attach an IServiceProvider or equivalent.
Any ideas?
I needed to create a data structure essentially on demand without access to the underlying classes as they came from a third party, in this case XNA Game Studio. There is only one way to do this I know of... statically.
public class TestMap : Dictionary<string, string>
{
private static readonly TestMap map = new TestMap();
private TestMap()
{
Add("Logic", "LogicProcessor");
Add("Sequence", "SequenceProcessor");
Add("Key", "KeyProcessor");
Add("KeyVector", "KeyVectorProcessor");
Add("Mouse", "MouseProcessor");
Add("Pad", "PadProcessor");
Add("PadVector", "PadVectorProcessor");
}
public static TestMap Map
{
get { return map; }
}
public IEnumerable<TestContent> Collect(XElement xe, ContentProcessorContext cpc)
{
foreach(var e in xe.Elements().Where(e => ContainsKey(e.Name.ToString())))
{
yield return cpc.Convert<XElement, TestContent>(
e, this[e.Name.ToString()]);
}
}
}
I took this a step further and created content processors for each type of TestBase:
/// <summary>
/// Turns an imported XElement into a TestContent used for a LogicTest
/// </summary>
[ContentProcessor(DisplayName = "LogicProcessor")]
public class LogicProcessor : ContentProcessor<XElement, TestContent>
{
public override TestContent Process(XElement input, ContentProcessorContext context)
{
var result = new TestContent(input);
string key = "Or";
input.GetAttribute("op", key);
result.args.Add("key", key);
var items = TestMap.Map.Collect(input, context);
if (items.Count() < 2) throw new ArgumentNullException(
"LogicProcessor requires at least two items.");
result.args.Add("items", items);
return result;
}
}
Any attempt to reference or access the class such as calling TestMap.Collect will generate the underlying static class if needed. I basically moved the code from LogicTest.Build to the processor. I also carry out any needed validation in the processor.
When I get to reading these classes I will have the ContentService to help.