Is net.sf.saxon.s9api.XsltTransformer designed for one time use? - saxon

I don't believe I adequately understand the XsltTransformer class enough to explain why method f1 is superior to f2. In fact, f1 finishes in about 40 seconds, consuming between 750mb and 1gb of memory. I was expecting f2 to be a better solution but it never finishes for the same lengthy list of input files. By the time I kill it, it has processed only about 1000 input files while consuming over 4gb of memory.
import java.io.*;
import javax.xml.transform.stream.StreamSource;
import net.sf.saxon.s9api.*;
public class foreachfile {
private static long f1 (Processor p, XsltExecutable e, Serializer ser, String args[]) {
long maxTotalMemory = 0;
Runtime rt = Runtime.getRuntime();
for (int i=1; i<args.length; i++) {
String xmlfile = args[i];
try {
XsltTransformer t = e.load();
t.setDestination(ser);
t.setInitialContextNode(p.newDocumentBuilder().build(new StreamSource(new File(xmlfile))));
t.transform();
long tm = rt.totalMemory();
if (tm > maxTotalMemory)
maxTotalMemory = tm;
} catch (Throwable ex) {
System.err.println(ex);
}
}
return maxTotalMemory;
}
private static long f2 (Processor p, XsltExecutable e, Serializer ser, String args[]) {
long maxTotalMemory = 0;
Runtime rt = Runtime.getRuntime();
XsltTransformer t = e.load();
t.setDestination(ser);
for (int i=1; i<args.length; i++) {
String xmlfile = args[i];
try {
t.setInitialContextNode(p.newDocumentBuilder().build(new StreamSource(new File(xmlfile))));
t.transform();
long tm = rt.totalMemory();
if (tm > maxTotalMemory)
maxTotalMemory = tm;
} catch (Throwable ex) {
System.err.println(ex);
}
}
return maxTotalMemory;
}
public static void main (String args[]) throws SaxonApiException, Exception {
String usecase = System.getProperty("xslt.usecase");
int uc = Integer.parseInt(usecase);
String xslfile = args[0];
Processor p = new Processor(true);
XsltCompiler c = p.newXsltCompiler();
XsltExecutable e = c.compile(new StreamSource(new File(xslfile)));
Serializer ser = new Serializer();
ser.setOutputStream(System.out);
long maxTotalMemory = uc == 1 ? f1(p, e, ser, args) : f2(p, e, ser, args);
System.err.println(String.format("Max total memory was %d", maxTotalMemory));
}
}

I normally recommend using a new XsltTransformer for each transformation. However, the class is serially reusable (you can perform multiple transformations one after another, but not concurrently). The XsltTransformer keeps certain resources in memory, in case they are needed again: notably, all documents read using the doc() or document() functions. This can be useful, for example, if you want to transform one set of input documents to five different output formats as part of your publishing workflow. But if this reuse of resources doesn't give you any benefits, it merely imposes a cost in memory use, which you can avoid by creating a new transformer each time. The same applies if you use the JAXP interface.

Related

Fibonacci sequence, public static void xxx

I'm just a very beginner and need for help with Fibonacci sequence. So the problem is that I need to ask a number from the answerer and secondly print the Fibonacci number that fits with the answerer's number? Is the method that I need to use "public static void xxx" loop?
I hope someone understands my bad English and can help me with my problem.
I hope you need it in java:
import java.io.*;
public class Fibonacci{
// your method public static void xxx
public static void fib() throws IOException
{
// take input from user
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int n = Integer.parseInt(br.readLine());
// compute nth fibonacci: your loop
int f1 = 0, f2 = 1;
if(n == 0)
System.out.println(f1);
for(int i=2; i<n; i++)
{
int fi = f1 + f2;
f1 = f2;
f2 = fi;
}
// print your answer
System.out.println(f2);
}
public static void main(Strings args[])
{
// call fib method
fib();
}
}

Custom Batch filter in weka

I am trying to build a custom batch filter that extends SimpleBatchFilter. However, I am experiencing the problem of running it second time to get an inverted output. Here is the relevant code and the error I am getting after both runs are completed:
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 79, Size: 79
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at weka.core.Attribute.addStringValue(Attribute.java:994)
at weka.core.StringLocator.copyStringValues(StringLocator.java:155)
at weka.core.StringLocator.copyStringValues(StringLocator.java:91)
at weka.filters.Filter.copyValues(Filter.java:373)
at weka.filters.Filter.push(Filter.java:290)
at weka.filters.SimpleBatchFilter.batchFinished(SimpleBatchFilter.java:266)
at weka.filters.Filter.useFilter(Filter.java:667)
at likeability.Main.main(Main.java:30)
And here is the relevant code:
public class TestFilter extends SimpleBatchFilter {
private Attribute a;
private Attribute b;
private int sampleSizePercent = 15;
private boolean invert = false;
private int seed = 1;
#Override
protected Instances process(Instances inst) throws Exception {
ArrayList<Instances> partitionsA = partition(inst, a);
ArrayList<Instances> partitions = new ArrayList<Instances>();
for(Instances data: partitionsA) {
partitions.addAll(partition(data, b));
}
return getTestSet(partitions);
}
/*
* Partitions the data so that there's only one nominal value of the
* attribute a in one partition.
*/
private ArrayList<Instances> partition(Instances data, Attribute att) throws Exception {
ArrayList<Instances> instances = new ArrayList<Instances>();
for (int i = 0; i < att.numValues(); i++){
RemoveWithValues rm = new RemoveWithValues();
rm.setAttributeIndex(Integer.toString(att.index()+1));
rm.setInvertSelection(true);
rm.setNominalIndices(Integer.toString(i+1));
rm.setInputFormat(data);
instances.add(Filter.useFilter(data, rm));
}
return instances;
}
private Instances getTestSet(List<Instances> insts) throws Exception {
Instances output = new Instances(insts.get(0), 0);
for(Instances inst: insts) {
Resample filter = new Resample();
filter.setRandomSeed(seed);
filter.setNoReplacement(true);
filter.setInvertSelection(invert);
filter.setSampleSizePercent(sampleSizePercent);
filter.setInputFormat(inst);
Instances curr = Filter.useFilter(inst, filter);
System.out.println(inst.size() + " " + curr.size());
output.addAll(curr);
}
return output;
}
#Override
protected Instances determineOutputFormat(Instances arg) throws Exception {
return new Instances(arg, 0);
}
#Override
public String globalInfo() {
return "A filter which partitions the data so that each partition contains"
+ " only instances with one value of attribute a and b, then takes "
+ "a random subset of values from each partition and merges them to"
+ " produce the final set.";
}
public Capabilities getCapabilities() {
Capabilities result = super.getCapabilities();
result.enableAllAttributes();
result.enableAllClasses();
result.enable(Capability.NO_CLASS); // filter doesn't need class to be set
return result;
}
//Main and getters and setters
}
And this is how I call it:
TestFilter filter = new TestFilter();
filter.setA(data.attribute("gender"));
filter.setB(data.attribute("age"));
filter.setInputFormat(data);
Instances test = Filter.useFilter(data, filter);
filter.setInvert(true);
filter.setInputFormat(data);
Instances train = Filter.useFilter(data, filter);
It seems to me quite stupid that I would need to use those two lines between the calls. I suspect I should use isBatchFinished(), does it mean I have to implement it extending BatchFilter rather then SimpleBatchFilter? It would be also helpful to see some successful implementations, since the only ones I could find where the ones in the WEKA manual.
I solved it by extending a Filter instead and changing the process function to batchFinished(). I am posting this answer as I have not found a custom filter example anywhere else.
#Override
public boolean batchFinished() throws Exception {
if(isFirstBatchDone()) {
invert = true;
}
if (getInputFormat() == null)
throw new NullPointerException("No input instance format defined");
Instances inst = getInputFormat();
ArrayList<Instances> partitionsA = partition(inst, a);
ArrayList<Instances> partitions = new ArrayList<Instances>();
for(Instances data: partitionsA) {
partitions.addAll(partition(data, b));
}
private void getTestSet(List<Instances> insts) throws Exception {
for(Instances inst: insts) {
Resample filter = new Resample();
filter.setRandomSeed(seed);
filter.setNoReplacement(true);
filter.setInvertSelection(invert);
filter.setSampleSizePercent(sampleSizePercent);
filter.setInputFormat(inst);
Instances curr = Filter.useFilter(inst, filter);
System.out.println(inst.size() + " " + curr.size());
curr.forEach((i) -> push(i));
}
}
#Override
public boolean setInputFormat(Instances arg) throws Exception {
super.setInputFormat(arg);
Instances outputFormat = new Instances(arg, 0);
setOutputFormat(outputFormat);
return true;
}

What is an example of proper usage of the libSVM library functions?

I need to see some example code in java so that i can figure out the proper functioning of the various methods defined in the library.Also how to pass various necessary parameters.
some of them are
svm_predict
svm_node
svm_problem etc.
I have done a lot of googling and i still haven't found something substantial. And the documentation for java is another major disappointment. please help me out!!
here is some code that i have written so far.
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import libsvm.*;
import libsvm.svm_node;
import java.io.IOException;
import java.io.InputStream;
public class trial {
public static void main(String[] args) throws IOException {
svm temp = new svm();
svm_model model;
model = svm.svm_load_model("C:\\Users\\sidharth\\Desktop\\libsvm-3.18\\windows\\svm- m.model");
svm_problem prob = new svm_problem();
prob.l = trial.countLines("C:\\Users\\sidharth\\Desktop\\libsvm-3.18\\windows\\svm-ml.test");
prob.y = new double[prob.l];
int i;
for(i=0;i<prob.l;i++)
{
prob.y[i]=0.0;
}
prob.x = new svm_node[prob.l][];
temp.svm_predict(model, /*what to put here*/);
}
public static int countLines(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
}
I already have a model file created and i want to predict a sample data using this model file. I have given prob.y[] a label of 0 .
Any example code that has been written by you will be of great help.
P.S. I am supposed to make an SVM based POS tagger. that is why i have tagged nlp.
Its pretty straightforward.
For training you could write something of the form:
svm_train training = new svm_train();
String[] options = new String[7];
options [0] = "-c";
options [1] = "1";
options [2] = "-t";
options [3] = "0"; //linear kernel
options [4] = "-v";
options [5] = "10"; //10 fold cross-validation
options [6] = your_training_filename;
training.run(options);
If you choose to save the model. Then you can retrieve it by
libsvm.svm_model model = training.getModel();
If you wish to test the model on test data, you could write something of the form:
BufferedReader input = new BufferedReader(new FileReader(test_file));
DataOutputStream output = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(prediction_output_file)));
svm_predict.predict(input, output, model, 0);
Hope this helps !

Using scanner to read phrases

Hey StackOverflow Community,
So, I have this line of information from a txt file that I need to parse.
Here is an example lines:
-> date & time AC Power Insolation Temperature Wind Speed
-> mm/dd/yyyy hh:mm.ss kw W/m^2 deg F mph
Using a scanner.nextLine() gives me a String with a whole line in it, and then I pass this off into StringTokenizer, which then separates them into individual Strings using whitespace as a separator.
so for the first line it would break up into:
date
&
time
AC
Power
Insolation
etc...
I need things like "date & time" together, and "AC Power" together. Is there anyway I can specify this using a method already defined in StringTokenizer or Scanner? Or would I have to develop my own algorithm to do this?
Would you guys suggest I use some other form of parsing lines instead of Scanner? Or, is Scanner sufficient enough for my needs?
ejay
oh, this one was tricky, maybe you could build up some Trie structure with your tokens, i was bored and wrote a little class which solves your problem. Warning: it's a bit hacky, but was fun to implement.
The Trie class:
class Trie extends HashMap<String, Trie> {
private static final long serialVersionUID = 1L;
boolean end = false;
public void addToken(String strings) {
addToken(strings.split("\\s+"), 0);
}
private void addToken(String[] strings, int begin) {
if (begin == strings.length) {
end = true;
return;
}
String key = strings[begin];
Trie t = get(key);
if (t == null) {
t = new Trie();
put(key, t);
}
t.addToken(strings, begin + 1);
}
public List<String> tokenize(String data) {
String[] split = data.split("\\s+");
List<String> tokens = new ArrayList<String>();
int pos = 0;
while (pos < split.length) {
int tokenLength = getToken(split, pos, 0);
tokens.add(glue(split, pos, tokenLength));
pos += tokenLength;
}
return tokens;
}
public String glue(String[] parts, int pos, int length) {
StringBuilder sb = new StringBuilder();
sb.append(parts[pos]);
for (int i = pos + 1; i < pos + length; i++) {
sb.append(" ");
sb.append(parts[i]);
}
return sb.toString();
}
private int getToken(String[] tokens, int begin, int length) {
if (end) {
return length;
}
if (begin == tokens.length) {
return 1;
}
String key = tokens[begin];
Trie t = get(key);
if (t != null) {
return t.getToken(tokens, begin + 1, length + 1);
}
return 1;
}
}
and how to use it:
Trie t = new Trie();
t.addToken("AC Power");
t.addToken("date & time");
t.addToken("date & foo");
t.addToken("Speed & fun");
String data = "date & time AC Power Insolation Temperature Wind Speed";
List<String> tokens = t.tokenize(data);
for (String s : tokens) {
System.out.println(s);
}

HLSL: Enforce Constant Register Limit at Compile Time

In HLSL, is there any way to limit the number of constant registers that the compiler uses?
Specifically, if I have something like:
float4 foobar[300];
In a vs_2_0 vertex shader, the compiler will merrily generate the effect with more than 256 constant registers. But a 2.0 vertex shader is only guaranteed to have access to 256 constant registers, so when I try to use the effect, it fails in an obscure and GPU-dependent way at runtime. I would much rather have it fail at compile time.
This problem is especially annoying as the compiler itself allocates constant registers behind the scenes, on top of the ones I am asking for. I have to check the assembly to see if I'm over the limit.
Ideally I'd like to do this in HLSL (I'm using the XNA content pipeline), but if there's a flag that can be passed to the compiler that would also be interesting.
Based on Stringer Bell's pointing out of the Disassemble method, I have whipped up a small post-build utility to parse and check the effect. Be warned that this is not very pretty. It is designed for XNA 3.1 and requires the ServiceContainer and GraphicsDeviceService classes from the XNA WinForms sample. Pass a content directory path on the command line with no trailing slash.
class Program
{
const int maxRegisters = 256; // Sutiable for VS 2.0, not much else
static int retval = 0;
static string root;
static ContentManager content;
static void CheckFile(string path)
{
string name = path.Substring(root.Length+1, path.Length - (root.Length+1) - #".xnb".Length);
Effect effect;
try { effect = content.Load<Effect>(name); }
catch { return; } // probably not an Effect
string effectSource = effect.Disassemble(false);
int highest = -1; // highest register allocated
var matches = Regex.Matches(effectSource, #" c([0-9]+)"); // quick and dirty
foreach(Match match in matches)
{
int register = Int32.Parse(match.Groups[1].ToString());
if(register > highest)
highest = register;
}
var parameters = Regex.Matches(effectSource, #"^ *// *[a-zA-Z_0-9]+ +c([0-9]+) +([0-9]+)", RegexOptions.Multiline);
foreach(Match match in parameters)
{
int register = Int32.Parse(match.Groups[1].ToString()) + Int32.Parse(match.Groups[2].ToString()) - 1;
if(register > highest)
highest = register;
}
if(highest+1 > maxRegisters)
{
Console.WriteLine("Error: Shader \"" + name + "\" uses " + (highest+1).ToString() + " constant registers, which is TOO MANY!");
retval = 1;
}
else
{
Console.WriteLine("Shader \"" + name + "\" uses " + (highest+1).ToString() + " constant registers (OK)");
}
}
static void CheckDirectory(string path)
{
try
{
foreach(string file in Directory.GetFiles(path, #"*.xnb"))
CheckFile(file);
foreach(string dir in Directory.GetDirectories(path))
CheckDirectory(dir);
}
catch { return; } // Don't care
}
static int Main(string[] args)
{
root = args[0];
Form form = new Form(); // Dummy form for creating a graphics device
GraphicsDeviceService gds = GraphicsDeviceService.AddRef(form.Handle,
form.ClientSize.Width, form.ClientSize.Height);
ServiceContainer services = new ServiceContainer();
services.AddService<IGraphicsDeviceService>(gds);
content = new ContentManager(services, root);
CheckDirectory(root);
return retval;
}
}

Resources