Saxon. XPathSelector. XPathException: Cannot compare xs:untypedAtomic to xs:decimal - saxon

When I use saxon9-he (version 9.8.0.6) for XPath in an application on Net Framwork, there is an error "net.sf.saxon.trans.XPathException: Can not compare xs: untypedAtomic to xs: decimal".
This occurs when XPathSelector evaluates an expression using the "Evaluate" method.
The expression itself looks like this: matches ($ var1 / text (), '^ [0-9] {1,2} . [0-9] {2} $') or ($ var1 eq 100.0).
Variable "var1" is set in XPathSelector by the SetVariable method as XdmNode.
I ask you to suggest a possible way out of this situation.

The "=" operator converts an xs:untypedAtomic operand to the type of the other operand. The "eq" operator does not.
The reason for this is to make "eq" transitive, so it works sensibly for indexing, grouping, etc.
So you should either do a manual conversion:
xs:decimal($var1) eq 100.0
or use the "=" operator:
$var1 = 100.0
By the way, you're welcome to ask Saxon questions either here or on the Saxon forums, but please don't ask the same question on both: it wastes everyone's time.

Related

How to parse dot operator in language syntax?

Let's say I'm writing a parser that parses the following syntax:
foo.bar().baz = 5;
The grammar rules look something like this:
program: one or more statement
statement: expression followed by ";"
expression: one of:
- identifier (\w+)
- number (\d+)
- func call: expression "(" ")"
- dot operator: expression "." identifier
Two expressions have a problem, the func call and the dot operator. This is because the expressions are recursive and look for another expression at the start, causing a stack overflow. I will focus on the dot operator for this quesition.
We face a similar problem with the plus operator. However, rather than using an expression you would do something like this to solve it (look for a "term" instead):
add operation: term "+" term
term: one of:
- number (\d+)
- "(" expression ")"
The term then includes everything except the add operation itself. To ensure that multiple plus operators can be chained together without using parenthesis, one would rather do:
add operation: term, one or more of ("+" followed by term)
I was thinking a similar solution could for for the dot operator or for function calls.
However, the dot operator works a little differently. We always evaluate from left-to-right and need to allow full expressions so that you can do function calls etc. in-between. With parenthesis, an example might be:
(foo.bar()).baz = 5;
Unfortunately, I do not want to require parenthesis. This would end up being the case if following the method used for the plus operator.
How could I go about implementing this?
Currently my parser never peeks ahead, but even if I do look ahead, it still seems tricky to accomplish.
The easy solution would be to use a bottom-up parser which doesn't drop into a bottomless pit on left recursion, but I suppose you have already rejected that solution.
I don't understand your objection to using a looping construct, though. Postfix modifiers like field lookup and function call are not really different from binary operators like addition (except, of course, for the fact that they will not need to claim an eventual right operand). Plus and minus intermingle freely, which you can parse with a repetition like:
additive: term ( '+' term | '-' term )*
Similarly, postfix modifiers can be easily parsed with something like:
postfixed: atom ( '.' ID | '(' opt-expr-list `)` )*
I'm using a form of extended BNF: parentheses group; | separates alternatives and binds less stringly than concatenation; and * means "zero or more repetitions" of the atom on its left.
Another postfix operator which falls into the same category is array/map subscripting ('[' expr ']'), although you might also have other postfix operators.
Note that like the additive syntax above, selecting the appropriate alternative does not require looking beyond the next token. It's hard to parse without being able to peek one token into the future. Fortunately, that's very little overhead.
One way could be for the dot operator to parse a non-dot expression, that is, a rule that is the same as expression but without the dot operator. This prevents recursion.
Then, when the non-dot expression has been parsed, check if a dot and an identifier follows. If this is not the case, we are done. If this is the case, wrap the current node up in a dot operation node. Then, keep track of the entire string text that has been parsed for this operation so far. Then revert everything back to before the operation was being parsed, and now re-parse a "custom expression", where the first directly-nested expression would really be trying to match the exact string that was parsed before rather than a real expression. Repeat until there are no more dot-identifier pairs (this should happen automatically by the new "custom expression").
This is messy, complicated and possibly slow, and I'm not entirely sure if it'll work but I'll try it out. I'd appreciate alternative solutions.

Mathematica and Latex

I am constantly using the mathematica software and using TeXForm command to go back and forth between the calculations and the latex document I'm typesetting. However, mathematica won't allow me to define variables with underscore, which I constantly need in my latex document. Does anybody know how to create variables with "smarter" names in mathematica?
In a broader sense, what is the best way to integrate the use of mathematica and latex?
Thanks.
first of all, Mathematica allows you to define variables with underscore.
Subscript[x, 1] = 3
The shortcut for this ist [ctr]+[_]
If you convert a subscript variable with TeXForm, you'll get:
x_1
I prefer to not use the subscript notation for normal variables, because you can not easily see if a variable has allready a value in this notation. So you might just write
x1
We now want to transform these kind of variable names to the subscript notation in TeXForm.
One way to do this is with StringPattern.
1.Transform your expression to a String in TeXForm:
In[360]:= ToString[(-b+y1) ((b-y1)/(b-y2))^(-(w10/(x\[Gamma]1-\[Omega]2))), TeXForm]
Out[360]= (\text{y1}-b) \left(\frac{b-\text{y1}}{b-\text{y2}}\right)^{-\frac{\text{w10}}{\text{x$\gamma $1}-\text{$\omega $2}}}
2.Replace this specific String Pattern to the subscript notation of LaTeX:
In[361]:= StringReplace[%, "\\text{"~~name_?LetterQ~~index_?DigitQ~~"}":> name<>"_"<>index]
Out[361]= (y_1-b) \left(\frac{b-y_1}{b-y_2}\right)^{-\frac{\text{w10}}{\text{x$\gamma $1}-\text{$\omega $2}}}
You might have noticed, that this replacement just worked on the variablenames that consists of just one letter and one digit. Longer variable names will be ignored. This is because the StringPattern "_" stands just for ohne character, for a sequence of characters, use "__", but we have to make shure, that we match with the Shortest possible sequence. To catch the longer variable names we apply another string replacement:
In[362]:= StringReplace[%,
"\\text{"~~Shortest[name__]~~Shortest[index__?DigitQ]~~"}":> "\\text{"<>name<>"}_{"<>index<>"}"]
Out[362]= (y_1-b) \left(\frac{b-y_1}{b-y_2}\right)^{-\frac{\text{w}_{10}}{\text{x$\gamma $}_{1}-\text{$\omega $}_{2}}}
Now all variables appear to be in the correct LaTeX-notation for subscript variables. But some of the "\text{}"s and "{}"s are obsolet now, due to single letters or digits, inside.
To optimize the LaTeX code, we can add further repacements:
In[371]:= StringReplace[%, "{" ~~ i_?DigitQ ~~ "}" :> i];
StringReplace[%, "\\text{" ~~ name_?LetterQ ~~ "}" :> name]
Out[372]= (y_1-b) \left(\frac{b-y_1}{b-y_2}\right)^{-\frac{w_{10}}{\text{x$\gamma $}_1-\text{$\omega $}_2}}
Now i think the TeX looks good enough, so we can define a function that does all the replacements in one step:
In[506]:=
ClearAll[myTeXForm]
SetAttributes[myTeXForm, HoldFirst]
myTeXForm[expr_] := Fold[StringReplace, ToString[HoldPattern[expr], TeXForm],
{"\\text{HoldPattern}\\left[" ~~ str__ ~~ "\\right]" ~~ EndOfString :> str,
"\\text{" ~~ Shortest[str__] ~~ Shortest[i__?DigitQ] ~~ "}" :>
"\\text{" <> str <> "}_{" <> i <> "}",
{"{" ~~ i_?DigitQ ~~ "}" :> i, "\\text{" ~~ s_?LetterQ ~~ "}" :> s}}]
Testing the function:
b=134;
myTeXForm[(-b+y1) ((b-y1)/(b-y2))^(-(w10/(x\[Gamma]13-\[Omega]2)))]
Out[510]= (y_1-b) \left(\frac{b-y_1}{b-y_2}\right)^{-\frac{w_{10}}{\text{x$\gamma $}_{13}-\text{$\omega $}_2}}
Note that i used a little trick to protect the function agains its argument values. In this example the variable b has allready the value 134, but in the TeX Output it should still apear as "b". To do so i added the Attribut HoldFirst to our function and used HoldPattern inside. Maybe one can do this easier, but it works fine.
Hope this might inspire you.
Best regards.

why erlang not supporting expressions like : mysum(32)(1)

I am new in Eralng . get a little query about applying functions
assumming got a funciton defined :
mysum(X) -> fun(Y)-> X + Y end.
then try to calling like this
mysum(32)(332)
getting error
* 1: syntax error before: '('
so I had to
apply(mysum(32),[333])
or
M = mysum(32), M(333)
but I would like to know a little bit more , why it is not supporting , what is the disadvantage
As you expected, mysum return a function. you must enclose the evaluation inside parenthesis to satisfy the erlang parser:
(mysum(32))(332)
this spelling is obviously not ambiguous.
Your expression seems not ambiguous because you know that mysum(32) is a function, but the types are solved at run time in erlang, so the parser has no idea of what is mysum(32), it is expecting some help here to know what it has to do: the parenthesis, the apply or the intermediate variables, but it could be an operator or a separator.

string format check

Suppose I have string variables like following:
s1="10$"
s2="10$ I am a student"
s3="10$Good"
s4="10$ Nice weekend!"
As you see above, s2 and s4 have white space(s) after 10$ .
Generally, I would like to have a way to check if a string start with 10$ and have white-space(s) after 10$ . For example, The rule should find s2 and s4 in my above case. how to define such rule to check if a string start with '10$' and have white space(s) after?
What I mean is something like s2.RULE? should return true or false to tell if it is the matched string.
---------- update -------------------
please also tell the solution if 10# is used instead of 10$
You can do this using Regular Expressions (Ruby has Perl-style regular expressions, to be exact).
# For ease of demonstration, I've moved your strings into an array
strings = [
"10$",
"10$ I am a student",
"10$Good",
"10$ Nice weekend!"
]
p strings.find_all { |s| s =~ /\A10\$[ \t]+/ }
The regular expression breaks down like this:
The / at the beginning and the end tell Ruby that everything in between is part of the regular expression
\A matches the beginning of a string
The 10 is matched verbatim
\$ means to match a $ verbatim. We need to escape it since $ has a special meaning in regular expressions.
[ \t]+ means "match at least one blank and/or tab"
So this regular expressions says "Match every string that starts with 10$ followed by at least one blank or tab character". Using the =~ you can test strings in Ruby against this expression. =~ will return a non-nil value, which evaluates to true if used in a conditional like if.
Edit: Updated white space matching as per Asmageddon's suggestion.
this works:
"10$ " =~ /^10\$ +/
and returns either nil when false or 0 when true. Thanks to Ruby's rule, you can use it directly.
Use a regular expression like this one:
/10\$\s+/
EDIT
If you use =~ for matching, note that
The =~ operator returns the character position in the string of the
start of the match
So it might return 0 to denote a match. Only a return of nil means no match.
See for example http://www.regular-expressions.info/ruby.html on a regular expression tutorial for ruby.
If you want to proceed to cases with $ and # then try this regular expression:
/^10[\$#] +/

REBOL path operator vs division ambiguity

I've started looking into REBOL, just for fun, and as a fan of programming languages, I really like seeing new ideas and even just alternative syntaxes. REBOL is definitely full of these. One thing I noticed is the use of '/' as the path operator which can be used similarly to the '.' operator in most object-oriented programming languages. I have not programmed in REBOL extensively, just looked at some examples and read some documentation, but it isn't clear to me why there's no ambiguity with the '/' operator.
x: 4
y: 2
result: x/y
In my example, this should be division, but it seems like it could just as easily be the path operator if x were an object or function refinement. How does REBOL handle the ambiguity? Is it just a matter of an overloaded operator and the type system so it doesn't know until runtime? Or is it something I'm missing in the grammar and there really is a difference?
UPDATE Found a good piece of example code:
sp: to-integer (100 * 2 * length? buf) / d/3 / 1024 / 1024
It appears that arithmetic division requires whitespace, while the path operator requires no whitespace. Is that it?
This question deserves an answer from the syntactic point of view. In Rebol, there is no "path operator", in fact. The x/y is a syntactic element called path. As opposed to that the standalone / (delimited by spaces) is not a path, it is a word (which is usually interpreted as the division operator). In Rebol you can examine syntactic elements like this:
length? code: [x/y x / y] ; == 4
type? first code ; == path!
type? second code
, etc.
The code guide says:
White-space is used in general for delimiting (for separating symbols).
This is especially important because words may contain characters such as + and -.
http://www.rebol.com/r3/docs/guide/code-syntax.html
One acquired skill of being a REBOler is to get the hang of inserting whitespace in expressions where other languages usually do not require it :)
Spaces are generally needed in Rebol, but there are exceptions here and there for "special" characters, such as those delimiting series. For instance:
[a b c] is the same as [ a b c ]
(a b c) is the same as ( a b c )
[a b c]def is the same as [a b c] def
Some fairly powerful tools for doing introspection of syntactic elements are type?, quote, and probe. The quote operator prevents the interpreter from giving behavior to things. So if you tried something like:
>> data: [x [y 10]]
>> type? data/x/y
>> probe data/x/y
The "live" nature of the code would dig through the path and give you an integer! of value 10. But if you use quote:
>> data: [x [y 10]]
>> type? quote data/x/y
>> probe quote data/x/y
Then you wind up with a path! whose value is simply data/x/y, it never gets evaluated.
In the internal representation, a PATH! is quite similar to a BLOCK! or a PAREN!. It just has this special distinctive lexical type, which allows it to be treated differently. Although you've noticed that it can behave like a "dot" by picking members out of an object or series, that is only how it is used by the DO dialect. You could invent your own ideas, let's say you make the "russell" command:
russell [
x: 10
y: 20
z: 30
x/y/z
(
print x
print y
print z
)
]
Imagine that in my fanciful example, this outputs 30, 10, 20...because what the russell function does is evaluate its block in such a way that a path is treated as an instruction to shift values. So x/y/z means x=>y, y=>z, and z=>x. Then any code in parentheses is run in the DO dialect. Assignments are treated normally.
When you want to make up a fun new riff on how to express yourself, Rebol takes care of a lot of the grunt work. So for example the parentheses are guaranteed to have matched up to get a paren!. You don't have to go looking for all that yourself, you just build your dialect up from the building blocks of all those different types...and hook into existing behaviors (such as the DO dialect for basics like math and general computation, and the mind-bending PARSE dialect for some rather amazing pattern matching muscle).
But speaking of "all those different types", there's yet another weirdo situation for slash that can create another type:
>> type? quote /foo
This is called a refinement!, and happens when you start a lexical element with a slash. You'll see it used in the DO dialect to call out optional parameter sets to a function. But once again, it's just another symbolic LEGO in the parts box. You can ascribe meaning to it in your own dialects that is completely different...
While I didn't find any written definitive clarification, I did also find that +,-,* and others are valid characters in a word, so clearly it requires a space.
x*y
Is a valid identifier
x * y
Performs multiplication. It looks like the path operator is just another case of this.

Resources