Mathematica and Latex - latex

I am constantly using the mathematica software and using TeXForm command to go back and forth between the calculations and the latex document I'm typesetting. However, mathematica won't allow me to define variables with underscore, which I constantly need in my latex document. Does anybody know how to create variables with "smarter" names in mathematica?
In a broader sense, what is the best way to integrate the use of mathematica and latex?
Thanks.

first of all, Mathematica allows you to define variables with underscore.
Subscript[x, 1] = 3
The shortcut for this ist [ctr]+[_]
If you convert a subscript variable with TeXForm, you'll get:
x_1
I prefer to not use the subscript notation for normal variables, because you can not easily see if a variable has allready a value in this notation. So you might just write
x1
We now want to transform these kind of variable names to the subscript notation in TeXForm.
One way to do this is with StringPattern.
1.Transform your expression to a String in TeXForm:
In[360]:= ToString[(-b+y1) ((b-y1)/(b-y2))^(-(w10/(x\[Gamma]1-\[Omega]2))), TeXForm]
Out[360]= (\text{y1}-b) \left(\frac{b-\text{y1}}{b-\text{y2}}\right)^{-\frac{\text{w10}}{\text{x$\gamma $1}-\text{$\omega $2}}}
2.Replace this specific String Pattern to the subscript notation of LaTeX:
In[361]:= StringReplace[%, "\\text{"~~name_?LetterQ~~index_?DigitQ~~"}":> name<>"_"<>index]
Out[361]= (y_1-b) \left(\frac{b-y_1}{b-y_2}\right)^{-\frac{\text{w10}}{\text{x$\gamma $1}-\text{$\omega $2}}}
You might have noticed, that this replacement just worked on the variablenames that consists of just one letter and one digit. Longer variable names will be ignored. This is because the StringPattern "_" stands just for ohne character, for a sequence of characters, use "__", but we have to make shure, that we match with the Shortest possible sequence. To catch the longer variable names we apply another string replacement:
In[362]:= StringReplace[%,
"\\text{"~~Shortest[name__]~~Shortest[index__?DigitQ]~~"}":> "\\text{"<>name<>"}_{"<>index<>"}"]
Out[362]= (y_1-b) \left(\frac{b-y_1}{b-y_2}\right)^{-\frac{\text{w}_{10}}{\text{x$\gamma $}_{1}-\text{$\omega $}_{2}}}
Now all variables appear to be in the correct LaTeX-notation for subscript variables. But some of the "\text{}"s and "{}"s are obsolet now, due to single letters or digits, inside.
To optimize the LaTeX code, we can add further repacements:
In[371]:= StringReplace[%, "{" ~~ i_?DigitQ ~~ "}" :> i];
StringReplace[%, "\\text{" ~~ name_?LetterQ ~~ "}" :> name]
Out[372]= (y_1-b) \left(\frac{b-y_1}{b-y_2}\right)^{-\frac{w_{10}}{\text{x$\gamma $}_1-\text{$\omega $}_2}}
Now i think the TeX looks good enough, so we can define a function that does all the replacements in one step:
In[506]:=
ClearAll[myTeXForm]
SetAttributes[myTeXForm, HoldFirst]
myTeXForm[expr_] := Fold[StringReplace, ToString[HoldPattern[expr], TeXForm],
{"\\text{HoldPattern}\\left[" ~~ str__ ~~ "\\right]" ~~ EndOfString :> str,
"\\text{" ~~ Shortest[str__] ~~ Shortest[i__?DigitQ] ~~ "}" :>
"\\text{" <> str <> "}_{" <> i <> "}",
{"{" ~~ i_?DigitQ ~~ "}" :> i, "\\text{" ~~ s_?LetterQ ~~ "}" :> s}}]
Testing the function:
b=134;
myTeXForm[(-b+y1) ((b-y1)/(b-y2))^(-(w10/(x\[Gamma]13-\[Omega]2)))]
Out[510]= (y_1-b) \left(\frac{b-y_1}{b-y_2}\right)^{-\frac{w_{10}}{\text{x$\gamma $}_{13}-\text{$\omega $}_2}}
Note that i used a little trick to protect the function agains its argument values. In this example the variable b has allready the value 134, but in the TeX Output it should still apear as "b". To do so i added the Attribut HoldFirst to our function and used HoldPattern inside. Maybe one can do this easier, but it works fine.
Hope this might inspire you.
Best regards.

Related

Match Symbol specific number of times

When defining a syntax, it is possible to match 1 or more times (+) or 0 or more times (*) similarly to how it is done in regex. However, I have not found in the rascal documentation if it is possible to also match a Symbol a specific amount of times. In regex (and Rascal patterns) this is done with an integer between two curly brackets but this doesn't seem to work for syntax definition. Ideally, I'd want something like:
lexical Line = [0-9.]+;
syntax Sym = sym: {Line Newline}{5};
Which would only try to match the first 5 lines of the text below:
..0..
11.11
44.44
1.11.1
33333
55555
No this meta syntax does not exist in Rascal. We did not add it.
You could write an over-estimation like this and have a post-parse filter reject more than 5 items:
syntax Sym = fiveLines: (Line NewLine)+ lines
visit (myParseTree) {
case (Sym) `<(Line NewLine)+ lines>` :
throw ParseError(x.src) when length(lines) != 5;
}
Or unfold the loop like so:
syntax Sym
= Line NewLine
Line NewLine
Line NewLine
Line NewLine
Line NewLine
;
Repetition with an integer parameter sounds like a good feature request for us the consider, if you need it badly. We only have to consider what it means for Rascal's type-system; for the parser generator its a simple rule to add.

When to use \...\m in regular expressions?

If I have this string:
st = "The important thing is not to stop questioning.
Curiosity has its own reason for existing.
Never lose a holy curiosity."
and I want to match "Curiosity" using a regular expression, can I use
/Curiosity/m === st
When do you typically make use of \...\m ?
Thank you very much, I appreciate it!
No, you do not need it for your example case. m is a modifier that allows the dot (that means by default any character except the newline) to match also newlines.
Note that this meaning of the m modifier is specific to ruby and its regex engine, in other languages, that uses other regex engines, the modifer m has a different meaning.
Examples:
/a.*b/ matches "a#123opi[b"
but it doesn't match "a#123
opi[b"
because by default the dot . doesn't match a newline.
/a.*b/m does because the m modifier changes the meaning of the dot and allows it to match newlines.

Why are redundant parenthesis not allowed in syntax definitions?

This syntax module is syntactically valid:
module mod1
syntax Empty =
;
And so is this one, which should be an equivalent grammar to the previous one:
module mod2
syntax Empty =
( )
;
(The resulting parser accepts only empty strings.)
Which means that you can make grammars such as this one:
module mod3
syntax EmptyOrKitchen =
( ) | "kitchen"
;
But, the following is not allowed (nested parenthesis):
module mod4
syntax Empty =
(( ))
;
I would have guessed that redundant parenthesis are allowed, since they are allowed in things like expressions, e.g. ((2)) + 2.
This problem came up when working with the data types for internal representation of rascal syntax definitions. The following code will create the same module as in the last example, namely mod4 (modulo some whitespace):
import Grammar;
import lang::rascal::format::Grammar;
str sm1 = definition2rascal(\definition("unknown_main",("the-module":\module("unknown",{},{},grammar({sort("Empty")},(sort("Empty"):prod(sort("Empty"),[
alt({seq([])})
],{})))))));
The problematic part of the data is on its own line - alt({seq([])}). If this code is changed to seq([]), then you get the same syntax module as mod2. If you further delete this whole expression, i.e. so that you get this:
str sm3 =
definition2rascal(\definition("unknown_main",("the-module":\module("unknown",{},{},grammar({sort("Empty")},(sort("Empty"):prod(sort("Empty"),[
], {})))))));
Then you get mod1.
So should such redundant parenthesis by printed by the definition2rascal(...) function? And should it matter with regards to making the resulting module valid or not?
Why they are not allowed is basically we wanted to see if we could do without. There is currently no priority relation between the symbol kinds, so in general there is no need to have a bracket syntax (like you do need to + and * in expressions).
Already the brackets have two different semantics, one () being the epsilon symbol and two (Sym1 Sym2 ...) being a nested sequence. This nested sequence is defined (syntactically) to expect at least two symbols. Now we could without ambiguity introduce a third semantics for the brackets with a single symbol or relax the requirement for sequence... But we reckoned it would be confusing that in one case you would get an extra layer in the resulting parse tree (sequence), while in the other case you would not (ignored superfluous bracket).
More detailed wise, the problem of printing seq([]) is not so much a problem of the meta syntax but rather that the backing abstract notation is more relaxed than the concrete notation (i.e. it is a bigger language or an over-approximation). The parser generator will generate a working parser for seq([]). But, there is no Rascal notation for an empty sequence and I guess the pretty printer should throw an exception.

REBOL path operator vs division ambiguity

I've started looking into REBOL, just for fun, and as a fan of programming languages, I really like seeing new ideas and even just alternative syntaxes. REBOL is definitely full of these. One thing I noticed is the use of '/' as the path operator which can be used similarly to the '.' operator in most object-oriented programming languages. I have not programmed in REBOL extensively, just looked at some examples and read some documentation, but it isn't clear to me why there's no ambiguity with the '/' operator.
x: 4
y: 2
result: x/y
In my example, this should be division, but it seems like it could just as easily be the path operator if x were an object or function refinement. How does REBOL handle the ambiguity? Is it just a matter of an overloaded operator and the type system so it doesn't know until runtime? Or is it something I'm missing in the grammar and there really is a difference?
UPDATE Found a good piece of example code:
sp: to-integer (100 * 2 * length? buf) / d/3 / 1024 / 1024
It appears that arithmetic division requires whitespace, while the path operator requires no whitespace. Is that it?
This question deserves an answer from the syntactic point of view. In Rebol, there is no "path operator", in fact. The x/y is a syntactic element called path. As opposed to that the standalone / (delimited by spaces) is not a path, it is a word (which is usually interpreted as the division operator). In Rebol you can examine syntactic elements like this:
length? code: [x/y x / y] ; == 4
type? first code ; == path!
type? second code
, etc.
The code guide says:
White-space is used in general for delimiting (for separating symbols).
This is especially important because words may contain characters such as + and -.
http://www.rebol.com/r3/docs/guide/code-syntax.html
One acquired skill of being a REBOler is to get the hang of inserting whitespace in expressions where other languages usually do not require it :)
Spaces are generally needed in Rebol, but there are exceptions here and there for "special" characters, such as those delimiting series. For instance:
[a b c] is the same as [ a b c ]
(a b c) is the same as ( a b c )
[a b c]def is the same as [a b c] def
Some fairly powerful tools for doing introspection of syntactic elements are type?, quote, and probe. The quote operator prevents the interpreter from giving behavior to things. So if you tried something like:
>> data: [x [y 10]]
>> type? data/x/y
>> probe data/x/y
The "live" nature of the code would dig through the path and give you an integer! of value 10. But if you use quote:
>> data: [x [y 10]]
>> type? quote data/x/y
>> probe quote data/x/y
Then you wind up with a path! whose value is simply data/x/y, it never gets evaluated.
In the internal representation, a PATH! is quite similar to a BLOCK! or a PAREN!. It just has this special distinctive lexical type, which allows it to be treated differently. Although you've noticed that it can behave like a "dot" by picking members out of an object or series, that is only how it is used by the DO dialect. You could invent your own ideas, let's say you make the "russell" command:
russell [
x: 10
y: 20
z: 30
x/y/z
(
print x
print y
print z
)
]
Imagine that in my fanciful example, this outputs 30, 10, 20...because what the russell function does is evaluate its block in such a way that a path is treated as an instruction to shift values. So x/y/z means x=>y, y=>z, and z=>x. Then any code in parentheses is run in the DO dialect. Assignments are treated normally.
When you want to make up a fun new riff on how to express yourself, Rebol takes care of a lot of the grunt work. So for example the parentheses are guaranteed to have matched up to get a paren!. You don't have to go looking for all that yourself, you just build your dialect up from the building blocks of all those different types...and hook into existing behaviors (such as the DO dialect for basics like math and general computation, and the mind-bending PARSE dialect for some rather amazing pattern matching muscle).
But speaking of "all those different types", there's yet another weirdo situation for slash that can create another type:
>> type? quote /foo
This is called a refinement!, and happens when you start a lexical element with a slash. You'll see it used in the DO dialect to call out optional parameter sets to a function. But once again, it's just another symbolic LEGO in the parts box. You can ascribe meaning to it in your own dialects that is completely different...
While I didn't find any written definitive clarification, I did also find that +,-,* and others are valid characters in a word, so clearly it requires a space.
x*y
Is a valid identifier
x * y
Performs multiplication. It looks like the path operator is just another case of this.

Create a Print Function

I'm learning Bison and at this time the only thing that I do was the rpcalc example, but now I want to implement a print function(like printf of C), but I don't know how to do this and I'm planning to have a syntax like this print ("Something here");, but I don't know how to build the print function and I don't know how to create that ; as a end of line. Thanks for your help.
You first need to ask yourself:
What are the [sub-]parts of my 'print ("something");' syntax ?
Once you identify these parts, "simply" describe them in the form of grammar syntax rules, along with applicable production rules. And then let Bison generate the parser for you; that's about it.
To put you on your way:
The semi-column is probably a element you will use to separate statemements (such a one "call" to print from another).
'print' itself is probably a keyword, or preferably a native function name of your language.
The print statement appears to take a literal string as [one of] its arguments. a literal string starts and ends with a double quote (and probably allow for escaped quotes within itself)
etc.
The bolded and italic expressions above are some of the entities (the 'symbols' in parser lingo) you'll likely need to define in the syntax for your language. For that you'll use Bison grammar rules, such as
stmt : print_stmt ';' | input_stmt ';'| some_other_stmt ';' ;
prnt_stmt : print '(' args ')'
{ printf( $3 ); }
;
args : arg ',' args;
...
Since the question asked about the semi-column, maybe some confusion was from the different uses thereof; see for example above how the ';' belong to your language's syntax whereby the ; (no quotes) at the end of each grammar rule are part of Bison's language.
Note: this is of course a simplistic implementation, aimed at showing the essential. Also the Bison syntax may be a tat off (been there / done it, but a long while back ;-) I then "met" ANTLR never to return to Bison, although I do see how its lightweight and fully self contained nature can make it appropriate in some cases)

Resources