How to make a system of equation in OpenOffice?
f(%ksi) = left lbrace stack #(2 (x-a)) over ((b-a)(c-a))# #(2(b-x)) over
((b-a)(b-c)# right none
It doesn't not work.
Is this what you need?
f(%ksi) = left lbrace stack {
{ (2 (x-a)) over ((b-a)(c-a)) } #
{ (2 (b-x)) over ((b-a)(b-c)) }
} right none
resulting in
(tested with LibreOffice Math 5.0.1.2)
Related
Hi I'm trying to build a regular expression builder to detect 2 or more spaces or a tab, so (let twoOrMoreSpacesOrTab = /\s{2,}|\t/)
How to build this using a Regex Builder?
I tried this but its not 100% acurate:
ChoiceOf {
OneOrMore(" ")
One("\t")
}
The problem here is that is trying to match multiples of 2 white spaces, and I want to consume the whole thing.
This might work.
let twoOrMoreSpacesOrTab = Regex {
ChoiceOf {
Repeat(2...) {
One(.whitespace)
}
One("\t")
}
}
I'm giving Kotlin a go; coding contently, I have an ArrayList of chars which i want to classify depending on how brackets are matched:
(abcde) // ok characters other than brackets can go anywhere
)abcde( // ok matching the brackets 'invertedly' are ok
(({()})) // ok
)()()([] // ok
([)] // bad can't have different overlapping bracket pairs
((((( // bad all brackets need to have a match
My solution comes out(recursive):
//charList is a property
//Recursion starter'upper
private fun classifyListOfCharacters() : Boolean{
var j = 0
while (j < charList.size ) {
if (charList[j].isBracket()){
j = checkMatchingBrackets(j+1, charList[j])
}
j++
}
return j == commandList.size
}
private fun checkMatchingBrackets(i: Int, firstBracket :Char) : Int{
var j = i
while (j < charList.size ) {
if (charList[j].isBracket()){
if (charList[j].matchesBracket(firstBracket)){
return j //Matched bracket normal/inverted
}
j = checkMatchingBrackets(j+1, charList[j])
}
j++
}
return j
}
This works, but is this how you do it in Kotlin? It feels like I've coded java in Kotlin syntax
Found this Functional languages better at recursion, I've tried thinking in terms of manipulating functions and sending them down the recursion but to no avail. I'd be glad to be pointed in the right direction, code, or some pseudo-code of a possible refactoring.
(Omitted some extension methods regarding brackets, I think it's clear what they do)
Another, possibly a simpler approach to this problem is maintaining a stack of brackets while you iterate over the characters.
When you encounter another bracket:
If it matches the top of the stack, you pop the top of the stack;
If it does not match the top of the stack (or the stack is empty), you push it onto the stack.
If any brackets remain on the stack at the end, it means they are unmatched, and the answer is false. If the stack ends up empty, the answer is true.
This is correct, because a bracket at position i in a sequence can match another one at position j, only if there's no unmatched bracket of a different kind between them (at position k, i < k < j). The stack algorithm simulates exactly this logic of matching.
Basically, this algorithm could be implemented in a single for-loop:
val stack = Stack<Char>()
for (c in charList) {
if (!c.isBracket())
continue
if (stack.isNotEmpty() && c.matchesBracket(stack.peek())) {
stack.pop()
} else {
stack.push(c)
}
}
return stack.isEmpty()
I've reused your extensions c.isBracket(...) and c.matchesBracket(...). The Stack<T> is a JDK class.
This algorithm hides the recursion and the brackets nesting inside the abstraction of the brackets stack. Compare: your current approach implicitly uses the function call stack instead of the brackets stack, but the purpose is the same: it either finds a match for the top character or makes a deeper recursive call with another character on top.
Hotkey's answer (using a for loop) is great. However, you asked for an optimized recursion solution. Here is an optimized tail recursive function (Note the tailrec modifier before the function):
tailrec fun isBalanced(input: List<Char>, stack: Stack<Char>): Boolean = when {
input.isEmpty() -> stack.isEmpty()
else -> {
val c = input.first()
if (c.isBracket()) {
if (stack.isNotEmpty() && c.matchesBracket(stack.peek())) {
stack.pop()
} else {
stack.push(c)
}
}
isBalanced(input.subList(1, input.size), stack)
}
}
fun main(args: Array<String>) {
println("check: ${isBalanced("(abcde)".toList(), Stack())}")
}
This function calls itself until the input becomes empty and returns true if the stack is empty when the input becomes empty.
If we look at the decompiled Java equivalent of the generated bytecode, this recursion has been optimized to an efficient while loop by the compiler so we won't get StackOverflowException (removed Intrinsics null checks):
public static final boolean isBalanced(#NotNull String input, #NotNull Stack stack) {
while(true) {
CharSequence c = (CharSequence)input;
if(c.length() == 0) {
return stack.isEmpty();
}
char c1 = StringsKt.first((CharSequence)input);
if(isBracket(c1)) {
Collection var3 = (Collection)stack;
if(!var3.isEmpty() && matchesBracket(c1, ((Character)stack.peek()).charValue())) {
stack.pop();
} else {
stack.push(Character.valueOf(c1));
}
}
input = StringsKt.drop(input, 1);
}
}
I have written a flex lexer to handle the text in BYOND's .dmi file format. The contents inside are (key, value) pairs delimited by '='. Valid keys are all essentially keywords (such as "width"), and invalid keys are not errors: they are just ignored.
Interestingly, the current state of BYOND's .dmi parser uses everything prior to the '=' as its keyword, and simply ignores any excess junk. This means "\twidth123" is recognized as "width".
The crux of my problem is in allowing for this irregularity. In doing so my generated lexer expands from ~40-50KB to ~13-14MB. For reference, I present the following contrived example:
%option c++ noyywrap
fill [^=#\n]*
%%
{fill}version{fill} { return 0; }
{fill}width{fill} { return 0; }
{fill}height{fill} { return 0; }
{fill}state{fill} { return 0; }
{fill}dirs{fill} { return 0; }
{fill}frames{fill} { return 0; }
{fill}delay{fill} { return 0; }
{fill}loop{fill} { return 0; }
{fill}rewind{fill} { return 0; }
{fill}movement{fill} { return 0; }
{fill}hotspot{fill} { return 0; }
%%
fill is the rule that is used to merge the keywords with "anything before the =". Running flex on the above yields a ~13MB lex.yy.cc on my computer. Simply removing the kleene star (*) in the fill rule yields a 45KB lex.yy.cc file; however, obviously, this then makes the lexer incorrect.
Are there any tricks, flex options, or lexer hacks to avoid this insane expansion? The only things I can think of are:
Disallow "width123" to represent "width", which is undesirable as then technically-correct files could not be parsed.
Make one rule that is simply [^=\n]+ to return some identifier token, and pick out the keyword in the parser. This seems suboptimal to me as well, particularly because different keywords have different value types and it seems most natural to be able to handle "'width' '=' INT" and "'version' '=' FLOAT" in the parser instead of "ID '=' VALUE" followed by picking out the keyword in the identifier, making sure the value is of the right type, etc.
I could make the rule {fill}(width|height|version|...){fill}, which does indeed keep the generated file small. However, while regular expression parsers tend to produce "captures," flex just gives me yytext and re-parsing that for a keyword to produce the desired token seems to be very undesirable in terms of algorithmic complexity.
Make fill a separate rule of its own that does nothing, and remove it from all the other rules, and separate its definition from whitespace for clarity:
whitespace [ \t\f]
fill [^#=\n]
%%
{whitespace}+ ;
{fill}+ ;
I would probably also avoid building the keywords into the lexer and just use an identifier [a-zA-Z]+ rule that does a table lookup. And finally add a rule to catch the =:
. return yytext[0];
to let the parser handle all special characters.
This is not really a problem flex is "good at", but it can be solved if it is precisely defined. In particular, it is important to know which of the keywords should be returned if the random string of letters before the = contains more than one keyword. For example, suppose the input is:
garbage_widtheight_moregarbage = 42
Now, is that setting the width or the height?
Remember that flex scanners will choose the rule with longest match, and of rules with equally long matches, the first one in the lexical description.
So the model presented in the OP:
fill [^=#\n]*
%%
{fill}width{fill} { return 0; }
{fill}height{fill} { return 0; }
/* SNIP */
will always prefer width to height, because the matches will be the same length (both terminate at the last character before the =), and the width pattern comes first in the file. If the rules were written in the opposite order, height would be preferred.
On the other hand, if you removed the second {fill}:
{fill}width{fill} { return 0; }
{fill}height{fill} { return 0; }
then the last keyword in the input (in this case, height) will be preferred, because that one has the longer match.
The most likely requirement, however, is that the first keyword be recognized, so neither of the preceding will work. In order to match the first keyword, it is necessary to first match the shortest possible sequence of {fill}. And since flex does not implement non-greedy repetition, that can only be done with a character-by-character span.
Here's an example, using start conditions. Note that we hold onto the keyword token until we actually find the =, in case the = is not found.
/* INITIAL: beginning of a line
* FIND_EQUAL: keyword recognized, looking for the =
* VALUE: = recognized, lexing the right-hand side
* NEXT_LINE: find the next line and continue the scan
*/
%x FIND_EQUAL VALUE
%%
int keyword;
"[#=]".* /* Skip comments and lines with no recognizable keyword */
version { keyword = KW_VERSION; BEGIN(FIND_EQUAL); }
width { keyword = KW_WIDTH; BEGIN(FIND_EQUAL); }
height { keyword = KW_HEIGHT; BEGIN(FIND_EQUAL); }
/* etc. */
.|\n /* Skip any other single character, or newline */
<FIND_EQUAL>{
[^=#\n]*"=" { BEGIN(VALUE); return keyword; }
"#".* { BEGIN(INITIAL); }
\n { BEGIN(INITIAL); }
}
<VALUE>{
"#".* { BEGIN(INITIAL); }
\n { BEGIN(INITIAL); }
[[:blank:]]+ ; /* Ignore space and tab characters */
[[:digit:]]+ { yylval.ival = atoi(yytext);
BEGIN(NEXT_LINE); return INTEGER;
}
[[:digit:]]+"."[[:digit:]]*|"."[[:digit:]]+ {
yylval.fval = atod(yytext);
BEGIN(NEXT_LINE); return FLOAT;
}
\"([^"]|\\.)*\" { char* s = malloc(yyleng - 1);
yylval.sval = s;
/* Remove quotes and escape characters */
yytext[yyleng - 1] = '\0';
do {
if (*++yytext == '\\') ++yytext;
*s++ = *yytext;
} while (*yytext);
BEGIN(NEXT_LINE); return STRING;
}
/* Other possible value token types */
. BEGIN(NEXT_LINE); /* bad character in value */
}
<NEXT_LINE>.*\n? BEGIN(INITIAL);
In the escape-removal code, you might want to translate things like \n. And you might also want to avoid string values with physical newlines. And a bunch of etceteras. It's only intended as a model.
While finding follow sets, rules such as
A->aA can lead to infinite recursion. Is there any coding technique to avoid it?
Note that the above example is just an example, in practice such a recursion could happen indirectly as well.
Here is my sample C code for finding follow sets. The grammar is stored as an array of linked lists. Please tell me if the code is unclear at any point.
set findFollowSet(char nonTerminal[], Grammar G, hashTable2 h) //later assume that all first sets are already in the hashtable.
{
LINK temp1 = find2(h, nonTerminal);
set s= createEmptySet();
set temp = createEmptySet();
char lhs[80] = "\0";
int i;
//special case
if(temp1->numRightSideOf==0) //its not on right side of any grammar rule
return insert(s, "$");
for(i=0;i<temp1->numRightSideOf;i++)
{
link l = G.rules[temp1->rightSideOf[i]];
strcpy(lhs, l->symbol); //storing the lhs just in case the nonTerm appears on the rightmost end of the rule.
printf("!!!!! %s\n", lhs);
sleep(1);
//finding nonTerminal in G
while(l!=NULL)
{
if(strcmp(l->symbol, nonTerminal) == 0)
break;
l=l->next;
}
//found the nonTerminal in G
if(l->next!=NULL)
{
temp = findFirstSet(l->next, G, h);
temp = removeElement(temp, "EPSILON");
}
else //its on the rightmost end of the rule
temp = findFollowSet(lhs, G, h);
s = setUnion(s, temp); destroySet(temp);
}
return s;
}
FIRST and FOLLOW sets are defined recursively, so you need to find the recursive closure. What this mean in practice is that you don't find the FOLLOW set for a single non-terminal -- you find all the FOLLOW sets for all the terminals simultaneously, by starting with all sets empty and going over the grammar adding symbols to different sets, until no more symbols can be added to any set. So you end up with something like:
FOLLOW[*] = {}; // all follow sets start empty
done = false;
while (!done)
done = true;
for (R : each rule in the grammar)
A = RHS[R];
tmp = FOLLOW[A];
for (S : each symbol in LHS[R] from right to left)
if (S is terminal)
tmp = {S};
else
if (!(FOLLOW[S] contains tmp))
done = false
FOLLOW[S] |= tmp
if (epsilon in FIRST[S])
tmp |= FIRST[S] - epsilon
else
tmp = FIRST[S]
Ok I got the answer but its inefficient.
So if anyone wants to suggest some more efficient answer, please feel welcomed.
Just store the recursion stack explicitly and at each recursive call, check if the entry already exists in the stack.
Mind you, you need to check the entire stack not just the top of it.
How would you write a Parsing Expression Grammar in any of the following Parser Generators (PEG.js, Citrus, Treetop) which can handle Python/Haskell/CoffeScript style indentation:
Examples of a not-yet-existing programming language:
square x =
x * x
cube x =
x * square x
fib n =
if n <= 1
0
else
fib(n - 2) + fib(n - 1) # some cheating allowed here with brackets
Update:
Don't try to write an interpreter for the examples above. I'm only interested in the indentation problem. Another example might be parsing the following:
foo
bar = 1
baz = 2
tap
zap = 3
# should yield (ruby style hashmap):
# {:foo => { :bar => 1, :baz => 2}, :tap => { :zap => 3 } }
Pure PEG cannot parse indentation.
But peg.js can.
I did a quick-and-dirty experiment (being inspired by Ira Baxter's comment about cheating) and wrote a simple tokenizer.
For a more complete solution (a complete parser) please see this question: Parse indentation level with PEG.js
/* Initializations */
{
function start(first, tail) {
var done = [first[1]];
for (var i = 0; i < tail.length; i++) {
done = done.concat(tail[i][1][0])
done.push(tail[i][1][1]);
}
return done;
}
var depths = [0];
function indent(s) {
var depth = s.length;
if (depth == depths[0]) return [];
if (depth > depths[0]) {
depths.unshift(depth);
return ["INDENT"];
}
var dents = [];
while (depth < depths[0]) {
depths.shift();
dents.push("DEDENT");
}
if (depth != depths[0]) dents.push("BADDENT");
return dents;
}
}
/* The real grammar */
start = first:line tail:(newline line)* newline? { return start(first, tail) }
line = depth:indent s:text { return [depth, s] }
indent = s:" "* { return indent(s) }
text = c:[^\n]* { return c.join("") }
newline = "\n" {}
depths is a stack of indentations. indent() gives back an array of indentation tokens and start() unwraps the array to make the parser behave somewhat like a stream.
peg.js produces for the text:
alpha
beta
gamma
delta
epsilon
zeta
eta
theta
iota
these results:
[
"alpha",
"INDENT",
"beta",
"gamma",
"INDENT",
"delta",
"DEDENT",
"DEDENT",
"epsilon",
"INDENT",
"zeta",
"DEDENT",
"BADDENT",
"eta",
"theta",
"INDENT",
"iota",
"DEDENT",
"",
""
]
This tokenizer even catches bad indents.
I think an indentation-sensitive language like that is context-sensitive. I believe PEG can only do context-free langauges.
Note that, while nalply's answer is certainly correct that PEG.js can do it via external state (ie the dreaded global variables), it can be a dangerous path to walk down (worse than the usual problems with global variables). Some rules can initially match (and then run their actions) but parent rules can fail thus causing the action run to be invalid. If external state is changed in such an action, you can end up with invalid state. This is super awful, and could lead to tremors, vomiting, and death. Some issues and solutions to this are in the comments here: https://github.com/dmajda/pegjs/issues/45
So what we are really doing here with indentation is creating something like a C-style blocks which often have their own lexical scope. If I were writing a compiler for a language like that I think I would try and have the lexer keep track of the indentation. Every time the indentation increases it could insert a '{' token. Likewise every time it decreases it could inset an '}' token. Then writing an expression grammar with explicit curly braces to represent lexical scope becomes more straight forward.
You can do this in Treetop by using semantic predicates. In this case you need a semantic predicate that detects closing a white-space indented block due to the occurrence of another line that has the same or lesser indentation. The predicate must count the indentation from the opening line, and return true (block closed) if the current line's indentation has finished at the same or shorter length. Because the closing condition is context-dependent, it must not be memoized.
Here's the example code I'm about to add to Treetop's documentation. Note that I've overridden Treetop's SyntaxNode inspect method to make it easier to visualise the result.
grammar IndentedBlocks
rule top
# Initialise the indent stack with a sentinel:
&{|s| #indents = [-1] }
nested_blocks
{
def inspect
nested_blocks.inspect
end
}
end
rule nested_blocks
(
# Do not try to extract this semantic predicate into a new rule.
# It will be memo-ized incorrectly because #indents.last will change.
!{|s|
# Peek at the following indentation:
save = index; i = _nt_indentation; index = save
# We're closing if the indentation is less or the same as our enclosing block's:
closing = i.text_value.length <= #indents.last
}
block
)*
{
def inspect
elements.map{|e| e.block.inspect}*"\n"
end
}
end
rule block
indented_line # The block's opening line
&{|s| # Push the indent level to the stack
level = s[0].indentation.text_value.length
#indents << level
true
}
nested_blocks # Parse any nested blocks
&{|s| # Pop the indent stack
# Note that under no circumstances should "nested_blocks" fail, or the stack will be mis-aligned
#indents.pop
true
}
{
def inspect
indented_line.inspect +
(nested_blocks.elements.size > 0 ? (
"\n{\n" +
nested_blocks.elements.map { |content|
content.block.inspect+"\n"
}*'' +
"}"
)
: "")
end
}
end
rule indented_line
indentation text:((!"\n" .)*) "\n"
{
def inspect
text.text_value
end
}
end
rule indentation
' '*
end
end
Here's a little test driver program so you can try it easily:
require 'polyglot'
require 'treetop'
require 'indented_blocks'
parser = IndentedBlocksParser.new
input = <<END
def foo
here is some indented text
here it's further indented
and here the same
but here it's further again
and some more like that
before going back to here
down again
back twice
and start from the beginning again
with only a small block this time
END
parse_tree = parser.parse input
p parse_tree
I know this is an old thread, but I just wanted to add some PEGjs code to the answers. This code will parse a piece of text and "nest" it into a sort of "AST-ish" structure. It only goes one deep and it looks ugly, furthermore it does not really use the return values to create the right structure but keeps an in-memory tree of your syntax and it will return that at the end. This might well become unwieldy and cause some performance issues, but at least it does what it's supposed to.
Note: Make sure you have tabs instead of spaces!
{
var indentStack = [],
rootScope = {
value: "PROGRAM",
values: [],
scopes: []
};
function addToRootScope(text) {
// Here we wiggle with the form and append the new
// scope to the rootScope.
if (!text) return;
if (indentStack.length === 0) {
rootScope.scopes.unshift({
text: text,
statements: []
});
}
else {
rootScope.scopes[0].statements.push(text);
}
}
}
/* Add some grammar */
start
= lines: (line EOL+)*
{
return rootScope;
}
line
= line: (samedent t:text { addToRootScope(t); }) &EOL
/ line: (indent t:text { addToRootScope(t); }) &EOL
/ line: (dedent t:text { addToRootScope(t); }) &EOL
/ line: [ \t]* &EOL
/ EOF
samedent
= i:[\t]* &{ return i.length === indentStack.length; }
{
console.log("s:", i.length, " level:", indentStack.length);
}
indent
= i:[\t]+ &{ return i.length > indentStack.length; }
{
indentStack.push("");
console.log("i:", i.length, " level:", indentStack.length);
}
dedent
= i:[\t]* &{ return i.length < indentStack.length; }
{
for (var j = 0; j < i.length + 1; j++) {
indentStack.pop();
}
console.log("d:", i.length + 1, " level:", indentStack.length);
}
text
= numbers: number+ { return numbers.join(""); }
/ txt: character+ { return txt.join(""); }
number
= $[0-9]
character
= $[ a-zA-Z->+]
__
= [ ]+
_
= [ ]*
EOF
= !.
EOL
= "\r\n"
/ "\n"
/ "\r"