DLV Rule is not safe - declarative

I am starting to work with DLV (Disjunctive Datalog) and I have a rule that is reporting a "Rule is not safe" error, when running the code. The rule is the following:
foo(R, 1) :- not foo(R, _)
I have read the manual and seen that "cyclic dependencies are disallowed". I guess this is why I am being reported the error, but I am not sure how this statement is so problematic to DLV. The final goal is to have some kind of initialization in case that the predicate has not been defined.
More precisely, if there is no occurrence of 'foo' with the parameter R (and anything else), then define it with parameters R and 1. Once it is defined, the rule shouldn't be triggered again. So, this is not a real recursion in my opinion.
Any comments on how to solve this issue are welcomed!
I have realised that I probably need another predicate to match the parameter R in the body of the rule. Something like this:
foo(R, 1) :- not foo(R, _), bar(R)
Since, otherwise there would be no way to know whether there are no occurrences of foo(R, _). I don't know whether I made myself clear.
Anyway, this doesn't work either :(

To the particular "Rule is not safe" error: First of all this has nothing to do with cyclic or acyclic dependencies. The same error message shows up for the non-cyclic program:
foo2(R, 1) :- not foo(R,_), bar(R).
The problem is that the program is actually not safe (http://www.dlvsystem.com/html/DLV_User_Manual.html#SAFETY). As mentioned in the section on negative rules (anchor #AEN375, I am only allowed to use 2 links in my answer):
Variables, which occur in a negated literal, must also occur in a
positive literal in the body.
Observe that the _ is an anonymous variable. I.e., the program
foo(R,1) :- not foo(R,_), bar(R).
can be equivalently written as (and is equivalent to)
foo(R,1) :- not foo(R,X), bar(R).
Anonymous variables (DLV manual, anchor #AEN264 - at the end of the section) just allow us to avoid inventing names for variables that will only occur once within the rule (i.e. for variables that only express "there is some value, I absolutely do not care about it), but they are variables nevertheless. And since negation with not is "negation" and not "true negation" (or "strong negation" as it is also often called), none of the three safety conditions is satisfied by the rule.
A very rough and high-level intuition for safety is that it guarantees that every variable in the program can be assigned to some finite domain - as it is now the case with R by adding bar(R). However, the same also must be the case for the anonymous variable _ .
To the actual problem of defining default values:
As pointed out by lambda.xy.x, the problem here is the Answer Set (or stable model) semantics of DLV: Trying to do it in one rule does not give any solution:
In order to get a safe program, we could replace the above problems e.g. by
foo(1,2). bar(1). bar(2).
tmp(R) :- foo(R,_).
foo(R,1) :- not tmp(R), bar(R).
This has no stable model:
Assume the answer is, as intended,
{foo(1,2), bar(1), bar(2), foo(2,1)}
However, this is not a valid model, since tmp(R) :- foo(R,_) would require it to contain tmp(2). But then, "not tmp(2)" is no longer true, and therefore having foo(2,1) in the model violates the required minimality of the model. (This is not exactly what is going on, more a rough intuition. More technical details could be found in any article on answer set programming, a quick Google search gave me this paper as one of the first results: http://www.kr.tuwien.ac.at/staff/tkren/pub/2009/rw2009-asp.pdf)
In order to solve the problem, it is therefore somehow necessary to "break the cycle". One possibility would be:
foo(1,2). bar(1). bar(2). bar(3).
tmp(R) :- foo(R,X), X!=1.
foo(R,1) :- bar(R), not tmp(R).
I.e., by explicitly stating that we want to add R into the intermediate atom only if the value is different from 1, having foo(2,1) in the model does not contradict tmp(2) not being part of the model as well. Of course, this no longer allows to distinguish whether foo(R,1) is there as default value or by input, but if this is not required ...
Another possibility would be to not use foo for the computation, but some foo1 instead. I.e. having
foo1(R,X) :- foo(R,X).
tmp(R) :- foo(R,_).
foo1(R,1) :- bar(R), not tmp(R).
and then just use foo1 instead of foo.

Related

Basic error using solver in Z3-Python: it is returning [] as a model

Maybe I have slept bad today, but I am really struggling with this simple query to Z3-Python:
from z3 import *
a = Bool('a')
b = Bool('b')
sss = Solver()
sss.add(Exists([a,b], True))
print(sss.check())
print(sss.model())
The check prints out sat, but the model is []. However, it should be printing some (anyone) concrete assignment, such as a=True, b=True.
The same is happening if I change the formula to, say: sss.add(Exists([a,b], Not(And(a,b)))). Also tested sss.add(True). Thus, I am missing something really basic, so sorry for the basic nature of this question.
Other codes are working normally (even with optimizers instead of solvers), so it is not a problem of my environment (Collab)
Any help?
Note that there's a big difference between:
a = Bool('a')
b = Bool('b')
sss.add(Exists([a, b], True))
and
a = Bool('a')
b = Bool('b')
sss.add(True) # redundant, but to illustrate
In the first case, you're checking if the statement Exists a. Exists b. True is satisfiable; which trivially is; but there is no model to display: The variables a and b are inside quantification; and they don't play any role in model construction.
In the second case, a and b are part of the model, and hence will be displayed.
What's confusing is why do you need to declare the a and b in the first case. That is, why can't we just say:
sss.add(Exists([a, b], True))
without any a or b in the environment? After all, they are irrelevant as far as the problem is concerned. This is merely a peculiarity of the Python API; there's really no good reason other than this is how it is implemented.
You can see the generated SMTLib by adding a statement of the form:
print(sss.sexpr())
and if you do that for the above segments, you'll see the first one doesn't even declare the variables at all.
So, long story short, the formula exists a, b. True has no "model" variables and thus there's nothing to display. The only reason you declare them in z3py is because of an implementation trick that they use (so they can figure out the type of the variable), nothing more than that.
Here're some specific comments about your questions:
An SMT solver will only construct model-values for top-level declared variables. If you create "local" variables via exists/forall, they are not going to be displayed in models constructed. Note that you could have two totally separate assertions that talk about the "same" existentially quantified variable: It wouldn't even have a way of referring to that item. Rule-of-thumb: If you want to see the value in a model, it has to be declared at the top-level.
Yes, this is the reason for the trick. So they can figure out what type those variables are. (And other bookkeeping.) There's just no syntax afforded by z3py to let you say something like Exists([Int(a), Int(b)], True) or some such. Note that this doesn't mean something like this cannot be implemented. They just didn't. (And is probably not worth it.)
No you understood correctly. The reason you get [] is because there are absolutely no constraints on those a and b, and z3's model constructor will not assign any values because they are irrelevant. You can recover their values via model_completion parameter. But the best way to experiment with this is to add some extra constraints at the top level. (It might be easier to play around if you make a and b Ints. At the top level, assert that a = 5, b = 12. At the "existential" level assert something else. You'll see that your model will only satisfy the top-level constraints. Interestingly, if you assert something that's unsatisfiable in your existential query, the whole thing will become unsat; which is another sign of how they are treated.
I said trivially true because your formula is Exists a. Exists b. True. There're no constraints, so any assignment to a and b satisfy it. It's trivial in this sense. (And all SMTLib logics work over non-empty domains, so you can always assign values freely.)
Quantified variables can always be alpha-renamed without changing the semantics. So, whenever there's collision between quantified names and top-level names, imagine renaming them to be unique. I think you'll be able to answer your own question if you think of it this way.
Extended discussions over comments is really not productive. Feel free to ask new questions if anything isn't clear.

What's the difference between luaL_checknumber and lua_tonumber?

Clarification (sorry the question was not specific): They both try to convert the item on the stack to a lua_Number. lua_tonumber will also convert a string that represents a number. How does luaL_checknumber deal with something that's not a number?
There's also luaL_checklong and luaL_checkinteger. Are they the same as (int)luaL_checknumber and (long)luaL_checknumber respectively?
The reference manual does answer this question. I'm citing the Lua 5.2 Reference Manual, but similar text is found in the 5.1 manual as well. The manual is, however, quite terse. It is rare for any single fact to be restated in more than one sentence. Furthermore, you often need to correlate facts stated in widely separated sections to understand the deeper implications of an API function.
This is not a defect, it is by design. This is the reference manual to the language, and as such its primary goal is to completely (and correctly) describe the language.
For more information about "how" and "why" the general advice is to also read Programming in Lua. The online copy is getting rather long in the tooth as it describes Lua 5.0. The current paper edition describes Lua 5.1, and a new edition describing Lua 5.2 is in process. That said, even the first edition is a good resource, as long as you also pay attention to what has changed in the language since version 5.0.
The reference manual has a fair amount to say about the luaL_check* family of functions.
Each API entry's documentation block is accompanied by a token that describes its use of the stack, and under what conditions (if any) it will throw an error. Those tokens are described at section 4.8:
Each function has an indicator like this: [-o, +p, x]
The first field, o, is how many elements the function pops from the
stack. The second field, p, is how many elements the function pushes
onto the stack. (Any function always pushes its results after popping
its arguments.) A field in the form x|y means the function can push
(or pop) x or y elements, depending on the situation; an interrogation
mark '?' means that we cannot know how many elements the function
pops/pushes by looking only at its arguments (e.g., they may depend on
what is on the stack). The third field, x, tells whether the function
may throw errors: '-' means the function never throws any error; 'e'
means the function may throw errors; 'v' means the function may throw
an error on purpose.
At the head of Chapter 5 which documents the auxiliary library as a whole (all functions in the official API whose names begin with luaL_ rather than just lua_) we find this:
Several functions in the auxiliary library are used to check C
function arguments. Because the error message is formatted for
arguments (e.g., "bad argument #1"), you should not use these
functions for other stack values.
Functions called luaL_check* always throw an error if the check is not
satisfied.
The function luaL_checknumber is documented with the token [-0,+0,v] which means that it does not disturb the stack (it pops nothing and pushes nothing) and that it might deliberately throw an error.
The other functions that have more specific numeric types differ primarily in function signature. All are described similarly to luaL_checkint() "Checks whether the function argument arg is a number and returns this number cast to an int", varying the type named in the cast as appropriate.
The function lua_tonumber() is described with the token [-0,+0,-] meaning it has no effect on the stack and does not throw any errors. It is documented to return the numeric value from the specified stack index, or 0 if the stack index does not contain something sufficiently numeric. It is documented to use the more general function lua_tonumberx() which also provides a flag indicating whether it successfully converted a number or not.
It too has siblings named with more specific numeric types that do all the same conversions but cast their results.
Finally, one can also refer to the source code, with the understanding that the manual is describing the language as it is intended to be, while the source is a particular implementation of that language and might have bugs, or might reveal implementation details that are subject to change in future versions.
The source to luaL_checknumber() is in lauxlib.c. It can be seen to be implemented in terms of lua_tonumberx() and the internal function tagerror() which calls typerror() which is implemented with luaL_argerror() to actually throw the formatted error message.
They both try to convert the item on the stack to a lua_Number. lua_tonumber will also convert a string that represents a number. luaL_checknumber throws a (Lua) error when it fails a conversion - it long jumps and never returns from the POV of the C function. lua_tonumber merely returns 0 (which can be a valid return as well.) So you could write this code which should be faster than checking with lua_isnumber first.
double r = lua_tonumber(_L, idx);
if (r == 0 && !lua_isnumber(_L, idx))
{
// Error handling code
}
return r;

Encode FIRST and FOLLOW sets into a recursive descent parser

This is a follow up to a previous question I asked How to encode FIRST & FOLLOW sets inside a compiler, but this one is more about the design of my program.
I am implementing the Syntax Analysis phase of my compiler by writing a recursive descent parser. I need to be able to take advantage of the FIRST and FOLLOW sets so I can handle errors in the syntax of the source program more efficiently. I have already calculated the FIRST and FOLLOW for all of my non-terminals, but am have trouble deciding where to logically place them in my program and what the best data-structure would be to do so.
Note: all code will be pseudo code
Option 1) Use a map, and map all non-terminals by their name to two Sets that contains their FIRST and FOLLOW sets:
class ParseConstants
Map firstAndFollowMap = #create a map .....
firstAndFollowMap.put("<program>", FIRST_SET, FOLLOW_SET)
end
This seems like a viable option, but inside of my parser I would then need sorta ugly code like this to retrieve the FIRST and FOLLOW and pass to error function:
#processes the <program> non-terminal
def program
List list = firstAndFollowMap.get("<program>")
Set FIRST = list.get(0)
Set FOLLOW = list.get(1)
error(current_symbol, FOLLOW)
end
Option 2) Create a class for each non-terminal and have a FIRST and FOLLOW property:
class Program
FIRST = .....
FOLLOW = ....
end
this leads to code that looks a little nicer:
#processes the <program> non-terminal
def program
error(current_symbol, Program.FOLLOW)
end
These are the two options I thought up, I would love to hear any other suggestions for ways to encode these two sets, and also any critiques and additions to the two ways I posted would be helpful.
Thanks
I have also posted this question here: http://www.coderanch.com/t/570697/java/java/Encode-FIRST-FOLLOW-sets-recursive
You don't really need the FIRST and FOLLOW sets. You need to compute those to get the parse table. That is a table of {<non-terminal, token> -> <action, rule>} if LL(k) (which means seeing a non-terminal in stack and token in input, which action to take and if applies, which rule to apply), or a table of {<state, token> -> <action, state>} if (C|LA|)LR(k) (which means given state in stack and token in input, which action to take and go to which state.
After you get this table, you don't need the FIRST and FOLLOWS any more.
If you are writing a semantic analyzer, you must assume the parser is working correctly. Phrase level error handling (which means handling parse errors), is totally orthogonal to semantic analysis.
This means that in case of parse error, the phrase level error handler (PLEH) would try to fix the error. If it couldn't, parsing stops. If it could, the semantic analyzer shouldn't know if there was an error which was fixed, or there wasn't any error at all!
You can take a look at my compiler library for examples.
About phrase level error handling, you again don't need FIRST and FOLLOW. Let's talk about LL(k) for now (simply because about LR(k) I haven't thought about much yet). After you build the grammar table, you have many entries, like I said like this:
<non-terminal, token> -> <action, rule>
Now, when you parse, you take whatever is on the stack, if it was a terminal, then you must match it with the input. If it didn't match, the phrase level error handler kicks in:
Role one: handle missing terminals - simply generate a fake terminal of the type you need in your lexer and have the parser retry. You can do other stuff as well (for example check ahead in the input, if you have the token you want, drop one token from lexer)
If what you get is a non-terminal (T) from the stack instead, you must look at your lexer, get the lookahead and look at your table. If the entry <T, lookahead> existed, then you're good to go. Follow the action and push to/pop from the stack. If, however, no such entry existed, again, the phrase level error handler kicks in:
Role two: handle unexpected terminals - you can do many things to get passed this. What you do depends on what T and lookahead are and your expert knowledge of your grammar.
Examples of the things you can do are:
Fail! You can do nothing
Ignore this terminal. This means that you push lookahead to the stack (after pushing T back again) and have the parser continue. The parser would first match lookahead, throw it away and continues. Example: if you have an expression like this: *1+2/0.5, you can drop the unexpected * this way.
Change lookahead to something acceptable, push T back and retry. For example, an expression like this: 5id = 10; could be illegal because you don't accept ids that start with numbers. You can replace it with _5id for example to continue

Is it acceptable to store the previous state as a global variable?

One of the biggest problems with designing a lexical analyzer/parser combination is overzealousness in designing the analyzer. (f)lex isn't designed to have parser logic, which can sometimes interfere with the design of mini-parsers (by means of yy_push_state(), yy_pop_state(), and yy_top_state().
My goal is to parse a document of the form:
CODE1 this is the text that might appear for a 'CODE' entry
SUBCODE1 the CODE group will have several subcodes, which
may extend onto subsequent lines.
SUBCODE2 however, not all SUBCODEs span multiple lines
SUBCODE3 still, however, there are some SUBCODES that span
not only one or two lines, but any number of lines.
this makes it a challenge to use something like \r\n
as a record delimiter.
CODE2 Moreover, it's not guaranteed that a SUBCODE is the
only way to exit another SUBCODE's scope. There may
be CODE blocks that accomplish this.
In the end, I've decided that this section of the project is better left to the lexical analyzer, since I don't want to create a pattern that matches each line (and identifies continuation records). Part of the reason is that I want the lexical parser to have knowledge of the contents of each line, without incorporating its own tokenizing logic. That is to say, if I match ^SUBCODE[ ][ ].{71}\r\n (all records are blocked in 80-character records) I would not be able to harness the power of flex to tokenize the structured data residing in .{71}.
Given these constraints, I'm thinking about doing the following:
Entering a CODE1 state from the <INITIAL> start condition results
in calls to:
yy_push_state(CODE_STATE)
yy_push_state(CODE_CODE1_STATE)
(do something with the contents of the CODE1 state identifier, if such contents exist)
yy_push_state(SUBCODE_STATE) (to tell the analyzer to expect SUBCODE states belonging to the CODE_CODE1_STATE. This is where the analyzer begins to masquerade as a parser.
The <SUBCODE1_STATE> start condition is nested as follows: <CODE_STATE>{ <CODE_CODE1_STATE> { <SUBCODE_STATE>{ <SUBCODE1_STATE> { (perform actions based on the matching patterns) } } }. It also sets the global previous_state variable to yy_top_state(), to wit SUBCODE1_STATE.
Within <SUBCODE1_STATE>'s scope, \r\n will call yy_pop_state(). If a continuation record is present (which is a pattern at the highest scope against which all text is matched), yy_push_state(continuation_record_states[previous_state]) is called, bringing us back to the scope in 2. continuation_record_states[] maps each state with its continuation record state, which is used by the parser.
As you can see, this is quite complicated, which leads me to conclude that I'm massively over-complicating the task.
Questions
For states lacking an extremely clear token signifying the end of its scope, is my proposed solution acceptable?
Given that I want to tokenize the input using flex, is there any way to do so without start conditions?
The biggest problem I'm having is that each record (beginning with the (SUB)CODE prefix) is unique, but the information appearing after the (SUB)CODE prefix is not. Therefore, it almost appears mandatory to have multiple states like this, and the abstract CODE_STATE and SUBCODE_STATE states would act as groupings for each of the concrete SUBCODE[0-9]+_STATE and CODE[0-9]+_STATE states.
I would look at how the OMeta parser handles these things.

Is there an idiomatic way to order function arguments in Erlang?

Seems like it's inconsistent in the lists module. For example, split has the number as the first argument and the list as the second, but sublists has the list as the first argument and the len as the second argument.
OK, a little history as I remember it and some principles behind my style.
As Christian has said the libraries evolved and tended to get the argument order and feel from the impulses we were getting just then. So for example the reason why element/setelement have the argument order they do is because it matches the arg/3 predicate in Prolog; logical then but not now. Often we would have the thing being worked on first, but unfortunately not always. This is often a good choice as it allows "optional" arguments to be conveniently added to the end; for example string:substr/2/3. Functions with the thing as the last argument were often influenced by functional languages with currying, for example Haskell, where it is very easy to use currying and partial evaluation to build specific functions which can then be applied to the thing. This is very noticeable in the higher order functions in lists.
The only influence we didn't have was from the OO world. :-)
Usually we at least managed to be consistent within a module, but not always. See lists again. We did try to have some consistency, so the argument order in the higher order functions in dict/sets match those of the corresponding functions in lists.
The problem was also aggravated by the fact that we, especially me, had a rather cavalier attitude to libraries. I just did not see them as a selling point for the language, so I wasn't that worried about it. "If you want a library which does something then you just write it" was my motto. This meant that my libraries were structured, just not always with the same structure. :-) That was how many of the initial libraries came about.
This, of course, creates unnecessary confusion and breaks the law of least astonishment, but we have not been able to do anything about it. Any suggestions of revising the modules have always been met with a resounding "no".
My own personal style is a usually structured, though I don't know if it conforms to any written guidelines or standards.
I generally have the thing or things I am working on as the first arguments, or at least very close to the beginning; the order depends on what feels best. If there is a global state which is chained through the whole module, which there usually is, it is placed as the last argument and given a very descriptive name like St0, St1, ... (I belong to the church of short variable names). Arguments which are chained through functions (both input and output) I try to keep the same argument order as return order. This makes it much easier to see the structure of the code. Apart from that I try to group together arguments which belong together. Also, where possible, I try to preserve the same argument order throughout a whole module.
None of this is very revolutionary, but I find if you keep a consistent style then it is one less thing to worry about and it makes your code feel better and generally be more readable. Also I will actually rewrite code if the argument order feels wrong.
A small example which may help:
fubar({f,A0,B0}, Arg2, Ch0, Arg4, St0) ->
{A1,Ch1,St1} = foo(A0, Arg2, Ch0, St0),
{B1,Ch2,St2} = bar(B0, Arg4, Ch1, St1),
Res = baz(A1, B1),
{Res,Ch2,St2}.
Here Ch is a local chained through variable while St is a more global state. Check out the code on github for LFE, especially the compiler, if you want a longer example.
This became much longer than it should have been, sorry.
P.S. I used the word thing instead of object to avoid confusion about what I was talking.
No, there is no consistently-used idiom in the sense that you mean.
However, there are some useful relevant hints that apply especially when you're going to be making deeply recursive calls. For instance, keeping whichever arguments will remain unchanged during tail calls in the same order/position in the argument list allows the virtual machine to make some very nice optimizations.

Resources