I'm currently working on a recursive Prolog program to link routes together to create a basic GPS of the Birmingham area. At the moment I can get output as so:
Input
routeplan(selly_oak, aston, P).
Output
P = [selly_oak, edgbaston, ... , aston]
What I would like to do is have my program provide some sort of interface, so if I were to type in something along the lines of:
Route from selly_oak to aston
It would provide me with:
Go from selly_oak to edgbaston
Go from edgbaston to ...
Finally, Go from ... to aston.
Prolog is a powerful language so I assume this is easily possible, however many of the books I've taken out seem to skip over this part. As far as I am aware I have to use something along the lines of write() and read() although the details are unknown to me.
Could anyone here a Prolog novice out with some basic examples or links to further information?
EDIT: A lot of these answers seem very complicated, where the solution should only be around 5-10 lines of code. Reading in a value isn't a problem as I can do something along the lines of:
find:-
write('Where are you? '),
read(X),
nl, write('Where do you want to go? '),
read(Y),
loopForRoute(X,Y).
I'd prefer it if the output could be written out using write() so a new line (nl) can be used, so that it displays like the output above.
If this were my input, how would I then arrange the top routeplan() to work with these inputs? Also, if I were to add the Lines for these stations as an extra parameter how would this then be implemented? All links are defined at the beginning of the file like so:
rlinks(selly_oak, edgbaston, uob_line).
rlinks(edgbaston, bham_new_street, main_line).
Therefore, with this information, it'd be good to be able to read the line as so.
Go from selly_oak to edgbaston using the uob_line
Go from edgbaston to ... using the ...
Finally, go from ... to aston using the astuni_line
A book which discusses such things in detail is Natural Language Processing for Prolog Programmers
by Michael A. Covington.
In general, what you need to do is
Tokenize the input
Parse the tokens (e.g. with DCG) to get the input for routeplan/3
Call routeplan/3
Generate some English on the basis of the output of routeplan/3
Something like this (works in SWI-Prolog):
% Usage example:
%
% ?- query_to_response('Route from selly_oak to aston', Response).
%
% Response = 'go from selly_oak to edgbaston then go from edgbaston
% to aston then stop .'
%
query_to_response(Query, Response) :-
concat_atom(QueryTokens, ' ', Query), % simple tokenizer
query(path(From, To), QueryTokens, []),
routeplan(From, To, Plan),
response(Plan, EnglishTokens, []),
concat_atom(EnglishTokens, ' ', Response).
% Query parser
query(path(From, To)) --> ['Route'], from(From), to(To).
from(From) --> [from], [From], { placename(From) }.
to(To) --> [to], [To], { placename(To) }.
% Response generator
response([_]) --> [stop], [.].
response([From, To | Tail]) -->
goto(path(From, To)), [then], response([To | Tail]).
goto(path(From, To)) --> [go], from(From), to(To).
% Placenames
placename(selly_oak).
placename(aston).
placename(edgbaston).
% Mock routeplan/3
routeplan(selly_oak, aston, [selly_oak, edgbaston, aston]).
Hm, if I understand you correctly you just want to format the list nicely for printing out, no?
In SWI-Prolog this works:
output_string([A,B],StrIn,StrOut) :-
concat_atom([StrIn, 'Finally, Go from ', A, ' to ', B, '.'],StrOut),
write(StrOut).
output_string([A,B|Rest],StrIn,StrOut) :-
concat_atom([StrIn,'Go from ', A, ' to ', B, '.\n'],StrAB),
output_string([B|Rest],StrAB,StrOut).
then call with
output_string(P,'',_).
It's probably not very efficient, but it does the job. :)
For this sort of thing, I usually create shell predicates. So in your case...
guided:-
print('Enter your start point'),nl,
read(Start),
print('Enter your destination'),nl,
read(Dest),
routeplan(Start, Dest, Route),
print_route(Route).
And print_route/1 could be something recursive like this:
print_route([]).
print_route([[A,B,Method]|Tail]):-
print_route(Tail),
print('Go from '), print(A),
print(' to '), print(B),
print(' by '), print(Method), nl.
I've assumed that the 3rd variable of the routeplan/3 predicate is a list of lists. Also that it's built by adding to the tail. If it's not, it should be fairly easy to adapt. Ask in the comments.
Here are a few predicates to read lines from a file/stream into a Prolog string:
%%% get_line(S, CL): CL is the string read up to the end of the line from S.
%%% If reading past end of file, returns 'end_of_file' in CL first, raises
%%% an exception second time.
%%% :- pred get_string(+stream, -list(int)).
get_line(S, CL) :-
peek_code(S, C),
( C = -1
-> get_code(S, _),
CL = end_of_file
; get_line(S, C, CL)).
get_line(_, -1, CL) :- !, CL = []. % leave end of file mark on stream
get_line(S, 0'\n, CL) :- !,
get_code(S, _),
CL = [].
get_line(S, C, [C|CL]) :-
get_code(S, _),
peek_code(S, NC),
get_line(S, NC, CL).
%% read_lines(L): reads lines from current input to L. L is a list of list
%% of character codes, newline characters are not included.
%% :- pred read_lines(-list(list(char))).
read_lines(L) :-
current_input(In),
get_line(In, L0),
read_lines(In, L0, L).
%% read_lines(F, L): reads lines from F to L. L is a list of list of character
%% codes, newline characters are not included.
%% :- pred read_lines(+atom, -list(list(char))).
read_lines(F, L) :-
fail_on_error(open(F, read, S)),
call_cleanup((get_line(S, L0),
read_lines(S, L0, L)),
close(S)).
read_lines(_, end_of_file, L) :- !, L = [].
read_lines(S, H, [H|T]) :-
get_line(S, NH),
read_lines(S, NH, T).
Then, take a look at DCGs for information on how to parse a string.
Related
I'm curious about Prolog as a parser, so I'm making a little Lisp front-end. I have already made a tokenizer, which you can see here:
base_tokenize([], Buffer, [Buffer]).
base_tokenize([Char | Chars], Buffer, Tokens) :-
(Char = '(' ; Char = ')') ->
base_tokenize(Chars, '', Tail_Tokens),
Tokens = [Buffer, Char | Tail_Tokens];
Char = ' ' ->
base_tokenize(Chars, '', Tail_Tokens),
Tokens = [Buffer | Tail_Tokens];
atom_concat(Buffer, Char, New_Buffer),
base_tokenize(Chars, New_Buffer, Tokens).
filter_empty_blank([], []).
filter_empty_blank([Head | Tail], Result) :-
filter_empty_blank(Tail, Tail_Result),
((Head = [] ; Head = '') ->
Result = Tail_Result;
Result = [Head | Tail_Result]).
tokenize(Expr, Tokens) :-
atom_chars(Expr, Chars),
base_tokenize(Chars, '', Dirty_Tokens),
filter_empty_blank(Dirty_Tokens, Tokens).
I now have a new challenge: construct a parse tree from this. First, I tried making one without a grammar, but that turned out really messy. So I'm using DCGs. Wikipedia's page on it is not very clear - especially the portion Parsing with DCGs. Maybe someone can give me a clearer idea of how I would construct a tree? I was very happy to know that Prolog's lists are untyped, so it's a bit easier now that no sum types are needed. I'm just really confused about inputs to grammar clauses like sentence(s(NP,VP)) or verb(v(eats)) (on the Wiki), why the arguments have such abstruse names, and how I can get started with my parser without too much hassle.
expr --> [foo].
expr --> list.
seq --> expr, seq.
seq --> expr.
list --> ['('], seq, [')'].
Here is a beginning: Parsing a LISP list-of-atom, which at first is unstructured list-of-token:
List = [ '(', '(', foo, bar, ')', baz ')' ].
First, just accept it.
Write down the grammar directly:
so_list --> ['('], so_list_content, [')'].
so_list_content --> [].
so_list_content --> so_atom, so_list_content.
so_list_content --> so_list, so_list_content.
so_atom --> [X], { \+ member(X,['(',')']),atom(X) }.
Add some test cases (is there plunit in GNU Prolog?)
:- begin_tests(accept_list).
test(1,[fail]) :- phrase(so_list,[]).
test(2,[true,nondet]) :- phrase(so_list,['(',')']).
test(3,[true,nondet]) :- phrase(so_list,['(',foo,')']).
test(4,[true,nondet]) :- phrase(so_list,['(',foo,'(',bar,')',')']).
test(5,[true,nondet]) :- phrase(so_list,['(','(',bar,')',foo,')']).
test(6,[fail]) :- phrase(so_list,['(',foo,'(',bar,')']).
:- end_tests(accept_list).
And so:
?- run_tests.
% PL-Unit: accept_list ...... done
% All 6 tests passed
true.
Cool. Looks like we can accept lists-of-tokens.
Now build a parse tree. This is done by growing a Prolog term through parameters of the "DCG predicates". The term (or multiple terms) in the head collect the terms (or multiple terms) appearing in the body into a larger structure, quite naturally. Once the terminal tokens are reached, the structure starts to fill up with actual content:
so_list(list(Stuff)) --> ['('], so_list_content(Stuff), [')'].
so_list_content([]) --> [].
so_list_content([A|Stuff]) --> so_atom(A), so_list_content(Stuff).
so_list_content([L|Stuff]) --> so_list(L), so_list_content(Stuff).
so_atom(X) --> [X], { \+ member(X,['(',')']),atom(X) }.
Yup, tests (move the expected Result out of the test head because the visual noise is too much)
:- begin_tests(parse_list).
test(1,[fail]) :-
phrase(so_list(_),[]).
test(2,[true(L==Result),nondet]) :-
phrase(so_list(L),['(',')']),
Result = list([]).
test(3,[true(L==Result),nondet]) :-
phrase(so_list(L),['(',foo,')']),
Result = list([foo]).
test(4,[true(L==Result),nondet]) :-
phrase(so_list(L),['(',foo,'(',bar,')',')']),
Result = list([foo,list([bar])]).
test(5,[true(L==Result),nondet]) :-
phrase(so_list(L),['(','(',bar,')',foo,')']),
Result = list([list([bar]),foo]).
test(6,[fail]) :-
phrase(so_list(_),['(',foo,'(',bar,')']).
:- end_tests(parse_list).
And so:
?- run_tests.
% PL-Unit: parse_list ...... done
% All 6 tests passed
true.
I'm making a parser for a DSL in Haskell using Alex + Happy.
My DSL uses dice rolls as part of the possible expressions.
Sometimes I have an expression that I want to parse that looks like:
[some code...] 3D6 [... rest of the code]
Which should translate roughly to:
TokenInt {... value = 3}, TokenD, TokenInt {... value = 6}
My DSL also uses variables (basically, Strings), so I have a special token that handle variable names.
So, with this tokens:
"D" { \pos str -> TokenD pos }
$alpha [$alpha $digit \_ \']* { \pos str -> TokenName pos str}
$digit+ { \pos str -> TokenInt pos (read str) }
The result I'm getting when using my parse now is:
TokenInt {... value = 3}, TokenName { ... , name = "D6"}
Which means that my lexer "reads" an Integer and a Variable named "D6".
I have tried many things, for example, i changed the token D to:
$digit "D" $digit { \pos str -> TokenD pos }
But that just consumes the digits :(
Can I parse the dice roll with the numbers?
Or at least parse TokenInt-TokenD-TokenInt?
PS: I'm using PosN as a wrapper, not sure if relevant.
The way I'd go about it would be to extend the TokenD type to TokenD Int Int so using the basic wrapper for convenience I would do
$digit+ D $digit+ { dice }
...
dice :: String -> Token
dice s = TokenD (read $ head ls) (read $ last ls)
where ls = split 'D' s
split can be found here.
This is an extra step that'd usually be done in during syntactic analysis but doesn't hurt much here.
Also I can't make Alex parse $alpha for TokenD instead of TokenName. If we had Di instead of D that'd be no problem. From Alex's docs:
When the input stream matches more than one rule, the rule which matches the longest prefix of the input stream wins. If there are still several rules which match an equal number of characters, then the rule which appears earliest in the file wins.
But then your code should work. I don't know if this is an issue with Alex.
I decided that I could survive with variables starting with lowercase letters (like Haskell variables), so I changed my lexer to parse variables only if they start with a lowercase letter.
That also solved some possible problems with some other reserved words.
I'm still curious to know if there were other solutions, but the problem in itself was solved.
Thank you all!
I managed to build the parse tree for given sentence and here it is, for the sentence: "The man went home."
T = s(np(det(the), n(man)), vp(v(went), np(n(home))))
1) How to use phrase/2 on this?
How to translate a sentence in a logical language using prolog? - is similar to what I need, but it's solution doesn't work on me.
2)I want to map this with grammar pattern and get the words tag.
Det=the, N(Subject)=man, V=went, N(Object)=home
Is there a way to map this tree with given set tree structures and identify the grammar.
how can I use parse tree to identify Subject, verb, object, the grammar pattern and the generate the target language sentence.
Edited later..
I tried this code and it gives considerable answer. Any suggestions on this code.
sent("(s(np(n(man))) (vp(v(went)) (np(n(home)))))").
whitespace --> [X], { char_type(X, white) ; char_type(X, space) }, whitespace.
whitespace --> [].
char(C) --> [C], { char_type(C, graph), \+ memberchk(C, "()") }.
chars([C|Rest]) --> char(C), chars(Rest).
chars([C]) --> char(C).
term(T) --> chars(C), { atom_chars(T, C) }.
term(L) --> list(L).
list(T) --> "(", terms(T), ")".
terms([]) --> [].
terms([T|Terms]) --> term(T), whitespace, !, terms(Terms).
simplify([s,[np, [n,[Subject]]], [vp,[v,[Verb]],[np,[n,[Object]]]]],Result) :- Result = [Subject,Verb,Object].
Thanks Mathee
the simpler way to do is by means a visit of the tree, 'hardcoded' on the symbols you are interested.
Here is a more generic utility, that uses (=..)/2 to capture a named part of the tree:
part_of(T, S, R) :- T =.. [F|As],
( F = S,
R = T
; member(N, As),
part_of(N, S, R)
).
?- part_of(s(np(det(the), n(man)), vp(v(went), np(n(home)))),np,P).
P = np(det(the), n(man)) ;
P = np(n(home)) ;
false.
It's a kind of member/2, just for trees. BTW I don't understand the first part of your question: why do you want to use phrase/2 on a syntax tree ? Usually a grammar (the first argument to phrase/2) is meant to build a syntax tree from 'raw' characters stream...
I'm having trouble working out how to use any of the functions in the Text.Parsec.Indent module provided by the indents package for Haskell, which is a sort of add-on for Parsec.
What do all these functions do? How are they to be used?
I can understand the brief Haddock description of withBlock, and I've found examples of how to use withBlock, runIndent and the IndentParser type here, here and here. I can also understand the documentation for the four parsers indentBrackets and friends. But many things are still confusing me.
In particular:
What is the difference between withBlock f a p and
do aa <- a
pp <- block p
return f aa pp
Likewise, what's the difference between withBlock' a p and do {a; block p}
In the family of functions indented and friends, what is ‘the level of the reference’? That is, what is ‘the reference’?
Again, with the functions indented and friends, how are they to be used? With the exception of withPos, it looks like they take no arguments and are all of type IParser () (IParser defined like this or this) so I'm guessing that all they can do is to produce an error or not and that they should appear in a do block, but I can't figure out the details.
I did at least find some examples on the usage of withPos in the source code, so I can probably figure that out if I stare at it for long enough.
<+/> comes with the helpful description “<+/> is to indentation sensitive parsers what ap is to monads” which is great if you want to spend several sessions trying to wrap your head around ap and then work out how that's analogous to a parser. The other three combinators are then defined with reference to <+/>, making the whole group unapproachable to a newcomer.
Do I need to use these? Can I just ignore them and use do instead?
The ordinary lexeme combinator and whiteSpace parser from Parsec will happily consume newlines in the middle of a multi-token construct without complaining. But in an indentation-style language, sometimes you want to stop parsing a lexical construct or throw an error if a line is broken and the next line is indented less than it should be. How do I go about doing this in Parsec?
In the language I am trying to parse, ideally the rules for when a lexical structure is allowed to continue on to the next line should depend on what tokens appear at the end of the first line or the beginning of the subsequent line. Is there an easy way to achieve this in Parsec? (If it is difficult then it is not something which I need to concern myself with at this time.)
So, the first hint is to take a look at IndentParser
type IndentParser s u a = ParsecT s u (State SourcePos) a
I.e. it's a ParsecT keeping an extra close watch on SourcePos, an abstract container which can be used to access, among other things, the current column number. So, it's probably storing the current "level of indentation" in SourcePos. That'd be my initial guess as to what "level of reference" means.
In short, indents gives you a new kind of Parsec which is context sensitive—in particular, sensitive to the current indentation. I'll answer your questions out of order.
(2) The "level of reference" is the "belief" referred in the current parser context state of where this indentation level starts. To be more clear, let me give some test cases on (3).
(3) In order to start experimenting with these functions, we'll build a little test runner. It'll run the parser with a string that we give it and then unwrap the inner State part using an initialPos which we get to modify. In code
import Text.Parsec
import Text.Parsec.Pos
import Text.Parsec.Indent
import Control.Monad.State
testParse :: (SourcePos -> SourcePos)
-> IndentParser String () a
-> String -> Either ParseError a
testParse f p src = fst $ flip runState (f $ initialPos "") $ runParserT p () "" src
(Note that this is almost runIndent, except I gave a backdoor to modify the initialPos.)
Now we can take a look at indented. By examining the source, I can tell it does two things. First, it'll fail if the current SourcePos column number is less-than-or-equal-to the "level of reference" stored in the SourcePos stored in the State. Second, it somewhat mysteriously updates the State SourcePos's line counter (not column counter) to be current.
Only the first behavior is important, to my understanding. We can see the difference here.
>>> testParse id indented ""
Left (line 1, column 1): not indented
>>> testParse id (spaces >> indented) " "
Right ()
>>> testParse id (many (char 'x') >> indented) "xxxx"
Right ()
So, in order to have indented succeed, we need to have consumed enough whitespace (or anything else!) to push our column position out past the "reference" column position. Otherwise, it'll fail saying "not indented". Similar behavior exists for the next three functions: same fails unless the current position and reference position are on the same line, sameOrIndented fails if the current column is strictly less than the reference column, unless they are on the same line, and checkIndent fails unless the current and reference columns match.
withPos is slightly different. It's not just a IndentParser, it's an IndentParser-combinator—it transforms the input IndentParser into one that thinks the "reference column" (the SourcePos in the State) is exactly where it was when we called withPos.
This gives us another hint, btw. It lets us know we have the power to change the reference column.
(1) So now let's take a look at how block and withBlock work using our new, lower level reference column operators. withBlock is implemented in terms of block, so we'll start with block.
-- simplified from the actual source
block p = withPos $ many1 (checkIndent >> p)
So, block resets the "reference column" to be whatever the current column is and then consumes at least 1 parses from p so long as each one is indented identically as this newly set "reference column". Now we can take a look at withBlock
withBlock f a p = withPos $ do
r1 <- a
r2 <- option [] (indented >> block p)
return (f r1 r2)
So, it resets the "reference column" to the current column, parses a single a parse, tries to parse an indented block of ps, then combines the results using f. Your implementation is almost correct, except that you need to use withPos to choose the correct "reference column".
Then, once you have withBlock, withBlock' = withBlock (\_ bs -> bs).
(5) So, indented and friends are exactly the tools to doing this: they'll cause a parse to immediately fail if it's indented incorrectly with respect to the "reference position" chosen by withPos.
(4) Yes, don't worry about these guys until you learn how to use Applicative style parsing in base Parsec. It's often a much cleaner, faster, simpler way of specifying parses. Sometimes they're even more powerful, but if you understand Monads then they're almost always completely equivalent.
(6) And this is the crux. The tools mentioned so far can only do indentation failure if you can describe your intended indentation using withPos. Quickly, I don't think it's possible to specify withPos based on the success or failure of other parses... so you'll have to go another level deeper. Fortunately, the mechanism that makes IndentParsers work is obvious—it's just an inner State monad containing SourcePos. You can use lift :: MonadTrans t => m a -> t m a to manipulate this inner state and set the "reference column" however you like.
Cheers!
I've been playing around with the splitting of atoms and have a problem with strings. The input data will always be an atom that consists of some letters and then some numbers, for instance ms444, r64 or min1. Since the function lists:splitwith/2 takes a list the atom is first converted into a list:
24> lists:splitwith(fun (C) -> is_atom(C) end, [m,s,4,4,4]).
{[m,s],[4,4,4]}
25> lists:splitwith(fun (C) -> is_atom(C) end, atom_to_list(ms444)).
{[],"ms444"}
26> atom_to_list(ms444).
"ms444"
I want to separate the letters from the numbers and I've succeeded in doing that when using a list, but since I start out with an atom I get a "string" as result to put into my splitwith function...
Is it interpreting each item in the list as a string or what is going on?
You might want to have a look at the string module documentation:
http://www.erlang.org/doc/man/string.html
The following function might interest you:
tokens(String, SeparatorList) -> Tokens
Since strings in Erlang are just a list() of integer() the test in the fun will be made if the item is an atom() when it is in fact an integer(). If the test is changed to look for letters it works:
29> lists:splitwith(fun (C) -> (C >= $a) and (C =< $Z) end, atom_to_list(ms444)).
{"ms","444"}
An atom in erlang is a named constant and not a variable (or not like a variable is in an imperative language).
You should really not create atoms in dynamic fashion (that is, don't convert things to atoms at runtime)
They are used more in pattern matching and send recive code.
Pid ! {matchthis, X}
recive
{foobar,Y} -> doY(Y);
{matchthis,X} -> doX(X);
Other -> doother(Other)
end
A variable, like X could be set to an atom. For example X=if 1==1 -> ok; true -> fail end. I could suffer from poor imagination but I can't think of a way why you would like to parse atom. You should be in charge of what atoms you write and not use list_to_atom(CharIntegerList).
Can you perhaps give a more overview of what you like to accomplish?
A "string" in Erlang is not a primitive type: it is just a list() of integers(). So if you want to "separate" the letters from the digits, you'll have to do comparison with the integer representation of the characters.