Prolog the cheapest path - path

I have to do a Prolog app which have to find the cheapest path, the fastest path, the cheapest with max time arrival, the fastest with max price. I write a little but I do not know how to find the cheapest ones, it won't work.
%train/plane(name,from,to,departure,arrival,price).
train(p1,gdansk, sopot, odjazd(10:15), przyjazd(10:30), cena(5)).
train(p2,sopot, gdynia, odjazd(11:00), przyjazd(11:30), cena(5)).
train(p3,sopot, gdynia, odjazd(11:15), przyjazd(11:45), cena(5)).
plane(s1,gdansk, warszawa, odlot(16:00), przylot(17:15), cena(300)).
plane(s2,gdansk, wroclaw, odlot(14:00), przylot(15:30), cena(300)).
plane(s3,gdansk, poznan, odlot(18:00), przylot(19:30), cena(200)).
path(From, To, Path) :-
path(From, To, [], Path).
path(From, From, _, [From]).
path(From, To, Visited, [From|Nodes]) :-
\+ member(From, Visited),
dif(From, To),
(train(_, From,Node,_,_,_);
plane(_, From,Node,_,_,_)
),
path(Node, To, [From|Visited], Nodes).
path_cost(P, C) :- aggregate_all(sum(Cost), member(train(_,_,_,_,_,Cost), P), C).
path_min_cost(From, To, Min, Path) :- aggregate(min(C,P),
(path(From,To,P), path_cost(P,C)),
min(Min,Path)).

Related

Erlang: variable is unbound

Why is the following saying variable unbound?
9> {<<A:Length/binary, Rest/binary>>, Length} = {<<1,2,3,4,5>>, 3}.
* 1: variable 'Length' is unbound
It's pretty clear that Length should be 3.
I am trying to have a function with similar pattern matching, ie.:
parse(<<Body:Length/binary, Rest/binary>>, Length) ->
But if fails with the same reason. How can I achieve the pattern matching I want?
What I am really trying to achieve is parse in incoming tcp stream packets as LTV(Length, Type, Value).
At some point after I parse the the Length and the Type, I want to ready only up to Length number of bytes as the value, as the rest will probably be for the next LTV.
So my parse_value function is like this:
parse_value(Value0, Left, Callback = {Module, Function},
{length, Length, type, Type, value, Value1}) when byte_size(Value0) >= Left ->
<<Value2:Left/binary, Rest/binary>> = Value0,
Module:Function({length, Length, type, Type, value, lists:reverse([Value2 | Value1])}),
if
Rest =:= <<>> ->
{?MODULE, parse, {}};
true ->
parse(Rest, Callback, {})
end;
parse_value(Value0, Left, _, {length, Length, type, Type, value, Value1}) ->
{?MODULE, parse_value, Left - byte_size(Value0), {length, Length, type, Type, value, [Value0 | Value1]}}.
If I could do the pattern matching, I could break it up to something more pleasant to the eye.
The rules for pattern matching are that if a variable X occurs in two subpatterns, as in {X, X}, or {X, [X]}, or similar, then they have to have the same value in both positions, but the matching of each subpattern is still done in the same input environment - bindings from one side do not carry over to the other. The equality check is conceptually done afterwards, as if you had matched on {X, X2} and added a guard X =:= X2. This means that your Length field in the tuple cannot be used as input to the binary pattern, not even if you make it the leftmost element.
However, within a binary pattern, variables bound in a field can be used in other fields following it, left-to-right. Therefore, the following works (using a leading 32-bit size field in the binary):
1> <<Length:32, A:Length/binary, Rest/binary>> = <<0,0,0,3,1,2,3,4,5>>.
<<0,0,0,3,1,2,3,4,5>>
2> A.
<<1,2,3>>
3> Rest.
<<4,5>>
I've run into this before. There is some weirdness between what is happening inside binary syntax and what happens during unification (matching). I suspect that it is just that binary syntax and matching occur at different times in the VM somewhere (we don't know which Length is failing to get assigned -- maybe binary matching is always first in evaluation, so Length is still meaningless). I was once going to dig in and find out, but then I realized that I never really needed to solve this problem -- which might be why it was never "solved".
Fortunately, this won't stop you with whatever you are doing.
Unfortunately, we can't really help further unless you explain the context in which you think this kind of a match is a good idea (you are having an X-Y problem).
In binary parsing you can always force the situation to be one of the following:
Have a fixed-sized header at the beginning of the binary message that tells you the next size element you need (and from there that can continue as a chain of associations endlessly)
Inspect the binary once on entry to determine the size you are looking for, pull that one value, and then begin the real parsing task
Have a set of fields, all of predetermined sizes that conform to some a binary schema standard
Convert the binary to a list and iterate through it with any arbitrary amount of look-ahead and backtracking you might need
Quick Solution
Without knowing anything else about your general problem, a typical solution would look like:
parse(Length, Bin) ->
<<Body:Length/binary, Rest/binary>> = Bin,
ok = do_something(Body),
do_other_stuff(Rest).
But I smell something funky here.
Having things like this in your code is almost always a sign that a more fundamental aspect of the code structure is not in agreement with the data that you are handling.
But deadlines.
Erlang is all about practical code that satisfies your goals in the real world. With that in mind, I suggest that you do something like the above for now, and then return to this problem domain and rethink it. Then refactor it. This will gain you three benefits:
Something will work right away.
You will later learn something fundamental about parsing in general.
Your code will almost certainly run faster if it fits your data better.
Example
Here is an example in the shell:
1> Parse =
1> fun
1> (Length, Bin) when Length =< byte_size(Bin) ->
1> <<Body:Length/binary, Rest/binary>> = Bin,
1> ok = io:format("Chopped off ~p bytes: ~p~n", [Length, Body]),
1> Rest;
1> (Length, Bin) ->
1> ok = io:format("Binary shorter than ~p~n", [Length]),
1> Bin
1> end.
#Fun<erl_eval.12.87737649>
2> Parse(3, <<1,2,3,4,5>>).
Chopped off 3 bytes: <<1,2,3>>
<<4,5>>
3> Parse(8, <<1,2,3,4,5>>).
Binary shorter than 8
<<1,2,3,4,5>>
Note that this version is a little safer, in that we avoid a crash in the case that Length is longer than the binary. This is yet another good reason why maybe we can't do that match in the function head.
Try with below code:
{<<A:Length/binary, Rest/binary>>, _} = {_, Length} = {<<1,2,3,4,5>>, 3}.
This question is mentioned a bit in EEP-52:
Any variables used in the expression must have been previously bound, or become bound in the same binary pattern as the expression. That is, the following example is illegal:
illegal_example2(N, <<X:N,T/binary>>) ->
{X,T}.
And explained a bit more in the following e-mail: http://erlang.org/pipermail/eeps/2020-January/000636.html
Illegal. With one exception, matching is not done in a left-to-right
order, but all variables in the pattern will be bound at the same
time. That means that the variables must be bound before the match
starts. For maps, that means that the variables referenced in key
expressions must be bound before the case (or receive) that matches
the map. In a function head, all map keys must be literals.
The exception to this general rule is that within a binary pattern,
the segments are matched from left to right, and a variable bound in a
previous segment can be used in the size expression for a segment
later in the binary pattern.
Also one of the members of OTP team mentioned that they made a prototype that can do that, but it was never finished http://erlang.org/pipermail/erlang-questions/2020-May/099538.html
We actually tried to make your example legal. The transformation of
the code that we did was not to rewrite to guards, but to match
arguments or parts of argument in the right order so that variables
that input variables would be bound before being used. (We would do a
topological sort to find the correct order.) For your example, the
transformation would look similar to this:
legal_example(Key, Map) ->
case Map of
#{Key := Value} -> Value;
_ -> error(function_clause, [Key, Map])
end.
In the prototype implementation, the compiler could compile the
following example:
convoluted(Ref,
#{ node(Ref) := NodeId, Loop := universal_answer},
[{NodeId, Size} | T],
<<Int:(Size*8+length(T)),Loop>>) when is_reference(Ref) ->
Int.
Things started to fall apart when variables are repeated. Repeated
variables in patterns already have a meaning in Erlang (they should be
the same), so it become tricky to understand to distinguish between
variables being bound or variables being used a binary size or map
key. Here is an example that the prototype couldn't handle:
foo(#{K := K}, K) -> ok.
A human can see that it should be transformed similar to this:
foo(Map, K) -> case Map of
{K := V} when K =:= V -> ok end.
Here are few other examples that should work but the prototype would
refuse to compile (often emitting an incomprehensible error message):
bin2(<<Sz:8,X:Sz>>, <<Y:Sz>>) -> {X,Y}.
repeated_vars(#{K := #{K := K}}, K) -> K.
match_map_bs(#{K1 := {bin,<<Int:Sz>>}, K2 := <<Sz:8>>}, {K1,K2}) ->
Int.
Another problem was when example was correctly rejected, the error
message would be confusing.
Because much more work would clearly be needed, we have shelved the
idea for now. Personally, I am not sure that the idea is sound in the
first place. But I am sure of one thing: the implementation would be
very complicated.
UPD: latest news from 2020-05-14

Pathfinding in Prolog

I'm trying to teach myself Prolog. Below, I've written some code that I think should return all paths between nodes in an undirected graph... but it doesn't. I'm trying to understand why this particular code doesn't work (which I think differentiates this question from similar Prolog pathfinding posts). I'm running this in SWI-Prolog. Any clues?
% Define a directed graph (nodes may or may not be "room"s; edges are encoded by "leads_to" predicates).
room(kitchen).
room(living_room).
room(den).
room(stairs).
room(hall).
room(bathroom).
room(bedroom1).
room(bedroom2).
room(bedroom3).
room(studio).
leads_to(kitchen, living_room).
leads_to(living_room, stairs).
leads_to(living_room, den).
leads_to(stairs, hall).
leads_to(hall, bedroom1).
leads_to(hall, bedroom2).
leads_to(hall, bedroom3).
leads_to(hall, studio).
leads_to(living_room, outside). % Note "outside" is the only node that is not a "room"
leads_to(kitchen, outside).
% Define the indirection of the graph. This is what we'll work with.
neighbor(A,B) :- leads_to(A, B).
neighbor(A,B) :- leads_to(B, A).
Iff A --> B --> C --> D is a loop-free path, then
path(A, D, [B, C])
should be true. I.e., the third argument contains the intermediate nodes.
% Base Rule (R0)
path(X,Y,[]) :- neighbor(X,Y).
% Inductive Rule (R1)
path(X,Y,[Z|P]) :- not(X == Y), neighbor(X,Z), not(member(Z, P)), path(Z,Y,P).
Yet,
?- path(bedroom1, stairs, P).
is false. Why? Shouldn't we get a match to R1 with
X = bedroom1
Y = stairs
Z = hall
P = []
since,
?- neighbor(bedroom1, hall).
true.
?- not(member(hall, [])).
true.
?- path(hall, stairs, []).
true .
?
In fact, if I evaluate
?- path(A, B, P).
I get only the length-1 solutions.
Welcome to Prolog! The problem, essentially, is that when you get to not(member(Z, P)) in R1, P is still a pure variable, because the evaluation hasn't gotten to path(Z, Y, P) to define it yet. One of the surprising yet inspiring things about Prolog is that member(Ground, Var) will generate lists that contain Ground and unify them with Var:
?- member(a, X).
X = [a|_G890] ;
X = [_G889, a|_G893] ;
X = [_G889, _G892, a|_G896] .
This has the confusing side-effect that checking for a value in an uninstantiated list will always succeed, which is why not(member(Z, P)) will always fail, causing R1 to always fail. The fact that you get all the R0 solutions and none of the R1 solutions is a clue that something in R1 is causing it to always fail. After all, we know R0 works.
If you swap these two goals, you'll get the first result you want:
path(X,Y,[Z|P]) :- not(X == Y), neighbor(X,Z), path(Z,Y,P), not(member(Z, P)).
?- path(bedroom1, stairs, P).
P = [hall]
If you ask for another solution, you'll get a stack overflow. This is because after the change we're happily generating solutions with cycles as quickly as possible with path(Z,Y,P), only to discard them post-facto with not(member(Z, P)). (Incidentally, for a slight efficiency gain we can switch to memberchk/2 instead of member/2. Of course doing the wrong thing faster isn't much help. :)
I'd be inclined to convert this to a breadth-first search, which in Prolog would imply adding an "open set" argument to contain solutions you haven't tried yet, and at each node first trying something in the open set and then adding that node's possibilities to the end of the open set. When the open set is extinguished, you've tried every node you could get to. For some path finding problems it's a better solution than depth first search anyway. Another thing you could try is separating the path into a visited and future component, and only checking the visited component. As long as you aren't generating a cycle in the current step, you can be assured you aren't generating one at all, there's no need to worry about future steps.
The way you worded the question leads me to believe you don't want a complete solution, just a hint, so I think this is all you need. Let me know if that's not right.

Pattern matching on abstract forms

Disclaimer: I kept this because some things may be useful to others, however, it does not solve what I had initially tried to do.
Right now, I'm trying to solve the following:
Given something like {a, B, {c, D}} I want to scan through Erlang forms given to parse_transform/2 and find each use of the send operator (!). Then I want to check the message being sent and determine whether it would fit the pattern {a, B, {c, D}}.
Therefore, consider finding the following form:
{op,17,'!',
{var,17,'Pid'},
{tuple,17,[{atom,17,a},{integer,17,5},{var,17,'SomeVar'}]}}]}]}
Since the message being sent is:
{tuple,17,[{atom,17,a},{integer,17,5},{var,17,'SomeVar'}]}
which is an encoding of {a, 5, SomeVar}, this would match the original pattern of {a, B, {c, D}}.
I'm not exactly sure how I'm going to go about this but do you know of any API functions which could help?
Turning the given {a, B, {c, D}} into a form is possible by first substituting the variables with something, e.g. strings (and taking a note of this), else they'll be unbound, and then using:
> erl_syntax:revert(erl_syntax:abstract({a, "B", {c, "D"}})).
{tuple,0,
[{atom,0,a},
{string,0,"B"},
{tuple,0,[{atom,0,c},{string,0,"D"}]}]}
I was thinking that after getting them in the same format like this, I could analyze them together:
> erl_syntax:type({tuple,0,[{atom,0,a},{string,0,"B"},{tuple,0,[{atom,0,c},string,0,"D"}]}]}).
tuple
%% check whether send argument is also a tuple.
%% then, since it's a tuple, use erl_syntax:tuple_elements/1 and keep comparing in this way, matching anything when you come across a string which was a variable...
I think I'll end up missing something out (and for example recognizing some things but not others ... even though they should have matched).
Are there any API functions which I could use to ease this task? And as for a pattern match test operator or something along those lines, that does not exist right? (i.e. only suggested here: http://erlang.org/pipermail/erlang-questions/2007-December/031449.html).
Edit: (Explaining things from the beginning this time)
Using erl_types as Daniel suggests below is probably doable if you play around with the erl_type() returned by t_from_term/1 i.e. t_from_term/1 takes a term with no free variables so you'd have to stay changing something like {a, B, {c, D}} into {a, '_', {c, '_'}} (i.e. fill the variables), use t_from_term/1 and then go through the returned data structure and change the '_' atoms to variables using the module's t_var/1 or something.
Before explaining how I ended up going about it, let me state the problem a bit better.
Problem
I'm working on a pet project (ErlAOP extension) which I'll be hosting on SourceForge when ready. Basically, another project already exists (ErlAOP) through which one can inject code before/after/around/etc... function calls (see doc if interested).
I wanted to extend this to support injection of code at the send/receive level (because of another project). I've already done this but before hosting the project, I'd like to make some improvements.
Currently, my implementation simply finds each use of the send operator or receive expression and injects a function before/after/around (receive expressions have a little gotcha because of tail recursion). Let's call this function dmfun (dynamic match function).
The user will be specifying that when a message of the form e.g. {a, B, {c, D}} is being sent, then the function do_something/1 should be evaluated before the sending takes place. Therefore, the current implementation injects dmfun before each use of the send op in the source code. Dmfun would then have something like:
case Arg of
{a, B, {c, D}} -> do_something(Arg);
_ -> continue
end
where Arg can simply be passed to dmfun/1 because you have access to the forms generated from the source code.
So the problem is that any send operator will have dmfun/1 injected before it (and the send op's message passed as a parameter). But when sending messages like 50, {a, b}, [6, 4, 3] etc... these messages will certainly not match {a, B, {c, D}}, so injecting dmfun/1 at sends with these messages is a waste.
I want to be able to pick out plausible send operations like e.g. Pid ! {a, 5, SomeVar}, or Pid ! {a, X, SomeVar}. In both of these cases, it makes sense to inject dmfun/1 because if at runtime, SomeVar = {c, 50}, then the user supplied do_something/1 should be evaluated (but if SomeVar = 50, then it should not, because we're interested in {a, B, {c, D}} and 50 does not match {c, D}).
I wrote the following prematurely. It doesn't solve the problem I had. I ended up not including this feature. I left the explanation anyway, but if it were up to me, I'd delete this post entirely... I was still experimenting and I don't think what there is here will be of any use to anyone.
Before the explanation, let:
msg_format = the user supplied message format which will determine which messages being sent/received are interesting (e.g. {a, B, {c, D}}).
msg = the actual message being sent in the source code (e.g. Pid ! {a, X, Y}).
I gave the explanation below in a previous edit, but later found out that it wouldn't match some things it should. E.g. when msg_format = {a, B, {c, D}}, msg = {a, 5, SomeVar} wouldn't match when it should (by "match" I mean that dmfun/1 should be injected.
Let's call the "algorithm" outlined below Alg. The approach I took was to execute Alg(msg_format, msg) and Alg(msg, msg_format). The explanation below only goes through one of these. By repeating the same thing only getting a different matching function (matching_fun(msg_format) instead of matching_fun(msg)), and injecting dmfun/1 only if at least one of Alg(msg_format, msg) or Alg(msg, msg_format) returns true, then the result should be the injection of dmfun/1 where the desired message can actually be generated at runtime.
Take the message form you find in the [Forms] given to parse_transform/2 e.g. lets say you find: {op,24,'!',{var,24,'Pid'},{tuple,24,[{atom,24,a},{var,24,'B'},{var,24,'C'}]}}
So you would take {tuple,24,[{atom,24,a},{var,24,'B'},{var,24,'C'}]} which is the message being sent. (bind to Msg).
Do fill_vars(Msg) where:
-define(VARIABLE_FILLER, "_").
-spec fill_vars(erl_parse:abstract_form()) -> erl_parse:abstract_form().
%% #doc This function takes an abstract_form() and replaces all {var, LineNum, Variable} forms with
%% {string, LineNum, ?VARIABLE_FILLER}.
fill_vars(Form) ->
erl_syntax:revert(
erl_syntax_lib:map(
fun(DeltaTree) ->
case erl_syntax:type(DeltaTree) of
variable ->
erl_syntax:string(?VARIABLE_FILLER);
_ ->
DeltaTree
end
end,
Form)).
Do form_to_term/1 on 2's output, where:
form_to_term(Form) -> element(2, erl_eval:exprs([Form], [])).
Do term_to_str/1 on 3's output, where:
-define(inject_str(FormatStr, TermList), lists:flatten(io_lib:format(FormatStr, TermList))).
term_to_str(Term) -> ?inject_str("~p", [Term]).
Do gsub(v(4), "\"_\"", "_"), where v(4) is 4's output and gsub is: (taken from here)
gsub(Str,Old,New) -> RegExp = "\\Q"++Old++"\\E", re:replace(Str,RegExp,New,[global, multiline, {return, list}]).
Bind a variable (e.g. M) to matching_fun(v(5)), where:
matching_fun(StrPattern) ->
form_to_term(
str_to_form(
?inject_str(
"fun(MsgFormat) ->
case MsgFormat of
~s ->
true;
_ ->
false
end
end.", [StrPattern])
)
).
str_to_form(MsgFStr) ->
{_, Tokens, _} = erl_scan:string(end_with_period(MsgFStr)),
{_, Exprs} = erl_parse:parse_exprs(Tokens),
hd(Exprs).
end_with_period(String) ->
case lists:last(String) of
$. -> String;
_ -> String ++ "."
end.
Finally, take the user supplied message format (which is given as a string), e.g. MsgFormat = "{a, B, {c, D}}", and do: MsgFormatTerm = form_to_term(fill_vars(str_to_form(MsgFormat))). Then you can M(MsgFormatTerm).
e.g. with user supplied message format = {a, B, {c, D}}, and Pid ! {a, B, C} found in code:
2> weaver_ext:fill_vars({tuple,24,[{atom,24,a},{var,24,'B'},{var,24,'C'}]}).
{tuple,24,[{atom,24,a},{string,0,"_"},{string,0,"_"}]}
3> weaver_ext:form_to_term(v(2)).
{a,"_","_"}
4> weaver_ext:term_to_str(v(3)).
"{a,\"_\",\"_\"}"
5> weaver_ext:gsub(v(4), "\"_\"", "_").
"{a,_,_}"
6> M = weaver_ext:matching_fun(v(5)).
#Fun<erl_eval.6.13229925>
7> MsgFormatTerm = weaver_ext:form_to_term(weaver_ext:fill_vars(weaver_ext:str_to_form("{a, B, {c, D}}"))).
{a,"_",{c,"_"}}
8> M(MsgFormatTerm).
true
9> M({a, 10, 20}).
true
10> M({b, "_", 20}).
false
There is functionality for this in erl_types (HiPE).
I'm not sure you have the data in the right form for using this module though. I seem to remember that it takes Erlang terms as input. If you figure out the form issue you should be able to do most what you need with erl_types:t_from_term/1 and erl_types:t_is_subtype/2.
It was a long time ago that I last used these and I only ever did my testing runtime, as opposed to compile time. If you want to take a peek at usage pattern from my old code (not working any more) you can find it available at github.
I don't think this is possible at compile time in the general case. Consider:
send_msg(Pid, Msg) ->
Pid ! Msg.
Msg will look like a a var, which is a completely opaque type. You can't tell if it is a tuple or a list or an atom, since anyone could call this function with anything supplied for Msg.
This would be much easier to do at run time instead. Every time you use the ! operator, you'll need to call a wrapper function instead, which tries to match the message you are trying to send, and executes additional processing if the pattern is matched.

prolog list path and avoid certain route

I have a list of data
city(portsmouth,london).
city(london,bristol).
city(portsmouth,plymouth).
city(plymouth,london).
city(london,plymouth).
city(london,birmingham).
city(birmingham,bristol).
I'm using a method which is
?-op(150,xfy,pathto).
?-op(150,xfy,avoid).
X pathto Y:- city(X,Y).
and not so sure can be use like
X pathto Y avoid Z:-
findall(Path,avoid_country(X,Y,Z,Path),Paths),write(Paths),nl.
avoid_path(Start, End, Avoid,[]) :-
country(Start, End).
avoid_path(Start,End,Avoid,[Path|Result]):-
city(Start,Path),
Path\== Avoid,
avoid_path(Path, End,Avoid, Result).
it actually works perfectly without the avoids thing as well as the Path\== Avoid,
the error result is
| ?- portsmouth to bristol avoid birmingham.
Error 1, Backtrack Stack Full, Trying city/2
it should be [[london],[plymouth,london]].
Ok, so : first you got a loop in your facts : (london, plymouth) and (plymouth, london). That means that any attempt of backtracking will never end.
Then I'm not sure that you can use 2 operators this way, but since I'm not sure, other people will be more insightful on the matter :)
I took this convention : portsmouth to bristol-[london, birmingham] means from portsmouth to bristol avoiding london and birmingham (I took this convention not to manage the operators question), here is a working code that keeps track of visited cities to avoid infinite possibilities :
city(portsmouth,london).
city(london,bristol).
city(portsmouth,plymouth).
city(plymouth,london).
city(london,plymouth).
city(london,birmingham).
city(birmingham,bristol).
:- op(150, xfy, to).
Start to End-Avoid :-
findall(Waypoint, get_waypoints(Start, End, [Start], Avoid, Waypoint), Waypoints),
!,
write(Waypoints).
Start to End :-
findall(Waypoint, get_waypoints(Start, End, [Start], [], Waypoint), Waypoints),
write(Waypoints).
get_waypoints(Start, End, _Visited, _Avoid, []) :-
city(Start, End).
get_waypoints(Start, End, Visited, Avoid, [Waypoint|Result]) :-
city(Start, Waypoint),
don't go through cities to avoid...
\+ member(Waypoint, Avoid),
this check allows us not to fall into loops. This way, backtracking ends.
\+ member(Waypoint, Visited),
get_waypoints(Waypoint, End, [Waypoint|Visited], Avoid, Result).
You facts contain those two:
city(plymouth,london).
city(london,plymouth).
and your code (which contains some inconsistencies) does not check whether a city has already been visited. This will make findall/3 collect all possible routes [london],[plymouth,london],[plymouth,london,plymouth,london],... , which will sooner or later lead to memory exhaustion.

Use LaTeX Listings to correctly detect and syntax highlight embedded code of a different language in a script

I have scripts that have one-liners or sort scripts from other languages within them. How can I have LaTeX listings detect this and change the syntax formating language within the script? This would be especially useful for awk within bash I believe.
Bash
#!/bin/bash
echo "hello world"
R --vanilla << EOF
# Data on motor octane ratings for various gasoline blends
x <- c(88.5,87.7,83.4,86.7,87.5,91.5,88.6,100.3,
95.6,93.3,94.7,91.1,91.0,94.2,87.5,89.9,
88.3,87.6,84.3,86.7,88.2,90.8,88.3,98.8,
94.2,92.7,93.2,91.0,90.3,93.4,88.5,90.1,
89.2,88.3,85.3,87.9,88.6,90.9,89.0,96.1,
93.3,91.8,92.3,90.4,90.1,93.0,88.7,89.9,
89.8,89.6,87.4,88.9,91.2,89.3,94.4,92.7,
91.8,91.6,90.4,91.1,92.6,89.8,90.6,91.1,
90.4,89.3,89.7,90.3,91.6,90.5,93.7,92.7,
92.2,92.2,91.2,91.0,92.2,90.0,90.7)
x
length(x)
mean(x);var(x)
stem(x)
EOF
perl -n -e '
#t = split(/\t/);
%t2 = map { $_ => 1 } split(/,/,$t[1]);
$t[1] = join(",",keys %t2);
print join("\t",#t); ' knownGeneFromUCSC.txt
awk -F'\t' '{
n = split($2, t, ","); _2 = x
split(x, _) # use delete _ if supported
for (i = 0; ++i <= n;)
_[t[i]]++ || _2 = _2 ? _2 "," t[i] : t[i]
$2 = _2
}-3' OFS='\t' infile
Python
#!/usr/local/bin/python
print "Hello World"
os.system("""
VAR=even;
sed -i "s/$VAR/odd/" testfile;
for i in `cat testfile` ;
do echo $i; done;
echo "now the tr command is removing the vowels";
cat testfile |tr 'aeiou' ' '
""")
UPDATE:
These are my current Listings settings in the preamble:
% This gives syntax highlighting in the python environment
\renewcommand{\lstlistlistingname}{Code Listings}
\renewcommand{\lstlistingname}{Code Listing}
\definecolor{gray}{gray}{0.5}
\definecolor{key}{rgb}{0,0.5,0}
\lstloadlanguages{Fortran,C++,C,[LaTeX]TeX,Python,bash,R, Perl}
\lstnewenvironment{python}[1][]{
\lstset{
language=python,
basicstyle=\ttfamily\small,
otherkeywords={1, 2, 3, 4, 5, 6, 7, 8 ,9 , 0, -, =, +, [, ], (, ), \{, \}, :, *, !},
keywordstyle=\color{blue},
stringstyle=\color{red},
showstringspaces=false,
emph={class, pass, in, for, while, if, is, elif, else, not, and, or,
def, print, exec, break, continue, return},
emphstyle=\color{black}\bfseries,
emph={[2]True, False, None, self},
emphstyle=[2]\color{key},
emph={[3]from, import, as},
emphstyle=[3]\color{blue},
upquote=true,
morecomment=[s]{"""}{"""},
commentstyle=\color{gray}\slshape,
rulesepcolor=\color{blue},#1
}
}{}
\lstnewenvironment{bash}{%
\lstset{%
language=bash,
otherkeywords={=, +, [, ], (, ), \{, \}, *},
% bash commands from:
%http://www.math.montana.edu/Rweb/Rhelp/00Index.html
emph={addgroup,adduser,alias,
ant,
apropos,apt-get,aptitude,aspell,awk,
basename,bash,bc,bg,break,builtin,bzip2,cal,case,cat,cd,cfdisk,chgrp,
chkconfig,chmod,chown,chroot,cksum,clear,cmp,comm,command,continue,
cp,cron,crontab,csplit,cut,date,dc,dd,ddrescue,declare,df,diff,diff3,
dig,dir,dircolors,dirname,dirs,dmesg,du,echo,egrep,eject,enable,env,
ethtool,eval,exec,exit,expand,expect,export,expr,false,fdformat,
fdisk,fg,fgrep,file,find,fmt,fold,for,format,free,fsck,ftp,function,
fuser,gawk,getopts,
git,
grep,groups,gzip,
gunzip,
,hash,head,help,history,hostname,
id,if,ifconfig,ifdown,ifup,import,install,
java, java6, java_cur
join,kill,killall,less,
let,ln,local,locate,logname,logout,look,lpc,lpr,lprint,lprintd,
lprintq,lprm,ls,lsof,make,man,mkdir,mkfifo,mkisofs,mknod,mmv,more,
mount,mtools,mtr,mv,
mysql,
netstat,nice,nl,nohup,notify-send,
noweb,noweave,
nslookup,op,
open,passwd,paste,pathchk,ping,pkill,popd,pr,printcap,printenv,
printf,ps,pushd,pwd,quota,quotacheck,quotactl,ram,rcp,read,
readarray,readonly,reboot,remsync,rename,renice,return,rev,rm,rmdir,
rsync,scp,screen,sdiff,sed,select,seq,set,sftp,shift,shopt,shutdown,
sleep,slocate,sort,source,split,ssh,strace,su,sudo,sum,
svn, svn2git,
symlink,sync,
tail,tar,tee,test,time,times,top,touch,tr,traceroute,trap,true,
tsort,tty,type,ulimit,umask,umount,unalias,uname,unexpand,uniq,
units,
unrar,
unset,unshar,until,useradd,usermod,users,uudecode,uuencode,
vdir,vi,vmstat,watch,wc,Wget,whereis,which,while,who,whoami,write,
zcat},
breaklines=true,
keywordstyle=\color{blue},
stringstyle=\color{red},
emphstyle=\color{black}\bfseries,
commentstyle=\color{gray}\slshape,
}
}{}
\lstnewenvironment{latexCode}[1]{\lstset{language=[latex]tex} \lstset{#1}}{}
\lstnewenvironment{Rcode}{
\lstset{%
language={R},
basicstyle=\small, % print whole listing small
keywordstyle=\color{black}, % style for keyword
% Function list from:
% http://www.math.montana.edu/Rweb/Rhelp/00Index.html
emph={abbreviate, abline,
abs, acos, acosh, all, all.names,
all.vars, anova, anova.glm, anova.lm, any,
aperm, append, apply, approx, approxfun,
apropos, Arg, args, Arithmetic, array,
arrows, as.array, as.call, as.character, as.complex,
as.data.frame, as.double, as.expression, as.factor, asin,
asinh, as.integer, as.list, as.logical, as.matrix,
as.na, as.name, as.null, as.numeric, as.ordered,
as.qr, as.real, assign, as.ts, as.vector,
atan, atan2, atanh, attach, attr,
attributes, autoload, .AutoloadEnv, axis, backsolve,
barplot, beta, binomial, box, boxplot,
boxplot.stats, break, browser, bw.bcv, bw.sj,
bw.ucv, bxp, c, .C, call,
cat, cbind, ceiling, character, charmatch,
chisq.test, chol, chol2inv, choose, class,
class<-, codes, coef, coefficients, coefficients.glm,
coefficients.lm, co.intervals, col, colnames, colors,
colours, Comparison, complete.cases, complex, Conj,
contour, contrasts, contr.helmert, contr.poly, contr.sum,
contr.treatment, convolve, cooks.distance, coplot, cor,
cos, cosh, count.fields, cov, covratio,
crossprod, cummax, cummin, cumprod, cumsum,
curve, cut, D, data, data.class,
data.entry, dataentry, data.frame, data.matrix, dbeta,
dbinom, dcauchy, dchisq, de, debug,
delay, demo, de.ncols, density, deparse,
de.restore, deriv, deriv.default, deriv.formula, de.setup,
detach, deviance, deviance.glm, deviance.lm, device,
Devices, dev.off, dexp, df, dfbetas,
dffits, df.residual, df.residual.glm, df.residual.lm, dgamma,
dgeom, dget, dhyper, diag, diff,
digamma, dim, dim<-, dimnames, dimnames<-,
dlnorm, dlogis, dnbinom, dnchisq, dnorm,
do.call, dotplot, double, dpois, dput,
drop, dt, dump, dunif, duplicated,
dweibull, dyn.load, edit, effects.glm, effects.lm,
eigen, else, emacs, end, environment,
environment<-, eval, exists, exp, expression,
Extract, factor, family, family.glm, fft,
finite, fitted, fitted.values, fitted.values.glm, fitted.values.lm,
fivenum, fix, floor, for, formals,
format, formatC, format.default, formula.default, formula.formula,
formula.terms, .Fortran, frame, frequency, function,
Gamma, gamma, gaussian, gc, gcinfo,
get, getenv, gl, glm, glm.control,
glm.fit, .GlobalEnv, graphics.off, gray, grep,
grid, gsub, hat, heat.colors, help,
hist, hsv, identify, if, ifelse,
Im, image, \%in\%, influence.measures, inherits,
integer, interactive, .Internal, inverse.gaussian, invisible,
invisible, IQR, is.array, is.atomic, is.call,
is.character, is.complex, is.data.frame, is.double, is.environment,
is.expression, is.factor, is.function, is.integer, is.language,
is.list, is.loaded, is.logical, is.matrix, is.na,
is.name, is.null, is.numeric, is.ordered, is.qr,
is.real, is.recursive, is.single, is.ts, is.unordered,
is.vector, lapply, lbeta, lchoose, legend,
length, LETTERS, letters, levels, levels<-,
lgamma, .lib.loc, .Library, library, library.dynam,
license, lines, lines.default, list, lm,
lm.fit, lm.influence, lm.wfit, load, locator,
log, log10, log2, Logic, logical,
lower.tri, lowess, ls, ls.diag, lsfit,
lsf.str, ls.print, ls.str, .Machine, Machine,
machine, macintosh, mad, match, match.arg,
match.call, matlines, mat.or.vec, matplot, matpoints,
matrix, max, mean, median, menu,
methods, min, missing, Mod, mode,
mode<-, model.frame, model.frame.default, model.matrix, model.matrix.default,
month.abb, month.name, mtext, mvfft, NA,
na.action, na.action.default, na.fail, names, na.omit,
nargs, nchar, NCOL, ncol, next,
NextMethod, nextn, nlevels, nlm, [.noquote,
noquote, NROW, nrow, NULL, numeric,
objects, on.exit, optimize, options, order,
ordered, outer, pairs, palette, par,
parse, paste, pbeta, pbinom, pcauchy,
pchisq, pentagamma, pexp, pf, pgamma,
pgeom, phyper, pi, pictex, piechart,
plnorm, plogis, plot, plot.default, plot.density,
plot.ts, plot.xy, pmatch, pmax, pmin,
pnbinom, pnchisq, pnorm, points, points.default,
poisson, polygon, polyroot, postscript, ppoints,
ppois, pretty, print, print.anova.glm, print.anova.lm,
print.data.frame, print.default, print.density, print.formula, print.glm,
print.lm, print.noquote, print.plot, print.summary.glm, print.summary.lm,
print.terms, print.ts, proc.time, prod, prompt,
prompt.default, prop.test, provide, .Provided, ps.options,
pt, punif, pweibull, q, qbeta,
qbinom, qcauchy, qchisq, qexp, qf,
qgamma, qgeom, qhyper, qlnorm, qlogis,
qnbinom, qnchisq, qnorm, qpois, qqline,
qqnorm, qqplot, qr, qr.coef, qr.fitted,
qr.Q, qr.qty, qr.qy, qr.R, qr.resid,
qr.solve, qr.X, qt, quantile, quasi,
quit, qunif, quote, qweibull, rainbow,
.Random.seed, range, rank, rbeta, rbind,
rbinom, rcauchy, rchisq, Re, readline,
read.table, real, rect, remove, rep,
repeat, replace, require, resid, residuals,
residuals.glm, residuals.lm, return, rev, rexp,
rf, rgamma, rgb, rgeom, rhyper,
RLIBS, rlnorm, rlogis, rm, rnbinom,
rnchisq, rnorm, round, row, row.names,
rownames, rpois, rstudent, rt, runif,
rweibull, sample, sapply, save, save.plot,
scale, scan, sd, segments, seq,
sequence, sign, signif, sin, sinh,
sink, solve, solve.qr, sort, source,
spline, splinefun, split, sqrt, start,
stem, stop, storage.mode, storage.mode<-, str,
str.data.frame, str.default, strheight, stripplot, strsplit,
structure, strwidth, sub, Subscript, substitute,
substr, substring, sum, summary, summary.glm,
summary.lm, svd, sweep, switch, symbol.C,
symbol.For, symnum, sys.call, sys.calls, sys.frame,
sys.frames, sys.function, sys.nframe, sys.on.exit, sys.parent,
sys.parents, system, system.date, system.time, t,
table, tabulate, tan, tanh, tapply,
tempfile, terms, terms.default, terms.formula, terms.terms,
terrain.colors, tetragamma, text, time, title,
topo.colors, trace, traceback, trigamma, trunc,
ts, tsp, t.test, typeof, unclass,
undebug, unique, uniroot, unlink, unlist,
untrace, update, update.formula, update.glm, update.lm,
upper.tri, UseMethod, var, vector, Version,
version, vi, warning, weighted.mean, weights.lm,
while, window, windows, write, x11,
xedit, xemacs, xinch, xor, xy.coords,
yinch}, % define a list of word to emphasis
stringstyle=\color{red},
emphstyle=\color{black}\bfseries, % define the way to emphase
showspaces=false, % show the space in code, or not
stringstyle=\ttfamily, % style of the string (like "hello word")
showstringspaces=false, % show the space in string, on not #1
commentstyle=\color{gray}\slshape,
tabsize=2, % sets default tabsize to 2 spaces
breaklines=true, % sets automatic line breaking
breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace
}
}{}
\lstnewenvironment{Perl}{
\lstset{%
language={perl},
basicstyle=\small, % print whole listing small
keywordstyle=\color{black}, % style for keyword
emph={% From http://www.sdsc.edu/~moreland/courses/IntroPerl/docs/manual/pod/perlfunc.html
-X, run, abs, absolute, accept, accept, alarm, schedule, atan2,
arctangent, bind, binds, binmode, prepare, bless, create, caller,
get, chdir, change, chmod, changes, chomp, remove, chop, remove,
chown, change, chr, get, chroot, make, close, close, closedir, close,
connect, connect, continue, optional, cos, cosine, crypt, one-way,
dbmclose, breaks, dbmopen, create, defined, test, delete, deletes,
die, raise, do, turn, dump, create, each, retrieve, endgrent, be,
endhostent, be, endnetent, be, endprotoent, be, endpwent, be,
endservent, be, eof, test, eval, catch, exec, abandon, exists, test,
exit, terminate, exp, raise, fcntl, file, fileno, return, flock,
lock, fork, create, format, declare, formline, internal, getc, get,
getgrent, get, getgrgid, get, getgrnam, get, gethostbyaddr, get,
gethostbyname, get, gethostent, get, getlogin, return, getnetbyaddr,
get, getnetbyname, get, getnetent, get, getpeername, find, getpgrp,
get, getppid, get, getpriority, get, getprotobyname, get,
getprotobynumber, get, getprotoent, get, getpwent, get, getpwnam,
get, getpwuid, get, getservbyname, get, getservbyport, get,
getservent, get, getsockname, retrieve, getsockopt, get, glob,
expand, gmtime, convert, goto, create, grep, locate, hex, convert,
import, patch, int, get, ioctl, system-dependent, join, join, keys,
retrieve, kill, send, last, exit, lc, return, lcfirst, return,
length, return, link, create, listen, register, local, create,
localtime, convert, log, retrieve, lstat, stat, m//, match, map,
apply, mkdir, create, msgctl, SysV, msgget, get, msgrcv, receive,
msgsnd, send, my, declare, next, iterate, no, unimport, oct, convert,
open, open, opendir, open, ord, find, pack, convert, package,
declare, pipe, open, pop, remove, pos, find, print, output, printf,
output, prototype, get, push, append, q/STRING/, singly, qq/STRING/,
doubly, quotemeta, quote, qw/STRING/, quote, qx/STRING/, backquote,
rand, retrieve, read, fixed-length, readdir, get, readlink,
determine, recv, receive, redo, start, ref, find, rename, change,
require, load, reset, clear, return, get, reverse, flip, rewinddir,
reset, rindex, right-to-left, rmdir, remove, s///, replace, scalar,
force, seek, reposition, seekdir, reposition, select, reset, semctl,
SysV, semget, get, semop, SysV, send, send, setgrent, prepare,
sethostent, prepare, setnetent, prepare, setpgrp, set, setpriority,
set, setprotoent, prepare, setpwent, prepare, setservent, prepare,
setsockopt, set, shift, remove, shmctl, SysV, shmget, get, shmread,
read, shmwrite, write, shutdown, close, sin, return, sleep, block,
socket, create, socketpair, create, sort, sort, splice, add, split,
split, sprintf, formatted, sqrt, square, srand, seed, stat, get,
study, optimize, sub, declare, substr, get, symlink, create, syscall,
execute, sysread, fixed-length, system, run, syswrite, fixed-length,
tell, get, telldir, get, tie, bind, time, return, times, return,
tr///, transliterate, truncate, shorten, uc, return, ucfirst, return,
umask, set, undef, remove, unlink, remove, unpack, convert, unshift,
prepend, untie, break, use, load, utime, set, values, return, vec,
test, wait, wait, waitpid, wait, wantarray, get, warn, print, write,
print, y///, transliterate}, % define a list of word to emphasis
stringstyle=\color{red},
emphstyle=\color{black}\bfseries, % define the way to emphase
showspaces=false, % show the space in code, or not
stringstyle=\ttfamily, % style of the string (like "hello word")
showstringspaces=false, % show the space in string, on not #1
commentstyle=\color{gray}\slshape,
tabsize=2, % sets default tabsize to 2 spaces
breaklines=true, % sets automatic line breaking
breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace
}
}{}
\lstnewenvironment{plaintext}{
\lstset{
tabsize=2, % sets default tabsize to 2 spaces
breaklines=true, % sets automatic line breaking
breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace
basicstyle=\normalfont\ttfamily,
}
}{}
It is almost certainly easier to modify the Bash/Python highlighters than to write a context-sensitive highlighter. I'm guessing that just adding the keywords to the other highlighters should give acceptable results.
Modifying Pygments doesn't look too difficult, from Pygments' Write your own lexer documentation.

Resources