Code substitution for DSL using ANTLR - parsing

The DSL I'm working on allows users to define a 'complete text substitution' variable. When parsing the code, we then need to look up the value of the variable and start parsing again from that code.
The substitution can be very simple (single constants) or entire statements or code blocks.
This is a mock grammar which I hope illustrates my point.
grammar a;
entry
: (set_variable
| print_line)*
;
set_variable
: 'SET' ID '=' STRING_CONSTANT ';'
;
print_line
: 'PRINT' ID ';'
;
STRING_CONSTANT: '\'' ('\'\'' | ~('\''))* '\'' ;
ID: [a-z][a-zA-Z0-9_]* ;
VARIABLE: '&' ID;
BLANK: [ \t\n\r]+ -> channel(HIDDEN) ;
Then the following statements executed consecutively should be valid;
SET foo = 'Hello world!';
PRINT foo;
SET bar = 'foo;'
PRINT &bar // should be interpreted as 'PRINT foo;'
SET baz = 'PRINT foo; PRINT'; // one complete statement and one incomplete statement
&baz foo; // should be interpreted as 'PRINT foo; PRINT foo;'
Any time the & variable token is discovered, we immediately switch to interpreting the value of that variable instead. As above, this can mean that you set up the code in such a way that is is invalid, full of half-statements that are only completed when the value is just right. The variables can be redefined at any point in the text.
Strictly speaking the current language definition doesn't disallow nesting &vars inside each other, but the current parsing doesn't handle this and I would not be upset if it wasn't allowed.
Currently I'm building an interpreter using a visitor, but this one I'm stuck on.
How can I build a lexer/parser/interpreter which will allow me to do this? Thanks for any help!

So I have found one solution to the issue. I think it could be better - as it potentially does a lot of array copying - but at least it works for now.
EDIT: I was wrong before, and my solution would consume ANY & that it found, including those in valid locations such as inside string constants. This seems like a better solution:
First, I extended the InputStream so that it is able to rewrite the input steam when a & is encountered. This unfortunately involves copying the array, which I can maybe resolve in the future:
MacroInputStream.java
package preprocessor;
import org.antlr.v4.runtime.ANTLRInputStream;
public class MacroInputStream extends ANTLRInputStream {
private HashMap<String, String> map;
public MacroInputStream(String s, HashMap<String, String> map) {
super(s);
this.map = map;
}
public void rewrite(int startIndex, int stopIndex, String replaceText) {
int length = stopIndex-startIndex+1;
char[] replData = replaceText.toCharArray();
if (replData.length == length) {
for (int i = 0; i < length; i++) data[startIndex+i] = replData[i];
} else {
char[] newData = new char[data.length+replData.length-length];
System.arraycopy(data, 0, newData, 0, startIndex);
System.arraycopy(replData, 0, newData, startIndex, replData.length);
System.arraycopy(data, stopIndex+1, newData, startIndex+replData.length, data.length-(stopIndex+1));
data = newData;
n = data.length;
}
}
}
Secondly, I extended the Lexer so that when a VARIABLE token is encountered, the rewrite method above is called:
MacroGrammarLexer.java
package language;
import language.DSL_GrammarLexer;
import org.antlr.v4.runtime.Token;
import java.util.HashMap;
public class MacroGrammarLexer extends MacroGrammarLexer{
private HashMap<String, String> map;
public DSL_GrammarLexerPre(MacroInputStream input, HashMap<String, String> map) {
super(input);
this.map = map;
// TODO Auto-generated constructor stub
}
private MacroInputStream getInput() {
return (MacroInputStream) _input;
}
#Override
public Token nextToken() {
Token t = super.nextToken();
if (t.getType() == VARIABLE) {
System.out.println("Encountered token " + t.getText()+" ===> rewriting!!!");
getInput().rewrite(t.getStartIndex(), t.getStopIndex(),
map.get(t.getText().substring(1)));
getInput().seek(t.getStartIndex()); // reset input stream to previous
return super.nextToken();
}
return t;
}
}
Lastly, I modified the generated parser to set the variables at the time of parsing:
DSL_GrammarParser.java
...
...
HashMap<String, String> map; // same map as before, passed as a new argument.
...
...
public final SetContext set() throws RecognitionException {
SetContext _localctx = new SetContext(_ctx, getState());
enterRule(_localctx, 130, RULE_set);
try {
enterOuterAlt(_localctx, 1);
{
String vname = null; String vval = null; // set up variables
setState(1215); match(SET);
setState(1216); vname = variable_name().getText(); // set vname
setState(1217); match(EQUALS);
setState(1218); vval = string_constant().getText(); // set vval
System.out.println("Found SET " + vname +" = " + vval+";");
map.put(vname, vval);
}
}
catch (RecognitionException re) {
_localctx.exception = re;
_errHandler.reportError(this, re);
_errHandler.recover(this, re);
}
finally {
exitRule();
}
return _localctx;
}
...
...
Unfortunately this method is final so this will make maintenance a bit more difficult, but it works for now.

The standard pattern to handling your requirements is to implement a symbol table. The simplest form is as a key:value store. In your visitor, add var declarations as encountered, and read out the values as var references are encountered.
As described, your DSL does not define a scoping requirement on the variables declared. If you do require scoped variables, then use a stack of key:value stores, pushing and popping on scope entry and exit.
See this related StackOverflow answer.
Separately, since your strings may contain commands, you can simply parse the contents as part of your initial parse. That is, expand your grammar with a rule that includes the full set of valid contents:
set_variable
: 'SET' ID '=' stringLiteral ';'
;
stringLiteral:
Quote Quote? (
( set_variable
| print_line
| VARIABLE
| ID
)
| STRING_CONSTANT // redefine without the quotes
)
Quote
;

Related

type safe create Lua tables in Haxe without runtime overhead and without boilerplate

I am trying to write some externs to some Lua libraries that require to pass dictionary tables and I want to make them type safe.
So far, I have been declaring abstract classes with public inline constructors, but this gets tedious really fast:
abstract JobOpts(Table<String, Dynamic>) {
public inline function new(command:String, args:Array<String>) {
this = Table.create(null, {
command: command,
arguments: Table.create(args)
});
}
}
Is there a better way that allows me to keep things properly typed but that does not require that much boilerplate?
Please note that typedefs and anonymous structures are not valid options, because they introduce nasty fields in the created table and also do a function execution to assign a metatable to them:
--typedef X = {cmd: String}
_hx_o({__fields__={cmd=true},cmd="Yo"})
My abstract code example compiles to a clean lua table, but it is a lot of boilerplate
Some targets support #:nativeGen to strip Haxe-specific metadata from objects, but this does not seem to be the case for typedefs on Lua target. Fortunately, Haxe has a robust macro system so you can make the code write itself. Say,
Test.hx:
import lua.Table;
class Test {
public static function main() {
var q = new JobOpts("cmd", ["a", "b"]);
Sys.println(q);
}
}
#:build(TableBuilder.build())
abstract JobOpts(Table<String, Dynamic>) {
extern public inline function new(command:String, args:Array<String>) this = throw "no macro!";
}
TableBuilder.hx:
import haxe.macro.Context;
import haxe.macro.Expr;
class TableBuilder {
public static macro function build():Array<Field> {
var fields = Context.getBuildFields();
for (field in fields) {
if (field.name != "_new") continue; // look for new()
var f = switch (field.kind) { // ... that's a function
case FFun(_f): _f;
default: continue;
}
// abstract "constructors" transform `this = val;`
// into `{ var this; this = val; return this; }`
var val = switch (f.expr.expr) {
case EBlock([_decl, macro this = $x, _ret]): x;
default: continue;
}
//
var objFields:Array<ObjectField> = [];
for (arg in f.args) {
var expr = macro $i{arg.name};
if (arg.type.match(TPath({ name: "Array", pack: [] } ))) {
// if the argument's an array, make an unwrapper for it
expr = macro lua.Table.create($expr, null);
}
objFields.push({ field: arg.name, expr: expr });
}
var objExpr:Expr = { expr: EObjectDecl(objFields), pos: Context.currentPos() };
val.expr = (macro lua.Table.create(null, $objExpr)).expr;
}
return fields;
}
}
And thus...
Test.main = function()
local this1 = ({command = "cmd", args = ({"a","b"})});
local q = this1;
_G.print(Std.string(q));
end
Do note, however, that Table.create is a bit of a risky function - you will only be able to pass in array literals, not variables containing arrays. This can be remedied by making a separate "constructor" function with the same logic but without array➜Table.create unwrapping.

In Xtext, how to tweak certain function calls

I am using Xtext 2.15 to generate a language that, among other things, processes asynchronous calls in a way they look synchronous.
For instance, the following code in my language:
int a = 1;
int b = 2;
boolean sleepSuccess = doSleep(2000); // sleep two seconds
int c = 3;
int d = 4;
would generate the following Java code:
int a = 1;
int b = 2;
doSleep(2000, new DoSleepCallback() {
public void onTrigger(boolean rc) {
boolean sleepSuccess = rc;
int c = 3;
int d = 4;
}
});
To achieve it, I defined the grammar this way:
grammar org.qedlang.qed.QED with jbase.Jbase // Jbase inherits Xbase
...
FunctionDeclaration return XExpression:
=>({FunctionDeclaration} type=JvmTypeReference name=ValidID '(')
(params+=FullJvmFormalParameter (',' params+=FullJvmFormalParameter)*)?
')' block=XBlockExpression
;
The FunctionDeclaration rule is used to define asynchronous calls. In my language library, I would have as system call:
boolean doSleep(int millis) {} // async FunctionDeclaration element stub
The underlying Java implementation would be:
public abstract class DoSleepCallback {
public abstract void onTrigger(boolean rc);
}
public void doSleep(int millis, DoSleepCallback callback) {
<perform sleep and call callback.onTrigger(<success>)>
}
So, using the inferrer, type computer and compiler, how to identify calls to FunctionDeclaration elements, add a callback parameter and process the rest of the body in an inner class?
I could, for instance, override appendFeatureCall in the language compiler, would it work? There is still a part I don't know how to do...
override appendFeatureCall(XAbstractFeatureCall call, ITreeAppendable b) {
...
val feature = call.feature
...
if (feature instanceof JvmExecutable) {
b.append('(')
val arguments = call.actualArguments
if (!arguments.isEmpty) {
...
arguments.appendArguments(b, shouldBreakFirstArgument)
// HERE IS THE PART I DON'T KNOW HOW TO DO
<IF feature IS A FunctionDeclaration>
<argument.appendArgument(NEW GENERATED CALLBACK PARAMETER)>
<INSERT REST OF XBlockExpression body INSIDE CALLBACK INSTANCE>
<ENDIF>
}
b.append(');')
}
}
So basically, how to tell if "feature" points to FunctionDeclaration? The rest, I may be able to do it...
Related to another StackOverflow entry, I had the idea of implementing FunctionDeclaration in the inferrer as a class instead of as a method:
def void inferExpressions(JvmDeclaredType it, FunctionDeclaration function) {
// now let's go over the features
for ( f : (function.block as XBlockExpression).expressions ) {
if (f instanceof FunctionDeclaration) {
members += f.toClass(f.fullyQualifiedName) [
inferVariables(f)
superTypes += typeRef(FunctionDeclarationObject)
// let's add a default constructor
members += f.toConstructor [
for (p : f.params)
parameters += p.toParameter(p.name, p.parameterType)
body = f.block
]
inferExpressions(f)
]
}
}
}
The generated class would extend FunctionDeclarationObject, so I thought there was a way to identify FunctionDeclaration as FunctionDeclarationObject subclasses. But then, I would need to extend the XFeatureCall default scoping to include classes in order to making it work...
I fully realize the question is not obvious, sorry...
Thanks,
Martin
EDIT: modified DoSleepCallback declaration from static to abstract (was erroneous)
I don't think you can generate what you need using the jvm model inferrer.
You should provide your own subclass of the XbaseCompiler (or JBaseCompiler, if any... and don't forget to register with guice in your runtime module), and override doInternalToJavaStatement(XExpression expr, ITreeAppendable it, boolean isReferenced) to manage how your FunctionDeclaration should be generated.

want to skip all line comments except two in antlr4 grammar

I want to extend the IDL.g4 grammar a bit so that I can distinguish the following two comments //#top-level false and //#top-level true, all other comments I just want to skip like before.
I have tried to add top_level, TOP_LEVEL_TRUEand TOP_LEVEL_FALSElike this, because I thought antr4 gave precedence to lexical rules comming first.
top_level
: TOP_LEVEL_TRUE
| TOP_LEVEL_FALSE
;
TOP_LEVEL_TRUE
: '//#top-level true'
;
TOP_LEVEL_FALSE
: '//#top-level false'
;
LINE_COMMENT
: '//' ~('\n'|'\r')* '\r'? '\' -> channel(HIDDEN)
;
But the listener enterTop_level(...) is never called,
all comments seems to be eaten by LINE_COMMENT. How shall I organize the lexer and parser rules?
And one more question, I also want to be notified when end of input-file is reached. How do I do that? I have tried a finalize() function i the listener class, but never get called.
Updated with a complete example:
I use this grammar file : IDL.g4 as I said above. Then I update it by putting the parser rule top_level just below the event_header rule. The Lexer rules is put just above the ID rule.
Here is my Listener.java file
class Listener extends IDLBaseListener {
#Override
public void enterTop_level(IDLParser.Top_levelContext ctx) {
System.out.println("Found top-level");
}
}
and here is a main program: IDLCheck.java
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
import java.io.FileInputStream;
import java.io.InputStream;
public class IDLCheck {
public void process(String[] args) throws Exception {
InputStream is = new FileInputStream("sample.idl");
ANTLRInputStream input = new ANTLRInputStream(is);
IDLLexer lexer = new IDLLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
IDLParser parser = new IDLParser(tokens);
parser.setBuildParseTree(true);
RuleContext tree = parser.specification();
Listener listener = new Listener();
ParseTreeWalker walker = new ParseTreeWalker();
walker.walk(listener, tree);
}
public static void main(String[] args) throws Exception {
new IDLCheck().process(args);
}
}
and a input file: sample.idl
module CommonTypes {
struct WChannel {
int w;
float d;
}; //#top-level false
struct EPlanID {
int kind;
short index;
}; //#top-level TRUE
};
I expect to see the output "Found top-level" twice, but I see nothing
Finally I found a solution. I just added newline characters to the TOP_LEVEL_FALSE and TOP_LEVEL_TRUElexer rules an I also added the top_level parser rule to the definition rule because I only expected top_level to appear after a struct or union. this is a rti.com specific extension to the IDL-format, this modification seems to be good enough for me.
definition
: type_decl SEMICOLON top_level?
| const_decl SEMICOLON
...
TOP_LEVEL_TRUE
: '//#top-level true' '\r'? '\n'
;
TOP_LEVEL_FALSE
: '//#top-level false' '\r'? '\n'
;

Values in $1, $2 .. variables always NULL

I am trying to create a parser with Bison (GNU bison 2.4.1) and flex (2.5.35) on my Ubuntu OS. I have something like this:
sql.h:
typedef struct word
{
char *val;
int length;
} WORD;
struct yword
{
struct word v;
int o;
...
};
sql1.y
%{
..
#include "sql.h"
..
%}
%union yystype
{
struct tree *t;
struct yword b;
...
}
%token <b> NAME
%%
...
table:
NAME { add_table(root, $1.v); }
;
...
Trouble is that whatever string I give to it, when it comes to resolve this, v always has values (NULL, 0) even if the input string should have some table name. (I chose to skip unnecessary other details/snippets, but can provide more if it helps resolve this.)
I wrote the grammar which is complete and correct, but I can't get it to build the parse tree due to this problem.
Any inputs would be quite appreciated.
Your trouble seems related to some missing or buggous code in the lexical analyzer.
Check your lexical analyzer first.
If it does not return the token proprely the parser part can not handle correctly the values.
Write a basic test that print the token value.
Do not mind the "c" style, above all is the principle :
main() {
int token;
while( token = yylex() ) {
switch( token) {
case NAME:
printf("name '%s'\n", yylval.b.v.val );
break;
...
}
}
}
If you run some input and that does not work.
if the lexical analyzer does not set yylval when it returns NAME, it is normal that val is empty.
If in your flex you have a pattern such as :
[a-z]+ { return NAME; }
It is incorrect you have to set the value like this
[a-z]+ {
yylval.val = strdup(yytext);
yylval.length = yylen;
return NAME; }

Method to create and store method chain at runtime

The problem I have is that I need to do about 40+ conversions to convert loosely typed info into strongly typed info stored in db, xml file, etc.
I'm plan to tag each type with a tuple i.e. a transformational form like this:
host.name.string:host.dotquad.string
which will offer a conversion from the input to an output form. For example, the name stored in the host field of type string, the input is converted into a dotquad notation of type string and stored back into host field. More complex conversions may need several steps, with each step being accomplished by a method call, hence method chaining.
Examining further the example above, the tuple 'host.name.string' with the field host of name www.domain.com. A DNS lookup is done to covert domain name to IP address. Another method is applied to change the type returned by the DNS lookup into the internal type of dotquad of type string. For this transformation, there is 4 seperate methods called to convert from one tuple into another. Some other conversions may require more steps.
Ideally I would like an small example of how method chains are constructed at runtime. Development time method chaining is relatively trivial, but would require pages and pages of code to cover all possibilites, with 40+ conversions.
One way I thought of doing is, is parsing the tuples at startup, and writing the chains out to an assembly, compiling it, then using reflection to load/access. Its would be really ugly and negate the performance increases i'm hoping to gain.
I'm using Mono, so no C# 4.0
Any help would be appreciated.
Bob.
Here is a quick and dirty solution using LINQ Expressions. You have indicated that you want C# 2.0, this is 3.5, but it does run on Mono 2.6. The method chaining is a bit hacky as i didn't exactly know how your version works, so you might need to tweak the expression code to suit.
The real magic really happens in the Chainer class, which takes a collection of strings, which represent the MethodChain subclass. Take a collection like this:
{
"string",
"string",
"int"
}
This will generate a chain like this:
new StringChain(new StringChain(new IntChain()));
Chainer.CreateChain will return a lambda that calls MethodChain.Execute(). Because Chainer.CreateChain uses a bit of reflection, it's slow, but it only needs to run once for each expression chain. The execution of the lambda is nearly as fast as calling actual code.
Hope you can fit this into your architecture.
public abstract class MethodChain {
private MethodChain[] m_methods;
private object m_Result;
public MethodChain(params MethodChain[] methods) {
m_methods = methods;
}
public MethodChain Execute(object expression) {
if(m_methods != null) {
foreach(var method in m_methods) {
expression = method.Execute(expression).GetResult<object>();
}
}
m_Result = ExecuteInternal(expression);
return this;
}
protected abstract object ExecuteInternal(object expression);
public T GetResult<T>() {
return (T)m_Result;
}
}
public class IntChain : MethodChain {
public IntChain(params MethodChain[] methods)
: base(methods) {
}
protected override object ExecuteInternal(object expression) {
return int.Parse(expression as string);
}
}
public class StringChain : MethodChain {
public StringChain(params MethodChain[] methods):base(methods) {
}
protected override object ExecuteInternal(object expression) {
return (expression as string).Trim();
}
}
public class Chainer {
/// <summary>
/// methods are executed from back to front, so methods[1] will call method[0].Execute before executing itself
/// </summary>
/// <param name="methods"></param>
/// <returns></returns>
public Func<object, MethodChain> CreateChain(IEnumerable<string> methods) {
Expression expr = null;
foreach(var methodName in methods.Reverse()) {
ConstructorInfo cInfo= null;
switch(methodName.ToLower()) {
case "string":
cInfo = typeof(StringChain).GetConstructor(new []{typeof(MethodChain[])});
break;
case "int":
cInfo = typeof(IntChain).GetConstructor(new[] { typeof(MethodChain[]) });
break;
}
if(cInfo == null)
continue;
if(expr != null)
expr = Expression.New(cInfo, Expression.NewArrayInit( typeof(MethodChain), Expression.Convert(expr, typeof(MethodChain))));
else
expr = Expression.New(cInfo, Expression.Constant(null, typeof(MethodChain[])));
}
var objParam = Expression.Parameter(typeof(object));
var methodExpr = Expression.Call(expr, typeof(MethodChain).GetMethod("Execute"), objParam);
Func<object, MethodChain> lambda = Expression.Lambda<Func<object, MethodChain>>(methodExpr, objParam).Compile();
return lambda;
}
[TestMethod]
public void ExprTest() {
Chainer chainer = new Chainer();
var lambda = chainer.CreateChain(new[] { "int", "string" });
var result = lambda(" 34 ").GetResult<int>();
Assert.AreEqual(34, result);
}
}
The command pattern would fit here. What you could do is queue up commands as you need different operations performed on the different data types. Those messages could then all be processed and call the appropriate methods when you're ready later on.
This pattern can be implemented in .NET 2.0.
Do you really need to do this at execution time? Can't you create the combination of operations using code generation?
Let me elaborate:
Assuming you have a class called Conversions which contains all the 40+ convertions you mentioned like this:
//just pseudo code..
class conversions{
string host_name(string input){}
string host_dotquad(string input){}
int type_convert(string input){}
float type_convert(string input){}
float increment_float(float input){}
}
Write a simple console app or something similar which uses reflection to generate code for methods like this:
execute_host_name(string input, Queue<string> conversionQueue)
{
string ouput = conversions.host_name(input);
if(conversionQueue.Count == 0)
return output;
switch(conversionQueue.dequeue())
{
// generate case statements only for methods that take in
// a string as parameter because the host_name method returns a string.
case "host.dotquad": return execute_host_dotquad(output,conversionQueue);
case "type.convert": return execute_type_convert(output, conversionQueue);
default: // exception...
}
}
Wrap all this in a Nice little execute method like this:
object execute(string input, string [] conversions)
{
Queue<string> conversionQueue = //create the queue..
case(conversionQueue.dequeue())
{
case "host.name": return execute_host_name(output,conversionQueue);
case "host.dotquad": return execute_host_dotquad(output,conversionQueue);
case "type.convert": return execute_type_convert(output, conversionQueue);
default: // exception...
}
}
This code generation application need to be executed only when your method signatures changes or when you decide to add new transformations.
Main advantages:
No runtime overhead
Easy to add/delete/change the conversions (code generator will take care of the code changes :) )
What do you think?
I apologize for the long code dump and the fact that it is in Java, rather than C#, but I found your problem quite interesting and I do not have much C# experience. Hopefully you will be able to adapt this solution without difficulty.
One approach to solving your problem is to create a cost for each conversion -- usually this is related to the accuracy of the conversion -- and then perform a search to find the best possible conversion sequence to get from one type to another.
The reason for needing a cost function is to choose among multiple conversion paths. For example, converting from an integer to a string is lossless, but there is no guarantee that every string can be represented by an integer. So, if you had two conversion chains
string -> integer -> float -> decimal
string -> float -> decimal
You would want to select the second one because it will reduce the chance of a conversion failure.
The Java code below implements such a scheme and performs a best-first search to find an optimal conversion sequence. I hope you find it useful. Running the code produces the following output:
> No conversion possible from string to integer
> The optimal conversion sequence from string to host.dotquad.string is:
> string to host.name.string, cost = -1.609438
> host.name.string to host.dns, cost = -1.609438 *PERFECT*
> host.dns to host.dotquad, cost = -1.832581
> host.dotquad to host.dotquad.string, cost = -1.832581 *PERFECT*
Here is the Java code.
/**
* Use best-first search to find an optimal sequence of operations for
* performing a type conversion with maximum fidelity.
*/
import java.util.*;
public class TypeConversion {
/**
* Define a type-conversion interface. It converts between to
* user-defined types and provides a measure of fidelity (accuracy)
* of the conversion.
*/
interface ITypeConverter<T, F> {
public T convert(F from);
public double fidelity();
// Could use reflection instead of handling this explicitly
public String getSourceType();
public String getTargetType();
}
/**
* Create a set of user-defined types.
*/
class HostName {
public String hostName;
public HostName(String hostName) {
this.hostName = hostName;
}
}
class DnsLookup {
public String ipAddress;
public DnsLookup(HostName hostName) {
this.ipAddress = doDNSLookup(hostName);
}
private String doDNSLookup(HostName hostName) {
return "127.0.0.1";
}
}
class DottedQuad {
public int[] quad = new int[4];
public DottedQuad(DnsLookup lookup) {
String[] split = lookup.ipAddress.split(".");
for ( int i = 0; i < 4; i++ )
quad[i] = Integer.parseInt( split[i] );
}
}
/**
* Define a set of conversion operations between the types. We only
* implement a minimal number for brevity, but this could be expanded.
*
* We start by creating some broad classes to differentiate among
* perfect, good and bad conversions.
*/
abstract class PerfectTypeConversion<T, F> implements ITypeConverter<T, F> {
public abstract T convert(F from);
public double fidelity() { return 1.0; }
}
abstract class GoodTypeConversion<T, F> implements ITypeConverter<T, F> {
public abstract T convert(F from);
public double fidelity() { return 0.8; }
}
abstract class BadTypeConversion<T, F> implements ITypeConverter<T, F> {
public abstract T convert(F from);
public double fidelity() { return 0.2; }
}
/**
* Concrete classes that do the actual conversions.
*/
class StringToHostName extends BadTypeConversion<HostName, String> {
public HostName convert(String from) { return new HostName(from); }
public String getSourceType() { return "string"; }
public String getTargetType() { return "host.name.string"; }
}
class HostNameToDnsLookup extends PerfectTypeConversion<DnsLookup, HostName> {
public DnsLookup convert(HostName from) { return new DnsLookup(from); }
public String getSourceType() { return "host.name.string"; }
public String getTargetType() { return "host.dns"; }
}
class DnsLookupToDottedQuad extends GoodTypeConversion<DottedQuad, DnsLookup> {
public DottedQuad convert(DnsLookup from) { return new DottedQuad(from); }
public String getSourceType() { return "host.dns"; }
public String getTargetType() { return "host.dotquad"; }
}
class DottedQuadToString extends PerfectTypeConversion<String, DottedQuad> {
public String convert(DottedQuad f) {
return f.quad[0] + "." + f.quad[1] + "." + f.quad[2] + "." + f.quad[3];
}
public String getSourceType() { return "host.dotquad"; }
public String getTargetType() { return "host.dotquad.string"; }
}
/**
* To find the best conversion sequence, we need to instantiate
* a list of converters.
*/
ITypeConverter<?,?> converters[] =
{
new StringToHostName(),
new HostNameToDnsLookup(),
new DnsLookupToDottedQuad(),
new DottedQuadToString()
};
Map<String, List<ITypeConverter<?,?>>> fromMap =
new HashMap<String, List<ITypeConverter<?,?>>>();
public void buildConversionMap()
{
for ( ITypeConverter<?,?> converter : converters )
{
String type = converter.getSourceType();
if ( !fromMap.containsKey( type )) {
fromMap.put( type, new ArrayList<ITypeConverter<?,?>>());
}
fromMap.get(type).add(converter);
}
}
public class Tuple implements Comparable<Tuple>
{
public String type;
public double cost;
public Tuple parent;
public Tuple(String type, double cost, Tuple parent) {
this.type = type;
this.cost = cost;
this.parent = parent;
}
public int compareTo(Tuple o) {
return Double.compare( cost, o.cost );
}
}
public Tuple findOptimalConversionSequence(String from, String target)
{
PriorityQueue<Tuple> queue = new PriorityQueue<Tuple>();
// Add a dummy start node to the queue
queue.add( new Tuple( from, 0.0, null ));
// Perform the search
while ( !queue.isEmpty() )
{
// Pop the most promising candidate from the list
Tuple tuple = queue.remove();
// If the type matches the target type, return
if ( tuple.type == target )
return tuple;
// If we have reached a dead-end, backtrack
if ( !fromMap.containsKey( tuple.type ))
continue;
// Otherwise get all of the possible conversions to
// perform next and add their costs
for ( ITypeConverter<?,?> converter : fromMap.get( tuple.type ))
{
String type = converter.getTargetType();
double cost = tuple.cost + Math.log( converter.fidelity() );
queue.add( new Tuple( type, cost, tuple ));
}
}
// No solution
return null;
}
public static void convert(String from, String target)
{
TypeConversion tc = new TypeConversion();
// Build a conversion lookup table
tc.buildConversionMap();
// Find the tail of the optimal conversion chain.
Tuple tail = tc.findOptimalConversionSequence( from, target );
if ( tail == null ) {
System.out.println( "No conversion possible from " + from + " to " + target );
return;
}
// Reconstruct the conversion path (skip dummy node)
List<Tuple> solution = new ArrayList<Tuple>();
for ( ; tail.parent != null ; tail = tail.parent )
solution.add( tail );
Collections.reverse( solution );
StringBuilder sb = new StringBuilder();
Formatter formatter = new Formatter(sb);
sb.append( "The optimal conversion sequence from " + from + " to " + target + " is:\n" );
for ( Tuple tuple : solution ) {
formatter.format( "%20s to %20s, cost = %f", tuple.parent.type, tuple.type, tuple.cost );
if ( tuple.cost == tuple.parent.cost )
sb.append( " *PERFECT*");
sb.append( "\n" );
}
System.out.println( sb.toString() );
}
public static void main(String[] args)
{
// Run two tests
convert( "string", "integer" );
convert( "string", "host.dotquad.string" );
}
}

Resources