I am working on a custom programming language. On compiling it, the parser first converts the text into a simple stream of tokens. The tokens are then converted into a simple tree. The tree is then converted into an object graph (with holes in it, as the types haven't yet been necessarily fully figured out). The holey tree is then transformed into a compact object graph.
Then we can go further and compile it to, say, JavaScript. The compact object graph is then transformed into a JavaScript AST. The JS AST is then transformed into a "concrete" syntax tree (with whitespace and such), and then that is converted into the JS text.
So in going from text to compact object graph, there are 5 transformation steps (text -> token_list -> tree -> holey_graph -> graph). In other situations (other languages), you might have more or less.
The way I am doing this transformation now is very ad-hoc and not keeping track of line numbers, so it's impossible to really tell where an error is coming from. I would like to fix that.
In my case, I am wondering how you could create a data model to keep track of the line of text where something was defined. This way, you could report any compilation errors nicely to the developer. The way I have modeled that so far is with a sort of "folding" model as I'm calling it. The initial "fold" is on the text -> token_list transformation. For each token, it keeps track of 3 things: the line, the column, and the text length, for the token. At first you may model it like this:
{
token: 'function',
line: 10,
column: 2,
size: 8
}
But that is tying two concepts into one object: the token itself, and the "fold" as I am calling it. Really it would be better like this:
fold = {
line: 10,
column: 2,
size: 8
}
token = {
value: 'function'
}
// bind the two together.
fold.data = token
token.fold = fold
Then, you transform from token to AST node in the simple tree. That might be like:
treeNode = {
type: 'function'
}
fold = {
previous: tokenFold,
data: treeNode
}
And so connecting the dots like this. In the end, you would have a fold list, which could be traversed theorertically from the compact object graph, to the text, so if there was a compile eror when doing typechecking for example, you could report the exact line number and everything to the developer. The navigation would look something like this:
data = compactObjectGraph
.fold
.previous.previous.previous.previous
.data
data.line
data.column
data.size
In theory. But the problem is, the "compact object graph" might have been created not from a simple linear chain of inputs, but from a suite of inputs. While I have modeled this on paper so far, I am starting to think there isn't actually in reality a clear way of mapping from object to object how it was transfformed, using this sort of "fold" sort of system.
The question is, how can I define the data model to allow for getting back to the source text line/column number, given there is a complex sequence of transformations from one data structure to the next? That is, at a high level, what is a way to model this that will allow you to isolate the transformation data structures, yet be able to map from the last generated one to the first, to find how some compact object graph node was actually represented in the original source text?
I would create a data structure containing the filename, line and column. In C++ it may work well to store a reference to this structure, rather than copy it to many places.
There isn't really that many ways to solve this, but having a single structure that is re-usable across your other data structures is almost certainly the right solution.
I answered your question on Quora in July, so maybe you missed it: https://qr.ae/pvkrwJ
Basically you have to stamp all the compiler artifacts with source information from which they are derived. Best represented a some kind of structure (Mats' response). Yeah, that takes effort, because
you have to do it everywhere in the compiler.
To do a perfect job, you'd need to stamp it with the complete set of source items that caused its generation; you're essentially producing a dependency graph. (You could represent such sets as trees of subsets to maximize sharing). Then any complaint the compiler issued could clearly identify the set of causes.
To do a less perfect job, you can pick any of of the contributing items and use that as the source location dependency. That means that a compiler complaint will only identify one source location that might be the cause, and the reader will have to guess at others if that isn't the principal source of the problem. Judicious choice of which-cause source information can arrange it so the answer is right much of the time and that's probably good enough.
I am using System.Linq.Dynamic.Core to parse custom statistical templates, and was wondering if it is possible to somehow extend the library's functionality to parse more mathematical functions. Specifically, I needed in this instance to calculate the absolute value of a variable. I have managed to do this with the already supported "iif" function (i.e. "iif(a>-a, a, -a)"), but I was wondering if there is a way to extend the library to add an "abs()" function, and similarly other functions I may need in the future (such as square root etc).
Any pointers to the right direction?
The System.Linq.Dynamic.Core library is not really designed for this extensibility.
However, you can take a look at the System.Linq.Dynamic.Core.Parser.ExpressionParser.cs for examples, like the IIF you already mention.
I'm using a bidirectional map to link a list of names to a particular single name (for example, to correlate cities and countries). So, my definition of the type is something like:
using boost::bimap<boost::bimaps::unordered_set_of<std::string>, std::string> CitiesVsCountries;
But one question intrigues me:
What's the advantage on using a boost::bimaps::unordered_set_of<std::string> v/s a simple std::unordered_set? The advantage of the bimap is clear (avoing having to synchronize by hand two maps), but I can't really see what added value is giving the Boost version of the unordered set, nor I can find any document detailing the difference.
Thanks a lot for your help.
Using Xamarin.iOS, (or just the iOS API) I need to get the outline path of text as rendered in some typeface. The exact outline is needed because I'm going to tesselate the outlines and apply 2D and 3D transformations to them.
In Java, this is straightforward by turning rendered text into a Shape (via GlyphVectors).
In GDI (.NET) this can be done with System.Drawing.GraphicsPath, adding text and getting the path. This is not available in Xamarin.iOS.
Is there a straightforward way to create paths for rendered text in iOS or Xamarin.iOS?
The MonoTouch.CoreText.CTFont.GetPathForGlyph overloads that returns instances of CGPath are likely what you're looking for. It maps to the native CTFontCreatePathForGlyph API (for further documentation / samples).
You'll need to iterate your string (for each glyph) and create subpaths - but you should end up with your string as a vector (and be able to further transform then as you need).
I'm using the same zachrone iphonepdf but I did not get the text. My text view shows nothing. What's the problem?
Here is my code:
NSString *text=convertPDF(#"Course.pdf");
texview.text=text;
But I did not get anything in text view?
The text extractor zachron / pdfiphone (I assume you meant that one) is extremely naive and makes very many assumptions.
It ignores the PDF file structure and, therefore completely ignores whether the data it inspects is still used in the current revision.
It ignores encryption and therefore will fail completely for many documents with usage restrictions.
It completely ignores font encodings and implicitely assumes an ASCII'ish one --- this is fairly often true in small PDFs with English text only and not embedded fonts; otherwise the result can be anything.
... many many more assumptions ...
Unless one only has to deal with very simple documents and the extracted text is not really necessary for the functionality of one's code, I would propose using different code for text extraction.