I'm making my little toy stack-language, and am revolving it around a central concept of Bytecode, literally just like the JVM.
One of the opcodes I want to add in, is a print_var function, that will print the value of a local variable. My question is, which print function do I use? println(), or print()?
Well really, it's not a question of what to use, but what is the standard in actual applications of this process? Are systems designed to simply print(), and then a println() function is built around the print() function, or are they both made hand-in-hand to do seperate things?
I'm taking a look at Java's PrintWriter source, which is used in System.out.println(), and its defined as follows:
public void println(boolean x) {
synchronized (lock) {
print(x);
println();
}
}
Obviously Java made a print() function and based the println() off of it. Are many other major languages like that?
Let's see:
C: has printf --- no println
C++: has ostream -- no println
C#: has WriteLine (and Write)
Haskell: has putStr and putStrLn
...
Long story short: you can find both. So simply implement what you think works best for your requirements respectively your approach in general. In other words: simply have a look at the other opcodes you have/want to implement and assess if they are more about "convenience" (so you would have print and println), or if they focus on providing "elementary building blocks" (then you might only provide print).
Beyond that: keep in mind that providing println() has one major advantage: the user doesn't need to worry about different "new line" characters for different operating systems.
Related
I recently started using Lex, as a simple way to explain the problem I encoutered, supposing that I'm trying to realize a lexical analyser with Flex that print all the letters and also all the bigrams in a given text, that seems very easy and simple, but once I implemented it, I've realised that it shows bigrams first and only shows letters when they are single, example: for the following text
QQQZ ,JQR
The result is
Bigram QQ
Bigram QZ
Bigram JQ
Letter R
Done
This is my lex code
%{
%}
letter[A-Za-z]
Separ [ \t\n]
%%
{letter} {
printf(" Letter %c\n",yytext[0]);
}
{letter}{2} {
printf(" Bigram %s\n",yytext);
}
%%
main()
{ yylex();
printf("Done");
}
My question is How can realise the two analysis seperatly, knowing that my actual problem isn't as simple as this example
Lexical analysers divide the source text into separate tokens. If your problem looks like that, then (f)lex is an appropriate tool. If your problem does not look like that, then (f)lex is probably not the correct tool.
Doing two simultaneous analyses of text is not really a use case for (f)lex. One possibility would be to use two separate reentrant lexical analysers, arranging to feed them the same inputs. However, that will be a lot of work for a problem which could easily be solved in a few lines of C.
Since you say that your problem is different from the simple problem in your question, I did not bother to either write the simple C code or the rather more complicated code to generate and run two independent lexical analysers, since it is impossible to know whether either of those solutions is at all relevant.
If your problem really is matching two (or more) different lexemes from the same starting position, you could use one of two strategies, both quite ugly (IMHO):
I'm assuming the existence of handler functions:
void handle_letter(char ch);
void handle_bigram(char* s); /* Expects NUL-terminated string */
void handle_trigram(char* s); /* Expects NUL-terminated string */
For historical reasons, lex implements the REJECT action, which causes the current match to be discarded. The idea was to let you process a match, and then reject it in order to process a shorter (or alternate) match. With flex, the use of REJECT is highly discouraged because it is extremely inefficient and also prevents the lexer from resizing the input buffer, which arbitrarily limits the length of a recognisable token. However, in this particular use case it is quite simple:
[[:alpha:]][[:alpha:]][[:alpha:]] handle_trigram(yytext); REJECT;
[[:alpha:]][[:alpha:]] handle_bigram(yytext); REJECT;
[[:alpha:]] handle_letter(*yytext);
If you want to try this solution, I recommend using flex's debug facility (flex -d ...) in order to see what is going on.
See debugging options and REJECT documentation.
The solution I would actually recommend, although the code is a bit clunkier, is to use yyless() to reprocess part of the recognised token. This is quite a bit more efficient than REJECT; yyless() just changes a single pointer, so it has no impact on speed. Without REJECT, we have to know all the lexeme handlers which will be needed, but that's not very difficult. A complication is the interface for handle_bigram, which requires a NUL-terminated string. If your handler didn't impose this requirement, the code would be simpler.
[[:alpha:]][[:alpha:]][[:alpha:]] { handle_trigram(yytext);
char tmp = yytext[2];
yytext[2] = 0;
handle_bigram(yytext);
yytext[2] = tmp;
handle_letter(yytext[0]);
yyless(1);
}
[[:alpha:]][[:alpha:]] { handle_bigram(yytext);
handle_letter(yytext[0]);
yyless(1);
}
[[:alpha:]] handle_letter(*yytext);
See yyless() documentation
E.g. is there something like this:
O = widget:new(),
O:whirl()
I seem to recall seeing some code like this (maybe I was imagining it), but I haven't seen it in the tutorials that I've read. From what I've seen, the closest thing is this:
O = widget:new(),
widget:whirl(O)
That's not too bad, but not having to repeat widget: in the second expression would be nice.
This is syntax for parametrized modules which was removed from Erlang in R16 (2012).
No, Erlang does not have methods. Erlang has processes, not objects, and you communicate with them by messaging them, not calling methods on them. That's it. That's all there is to it.
The closest thing to new in Erlang that means what it means in Java or C++ is spawn. (The parameterized module discussion touches on something very different from what you would expect coming from a C++ type language where new reserves memory, calls a constructor, etc.)
There are actually two aspects to this: data objects (like a dict or list or something) and processes (things you create with spawn).
Within a function definition you might see something like
SomeDict = dict:new(),
or
OtherDict = dict:from_list(KV_List)
This does indeed create something, but its not an "object" in the Java or C++ sense, it is an "object" in the (older) sense of the term that it is a named reference to something in memory. And indeed you interact with it the same way you demonstrated above:
D = dict:new(),
ok = some_operation(D),
where some_operation/1 might be anything, whether it is dict:foo() or something else. The part before the colon is a module identifier, telling the runtime what namespace that function you're calling exists in -- nothing more.
The other thing, spawn, is much more like new in C++, where you want to create a complete thing that is alive and has arms and legs -- a noun that can do verby things:
Pid = spawn(Mod, Fun, Args),
Pid ! {some, message},
Most of the time you only see the erlang:send/2 function (also written as the infix ! operator) in prototype or non-OTP code. Usually this is hidden from you by interface functions that abstract away the fact that you are sending async messages all the time to communicate with your processes.
For some more in-depth explanation, I recommend reading Learn You Some Erlang -- the author explains this and other basic concepts in some depth.
Whatever you do, do not fall into the simpleton's trap of thinking Erlang is Java. That is just a great way to trip over your own preconceptions and get frustrated. This is probably the #1 beginner mistake I see people make...
I was given a fragment of code (a function called bubbleSort(), written in Java, for example). How can I, or rather my program, tell if a given source code implements a particular sorting algorithm the correct way (using bubble method, for instance)?
I can enforce a user to give a legitimate function by analyzing function signature: making sure the the argument and return value is an array of integers. But I have no idea how to determine that algorithm logic is being done the right way. The input code could sort values correctly, but not in an aforementioned bubble method. How can my program discern that? I do realize a lot of code parsing would be involved, but maybe there's something else that I should know.
I hope I was somewhat clear.
I'd appreciate if someone could point me in the right direction or give suggestions on how to tackle such a problem. Perhaps there are tested ways that ease the evaluation of program logic.
In general, you can't do this because of the Halting problem. You can't even decide if the function will halt ("return").
As a practical matter, there's a bit more hope. If you are looking for a bubble sort, you can decide that it has number of parts:
a to-be-sorted datatype S with a partial order,
a container data type C with single instance variable A ("the array")
that holds the to-be-sorted data
a key type K ("array index") used to access the container that has a partial order
such that container[K] is type S
a comparison of two members of container, using key A and key B
such that A < B according to the key partial order, that determines
if container[B]>container of A
a swap operation on container[A], container[B] and some variable T of type S, that is conditionaly dependent on the comparison
a loop wrapped around the container that enumerates keys in according the partial order on K
You can build bits of code that find each of these bits of evidence in your source code, and if you find them all, claim you have evidence of a bubble sort.
To do this concretely, you need standard program analysis machinery:
to parse the source code and build an abstract syntax tree
build symbol tables (ST) that know the type of each identifier where it is used
construct a control flow graph (CFG) so that you check that various recognized bits occur in appropriate ordering
construct a data flow graph (DFG), so that you can determine that values recognized in one part of the algorithm flow properly to another part
[That's a lot of machinery just to get started]
From here, you can write ad hoc code procedural code to climb over the AST, ST, CFG, DFG, to "recognize" each of the individual parts. This is likely to be pretty messy as each recognizer will be checking these structures for evidence of its bit. But, you can do it.
This is messy enough, and interesting enough, so there are tools which can do much of this.
Our DMS Software Reengineering Toolkit is one. DMS already contains all the machinery to do standard program analysis for several languages. DMS also has a Dataflow pattern matching language, inspired by Rich and Water's 1980's "Programmer's Apprentice" ideas.
With DMS, you can express this particular problem roughly like this (untested):
dataflow pattern domain C;
dataflow pattern swap(in out v1:S, in out v2:S, T:S):statements =
" \T = \v1;
\v1 = \v2;
\v2 = \T;";
dataflow pattern conditional_swap(in out v1:S, in out v2:S,T:S):statements=
" if (\v1 > \v2)
\swap(\v1,\v2,\T);"
dataflow pattern container_access(inout container C, in key: K):expression
= " \container.body[\K] ";
dataflow pattern size(in container:C, out: integer):expression
= " \container . size "
dataflow pattern bubble_sort(in out container:C, k1: K, k2: K):function
" \k1 = \smallestK\(\);
while (\k1<\size\(container\)) {
\k2 = \next\(k1);
while (\k2 <= \size\(container\) {
\conditionalswap\(\container_access\(\container\,\k1\),
\container_access\(\container\,\k2\) \)
}
}
";
Within each pattern, you can write what amounts to the concrete syntax of the chosen programming language ("pattern domain"), referencing dataflows named in the pattern signature line. A subpattern can be mentioned inside another; one has to pass the dataflows to and from the subpattern by naming them. Unlike "plain old C", you have to pass the container explicitly rather than by implicit reference; that's because we are interested in the actual values that flow from one place in the pattern to another. (Just because two places in the code use the same variable, doesn't mean they see the same value).
Given these definitions, and ask to "match bubble_sort", DMS will visit the DFG (tied to CFG/AST/ST) to try to match the pattern; where it matches, it will bind the pattern variables to the DFG entries. If it can't find a match for everything, the match fails.
To accomplish the match, each of patterns above is converted essentially into its own DFG, and then each pattern is matched against the DFG for the code using what is called a subgraph isomorphism test. Constructing the DFG for the patter takes a lot of machinery: parsing, name resolution, control and data flow analysis, applied to fragments of code in the original language, intermixed with various pattern meta-escapes. The subgraph isomorphism is "sort of easy" to code, but can be very expensive to run. What saves the DMS pattern matchers is that most patterns have many, many constraints [tech point: and they don't have knots] and each attempted match tends to fail pretty fast, or succeed completely.
Not shown, but by defining the various bits separately, one can provide alternative implementations, enabling the recognition of variations.
We have used this to implement quite complete factory control model extraction tools from real industrial plant controllers for Dow Chemical on their peculiar Dowtran language (meant building parsers, etc. as above for Dowtran). We have version of this prototyped for C; the data flow analysis is harder.
In C, I could generate an executable, do an extensive rename only refactor, then compare executables again to confirm that the executable did not change. This was very handy to ensure that the refactor did not break anything.
Has anyone done anything similar with Ruby, particularly a Rails app? Strategies and methods would be appreciated. Ideally, I could run a script that output a single file of some sort that was purely bytecode and was not changed by naming changes. I'm guessing JRuby or Rubinus would be helpful here.
I don't think this strategy will work for Ruby. Unlike C, where the compiler throws away the names, most of the things you name in Ruby carry that name with them. That includes classes, modules, constants, and instance variables.
Automated unit and integration tests are the way to go to support Ruby refactoring.
Interesting question -- I like the definitive "yes" answer you can get from this regression strategy, at least for the specific case of rename refactoring.
I'm not expert enough to tell whether you can compile ruby (or at least a subset, without things like eval) but there seem to be some hints at:
http://www.hokstad.com/the-problem-with-compiling-ruby.html
http://rubini.us/2011/03/17/running-ruby-with-no-ruby/
Supposing that a complete compilation isn't possible, what about an abstract interpretation approach? You could parse the ruby into an AST, emit some kind of C code from the AST, and then compile the C code. The C code would not need to fully capture the behavior of the ruby code. It would only need to be compilable and to be distinct whenever the ruby was distinct. (Actually running it could result in gibberish, or perhaps an immediate memory violation error.)
As a simple example, suppose that ruby supported multiplication and C didn't. Then you could include a static mult function in your C code and translate from:
a = b + c*d
to
a = b + mult(c,d)
and the resulting compiled code would be invariant under name refactoring but would show discrepancies under other sorts of change. The mult function need not actually implement multiplication, you could have one of these instead:
static int mult( int a, int b ) { return a + b; } // pretty close
static int mult( int a, int b ) { return *0; } // not close at all, but still sufficient
and you'd still get the invariance you need as long as the C compiler isn't going to inline the definition. The same sort of translation, from an uncompilable ruby construct to a less functional but distinct C construct, should work for object manipulation and so forth, mapping class operations into C structure references. The key point is just that you want to keep the naming relationships intact while sacrificing actual behavior.
(I wonder whether you could do something with a single C struct that has members (all pointers to the same struct type) named after all the class and property names in the ruby code. Class and object operations would then correspond to nested dereference operations using this single structure. Just a notion.)
Even if you cannot formulate a precise mapping, an imprecise mapping that misses some minor distinctions might still be enough to increase confidence in the original name refactoring.
The quickest way to implement such a scheme might be to map from byte code to C (rather from the ruby AST to C). That would save a lot of parsing, but the mapping would be harder to understand and verify.
Coming from a Matlab and R background where the development process is very interactive (select, run selection, fix, select, run selection, fix, etc), I'm trying to figure out how F# handles this style of development, which seems pretty important in scientific applications. Here are few things that just immediately come to mind to somebody new to F#:
Selecting multiple lines gives different results than one line at a time.
let add x y = x + y
add 4.1 2.3
Selecting both lines results in float -> float -> float whereas selecting the first line results in int -> int -> int. More generally, matlab/R users are used to results printing out after each statement, not at the end.
Shadow copying can become burdensome.
let file = open2GBfile('file.txt')
process file
If you run this interactively over and over again, the 2GB file is shadow copied and you will quickly run out of memory. Making file mutable doesn't seem like the appropriate solution, since the final run of the program will never change it.
Given these issues, is it impossible for a fsi.exe based system to support matlab/R style interactive development?
[Edit: I am guessing about 2. Do objects get marked for deletion as soon as they are shadowed?]
I wouldn't expect F# to be a drop-in replacement for Matlab/R, because unlike them, F# is a general purpose programming language. Not everything you need for a specific type of work will be in the standard libraries. But that doesn't mean that the "interactive development" you describe is impossible, it may just require some effort up-front to build the library functions you depend on.
For #1, as was mentioned earlier, adding type annotations is unfortunately necessary in some cases, but also the inline keyword and "hat-types" can give you duck-typing.
For #2, I'm not clear on what your open and process functions do, versus what you want them to do. For example, the open function could:
Read the entire file at once, return the data as an array/list/etc, and then close the file
Return a FileStream object, which you're calling process on but forget to close.
Return a sequence expression so you can lazily iterate over the file contents
Memoize the result of one of the above, so that subsequent calls just return the cached result
One of the gazillion other ways to create an abstraction over file access.
Some of these are better suited for your task than others. Compared to Matlab & R, a general purpose language like F# gives you more ways to shoot yourself in the foot. But that's because it gives you more ways to do everything.
To #1
In FSI, you'll have to type ;; at the end of each statement and get the results directly:
> 1 + 2;;
val it : int = 3
Generally an F#-Codefile should be seen as a collection of individual functions that you have to call and evaluate interactively and not as a series of steps that produce values to be shown.
To #2:
This seems to be a problem of the code itself: Make file a function, so the reading/copying is only done when and where really needed (otherwise the let binding would be evaluated in the beginning).