How to compare Rails ''executables" before and after refactor? - ruby-on-rails

In C, I could generate an executable, do an extensive rename only refactor, then compare executables again to confirm that the executable did not change. This was very handy to ensure that the refactor did not break anything.
Has anyone done anything similar with Ruby, particularly a Rails app? Strategies and methods would be appreciated. Ideally, I could run a script that output a single file of some sort that was purely bytecode and was not changed by naming changes. I'm guessing JRuby or Rubinus would be helpful here.

I don't think this strategy will work for Ruby. Unlike C, where the compiler throws away the names, most of the things you name in Ruby carry that name with them. That includes classes, modules, constants, and instance variables.
Automated unit and integration tests are the way to go to support Ruby refactoring.

Interesting question -- I like the definitive "yes" answer you can get from this regression strategy, at least for the specific case of rename refactoring.
I'm not expert enough to tell whether you can compile ruby (or at least a subset, without things like eval) but there seem to be some hints at:
http://www.hokstad.com/the-problem-with-compiling-ruby.html
http://rubini.us/2011/03/17/running-ruby-with-no-ruby/
Supposing that a complete compilation isn't possible, what about an abstract interpretation approach? You could parse the ruby into an AST, emit some kind of C code from the AST, and then compile the C code. The C code would not need to fully capture the behavior of the ruby code. It would only need to be compilable and to be distinct whenever the ruby was distinct. (Actually running it could result in gibberish, or perhaps an immediate memory violation error.)
As a simple example, suppose that ruby supported multiplication and C didn't. Then you could include a static mult function in your C code and translate from:
a = b + c*d
to
a = b + mult(c,d)
and the resulting compiled code would be invariant under name refactoring but would show discrepancies under other sorts of change. The mult function need not actually implement multiplication, you could have one of these instead:
static int mult( int a, int b ) { return a + b; } // pretty close
static int mult( int a, int b ) { return *0; } // not close at all, but still sufficient
and you'd still get the invariance you need as long as the C compiler isn't going to inline the definition. The same sort of translation, from an uncompilable ruby construct to a less functional but distinct C construct, should work for object manipulation and so forth, mapping class operations into C structure references. The key point is just that you want to keep the naming relationships intact while sacrificing actual behavior.
(I wonder whether you could do something with a single C struct that has members (all pointers to the same struct type) named after all the class and property names in the ruby code. Class and object operations would then correspond to nested dereference operations using this single structure. Just a notion.)
Even if you cannot formulate a precise mapping, an imprecise mapping that misses some minor distinctions might still be enough to increase confidence in the original name refactoring.
The quickest way to implement such a scheme might be to map from byte code to C (rather from the ruby AST to C). That would save a lot of parsing, but the mapping would be harder to understand and verify.

Related

Dart int and double being interned? Treated specially by identical()?

Dart has both:
an equality operator == and
a top-level function named identical().
By the choice of syntax, it feels natural to want to use Dart's == operator more frequently than identical(), and I like that. In fact, the Section on Equality of the Idiomatic Dart states that "in practice, you will rarely need to use" identical().
In a recent answer to one of my questions concerning custom filters, it seems that Angular Dart favors use of identical() rather than == when trying to determine whether changes to a model have reached a steady state. (Which can make sense, I suppose, for large models for reasons of efficiency.)
This got me to thinking about identity of int's and so I wrote some tests of identical() over ints. While I expected that small ints might be "interned/cached" (e.g. similar to what is done by Java's Integer.valueOf()), to my surprise, I can't seem to generate two ints that are equal but not identical. I get similar results for double.
Are int and double values being interned/cached? Or maybe identical() is treating them specially? Coming from a Java background, I used to equate equate Dart's:
== to Java's equal() method and
identical() to Java's equality test ==.
But that now seems wrong. Anyone know what is going on?
Numbers are treated specially. If their bit-pattern is the same they must be identical (although it is still debated if this includes the different versions of NaNs).
The main reasons are expectations, leaking of internal details and efficiency.
Expectations: users expect numbers to be identical. It goes against common sense that x == y (for two integers) but not identical(x, y).
Leaking of internal details: the VM uses SMIs (SMall Integers) to represent integers in a specific range (31 bits on 32-bit machines, 63 on 64-bit machines). These are canonicalized and are always identical. Exposing this internal implementation detail would lead to inconsistent results depending on which platform you run.
Efficiency: the VM wants to unbox numbers wherever it can. For example, inside a method doubles are frequently moved into registers. However, keeping track of the original box can be cumbersome and difficult.
foo(x, y) {
var result = x;
while(y-- > 0) {
result += x;
}
return result;
}
Suppose, that the VM optimizes this function and moves result into a register (unboxing x in the process). This allows for a tight loop where result is then efficiently modified. The difficult case happens, when y is 0. The loop wouldn't execute and foo would return x directly. In other words, the following would need to be true:
var x = 5.0;
identical(x, foo(x, 0)); // should be true.
If the VM unboxed the result variable in the method foo it would need to allocate a fresh box for the result and the identical call would therefore return false.
By modifying the definition of identical all these problems are avoided. It comes with a small cost to the identical check, though.
Seems like I posted too quickly. I just stumbled on Dart Issue 13084: Spec says identical(1.0, 1) is true, even if they have different types which led me to the Dart section on Object Identity of the language spec. (I had previously search for equality in the spec but not object identity.)
Here is an excerpt:
The predefined dart function identical() is defined such that identical(c1, c2) iff:
- c1 evaluates to either null or an instance of
bool and c1 == c2, OR
- c1 and c2 are instances of int and c1 == c2, OR
- c1 and c2 are constant strings and c1 == c2, OR
- c1 and c2 are instances of double and one of the following holds: ...
and there are more clauses dealing with lists, maps and constant objects. See the language spec for the full details. Hence, identical() is much more than just a simple test for reference equality.
I can't remember the source for this, but somewhere on dartlang.org or the issue tracker it was said that num, int and double are indeed getting special treatment. One of those special treatments is that you can't subclass those types for performance reasons, but there may be more. What exactly this special treatment entails can probably only be answered by the developers, or maybe someone who knows the specification by heart, but one thing can be inferred:
The numeric types are dart objects - they have methods you can call on their instances. But they also have qualities of primitive data types, as you can do int i = 3;, while a pure object should have a new keyword somewhere. This is different from Java, where there are real primitive types and real objects wrapping them and exposing instance methods.
While the technical details certainly are more complex, if you think about dart numerics as a blend of object and primitive, your comparison to Java still makes sense. In Java, new Integer(5).equals(new Integer(5)) evaluates to true, and so does 5==5.
I am aware this is not a very technically correct answer, but I hope it's still useful to make sense of the behaviour of dart numerics when coming from a Java background.

Why is it that Lua doesn't have augmented assignment? [duplicate]

This is a question I've been mildly irritated about for some time and just never got around to search the answer to.
However I thought I might at least ask the question and perhaps someone can explain.
Basically many languages I've worked in utilize syntactic sugar to write (using syntax from C++):
int main() {
int a = 2;
a += 3; // a=a+3
}
while in lua the += is not defined, so I would have to write a=a+3, which again is all about syntactical sugar. when using a more "meaningful" variable name such as: bleed_damage_over_time or something it starts getting tedious to write:
bleed_damage_over_time = bleed_damage_over_time + added_bleed_damage_over_time
instead of:
bleed_damage_over_time += added_bleed_damage_over_time
So I would like to know not how to solve this if you don't have a nice solution, in that case I would of course be interested in hearing it; but rather why lua doesn't implement this syntactical sugar.
This is just guesswork on my part, but:
1. It's hard to implement this in a single-pass compiler
Lua's bytecode compiler is implemented as a single-pass recursive descent parser that immediately generates code. It does not parse to a separate AST structure and then in a second pass convert that to bytecode.
This forces some limitations on the grammar and semantics. In particular, anything that requires arbitrary lookahead or forward references is really hard to support in this model. This means assignments are already hard to parse. Given something like:
foo.bar.baz = "value"
When you're parsing foo.bar.baz, you don't realize you're actually parsing an assignment until you hit the = after you've already parsed and generated code for that. Lua's compiler has a good bit of complexity just for handling assignments because of this.
Supporting self-assignment would make that even harder. Something like:
foo.bar.baz += "value"
Needs to get translated to:
foo.bar.baz = foo.bar.baz + "value"
But at the point that the compiler hits the =, it's already forgotten about foo.bar.baz. It's possible, but not easy.
2. It may not play nice with the grammar
Lua doesn't actually have any statement or line separators in the grammar. Whitespace is ignored and there are no mandatory semicolons. You can do:
io.write("one")
io.write("two")
Or:
io.write("one") io.write("two")
And Lua is equally happy with both. Keeping a grammar like that unambiguous is tricky. I'm not sure, but self-assignment operators may make that harder.
3. It doesn't play nice with multiple assignment
Lua supports multiple assignment, like:
a, b, c = someFnThatReturnsThreeValues()
It's not even clear to me what it would mean if you tried to do:
a, b, c += someFnThatReturnsThreeValues()
You could limit self-assignment operators to single assignment, but then you've just added a weird corner case people have to know about.
With all of this, it's not at all clear that self-assignment operators are useful enough to be worth dealing with the above issues.
I think you could just rewrite this question as
Why doesn't <languageX> have <featureY> from <languageZ>?
Typically it's a trade-off that the language designers make based on their vision of what the language is intended for, and their goals.
In Lua's case, the language is intended to be an embedded scripting language, so any changes that make the language more complex or potentially make the compiler/runtime even slightly larger or slower may go against this objective.
If you implement each and every tiny feature, you can end up with a 'kitchen sink' language: ADA, anyone?
And as you say, it's just syntactic sugar.
Another reason why Lua doesn't have self-assignment operators is that table access can be overloaded with metatables to have arbitrary side effects. For self assignment you would need to choose to desugar
foo.bar.baz += 2
into
foo.bar.baz = foo.bar.baz + 2
or into
local tmp = foo.bar
tmp.baz = tmp.baz + 2
The first version runs the __index metamethod for foo twice, while the second one does so only once. Not including self-assignment in the language and forcing you to be explicit helps avoid this ambiguity.

How does one define C/C++ style macros in ruby/rails

I have noticed that ruby on rails has some small macros that have a c feel.
__FILE__ and __LINE__
For instance. This makes debugging and generating unique cache keys much easier.
store = Rails.cache.fetch("#{__FILE__}|#{__LINE__}|#{self.store_id}")...
Most handy when I am likely to get the same store over and over for a given set of orders.
What I'm wondering though is; can I create my own MACROS/defines? Precompiled code while not a good idea sometimes, does have its place.
QUESTION: Is there a way to create c style macros in ruby/rails?
They look like C but they are not macros, they are just objects, like everything else in ruby, these in particular having the power to determine the name and line number of the file currently being interpreted.
C macros are just references to compile-time executable code. Ruby is ... like, totally, completely opposite from this :-) Compilation in ruby is really syntax evaluation and then interpretation. If some languages like C, C++ and Java are early-binding and others like PHP and Python are late-binding, ruby has missed the bus and will get there at the last possible moment.
So if you want to create your own stuff, create a module and/or class having something inside. What you create inside will be a variable, symbol, constant, or method, perhaps local, of the instance, or of the class, perhaps public, protected or private. It's generally not good form to use the special naming conventions __FOO__ or others (even if they are valid names).
(end of answer... if I am hearing what you're asking, perhaps this will help you make the mental leap that is needed)...
I went from C, to C++, to Java, then to Ruby. After a brief time in the clinic for reformed Java programmers, I was able to understand how amazingly powerful ruby is. But you really have to think about how to solve problems differently.
I highly recommend the book Eloquent Ruby, linked here to Amazon.
__FILE__ and __LINE__ aren't macros per se, they're keywords (that return String values).
Having said that, you are allowed to define constants and methods that start with an underscore or _ character. For example:
__MICKEY__ = :mouse
Now, a constant can refer to anything—a String, an expression, even a code block (a Proc or lambda) or a whole Class.
That means that instead of compile-time macro expansion (Ruby, being an interpreted language, doesn't really have a 'compile phase') you can take advantage of (arguably more powerful) run-time code execution.
For instance, the C++ macro:
#define SUM(a, b, c) a + b + c
Can be suitably approximated in Ruby as
def SUM(a, b, c); a + b + c; end
I say 'approximated' because a) in this case we're defining a method, not a constant or a macro, and b) C++ macros can still work when evaluated with a different number of arguments than they expect (because C++ macros are expanded, meaning, it's as if we did find and replace), whereas Ruby method call semantics come into play, such as checking arity of methods and arguments.
To get around that, in our particular example, we can do (something arguably even more flexible)
def SUM(*args); args.inject(:+); end
SUM(1, 2) # => 3
SUM(1, 2, 3, 5) # => 11
As you can see, you can achieve a similar (possibly even greater) level of expressiveness that C++ macros provide by leveraging Ruby's dynamic nature.

Is the "expression problem" solvable in F#?

I've been watching an interesting video in which type classes in Haskell are used to solve the so-called "expression problem". About 15 minutes in, it shows how type classes can be used to "open up" a datatype based on a discriminated union for extension -- additional discriminators can be added separately without modifying / rebuilding the original definition.
I know type classes aren't available in F#, but is there a way using other language features to achieve this kind of extensibility? If not, how close can we come to solving the expression problem in F#?
Clarification: I'm assuming the problem is defined as described in the previous video
in the series -- extensibility of the datatype and operations on the datatype with the features of code-level modularization and separate compilation (extensions can be deployed as separate modules without needing to modify or recompile the original code) as well as static type safety.
As Jörg pointed out in a comment, it depends on what you mean by solve. If you mean solve including some form of type-checking that the you're not missing an implementation of some function for some case, then F# doesn't give you any elegant way (and I'm not sure if the Haskell solution is elegant). You may be able to encode it using the SML solution mentioned by kvb or maybe using one of the OO based solutions.
In reality, if I was developing a real-world system that needs to solve the problem, I would choose a solution that doesn't give you full checking, but is much easier to use.
A sketch would be to use obj as the representation of a type and use reflection to locate functions that provide implementation for individual cases. I would probably mark all parts using some attribute to make checking easier. A module adding application to an expression might look like this:
[<Extends("Expr")>] // Specifies that this type should be treated as a case of 'Expr'
type App = App of obj * obj
module AppModule =
[<Implements("format")>] // Specifies that this extends function 'format'
let format (App(e1, e2)) =
// We don't make recursive calls directly, but instead use `invoke` function
// and some representation of the function named `formatFunc`. Alternatively
// you could support 'e1?format' using dynamic invoke.
sprintfn "(%s %s)" (invoke formatFunc e1) (invoke formatFunc e2)
This does not give you any type-checking, but it gives you a fairly elegant solution that is easy to use and not that difficult to implement (using reflection). Checking that you're not missing a case is not done at compile-time, but you can easily write unit tests for that.
See Vesa Karvonen's comment here for one SML solution (albeit cumbersome), which can easily be translated to F#.
I know type classes aren't available in F#, but is there a way using other language features to achieve this kind of extensibility?
I do not believe so, no.
If not, how close can we come to solving the expression problem in F#?
The expression problem is about allowing the user to augment your library code with both new functions and new types without having to recompile your library. In F#, union types make it easy to add new functions (but impossible to add new union cases to an existing union type) and class types make it easy to derive new class types (but impossible to add new methods to an existing class hierarchy). These are the two forms of extensibility required in practice. The ability to extend in both directions simultaneously without sacrificing static type safety is just an academic curiosity, IME.
Incidentally, the most elegant way to provide this kind of extensibility that I have seen is to sacrifice type safety and use so-called "rule-based programming". Mathematica does this. For example, a function to compute the symbolic derivative of an expression that is an integer literal, variable or addition may be written in Mathematica like this:
D[_Integer, _] := 0
D[x_Symbol, x_] := 1
D[_Symbol, _] := 0
D[f_ + g_, x_] := D[f, x] + D[g, x]
We can retrofit support for multiplication like this:
D[f_ g_, x_] := f D[g, x] + g D[f, x]
and we can add a new function to evaluate an expression like this:
E[n_Integer] := n
E[f_ + g_] = E[f] + E[g]
To me, this is far more elegant than any of the solutions written in languages like OCaml, Haskell and Scala but, of course, it is not type safe.

Alpha renaming in many languages

I have what I imagine will be a fairly involved technical challenge: I want to be able to reliably alpha-rename identifiers in multiple languages (as many as possible). This will require special consideration for each language, and I'm asking for advice for how to minimize the amount of work I need to do by sharing code. Something like a unified parsing or abstract syntax framework that already has support for many languages would be great.
For example, here is some python code:
def foo(x):
def bar(y):
return x+y
return bar
An alpha renaming of x to y changes the x to a y and preserves semantics. So it would become:
def foo(y):
def bar(y1):
return y+y1
return bar
See how we needed to rename y to y1 in order to keep from breaking the code? That is why this is a hard problem. It seems like the program would have to have a pretty good knowledge of what constitutes a scope, rather than just doing, say, a string search and replace.
I would also like to preserve as much of the formatting as possible: comments, spacing, indentation. But that is not 100% necessary, it would just be nice.
Any tips?
To do this safely, you need to be able to to determine
all the identifiers (and those things that are not, e.g., the middle of a comment) in your code
the scopes of validity for each identifer
the ability to substitute a new identifier for an old one in the text
the ability to determine if renaming an identifier causes another name to be shadowed
To determine identifiers accurately, you need a least a langauge-accurate lexer. Identifiers in PHP look different than the do in COBOL.
To determine scopes of validity, you have to be determine program structure in practice, since most "scopes" are defined by such structure. This means you need a langauge-accurate parser; scopes in PHP are different than scopes in COBOL.
To determine which names are valid in which scopes, you need to know the language scoping rules. Your language may insist that the identifier X will refer to different Xes depending on the context in which X is found (consider object constructors named X with different arguments). Now you need to be able to traverse the scope structures according to the naming rules. Single inheritance, multiple inheritance, overloading, default types all will pretty much require you to build a model of the scopes for the programs, insert the identifiers and corresponding types into each scope, and then climb from the point of encounter of an identifier in the program text through the various scopes according to the language semantics. You will need symbol tables, inheritance linkages, ASTs, and the ability to navigage all of these. These structures are different from PHP and COBOL, but they share lots of common ideas so you likely need a library with the common concept support.
To rename an identifier, you have to modify the text. In a million lines of code, you need to point carefully. Modifying an AST node is one way to point carefully. Actually, you need to modify all the identifiers that correspond to the one being renamed; you have to climb over the tree to find them all, or record in the AST where all the references exist so they can be found easily. After modifyingy the tree you have to regenerate the source text after modifying the AST. That's a lot of machinery; see my SO answer on how to prettyprint ASTs preseriving all of the stuff you reasonably suggest should be preserved.
(Your other choice is to keep track in the AST of where the text for the string is,
and the read/patch/write the file.)
Before you update the file, you need to check that you haven't shadowed something. Consider this code:
{ local x;
x=1;
{local y;
y=2;
{local z;
z=y
print(x);
}
}
}
We agree this code prints "1". Now we decide to rename y to x.
We've broken the scoping, and now the print statement which referred
conceptually to the outer x refers to an x captured by the renamed y. The code now prints "2", so our rename broke it. This means that one must check all the other identifiers in scopes in which the renamed variable might be found, to see if the new name "captures" some name we weren't expecting. (This would be legal if the print statement printed z).
This is a lot of machinery.
Yes, there is a framework that has almost all of this as well as a number of robust language front ends. See our DMS Software Reengineering Toolkit. It has parsers producing ASTs, prettyprinters to produce text back from ASTs, generic symbol table management machinery (including support for multiple inheritance), AST visiting/modification machinery. Ithas prettyprinting machinery to turn ASTs back into text. It has front ends for C, C++, COBOL and Java that implement name and type resolution (e.g. instanting symbol table scopes and identifier to symbol table entry mappings); it has front ends for many other langauges that don't have scoping implemented yet.
We've just finished an exercise in implementing "rename" for Java. (All the above issues of course appeared). We about about to start one for C++.
You could try to create Xtext based implementations for the involved languages. The Xtext framework provides reliable infrastructure for cross language rename refactoring. However, you'll have to provide a grammar a at least a "good enough" scope resolution for each language.
Languages mostly guarantee tokens will be unique, whatever the context. A naive first approach (and this will break many, many pieces of code) would be:
cp file file.orig
sed -i 's/\b(newTokenName)\b/TEMPTOKEN/g' file
sed -i 's/\b(oldTokenName)\b/newTokenName/g' file
With GNU sed, this will break on PHP. Rewriting \b to a general token match, like ([^a-zA-Z~$-_][^a-zA-Z0-9~$-_]) would work on most C, Java, PHP, and Python, but not Perl (need to add # and % to the token characters. Beyond that, it would require a plugin architecture that works for any language you wanted to add. At some point, there will be two languages whose variable and function naming rules will be incompatible, and at that point, you'll need to do more and more in the plugin.

Resources