Get the name of operands of a code from bytecode - analysis

I'm analyzing the bytecode of a class.
I can detect the opcode and the operands of it.
How can I get the the name of this (the name of the object)
For istance the opcode is new and i have as operand an integer (think a the follow code: "new String()")
Where I should serch for the name? In the constantPool of the class and how?
I'm not expert

ASM is the best choice. It is fast and has simple API and detailed User Guide. The framework completely abstracts you from dealing with constant pool and other class format structures.

Related

How does Rust store types at runtime?

A u32 takes 4 bytes of memory, a String takes 3 pointer-sized integers (for location, size, and reserved space) on the stack, plus some amount on the heap.
This to me implies that Rust doesn't know, when the code is executed, what type is stored at a particular location, because that knowledge would require more memory.
But at the same time, does it not need to know what type is stored at 0xfa3d2f10, in order to be able to interpret the bytes at that location? For example, to know that the next bytes form the spec of a String on the heap?
How does Rust store types at runtime?
It doesn't, generally.
Rust doesn't know, when the code is executed, what type is stored at a particular location
Correct.
does it not need to know what type is stored
No, the bytes in memory should be correct, and the rest of the code assumes as much. The offsets of fields in a struct are baked-in to the generated machine code.
When does Rust store something like type information?
When performing dynamic dispatch, a fat pointer is used. This is composed of a pointer to the data and a pointer to a vtable, a collection of functions that make up the interface in question. The vtable could be considered a representation of the type, but it doesn't have a lot of the information that you might think goes into "a type" (unless the trait requires it). Dynamic dispatch isn't super common in Rust as most people prefer static dispatch when it's possible, but both techniques have their benefits.
There's also concepts like TypeId, which can represent one specific type, but only of a subset of types. It also doesn't provide much capability besides "are these the same type or not".
Isn't this all terribly brittle?
Yes, it can be, which is one of the things that makes Rust so interesting.
In a language like C or C++, there's not much that safeguards the programmer from making dumb mistakes that go out and mess up those bytes floating around in memory. Making those mistakes is what leads to bugs due to memory safety. Instead of interpreting your password as a password, it's interpreted as your username and printed out to an attacker (oops!)
Rust provides safeguards against that in the form of a strong type system and tools like the borrow checker, but still all done at compile time. Unsafe Rust enables these dangerous tools with the tradeoff that the programmer is now expected to uphold all the guarantees themselves, much like if they were writing C or C++ again.
See also:
When does type binding happen in Rust?
How does Rust implement reflection?
How do I print the type of a variable in Rust?
How to introspect all available methods and members of a Rust type?

ejabberd - mnesia table record definition : " ::binary() "

I'm trying to understand the meaning and purpose of ::binary() that exist in record definition but don't really understand it. Appreciate if anyone can help me to understand this.
Example : mod_offline.hrl
This is a type declaration. This is described in the "Type Information in Record Declarations" section.
The meaning is that the value of that record field is supposed to be a binary. Since Erlang is a weakly typed language, the compiler doesn't care about this, but there is a static type checker called Dialyzer, that tries to find places in the code that puts something other than a binary in that field, or expects the field to hold something other than a binary.
For a gentle introduction to type specs and Dialyzer, see the Type Specifications and Erlang chapter of Learn You Some Erlang.

Compiler Design : Is "variable not declared" a syntactic error or semantic error?

Is such type of an error produced during type checking or when input is being parsed?
Under what type should the error be addressed?
The way I see it it is a semantic error, because your language parses just fine even though your are using an identifier which you haven't previously bound--i.e. syntactic analysis only checks the program for well-formed-ness. Semantic analysis actually checks that your program has a valid meaning--e.g. bindings, scoping or typing. As #pst said you can do scope checking during parsing, but this is an implementation detail. AFAIK old compilers used to do this to save some time and space, but I think today such an approach is questionable if you don't have some hard performance/memory constraints.
The program conforms to the language grammar, so it is syntactically correct. A language grammar doesn't contain any statements like 'the identifier must be declared', and indeed doesn't have any way of doing so. An attempt to build a two-level grammar along these lines failed spectacularly in the Algol-68 project, and it has not been attempted since to my knowledge.
The meaning, if any, of each is a semantic issue. Frank deRemer called issues like this 'static semantics'.
In my opinion, this is not strictly a syntax error - nor a semantic one. If I were to implement this for a statically typed, compiled language (like C or C++), then I would not put the check into the parser (because the parser is practically incapable of checking for this mistake), rather into the code generator (the part of the compiler that walks the abstract syntax tree and turns it into assembly code). So in my opinion, it lies between syntax and semantic errors: it's a syntax-related error that can only be checked by performing semantic analysis on the code.
If we consider a primitive scripting language however, where the AST is directly executed (without compilation to bytecode and without JIT), then it's the evaluator/executor function itself that walks the AST and finds the undeclared variable - in this case, it will be a runtime error. The difference lies between the "AST_walk()" routine being in different parts of the program lifecycle (compilation time and runtime), should the language be a scripting or a compiled one.
In the case of languages -- and there are many -- which require identifiers to be declared, a program with undeclared identifiers is ill-formed and thus a missing declaration is clearly a syntax error.
The usual way to deal with this is to incorporate information about symbols in a symbol table, so that the parse can use this information.
Here are a few examples of how identifier type affects parsing:
C / C++
A classic case:
(a)-b;
Depending on a, that's either a cast or a subtraction:
#include <stdio.h>
#if TYPEDEF
typedef double a;
#else
double a = 3.0;
#endif
int main() {
int b = 3;
printf("%g\n", (a)-b);
return 0;
}
Consequently, if a hadn't been declared at all, the compiler must reject the program as syntactically ill-formed (and that is precisely the word the standard uses.)
XML
This one is simple:
<block>Hello, world</blob>
That's ill-formed XML, but it cannot be detected with a CFG. (Nonetheless, all XML parsers will correctly reject it as ill-formed.) In the case of HTML/SGML, where end-tags may be omitted under some well-defined circumstances, parsing is trickier but nonetheless deterministic; again, the precise declaration of a tag will determine the parse of a valid input, and it's easy to come up with inputs which parse differently depending on declaration.
English
OK, not a programming language. I have lots of other programming language examples, but I thought this one might trigger some other intuitions.
Consider the two grammatically correct sentences:
The sheep is in the meadow.
The sheep are in the meadow.
Now, what about:
The cow is in the meadow.
(*) The cow are in the meadow.
The second sentence is intelligible, albeit ambiguous (is the noun or the verb wrong?) but it is certainly not grammatically correct. But in order to know that (and other similar examples), we have to know that sheep has an unmarked plural. Indeed, many animals have unmarked plurals, so I recognize all the following as grammatical:
The caribou are in the meadow.
The antelope are in the meadow.
The buffalo are in the meadow.
But definitely not:
(*) The mouse are in the meadow.
(*) The bird are in the meadow.
etc.
It seems that there is a common misconception that because the syntactic analyzer uses a context free grammar parser, that syntactic analysis is restricted to parsing a context free grammar. This is simply not true.
In the case of C (and family), the syntax analyzer uses a symbol table to help it parse. In the case of XML, it uses the tag stack, and in the case of generalize SGML (including HTML) it also uses tag declarations. Consequently, the syntax analyzer considered as a whole is more powerful than the CFG, which is just a part of the analysis.
The fact that a given program passes the syntax analysis does not mean that it is semantically correct. For example, the syntax analyser needs to know whether a is a type or not in order to correctly parse (a)-b, but it does not need to know whether the cast is in fact possible, in the case that it a is a type, or that a and b can meaningfully be subtracted, in the case that a is a variable. These verifications can happen during type analysis after the parse tree is built, but they are still compile-time errors.

What is the effect of type synonyms on instances of type classes? What does the TypeSynonymInstances pragma in GHC do?

I'm reading Real World Haskell Pg 151, and I've stared at the following passage for over an hour:
Recall that String is a synonym for
[Char], which in turn is the type [a]
where Char is substituted for the type
parameter a. According to Haskell 98's
rules, we are not allowed to supply a
type in place of a type parameter when
we write an instance. In other words,
it would be legal for us to write an
instance for [a], but not for [Char].
16 comments 5335
It simply isn't sinking in. Staring at the the (free not pirated) copy of RWH chapter 6 I see a lot of other people are really suffering with this. I still don't understand it from the comments...
Firstly, everything about this confuses me, so please if you feel you can explain anything about this passage, or TypeSynonymInstances please do.
Here is my problem:
Int is a data constructor
String is a data constructor AND type synonym
Now I can't answer these questions:
Why would a type synonym preclude the making the type a member of a type class (I'm looking for some reason which probably relates to compilation or implimentation of a type synonym)?
Why did the designers of the language, not want this syntax (I'm asking for reasoning not extensive theory or unicode math symbols).
I see this line "the type [a] where Char is substituted for the type parameter a", and I want to know why I can't substitute it for this "the type a where Int is substituted for the type parameter a".
Thanks!
I think part of the issue is that two, largely unrelated, restrictions are in play:
No type synonym instances means that instances can only be things declared with data or newtype, not type. This forbids String, but not [Char].
No flexible instances means that instances can only mention one type that isn't a variable, and only that type can be used as a type constructor. This forbids Maybe Int and f Int, but not Maybe a.
Here's what GHCi says about Int, Char, and String:
data Char = GHC.Types.C# GHC.Prim.Char#
data Int = GHC.Types.I# GHC.Prim.Int#
type String = [Char]
Int and Char are both simple types without type variable parameters; there's no type constructor involved, so you can make instances with them pretty much freely.
String, however, fails on both counts. It's a type synonym, which isn't allowed, and it's also a type constructor applied to a non-variable, namely the list type constructor applied to Char.
For comparison, note that [a], Maybe a, and Either a b are all valid in instances, but [Int], Maybe [a], and Either String a are forbidden; hopefully you can now see why.
As for your direct questions, I don't know what the original motivations were for designing the language that way, and I'm in no way qualified to make authoritative statements about "best practices", but for my own personal coding I don't really hesitate to use these pragmas:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE EmptyDataDecls #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
You could always go look at packages that use pragmas. Flexible instances, it seems, do get a fair amount of use, and from "respectable" packages (there's a couple hits in the source for Parsec, for instance).
Actually, neither Int nor String are data constructors. That is, you can't use them to create a value of
> (Int 42, String "bob")
<interactive>:1:1: Not in scope: data constructor `Int'
<interactive>:1:9: Not in scope: data constructor `String'
Int names a new, distinct, algebraic data type. String is a "type-synonym", or alias, for the already existing type: [Char]. The problem is that Haskell 98 says you can't use a type synonym in an instance declaration.
I can't say why authors of the Haskell 98 report choose to restrict type synonyms in this case. There are quite a number of restrictions on them. For example, they cannot be partially applied (if they take type arguments). I think a clue comes at the end of ยง4.2.2:
Type synonyms are a convenient, but
strictly syntactic, mechanism to make
type signatures more readable. A
synonym and its definition are
completely interchangeable, except in
the instance type of an instance
declaration (Section 4.3.2).
Presumably, there was an approach to program compilation, for which this syntactic interchangeability would have caused problems for instances. Perhaps it has to do with notable aspect of instances that they leak out of packages...
As to your last question, I believe that explanation is conflating two things: 1) String is a type synonym for [Char], which is in turn a specialization of the more general type [a] and 2) that even without the synonym, [Char] cannot be used in the head of an instance.
This second problem has nothing to do with type synonyms, but that instance heads must have all the type parameters to the type constructor be variables, not concrete types. That is, you can't define separate instances for [Int] and [Char] for some class, you can only defined an instances [a]. (Remember, that despite the convenient syntax, [] is a type constructor, and the thing inside is the type parameter.)
Again, I don't know why the report restricts these, but I suspect it also has to do with compilation strategy. Since GHC's compilation strategy for instances can handle this, you can relax this constraint in GHC via -XFlexibleInstances.
Finally, I've seen both extensions turned on in quite a lot of code, but perhaps someone with more Haskell experience can weigh in as to if they are "best practices" or no.
Haskell 98
an instance head must have the form C (T u1 ... uk), where T is a type constructor defined by a data or newtype declaration (see TypeSynonymInstances) and the ui are distinct type variables, and
each assertion in the context must have the form C' v, where v is one of the ui.
So it is valid to use instance ClassName TypeConstructor where and TypeConstructor MUST be such as Int, Double or [a], make sure that only can a type constructor be involved!!
BTW, [] is type constructor, so [TypeConstructor] cannot be used but [NewType] and [TypeVariable] are allowed.
This is a restriction is Haskell, and we can avoid it by enabling FlexibleInstances.
Int and String are types, not data construtors. String happens to be an alias for [Char] which can also be written List Char. A data constructor is something like Just, so Just 3 is a value of type Maybe Int. Type synonym instances are explained here:
http://hackage.haskell.org/trac/haskell-prime/wiki/TypeSynonymInstances

Pointer to generic type

In the process of transforming a given efficient pointer-based hash map implementation into a generic hash map implementation, I stumbled across the following problem:
I have a class representing a hash node (the hash map implementation uses a binary tree)
THashNode <KEY_TYPE, VALUE_TYPE> = class
public
Key : KEY_TYPE;
Value : VALUE_TYPE;
Left : THashNode <KEY_TYPE, VALUE_TYPE>;
Right : THashNode <KEY_TYPE, VALUE_TYPE>;
end;
In addition to that there is a function that should return a pointer to a hash node. I wanted to write
PHashNode = ^THashNode <KEY_TYPE, VALUE_TYPE>
but that doesn't compile (';' expected but '<' found).
How can I have a pointer to a generic type?
And adressed to Barry Kelly: if you read this: yes, this is based on your hash map implementation. You haven't written such a generic version of your implementation yourself, have you? That would save me some time :)
Sorry, Smasher. Pointers to open generic types are not supported because generic pointer types are not supported, although it is possible (compiler bug) to create them in certain circumstances (particularly pointers to nested types inside a generic type); this "feature" can't be removed in an update in case we break someone's code. The limitation on generic pointer types ought to be removed in the future, but I can't make promises when.
If the type in question is the one in JclStrHashMap I wrote (or the ancient HashList unit), well, the easiest way to reproduce it would be to change the node type to be a class and pass around any double-pointers as Pointer with appropriate casting. However, if I were writing that unit again today, I would not implement buckets as binary trees. I got the opportunity to write the dictionary in the Generics.Collections unit, though with all the other Delphi compiler work time was too tight before shipping for solid QA, and generic feature support itself was in flux until fairly late.
I would prefer to implement the hash map buckets as one of double-hashing, per-bucket dynamic arrays or linked lists of cells from a contiguous array, whichever came out best from tests using representative data. The logic is that cache miss cost of following links in tree/list ought to dominate any difference in bucket search between tree and list with a good hash function. The current dictionary is implemented as straight linear probing primarily because it was relatively easy to implement and worked with the available set of primitive generic operations.
That said, the binary tree buckets should have been an effective hedge against poor hash functions; if they were balanced binary trees (=> even more modification cost), they would be O(1) on average and O(log n) worst case performance.
To actually answer your question, you can't make a pointer to a generic type, because "generic types" don't exist. You have to make a pointer to a specific type, with the type parameters filled in.
Unfortunately, the compiler doesn't like finding angle brackets after a ^. But it will accept the following:
TGeneric<T> = record
value: T;
end;
TSpecific = TGeneric<string>;
PGeneric = ^TSpecific;
But "PGeneric = ^TGeneric<string>;" gives a compiler error. Sounds like a glitch to me. I'd report that over at QC if I was you.
Why are you trying to make a pointer to an object, anyway? Delphi objects are a reference type, so they're pointers already. You can just cast your object reference to Pointer and you're good.
If Delphi supported generic pointer types at all, it would have to look like this:
type
PHashNode<K, V> = ^THashNode<K, V>;
That is, mention the generic parameters on the left side where you declare the name of the type, and then use those parameters in constructing the type on the right.
However, Delphi does not support that. See QC 66584.
On the other hand, I'd also question the necessity of having a pointer to a class type at all. Generic or not. they are needed only very rarely.
There's a generic hash map called TDictionary in the Generics.Collections unit. Unfortunately, it's badly broken at the moment, but it's apparently going to be fixed in update #3, which is due out within a matter of days, according to Nick Hodges.

Resources