id values of different variables in python 3 - memory

I am able to understand immutability with python (surprisingly simple too). Let's say I assign a number to
x = 42
print(id(x))
print(id(42))
On both counts, the value I get is
505494448
My question is, does python interpreter allot ids to all the numbers, alphabets, True/False in the memory before the environment loads? If it doesn't, how are the ids kept track of? Or am I looking at this in the wrong way? Can someone explain it please?

What you're seeing is an implementation detail (an internal optimization) calling interning. This is a technique (used by implementations of a number of languages including Java and Lua) which aliases names or variables to be references to single object instances where that's possible or feasible.
You should not depend on this behavior. It's not part of the language's formal specification and there are no guarantees that separate literal references to a string or integer will be interned nor that a given set of operations (string or numeric) yielding a given object will be interned against otherwise identical objects.
I've heard that the C Python implementation does include a set of the first hundred or so integers as statically instantiated immutable objects. I suspect that other very high level language run-time libraries are likely to include similar optimizations: the first hundred integers are used very frequently by most non-trivial fragments of code.
In terms of how such things are implemented ... for strings and larger integers it would make sense for Python to maintain these as dictionaries. Thus any expression yielding an integer (and perhaps even floats) and strings (at least sufficiently short strings) would be hashed, looked up in the appropriate (internal) object dictionary, added if necessary and then returned as references to the resulting object.
You can do your own similar interning of any sorts of custom object you like by wrapping the instantiation in your own calls to your own class static dictionary.

Related

How does Rust store types at runtime?

A u32 takes 4 bytes of memory, a String takes 3 pointer-sized integers (for location, size, and reserved space) on the stack, plus some amount on the heap.
This to me implies that Rust doesn't know, when the code is executed, what type is stored at a particular location, because that knowledge would require more memory.
But at the same time, does it not need to know what type is stored at 0xfa3d2f10, in order to be able to interpret the bytes at that location? For example, to know that the next bytes form the spec of a String on the heap?
How does Rust store types at runtime?
It doesn't, generally.
Rust doesn't know, when the code is executed, what type is stored at a particular location
Correct.
does it not need to know what type is stored
No, the bytes in memory should be correct, and the rest of the code assumes as much. The offsets of fields in a struct are baked-in to the generated machine code.
When does Rust store something like type information?
When performing dynamic dispatch, a fat pointer is used. This is composed of a pointer to the data and a pointer to a vtable, a collection of functions that make up the interface in question. The vtable could be considered a representation of the type, but it doesn't have a lot of the information that you might think goes into "a type" (unless the trait requires it). Dynamic dispatch isn't super common in Rust as most people prefer static dispatch when it's possible, but both techniques have their benefits.
There's also concepts like TypeId, which can represent one specific type, but only of a subset of types. It also doesn't provide much capability besides "are these the same type or not".
Isn't this all terribly brittle?
Yes, it can be, which is one of the things that makes Rust so interesting.
In a language like C or C++, there's not much that safeguards the programmer from making dumb mistakes that go out and mess up those bytes floating around in memory. Making those mistakes is what leads to bugs due to memory safety. Instead of interpreting your password as a password, it's interpreted as your username and printed out to an attacker (oops!)
Rust provides safeguards against that in the form of a strong type system and tools like the borrow checker, but still all done at compile time. Unsafe Rust enables these dangerous tools with the tradeoff that the programmer is now expected to uphold all the guarantees themselves, much like if they were writing C or C++ again.
See also:
When does type binding happen in Rust?
How does Rust implement reflection?
How do I print the type of a variable in Rust?
How to introspect all available methods and members of a Rust type?

There seem to be a lot of ruby methods that are very similar, how do I pick which one to use?

I'm relatively new to Ruby, so this is a pretty general question. I have found through the Ruby Docs page a lot of methods that seem to do the exact same thing or very similar. For example chars vs split(' ') and each vs map vs collect. Sometimes there are small differences and other times I see no difference at all.
My question here is how do I know which is best practice, or is it just personal preference? I'm sure this varies from instance to instance, so if I can learn some of the more important ones to be cognizant of I would really appreciate that because I would like to develop good habits early.
I am a bit confused by your specific examples:
map and collect are aliases. They don't "do the exact same thing", they are the exact same thing. They are just two names for the same method. You can use whatever name you wish, or what reads best in context, or what your team has decided as a Coding Standard. The Community seems to have settled on map.
each and map/collect are completely different, there is no similarity there, apart from the general fact that they both operate on collections. map transform a collection by mapping every element to a new element using a transformation operation. It returns a new collection (an Array, actually) with the transformed elements. each performs a side-effect for every element of the collection. Since it is only used for its side-effect, the return value is irrelevant (it might just as well return nil like Kernel#puts does, in languages like C, C++, Java, C♯, it would return void), but it is specified to always return its receiver.
split splits a String into an Array of Strings based on a delimiter that can be either a Regexp (in which case you can also influence whether or not the delimiter itself gets captured in the output or ignored) or a String, or nil (in which case the global default separator gets used). chars returns an Array with the individual characters (represented as Strings of length 1, since Ruby doesn't have an specific Character type). chars belongs together in a family with bytes and codepoints which do the same thing for bytes and codepoints, respectively. split can only be used as a replacement for one of the methods in this family (chars) and split is much more general than that.
So, in the examples you gave, there really isn't much similarity at all, and I cannot imagine any situation where it would be unclear which one to choose.
In general, you have a problem and you look for the method (or combination of methods) that solve it. You don't look at a bunch of methods and look for the problem they solve.
There'll typically be only one method that fits a specific problem. Larger problems can be broken down into different subproblems in different ways, so it is indeed possible that you may end up with different combinations of methods to solve the same larger problem, but for each individual subproblem, there will generally be only one applicable method.
When documentation states that 2 methods do the same, it's just matter of preference. To learn the details, you should always start with Ruby API documentation

Why does F# Set need IComparable?

So I am trying to use the F# Set as a hash table. But my element type doesn't implement the IComparable interface (although it implements IEquatable). I got an error saying the construction is not allowed because of comparison constraint. And through some further read, I discovered that F# Set is implemented using binary tree, which makes insertion causes O(log(n)). This looks weird to me, why is the Set structure designed this way?
Edit: So I learned that Set in F# is actually a SortedSet. And I guess the question becomes, why is Sorted Set somehow more preferable than a general Hash Set as an immutable/functional data structure?
There are two important points that should help you understand how sets in F# (and in functional languages in general) work and how they are used:
Implementing immutable hashtables (like .NET HashSet) is hard - when you remove or add elements, you want to avoid copying everything in the data structure and (as far as I know) there is no general way of doing that (you would end up copying too much, so it would be inefficient).
For this reason, most functional sets are implemented as (some form of trees). Those require comparison to build a sorted tree. The nice property of balanced trees is that removing and adding elements does not have to copy everything in the tree, so even the worst case scenario is reasonably efficient (though mutable hashtable is still faster).
Now, F# is functional-first, which means that immutable structures are preferred, but it is perfectly fine to use mutable data structures (especially if you limit the usage to some well defined and restricted scope). For this reason, F# programmers often use Dictionary or HashSet, especially when this is only within the scope of a single function.

Why don't F# lists have a tail pointer

Or phrased another way, what kind of benefits do you get from having a basic, singly linked list with only a head pointer? The benefits of a tail pointer that I can see are:
O(1) list concatenation
O(1) Appending stuff to the right side of the list
Both of which are rather convenient things to have, as opposed to O(n) list concatenation (where n is the length of the left-side list?). What advantages does dropping the tail pointer have?
F#, like many other functional[-ish] languages, has a cons-list (the terminology originally comes from LISP, but the concept is the same). In F# the :: operator (or List.Cons) is used for cons'ing: note the signature is ‘a –> ‘a list –> ‘a list (see Mastering F# Lists).
Do not confuse a cons-list with an opaque Linked List implementation which contains a discrete first[/last] node - every cell in a cons-list is the start of a [different] list! That is, a "list" is simply the chain of cells that starts at a given cons-cell.
This offers some advantages when used in a functional-like manner: one is that all the "tail" cells are shared and since each cons-cell is immutable (the "data" might be mutable, but that's a different issue) there is no way to make a change to a "tail" cell and flub up all the other lists which contain said cell.
Because of this property, [new] lists can be efficiently built - that is, they do not require a copy - simply by cons'ing to the front. In addition, it is also very efficient to deconstruct a list to head :: tail - once again, no copy - which is often very useful in recursive functions.
This immutable property generally does not exist in a [standard mutable] Double Linked List implementation in that appending would add side-effects: the internal 'last' node (the type is now opaque) and one of the "tail" cells are changed. (There are immutable sequence types that allow an "effectively constant time" append/update such as immutable.Vector in Scala -- however, these are heavy-weight objects compared to a cons-list that is nothing more than a series of cells cons'ed together.)
As mentioned, there are also disadvantages a cons-list is not appropriate for all tasks - in particular, creating a new list except by cons'ing to the head is an O(n) operation, fsvo n, and for better (or worse) the list is immutable.
I would recommend creating your own version of concat to see how this operation is really done. (The article Why I love F#: Lists - The Basics covers this.)
Happy coding.
Also see related post: Why can you only prepend to lists in functional languages?
F# lists are immutable, there's no such thing as "append/concat", rather there's just creating new lists (that may reuse some suffixes of old lists). Immutability has many advantages, outside the scope of this question. (All pure languages, and most functional languages have this data structure, it is not an F#-ism.)
See also
http://diditwith.net/2008/03/03/WhyILoveFListsTheBasics.aspx
which has good picture diagrams to explain things (e.g. why consing on the front is cheaper than at the end for an immutable list).
In addition to what the others said: if you need efficient, but yet immutable data structures (which should be an idiomatic F# way), you have to consider reading Chris Okasaki, Purely Functional Data Structures. There is also a thesis available (on which the book is based).
In addition to what has been already said, the Introducing Functional Programming section on MSDN has an article about Working with Functional Lists that explains how lists work and also implements them in C#, so it may be a good way to understand how they work (and why adding reference to the last element would not allow efficient implementation of append).
If you need to append things to the end of the list, as well as to the front, then you need a different data structure. For example, Norman Ramsey posted source code for DList which has these properties here (The implementation is not idiomatic F#, but it should be easy to fix).
If you find you want a list with better performance for append operations, have a look at the QueueList in the F# PowerPack and the JoinList in the FSharpx extension libraries.
QueueList encapsulates two lists. When you prepend using the cons, it prepends an element to the first list, just as a cons-list. However, if you want to append a single element, it can be pushed to the top of the second list. When the first list runs out of elements, List.rev is run on the second list, and the two are swapped putting your list back in order and freeing the second list to append new elements.
JoinList uses a discriminated union to more efficiently append whole lists and is a bit more involved.
Both are obviously less performant for standard cons-list operations but offer better performance for other scenarios.
You can read more about these structures in the article Refactoring Pattern Matching.
As others have pointed out, an F# list could be represented by a data structure:
List<T> { T Value; List<T> Tail; }
From here, the convention is that a list goes from the List you have a reference to until Tail is null. Based on that definition, the benefits/features/limitations in the other answers come naturally.
However, your original question seems to be why the list is not defined more like:
List<T> { Node<T> Head; Node<T> Tail; }
Node<T> { T Value; Node<T> Next; }
Such a structure would allow both appending and prepending to the list without any visible effects to the a reference to the original list, since it still only sees a "window" of the now expanded list. Although this would (sometimes) allow O(1) concatenation, there are several issues such a feature would face:
The concatenation only works once. This can lead to unexpected performance behavior where one concatenation is O(1), but the next is O(n). Say for example:
listA = makeList1 ()
listB = makeList2 ()
listC = makeList3 ()
listD = listA + listB //modified Node at tail of A for O(1)
listE = listA + listC //must now make copy of A to concat with C
You could argue that the time savings for the cases where possible are worth it, but the surprise of not knowing when it will be O(1) and when O(n) are strong arguments against the feature.
All lists now take up twice as much space, even if you never plan to concatenate them.
You now have a separate List and Node type. In the current implementation, I believe F# only uses a single type like the beginning of my answer. There may be a way to do what you are suggesting with only one type, but it is not obvious to me.
The concatenation requires mutating the original "tail" node instance. While this shouldn't affect programs, it is a point of mutation, which most functional languages tend to avoid.
Or phrased another way, what kind of benefits do you get from having a basic, singly linked list with only a head pointer? The benefits of a tail pointer that I can see are:
O(1) list concatenation
O(1) Appending stuff to the right side of the list
Both of which are rather convenient things to have, as opposed to O(n) list concatenation (where n is the length of the left-side list?).
If by "tail pointer" you mean a pointer from every list to the last element in the list, that alone cannot be used to provide either of the benefits you cite. Although you could then get at the last element in the list quickly, you cannot do anything with it because it is immutable.
You could write a mutable doubly-linked list as you say but the mutability would make programs using it significantly harder to reason about because every function you call with one might change it.
As Brian said, there are purely functional catenable lists. However, they are many times slower at common operations than the simple singly-linked list that F# uses.
What advantages does dropping the tail pointer have?
30% less space usage and better performance on virtually all list operations.

Pointer to generic type

In the process of transforming a given efficient pointer-based hash map implementation into a generic hash map implementation, I stumbled across the following problem:
I have a class representing a hash node (the hash map implementation uses a binary tree)
THashNode <KEY_TYPE, VALUE_TYPE> = class
public
Key : KEY_TYPE;
Value : VALUE_TYPE;
Left : THashNode <KEY_TYPE, VALUE_TYPE>;
Right : THashNode <KEY_TYPE, VALUE_TYPE>;
end;
In addition to that there is a function that should return a pointer to a hash node. I wanted to write
PHashNode = ^THashNode <KEY_TYPE, VALUE_TYPE>
but that doesn't compile (';' expected but '<' found).
How can I have a pointer to a generic type?
And adressed to Barry Kelly: if you read this: yes, this is based on your hash map implementation. You haven't written such a generic version of your implementation yourself, have you? That would save me some time :)
Sorry, Smasher. Pointers to open generic types are not supported because generic pointer types are not supported, although it is possible (compiler bug) to create them in certain circumstances (particularly pointers to nested types inside a generic type); this "feature" can't be removed in an update in case we break someone's code. The limitation on generic pointer types ought to be removed in the future, but I can't make promises when.
If the type in question is the one in JclStrHashMap I wrote (or the ancient HashList unit), well, the easiest way to reproduce it would be to change the node type to be a class and pass around any double-pointers as Pointer with appropriate casting. However, if I were writing that unit again today, I would not implement buckets as binary trees. I got the opportunity to write the dictionary in the Generics.Collections unit, though with all the other Delphi compiler work time was too tight before shipping for solid QA, and generic feature support itself was in flux until fairly late.
I would prefer to implement the hash map buckets as one of double-hashing, per-bucket dynamic arrays or linked lists of cells from a contiguous array, whichever came out best from tests using representative data. The logic is that cache miss cost of following links in tree/list ought to dominate any difference in bucket search between tree and list with a good hash function. The current dictionary is implemented as straight linear probing primarily because it was relatively easy to implement and worked with the available set of primitive generic operations.
That said, the binary tree buckets should have been an effective hedge against poor hash functions; if they were balanced binary trees (=> even more modification cost), they would be O(1) on average and O(log n) worst case performance.
To actually answer your question, you can't make a pointer to a generic type, because "generic types" don't exist. You have to make a pointer to a specific type, with the type parameters filled in.
Unfortunately, the compiler doesn't like finding angle brackets after a ^. But it will accept the following:
TGeneric<T> = record
value: T;
end;
TSpecific = TGeneric<string>;
PGeneric = ^TSpecific;
But "PGeneric = ^TGeneric<string>;" gives a compiler error. Sounds like a glitch to me. I'd report that over at QC if I was you.
Why are you trying to make a pointer to an object, anyway? Delphi objects are a reference type, so they're pointers already. You can just cast your object reference to Pointer and you're good.
If Delphi supported generic pointer types at all, it would have to look like this:
type
PHashNode<K, V> = ^THashNode<K, V>;
That is, mention the generic parameters on the left side where you declare the name of the type, and then use those parameters in constructing the type on the right.
However, Delphi does not support that. See QC 66584.
On the other hand, I'd also question the necessity of having a pointer to a class type at all. Generic or not. they are needed only very rarely.
There's a generic hash map called TDictionary in the Generics.Collections unit. Unfortunately, it's badly broken at the moment, but it's apparently going to be fixed in update #3, which is due out within a matter of days, according to Nick Hodges.

Resources