I find myself constantly running into bugs where I access Hashes with symbols instead of strings and vice versa. I would like to do:
require 'active_support/hash_with_indifferent_access'
Hash = HashWithIndifferentAccess
# (irb):xx: warning: already initialized constant Hash
This warning is because Hash is already defined, but I don't really care.
Sure, the performance might be slightly worse.
Sure, maybe some gem creates a hash with strings AND symbols and will break -- but how common is that?
Is this bad? Anti-pattern? Will I regret doing this? What are the downsides? Any good/bad experiences doing this?
What do you think {}.class is going to be after your kludge? Right, it will be Hash.
What do you think Hash[:a, 'a'] will do? Right, SystemStackError for hopefully obvious reasons.
So yes, this is a bad idea, you will regret it, and an important downside is that it doesn't actually do what you think it will do.
The big downside is that you are making your Hash functions become O(n) instead of O(1). Comparing strings is much more demanding than comparing symbols so I'd say that would be a significant performance hit.
You shouldn't really be running into bugs when accessing Hashes with symbols since they are very similar to strings, maybe reading this article helps: Differences between symbols and strings
Related
I am learning in Ruby on Rails 4.0 that Rails has the ability to reference a hash's values, via a key that can either be a symbol or string, using the class HashWithIndifferentAccess. For example, the params hash can be referenced through both a symbol or a string, because it uses the class HashWithIndifferentAccess.
i.e.: params["id"] and params[:id] --> both access the id in the params hash
Although both can be used and are acceptable by Rails, is there clearly one preferred over other, either for best practice/performance reasons? My initial thought was that it would be better to use symbols due to the fact that once they are stored, they retain that piece of memory. This is contrasted against strings, which a new piece of memory is needed for every string.
Is this correct? Or does it truly not matter whether strings or symbols are used?
Ruby Strings are mutable, which can bring some unpredictability and reduced performance. For these reasons Ruby also offers the choice of Symbols. The big difference would be that Symbols are immutable. While mutable object can be changed the immutable ones can only be overwritten. Symbols play more nicely with the memory and thus gain on performance, but if not careful enough the memory footprint of your app will increase.
Using Strings or Symbols comes down to understanding both of the terms and how they will serve the purpose in benefit of the overall app health and performance. This is the reason why maybe there is no strict definition of where to use String and where Symbol.
What I would maybe give as a guidance of what is recommended where:
Symbol - internal identifiers, variables not meant to change
String - variables that are changing or printed out
HashWithIndifferentAccess ends up mapping internally all the symbols to strings.
h = ActiveSupport::HashWithIndifferentAccess.new(test: 'test')
If you try to retrieve the keys of the created hash (h) you will get the keys as strings
h.keys # => ["test"]
A couple highlights:
First, as of Ruby 2.3, string immutability (objects with same content point to same place in memory) is an option. From what I understand, this functionality is going to become the default in Ruby 3.0. Here's an example:
# frozen_string_literal: true
a = 'foo'
b = 'foo'
puts a.object_id
puts b.object_id
puts a.equal?(b)
Run the file:
➜ ruby test.rb
70186103229600
70186103229600
true
Second, symbols are generally preferred as hash keys (see this popular style guide).
If you're interested in learning more about how HashWithIndifferentAccess works, here's a good blog post. The short version is that all keys are converted to strings behind the scenes.
I'm working in a ruby app in which symbols are used in various places where one would usually use strings or enums in other languages (to specify configurations mostly).
So my question is, why should I not add a to_str method to symbol?
It seems seems sensible, as it allows implicit conversion between symbol and string. So I can do stuff like this without having to worry about calling :symbol.to_s:
File.join(:something, "something_else") # => "something/something_else"
The negative is the same as the positive, it implicitly converts symbols to strings, which can be REALLY confusing if it causes an obscure bug, but given how symbols are generally used, I'm not sure if this is a valid concern.
Any thoughts?
when an object does respond_to? :to_str, you expect him to really act like a String. This means it should implement all of String's methods, so you could potentially break some code relying on this.
to_s means that you get a string representation of your object, that's why so many objects implement it - but the string you get is far from being 'semantically' equivalent to your object ( an_hash.to_s is far from being a Hash ). :symbol.to_str's absence reflects this : a symbol is NOT and MUST NOT be confused with a string in Ruby, because they serve totally different purposes.
You wouldn't think about adding to_str to an Int, right ? Yet an Int has a lot of common with a symbol : each one of them is unique. When you have a symbol, you expect it to be unique and immutable as well.
You don't have to implicitly convert it right? Because doing something like this will automatically coerce it to a string.
"#{:something}/something_else" # "something/something_else"
The negative is what you say--at one point, anyway, some core Ruby had different behavior based on symbol/string. I don't know if that's still the case. The threat alone makes me a little twitchy, but I don't have a solid technical reason at this point. I guess the thought of making a symbol more string-like just makes me nervous.
I.e, if I have a record
-record(one, {frag, left}).
Is record_info(fields, one) going to always return [frag,
left]?
Is tl(tuple_to_list(#one{frag = "Frag", left = "Left"}))
always gonna be ["Frag", "Left"]?
Is this an implementation detail?
Thanks a lot!
The short answer is: yes, as of this writing it will work. The better answer is: it may not work that way in the future, and the nature of the question concerns me.
It's safe to use record_info/2, although relying on the order may be risky and frankly I can't think of a situation where doing so makes sense which implies that you are solving a problem the wrong way. Can you share more details about what exactly you are trying to accomplish so we can help you choose a better method? It could be that simple pattern matching is all you need.
As for the example with tuple_to_list/1, I'll quote from "Erlang Programming" by Cesarini and Thompson:
"... whatever you do, never, ever use the tuple representations of records in your programs. If you do, the authors of this book will disown you and deny any involvement in helping you learn Erlang."
There are several good reasons why, including:
Your code will become brittle - if you later change the number of fields or their order, your code will break.
There is no guarantee that the internal representation of records will continue to work this way in future versions of erlang.
Yes, order is always the same because records represented by tuples for which order is an essential property. Look also on my other answer about records with examples: Syntax Error while accessing a field in a record
Yes, in both cases Erlang will retain the 'original' order. And yes it's implementation as it's not specifically addressed in the function spec or documentation, though it's a pretty safe bet it will stay like that.
I'm starting to get comfortable with Ruby/Rails but must admit I still look askance when I see an unfamiliar block. take the following code:
(5..10).reduce(0) do |sum, value|
sum + value
end
I know what it does...but, how does one know the order of the parameters passed into a block in Ruby? Are they taken in order? How do you quickly know what they represent?
I'm assuming one must look at the source (or documentation) to uncover what's being yielded...but is there a shortcut? I guess I'm wondering how the old vets quickly discern what a block is doing?!? How should one approach looking at/interpreting blocks?
You just have to look it up in the documentation until you have it memorized. I still have trouble with reduce and a couple others. It's just like trying to remember the argument order for ordinary methods. Programmers have to deal with this problem in pretty much every language.
When you write code, there's no other way than checking the documentation - even if Ruby is quite consistent and coherent in this kind of things, so often you just expect things to work on a particular way.
On the other hand, when you read code, you can just hope that the coder has been smart and kind enough to use consistent variable names. In your example
(5..10).reduce(0) do |sum, value|
sum + value
end
There is a reason if the variables are called sum and value! :-) Something like
(5..10).reduce(0) {|i,j|i+j}
is of course the same, but much less readable. So the lesson here is: write good code and you'll convey some piece of information more than just instructions to a computer!
Are there any downsides to passing in an Erlang record as a function argument?
There is no downside, unless the caller function and the called function were compiled with different 'versions' of the record.
Some functions from erlangs standard library do indeed use records in their interfaces (I can't recall which ones, right now--but there are a few), but in my humble opinion, the major turnoff is, that the user will have to include your header file, just to use your function.
That seems un-erlangy to me (you don't ever do that normally, unless you're using said functions from the stdlib), creates weird inter-dependencies, and is harder to use from the shell (I wouldn't know from the top of my head how to load & use records from the shell -- I usually just "cheat" by constructing the tuple manually...)
Also, handling records is a bit different from the stuff you usually do, since their keys per default take the atom 'undefined' as value, au contraire to how you usually do it with proplists, for instance (a value that wasn't set just isn't there) -- this might cause some confusion for people who do not normally work a lot with records.
So, all-in-all, I'd usually prefer a proplist or something similar, unless I have a very good reason to use a record. I do usually use records, though, for internal state of for example a gen_server or a gen_fsm; It's somewhat easier to update that way.
I think the biggest downside is that it's not idiomatic. Have you ever seen an API that required you to construct a record and pass it in?
Why would you want to do something that's going to feel foreign to any erlang programmer? There's a convention already in use for optional named arguments to functions. Inventing yet another way without good cause is pointless.