I'm working in a ruby app in which symbols are used in various places where one would usually use strings or enums in other languages (to specify configurations mostly).
So my question is, why should I not add a to_str method to symbol?
It seems seems sensible, as it allows implicit conversion between symbol and string. So I can do stuff like this without having to worry about calling :symbol.to_s:
File.join(:something, "something_else") # => "something/something_else"
The negative is the same as the positive, it implicitly converts symbols to strings, which can be REALLY confusing if it causes an obscure bug, but given how symbols are generally used, I'm not sure if this is a valid concern.
Any thoughts?
when an object does respond_to? :to_str, you expect him to really act like a String. This means it should implement all of String's methods, so you could potentially break some code relying on this.
to_s means that you get a string representation of your object, that's why so many objects implement it - but the string you get is far from being 'semantically' equivalent to your object ( an_hash.to_s is far from being a Hash ). :symbol.to_str's absence reflects this : a symbol is NOT and MUST NOT be confused with a string in Ruby, because they serve totally different purposes.
You wouldn't think about adding to_str to an Int, right ? Yet an Int has a lot of common with a symbol : each one of them is unique. When you have a symbol, you expect it to be unique and immutable as well.
You don't have to implicitly convert it right? Because doing something like this will automatically coerce it to a string.
"#{:something}/something_else" # "something/something_else"
The negative is what you say--at one point, anyway, some core Ruby had different behavior based on symbol/string. I don't know if that's still the case. The threat alone makes me a little twitchy, but I don't have a solid technical reason at this point. I guess the thought of making a symbol more string-like just makes me nervous.
Related
I should check existence of values based on some conditions.
i.e. i have 3 variables, varA, varB and varC. varC should not be empty only if varA>varB (condition).
i normally use some syntax to check any of the variables and run a frequency of any of them to see if there are errors:
if missing(varC) and (varA>varB) ck_varC=1.
if not(missing(varC)) and not(varA>varB) ck_varC=2.
exe.
fre ck_varC.
exe.
I had some errors when the condition became complex and when in the condition there are missing() or other functions but i could have made a mistake.
do you think there is an easier way of doing this checks?
thanks in advance
EDIT: here an example of what i mean, think at a questionnaire with some routing, you ask age to anyone, if they are between 17 and 44 ask them if they work, if they work ask them how many hours.
i have an excel tool where i put down all variables with all conditions, then it will generate the syntax in the example, all with the same structure for all variables, considering both situations, we have a value that shouldn't be there or we don't have a value that should be there.
is there an easier way of doing that? is this structure always valid no matter what is the condition?
In SPSS, missing values are not numbers. You need to explicitly program those scenarios as well. you got varC covered (partially), but no scenario where varA or varB have missing data is covered.
(As good practice, maybe you should initialize your check variable as sysmis or 0, using syntax):
numeric ck_varC (f1.0).
compute ck_varC=0.
if missing(varC) and (varA>varB) ck_varC=1.
if not(missing(varC)) and not(varA>varB) ck_varC=2.
***additional conditional scenarios go here:.
if missing(varA) or missing(varB) ck_varC=3.
...
fre ck_varC.
By the way - you do not need any of the exe. commands if you are going to run your syntax as a whole.
Later Edit, after the poster updated the question:
Your syntax would be something like this. Note the use of the range function, which is not mandatory, but might be useful for you in the future.
I am also assuming that work is a string variable, so its values need to be referenced using quotation signs.
if missing(age) ck_age=1.
if missing(work) and range(age,17,44) ck_work=1.
if missing(hours) and work="yes" ck_hours=1.
if not (missing (age)) and not(1>0) ck_age=2. /*this will never happen because of the not(1>0).
if not(missing(work)) and (not range(age,17,44)) ck_work=2. /*note that if age is missing, this ck_work won't be set here.
if not(missing(hours)) and (not(work="yes")) ck_hours=2.
EXECUTE.
String variables are case sensitive
There is no missing equivalent in strings; an empty blank string ("") is still a string. not(work="yes") is True when work is blank ("").
I am learning in Ruby on Rails 4.0 that Rails has the ability to reference a hash's values, via a key that can either be a symbol or string, using the class HashWithIndifferentAccess. For example, the params hash can be referenced through both a symbol or a string, because it uses the class HashWithIndifferentAccess.
i.e.: params["id"] and params[:id] --> both access the id in the params hash
Although both can be used and are acceptable by Rails, is there clearly one preferred over other, either for best practice/performance reasons? My initial thought was that it would be better to use symbols due to the fact that once they are stored, they retain that piece of memory. This is contrasted against strings, which a new piece of memory is needed for every string.
Is this correct? Or does it truly not matter whether strings or symbols are used?
Ruby Strings are mutable, which can bring some unpredictability and reduced performance. For these reasons Ruby also offers the choice of Symbols. The big difference would be that Symbols are immutable. While mutable object can be changed the immutable ones can only be overwritten. Symbols play more nicely with the memory and thus gain on performance, but if not careful enough the memory footprint of your app will increase.
Using Strings or Symbols comes down to understanding both of the terms and how they will serve the purpose in benefit of the overall app health and performance. This is the reason why maybe there is no strict definition of where to use String and where Symbol.
What I would maybe give as a guidance of what is recommended where:
Symbol - internal identifiers, variables not meant to change
String - variables that are changing or printed out
HashWithIndifferentAccess ends up mapping internally all the symbols to strings.
h = ActiveSupport::HashWithIndifferentAccess.new(test: 'test')
If you try to retrieve the keys of the created hash (h) you will get the keys as strings
h.keys # => ["test"]
A couple highlights:
First, as of Ruby 2.3, string immutability (objects with same content point to same place in memory) is an option. From what I understand, this functionality is going to become the default in Ruby 3.0. Here's an example:
# frozen_string_literal: true
a = 'foo'
b = 'foo'
puts a.object_id
puts b.object_id
puts a.equal?(b)
Run the file:
➜ ruby test.rb
70186103229600
70186103229600
true
Second, symbols are generally preferred as hash keys (see this popular style guide).
If you're interested in learning more about how HashWithIndifferentAccess works, here's a good blog post. The short version is that all keys are converted to strings behind the scenes.
I find myself constantly running into bugs where I access Hashes with symbols instead of strings and vice versa. I would like to do:
require 'active_support/hash_with_indifferent_access'
Hash = HashWithIndifferentAccess
# (irb):xx: warning: already initialized constant Hash
This warning is because Hash is already defined, but I don't really care.
Sure, the performance might be slightly worse.
Sure, maybe some gem creates a hash with strings AND symbols and will break -- but how common is that?
Is this bad? Anti-pattern? Will I regret doing this? What are the downsides? Any good/bad experiences doing this?
What do you think {}.class is going to be after your kludge? Right, it will be Hash.
What do you think Hash[:a, 'a'] will do? Right, SystemStackError for hopefully obvious reasons.
So yes, this is a bad idea, you will regret it, and an important downside is that it doesn't actually do what you think it will do.
The big downside is that you are making your Hash functions become O(n) instead of O(1). Comparing strings is much more demanding than comparing symbols so I'd say that would be a significant performance hit.
You shouldn't really be running into bugs when accessing Hashes with symbols since they are very similar to strings, maybe reading this article helps: Differences between symbols and strings
My immediate project is to develop a system of CheckSums for proving that two somewhat complex objects are (functionally)EQUAL - in the sense that they have the same values for the critical properties. (Have discovered that dates/times cannot be included, so can't use JSON on the bigger object - duh :) (For my purposes) ).
To do this calling the hashCode() method on selected strings seemed to be the way to go.
Upon implementing this, I note that in practice I am getting very different values on multiple runs of highest level objects that are functionally 'identical'.
There are a number of "nums" that I have not rounded, there are integers, bools, Strings and not much more.
I have 'always' thought that a hashCode on the same set of values would return the same number, am I missing something?
BTW the only context that I have found material on hashCode() has been with WebSockets.
Of course I can write my own String to a unique value but I want to understand if this is a problem with Dart or something else.
I can attempt to answer the question posed in the title: "Can hashCode() method calls return different values on equal (==) Objects?"
Short answer: hash codes for two objects must be the same if those two objects are equals (==).
If you override hashCode you must also override equals. Two objects that are equal, as defined by ==, must also have the same hash code.
However, hash codes do not have to be unique. That is, a perfectly valid hash code is the value 1. A good hash code, however, should be uniformly distributed.
From the docs from Object:
Hash codes are guaranteed to be the same for objects that are equal
when compared using the equality operator ==. Other than that there
are no guarantees about the hash codes. They will not be consistent
between runs and there are no distribution guarantees.
If a subclass overrides hashCode it should override the equality
operator as well to maintain consistency.
I found the immediate problem. The object stringify() method, at one level, was not getting called, but rather some stringify property that must exist in all objects (?).
With this fixed everything is working as exactly as I would expect, and multiple runs of our Statistical Studies are returning exactly the same CheckSum at the highest levels (based on some 5 levels of hierarchy).
Meanwhile the JSON.stringify has continued to fail. Even in the most basic object. I have not been able to determine what is causing to fail. Of course, the question is not how "stringify" is accomplished.
So, empirically at least, I believe it is true that "objects with equal properties" will return equal checkSums in Dart. It was decided to round nums, I don't know if this was causing a problem - perhaps good to be aware of? And, of course, remember to be beware of things like dates, times, or anything that could legitimately vary.
_swarmii
The doc linked by Seth Ladd now include info:
They need not be consistent between executions of the same program and there are no distribution guarantees.`
so technically hashCode value can be change with same object in different executions for your question:
I have 'always' thought that a hashCode on the same set of values would return the same number, am I missing something?
I came across some code today that looked somewhat like this:
subroutine foo()
real blah
integer bar,k,i,j,ll
integer :: n_called=1
save integer
...
end
It seems like the intent here was probably save n_called, but is that even a valid statment to save all integers -- or is it implicitly declaring a variable named integer and saving it?
The second interpretation is correct. Fortran has many keywords, INTEGER being one of them, but it has no reserved words, which means that keywords can be used as identifiers, though this is usually a terrible idea (but nevertheless it carries on to C# where one can prefix a keyword with # and use it as an identifier, right?)
The SAVE statement, even if it was intended for n_called is superficial. Fortran automatically saves all variables that have initialisers and that's why the code probably works as intended.
integer :: n_called=1
Here n_called is automatically SAVE. This usually comes as a really bad surprise to C/C++ programmers forced to maintain/extend/create new Fortran code :)
I agree with your 2nd interpretation, that is, the statement save integer implicitly declares a variable called integer and gives it the save attribute. Fortran, of course, has no rule against using keywords as program entity names, though most sensible software developers do have such a rule.
If I try to compile your code snippet as you have presented it, my compiler (Intel Fortran) makes no complaint. If I insert implicit none at the right place it reports the error
This name does not have a type, and must have an explicit type. [INTEGER]
The other interpretation, that it gives the save attribute to all integer variables, seems at odds with the language standards and it's not a variation that I've ever come across.