I have an exclusive range of integers, e.g.
range = 1000...10001
I would rather work with inclusive ranges all the way, but the reason I have this is because when I tell ActiveRecord to store an inclusive range in PostgreSQL, I get the exclusive version when I query it back.
I want to convert it into an inclusive range, e.g. 1000..10000.
I'm doing:
(range.begin)..(range.end-1)
However it feels clunky and un-ruby-ish.
My whole API relies on passing ranges, but I'm also considering storying 2 values or storing an array in Postgres instead of a range.
Is there a better way to do this?
Edit (some clarifications)
I want to do this as efficiently as possible, given the large difference between the ranges
My use case is to use the begin and end part of the ranges. I'm not actually iterating over them or checking inclusion. I just thought that a ruby range is a nice way to represent the actual range for e.g. displaying a price ($ 1.000 - $ 10.000)
I would use Range#min and Range#max:
exclusive_range = (1000...10001)
inclusive_range = (exclusive_range.min..exclusive_range.max)
#=> 1000..10000
Related
Something strange is happening with the sorting. I am sorting G (weight) from smallest to largest. The numbers are 1 to 41.
It doesn't sort starting from the number 1. Instead the order is 10,11,12,13,14,15,16,17,18,19,2,20,21,23,25,26,3,30,35,4,41,5,6,7,8,9.
I know technically 10 starts with the number one, but shouldn't this application be smart enough to know that you don't start counting at number 10? The smallest weight of 2 should be first, followed 3,4,5,67,8,9,10,11,12,etc.
Is this not possible with that sorting option?
try to multiply your G column with 1 to convert your string numbers into numeric numbers as of, you are sorting it now by alphabetical order 10,1,20,2 not numerical 1,2,10,20
It sounds like your Col-G weights are strings and not actual numbers. And assuming that is the case, Google sorts in order according to ASCII char numbers. If you have any non-numeric characters appended (e.g., "kg"), then you've created strings automatically. However, if you only have digits and they somehow got into your sheet as strings (which are not actively generated by formula), you can try selecting that column and choosing Format > Number > 0.
Looking at the book Mining of Massive Datasets, section 1.3.2 has an overview of Hash Functions. Without a computer science background, this is quite new to me; Ruby was my first language, where a hash seems to be equivalent to Dictionary<object, object>. And I had never considered how this kind of datastructure is put together.
The book mentions hash functions, as a means of implementing these dictionary data structures. This paragraph:
First, a hash function h takes a hash-key value as an argument and produces
a bucket number as a result. The bucket number is an integer, normally in the
range 0 to B − 1, where B is the number of buckets. Hash-keys can be of any
type. There is an intuitive property of hash functions that they “randomize”
hash-keys
What exactly are buckets in terms of a hash function? it sounds like buckets are array-like structures, and that the hash function is some kind of algorithm / array-like-structure search that produces the same bucket number every time? What is inside this metaphorical bucket?
I've always read that javascript objects/ruby hashes/ etc don't guarantee order. In practice I've found that keys' order doesn't change (actually, I think using an older version of Mozilla's Rhino interpreter that the JS object order DID change, but I can't be sure...).
Does that mean that hashes (Ruby) / objects (JS) ARE NOT resolved by these hash functions?
Does the word hashing take on different meanings depending on the level at which you are working with computers? i.e. it would seem that a Ruby hash is not the same as a C++ hash...
When you hash a value, any useful hash function generally has a smaller range than the domain. This means that out of a large list of input values (for example all possible combinations of letters) it will output any of a smaller list of values (a number capped at a certain length). This means that more than one input value can map to the same output value.
When this is the case, the output values are refered to as buckets.
Consider the function f(x) = x mod 2
This generates the following outputs;
1 => 1
2 => 0
3 => 1
4 => 0
In this case there are two buckets (1 and 0), with a bunch of input values that fall into each.
A good hash function will fill all of these 'buckets' equally, and so enable faster searching etc. If you take the mod of any number, you get the bucket to look into, and thus have to search through less results than if you just searched initially, since each bucket has less results in it than the whole set of inputs. In the ideal situation, the hash is fast to calculate and there is only one result in each bucket, this enables lookups to take only as long as applying the hash function takes.
This is a simplified example of course but hopefully you get the idea?
The concept of a hash function is always the same. It's a function that calculates some number to represent an object. The properties of this number should be:
it's relatively cheap to compute
it's as different as possible for all objects.
Let's give a really artificial example to show what I mean with this and why/how hashes are usually used.
Take all natural numbers. Now let's assume it's expensive to check if 2 numbers are equal.
Let's also define a relatively cheap hash function as follows:
hash = number % 10
The idea is simple, just take the last digit of the number as the hash. In the explanation you got, this means we put all numbers ending in 1 into an imaginary 1-bucket, all numbers ending in 2 in the 2-bucket etc...
Those buckets don't really exists as data structure. They just make it easy to reason about the hash function.
Now that we have this cheap hash function we can use it to reduce the cost of other things. For example, we want to create a new datastructure to enable cheap searching of numbers. Let's call this datastructure a hashmap.
Here we actually put all the numbers with hash=1 together in a list/set/..., we put the numbers with hash=5 into their own list/set ... etc.
And if we then want to lookup some number, we first calculate it's hash value. Then we check the list/set corresponding to this hash, and then compare only "similar" numbers to find our exact number we want. This means we only had to do a cheap hash calculation and then have to check 1/10th of the numbers with the expensive equality check.
Note here that we use the hash function to define a new datastructure. The hash itself isn't a datastructure.
Consider a phone book.
Imagine that you wanted to look for Donald Duck in a phone book.
It would be very inefficient to have to look every page, and every entry on that page. So rather than doing that, we do the following thing:
We create an index
We create a way to obtain an index key from a name
For a phone book, the index goes from A-Z, and the function used to get the index key, is just getting first letter from the Surname.
In this case, the hashing function takes Donald Duck and gives you D.
Then you take D and go to the index where all the people with Surnames starting with D are.
That would be a very oversimplified way to put it.
Let me explain in simple terms. Buckets come into picture while handling collisions using chaining technique ( Open hashing or Closed addressing)
Here, each array entry shall correspond to a bucket and each array entry (if nonempty) will be having a pointer to the head of the linked list. (The bucket is implemented as a linked list).
The hash function shall be used by hash table to calculate an index into an array of buckets, from which the desired value can be found.
That is, while checking whether an element is in the hash table, the key is first hashed to find the correct bucket to look into. Then, the corresponding linked list is traversed to locate the desired element.
Similarly while any element addition or deletion, hashing is used to find the appropriate bucket. Then, the bucket is checked for presence/absence of required element, and accordingly it is added/removed from the bucket by traversing corresponding linked list.
This question is a little obscure, I'm trying to find out if its possible to "solve" for a value inputted into a hash in ruby, it looks like this:
I have:
#hash = Digest::SHA512.hexdigest(value1 + value2 + value3)
Value2 & value3 are known, and the value of #hash is known. Value 1 is "unknown". In this situation is it possible to solve for value1 in ruby, or would this require a ton of computing power/time?
Only way to do this is: brute-force
Guess a possible value for value1
Compute the hash
Check if it matches the target hash. If not goto 1
This is only feasible if value1 is easy enough to guess. GPUs are faster at this than CPUs, so you'd probably use a bunch of ATI CPUs to attack this.
Not having a cheap way to compute an input matching a given output is an essential property of a secure hash function, which is called first pre-image resistance. For SHA-512 we know no way faster than brute-force to do this.
If v2 and v3 are integers. You could theoretically attempt to brute force it by just running through numbers and finding when the hashes match. Then subtract v2 and v3. If your set of possible numbers is all real numbers though, this would be extremely hard. And you'd be better off running it on multiple machines with greatly varying rotating subsections of real numbers. That's you're best bet. And that's assuming the values are integers.
Id like to compare two strings in Ruby and find their similarity
I've had a look at the Levenshtein gem but it seems this was last updated in 2008 and I can't find documentation how to use it. With some blogs suggesting its broken
I tried the text gem with Levenshtein but it gives an integer (smaller is better)
Obviously if the two strings are of variable length I run into problems with the Levenshtein Algorithm (Say comparing two names, where one has a middle name and one doesnt).
What would you suggest I do to get a percentage comparison?
Edit: Im looking for something similar to PHP's similar text
I think your question could do with some clarifications, but here's something quick and dirty (calculating as percentage of the longer string as per your clarification above):
def string_difference_percent(a, b)
longer = [a.size, b.size].max
same = a.each_char.zip(b.each_char).count { |a,b| a == b }
(longer - same) / a.size.to_f
end
I'm still not sure how much sense this percent difference you are looking for makes, but this should get you started at least.
It's a bit like Levensthein distance, in that it compares the strings character by character. So if two names differ only by the middle name, they'll actually be very different.
There is now a ruby gem for similar_text. https://rubygems.org/gems/similar_text
It provides a similar method that compares two strings and returns a number representing the percent similarity between the two strings.
I can recommend the fuzzy-string-match gem.
You can use it like this (taken from the docs):
require "fuzzystringmatch"
jarow = FuzzyStringMatch::JaroWinkler.create(:native)
p jarow.getDistance("jones", "johnson")
It will return a score ~0.832 which tells how good those strings match.
Right now I have two fields for cost. One for dollars and one for cents. This works, but it is a bit ugly. It also doesn't allow the user to enter the term "free" or "no cost" if they want. But if I only have one field, I might have to make my parser a bit smarter. What do you think?
On the server side, I combine dollars and cents to store them as decimals in my database. Mainly so that I can gather statistics (cost averages, etc.) quickly.
Do you think it is better to store the cost as a string? Then whenever I actually use the cost for stats or other purposes, I would convert it to a decimal at that point. Or am I on the right track?
There is a rule in database design that states that "atomic data" should not be split. By this rule a price, or cost is such an example of atomic data and therefore it should never be split among multiple columns just like you shouldn't split a phone number among multiple columns (unless you really have a very good reason for it - very rare)
Use a DECIMAL data type. Something like DECIMAL(8,3) should work and it's supported by all ANSI SQL compliant database products!
You can consult Joe Celko's "Thinking In Sets" book for a discussion of this topic. See section 1.6.2, pages 21-22.
EDIT -
It seems from your question that you are also concerned with how to accept user's input in a form that resembles the price (xxxx.xx) - hence the two input boxes, for the whole dollars, and the pennies.
I recommend using a single input box and then doing input validation using Regular Expressions to match your format (i.e. something like [0-9]+(.[0-9]{1,3})? would probably work but could be improved). You could then parse the validated string to a Decimal type in your language, or just pass it as a string into your database - SQL will know how to cast it to a DECIMAL type.
Keep the whole cost as decimal. If it's free, then keep the cost as 0. In presentation if cost is zero - write "free" instead of 0.
I generally store the cost as the lowest unit (pennies) and then convert it to whole dollars later.
So a cost of $4.50 gets stored as 450. Free items would be -1 pennies. You could store free things as 0 pennies as well, this gives you the flexibility to use 0 and -1 to mean two slightly different things (free vs no sale?).
It also makes it easier to support countries that don't use cents if you choose to go that route.
As for presenting the data entry field, I personally don't like it when I have to keep switching fields for tiny things (like when they break up phone numbers into 3 fields, or IP addresses into 4). I'd present one field, and let the users type the decimal point in themselves. That way, your users don't have to tab (or click, if they are unfamiliar with tab) to the next field.
Use cents, use 450 for $4.50 this will save you problems that are arising very often
from the fact that floating point operations are not safe. Just try the following expression in irb:
0.4 - 0.3 == 0.1 will return false. All because of floating point representation
innacuracies.
In my models I'm always using:
attr_accessor :price_with_cents
def price_with_cents
self.price/100.00
end
def price\_with\_cents==(num)
self.price = (num.to_f * 100.00).to_i
end
And the name of column is just price and integer type.
I don't have much experience with decimal columns and their representation in ruby (which can be float that is problematic as i've shown at the begining).
Don't allow garbage to make it to your database. If you're expecting a dollar amount on a field, than make sure it's valid before it gets in there. This will allow you to report better on the data and allow simpler formatting on output.
I suggest making this a single field with validation on update or insert.
if field != SpecialFreeTag then
try to convert to decimal
if fail then report to user
otherwise accept value
Use try parse or regular expressions to help with the validation.
I would store the cost as decimal with the scale being no less than 2 and maybe even 3-5. If something is bought in bulk the unit cost could easily include fractions of a cent. Free items have a cost of 0. If the cost is unknown then allow null values also.