Calculating ISIN checksum - checksum

HI I know there have been may question about this here but I wasn't able to find a detailed enough answer, Wikipedia has two examples of ISIN and how is their checksum calculated.
The part of calculation that I'm struggling with is
Multiply the group containing the rightmost character
The way I understand this statement is:
Iterate through each character from right to left
once you stumble upon a character rather than digit record its position
if the position is an even number double all numeric values in even position
if the position is an odd number double all numeric values in odd position
My understanding has to be wrong because there are at least two problems:
Every ISIN starts with two character country code so position of rightmost character is always the first character
If you omit the first two characters then there is no explanation as to what to do with ISINs that are made up of all numbers (except for first two characters)
Note
isin.org contains even less information on verifying ISINs, they even use the same example as Wikipedia.

I agree with you; the definition on Wikipedia is not the clearest I have seen.
There's a piece of text just before the two examples that explains when one or the other algorithm should be used:
Since the NSIN element can be any alpha numeric sequence (9 characters), an odd number of letters will result in an even number of digits and an even number of letters will result in an odd number of digits. For an odd number of digits, the approach in the first example is used. For an even number of digits, the approach in the second example is used
The NSIN is identical to the ISIN, excluding the first two letters and the last digit; so if the ISIN is US0378331005 the NSIN is 037833100.
So, if you want to verify the checksum digit of US0378331005, you'll have to use the "first algorithm" because there are 9 digits in the NSIN. Conversely, if you want to check AU0000XVGZA3 you're going to use the "second algorithm" because the NSIN contains 4 digits.
As to the "first" and "second" algorithms, they're identical, with the only exception that in the former you'll multiply by 2 the group of odd digits, whereas in the latter you'll multiply by 2 the group of even digits.
Now, the good news is, you can get away without this overcomplicated algorithm.
You can, instead:
Take the ISIN except the last digit (which you'll want to verify)
Convert all letters to numbers, so to obtain a list of digits
Reverse the list of digits
All the digits in an odd position are doubled and their digits summed again if the result is >= 10
All the digits in an even position are taken as they are
Sum all the digits, take the modulo, subtract the result from 0 and take the absolute value
The only tricky step is #4. Let's clarify it with a mini-example.
Suppose the digits in an odd position are 4, 0, 7.
You'll double them and get: 8, 0, 14.
8 is not >= 10, so we take it as it is. Ditto for 0. 14 is >= 10, so we sum its digits again: 1+4=5.
The result of step #4 in this mini-example is, therefore: 8, 0, 5.
A minimal, working implementation in Python could look like this:
import string
isin = 'US4581401001'
def digit_sum(n):
return (n // 10) + (n % 10)
alphabet = {letter: value for (value, letter) in
enumerate(''.join(str(n) for n in range(10)) + string.ascii_uppercase)}
isin_to_digits = ''.join(str(d) for d in (alphabet[v] for v in isin[:-1]))
isin_sum = 0
for (i, c) in enumerate(reversed(isin_to_digits), 1):
if i % 2 == 1:
isin_sum += digit_sum(2*int(c))
else:
isin_sum += int(c)
checksum_digit = abs(- isin_sum % 10)
assert int(isin[-1]) == checksum_digit
Or, more crammed, just for functional fun:
checksum_digit = abs( - sum(digit_sum(2*int(c)) if i % 2 == 1 else int(c)
for (i, c) in enumerate(
reversed(''.join(str(d) for d in (alphabet[v] for v in isin[:-1]))), 1)) % 10)

Related

Deterministic Finite Automata divisibility problem

Design a DFA that accepts the string given by L = { w has number of 'a' divisible by 3 and number of 'b' divisible by 2 over the alphabet {a,b} }
Realize that we should have 3 * 2 = 6 states in the DFA. Why? Because one has 3 choices for the number of a's (0 or 1 or 2) [think in terms of remainders] and 2 choices for number of b's (0 or 1 similarly).
Let us name the states axby which means I have found x number of a's and y number of b's till now. For example, if we are in a2b0 and we encounter an a, then we go to a0b0 (hope you see why?). Similarly a1b1 ---b---> a1b0 and a1b1 ---a---> a2b1.
Needless to say a0b0 is the accepting state.
Now, all you have to do is draw the states and keep joining them. I have drawn them on a paper here.

Bitwise operation alternative in Neo4j cypher query

I need to do a bitwise "and" in a cypher query. It seems that cypher does not support bitwise operations. Any suggestions for alternatives?
This is what I want to detect ...
For example 268 is (2^8 + 2^3 + 2^2) and as you can see 2^3 = 8 is a part of my original number. So if I use bitwise AND it will be (100001100) & (1000) = 1000 so this way I can detect if 8 is a part of 268 or not.
How can I do this without bitwise support? any suggestions? I need to do this in cypher.
Another way to perform this type of test using cypher would be to convert your decimal values to collections of the decimals that represent the bits that are set.
// convert the binary number to a collection of decimal parts
// create an index the size of the number to convert
// create a collection of decimals that correspond to the bit locations
with '100001100' as number
, [1,2,4,8,16,32,64,128,256,512,1024,2048,4096] as decimals
with number
, range(length(number)-1,0,-1) as index
, decimals[0..length(number)] as decimals
// map the bits to decimal equivalents
unwind index as i
with number, i, (split(number,''))[i] as binary_placeholder, decimals[-i-1] as decimal_placeholder
// multiply the decimal value by the bits that are set
with collect(decimal_placeholder * toInt(binary_placeholder)) as decimal_placeholders
// filter out the zero values from the collection
with filter(d in decimal_placeholders where d > 0) as decimal_placeholders
return decimal_placeholders
Here is a sample of what this returns.
Then when you want to test whether the number is in the decimal, you can just test the actual decimal for presence in the collection.
with [4, 8, 256] as decimal_placeholders
, 8 as decimal_to_test
return
case
when decimal_to_test in decimal_placeholders then
toString(decimal_to_test) + ' value bit is set'
else
toString(decimal_to_test) + ' value bit is NOT set'
end as bit_set_test
Alternatively, if one had APOC available they could use apoc.bitwise.op which is a wrapper around the java bitwise operations.
RETURN apoc.bitwise.op(268, "&",8 ) AS `268_AND_8`
Which yields the following result
If you absolutely have to do the operation in cypher probably a better solution would be to implement something like #evan's SO solution Alternative to bitwise operation using cypher.
You could start by converting your data using cypher that looks something like this...
// convert binary to a product of prime numbers
// start with the number to conver an a collection of primes
with '100001100' as number
, [2,3,5,7,13,17,19,23,29,31,37] as primes
// create an index based on the size of the binary number to convert
// take a slice of the prime array that is the size of the number to convert
with number
, range(length(number)-1,0,-1) as index
, primes[0..length(number)] as primes, decimals[0..length(number)] as decimals
// iterate over the index and match the prime number to the bits in the number to convert
unwind index as i
with (split(number,''))[i] as binary_place_holder, primes[-i-1] as prime_place_holder, decimals[-i-1] as decimal_place_holder
// collect the primes that are set by multiplying by the set bits
with collect(toInt(binary_place_holder) * prime_place_holder) as prime_placeholders
// filter out the zero bits
with filter(p in prime_placeholders where p > 0) as prime_placeholders
// return a product of primes of the set bits
return prime_placeholders, reduce(pp = 1, p in prime_placeholders | pp * p) as prime_product
Sample of the output of the above query. The query could be adapted to update attributes with the prime product.
Here is a screen cap of how the conversion breaks down
Then when you want to use it you could use the modulus of the prime number in the location of the bit you want to test.
// test if the fourth bit is set in the decimal 268
// 268 is the equivalent of a prime product of 1015
// a modulus 7 == 0 will indicate the bit is set
with 1015 as prime_product
, [2,3,5,7,13,17,19,23,29,31,37] as primes
, 4 as bit_to_test
with bit_to_test
, prime_product
, primes[bit_to_test-1] as prime
, prime_product % primes[bit_to_test-1] as mod_remains
with
case when mod_remains = 0 then
'bit ' + toString(bit_to_test) + ' set'
else
'bit ' + toString(bit_to_test) + ' NOT set'
end as bit_set
return bit_set
It almost certainly defeats the purpose of choosing a bitwise operation in the first place but if you absolutely needed to AND the two binary numbers in cypher you could do something like this with collections.
with split('100001100', '') as bin_term_1
, split('000001000', '') as bin_term_2
, toString(1) as one
with bin_term_1, bin_term_2, one, range(0,size(bin_term_1)-1,1) as index
unwind index as i
with i, bin_term_1, bin_term_2, one,
case
when (bin_term_1[i] = one) and (bin_term_2[i] = one) then
1
else
0
end as r
return collect(r) as AND
Thanks Dave. I tried your solutions and they all worked. They were a good hint for me to find another approach. This is how I solved it. I used String comparison.
with '100001100' as number , '100000000' as sub_number
with number,sub_number,range(length (number)-1,length (number)-length(sub_number),-1) as tail,length (number)-length(sub_number) as difference
unwind tail as i
with i,sub_number,number, i - length (number) + length (sub_number) as sub_number_position
with sub_number_position, (split(number,''))[i-1] as bit_mask , (split(sub_number,''))[sub_number_position] as sub_bit
with collect(toInt(bit_mask) * toInt(sub_bit)) as result
return result
Obviously the number and sub_number can have different values.

Pascal's triangle and Fibonacci sequence explanation

Okay I need to redraw the pascal's triangle and explain the Fibonacci sequence embedded in it.. And i need to observe over 12 rows of the triangle (which ends on the number 144 in the fibonacci sequence) -- I understand this part as i am just explaining how each row diagonally forms the sum of the Fibonacci numbers.
But I need to use the fact that the rth number in the nth row of the triangle is
C(n, r) = n!/r! n-r!
This last part is whats confusing me.. How can i use C(n,r) to explain the Fibonacci sequence in the triangle??
Please Help. Thanks
Consider the following problem :
In how many ways can you go up a ladder of n steps if you can take either a single step at a time or 2 steps at a time?
Solution 1 : Let's construct a recurrence relation for this problem. It's pretty clear that the recurrence would be something like this : a(n) = a(n-1) + a(n-2); where a(1)=1 and a(2)=2
Thus, the answer for n would be the (n+1)th fibonacci term.
Solution 2 : Each unique way of climbing up the ladder corresponds to a unique sequence of 1's and 2's which adds up to n. The number of such sequences thus would be our answer. Let's start counting such sequences :
Number of sequences without a 2 = $ {n \choose 0 } $.
Number of sequences with one 2 = $ {n-1 \choose 1 } $.
.
.
.
and so on.
In case of even n, the last term would be $ {n/2 \choose n/2 } $.
And for odd n, it would be $ {(n+1)/2 \choose (n-1)/2 } $.
As you can see, These are the diagonal terms in a pascal's triangle.
As these two solutions compute the same result, hence they must be equal. Thus we get the relation between Fibonacci numbers and the diagonals of a pascals triangle.
Refer the link
http://ms.appliedprobability.org/data/files/Articles%2033/33-1-5.pdf
for anymore doubts.

if input is nth term in fibonacci series, finding n

in fibonacci series let's assume nth fibonacci term is T. F(n)=T. but i want to write a a program that will take T as input and return n that means which term is it in the series( taken that T always will be a fibonacci number. )i want to find if there lies an efficient way to find it.
The easy way would be to simply start generating Fibonacci numbers until F(i) == T, which has a complexity of O(T) if implemented correctly (read: not recursively). This method also allows you to make sure T is a valid Fibonacci number.
If T is guaranteed to be a valid Fibonacci number, you can use approximation rules:
Formula
It looks complicated, but it's not. The point is: from a certain point on, the ratio of F(i+1)/F(i) becomes a constant value. Since we're not generating Fibonacci Numbers but are merely finding the "index", we can drop most of it and just realize the following:
breakpoint := f(T)
Any f(i) where i > T = f(i-1)*Ratio = f(T) * Ratio^(i-T)
We can get the reverse by simply taking Log(N, R), R being Ratio. By adjusting for the inaccuracy for early numbers, we don't even have to select a breakpoint (if you do: it's ~ correct for i > 17).
The Ratio is, approximately, 1.618034. Taking the log(1.618034) of 6765 (= F(20)), we get a value of 18.3277. The accuracy remains the same for any higher Fibonacci numbers, so simply rounding down and adding 2 gives us the exact Fibonacci "rank" (provided that F(1) = F(2) = 1).
The first step is to implement fib numbers in a non-recursive way such as
fib1=0;fib2=1;
for(i=startIndex;i<stopIndex;i++)
{
if(fib1<fib2)
{
fib1+=fib2;
if(fib1=T) return i;
if(fib1>T) return -1;
}
else
{
fib2+=fib1;
if(fib2=T) return i;
if(fib2>t) return -1;
}
}
Here startIndex would be set to 3 stopIndex would be set to 10000 or so. To cut down in the iteration, you can also select 2 seed number that are sequential fib numbers further down the sequence. startIndex is then set to the next index and do the computation with an appropriate adjustment to the stopIndex. I would suggest breaking the sequence up in several section depending on machine performance and the maximum expected input to minimize the run time.

Constrained Sequence to Index Mapping

I'm puzzling over how to map a set of sequences to consecutive integers.
All the sequences follow this rule:
A_0 = 1
A_n >= 1
A_n <= max(A_0 .. A_n-1) + 1
I'm looking for a solution that will be able to, given such a sequence, compute a integer for doing a lookup into a table and given an index into the table, generate the sequence.
Example: for length 3, there are 5 the valid sequences. A fast function for doing the following map (preferably in both direction) would be a good solution
1,1,1 0
1,1,2 1
1,2,1 2
1,2,2 3
1,2,3 4
The point of the exercise is to get a packed table with a 1-1 mapping between valid sequences and cells.
The size of the set in bounded only by the number of unique sequences possible.
I don't know now what the length of the sequence will be but it will be a small, <12, constant known in advance.
I'll get to this sooner or later, but though I'd throw it out for the community to have "fun" with in the meantime.
these are different valid sequences
1,1,2,3,2,1,4
1,1,2,3,1,2,4
1,2,3,4,5,6,7
1,1,1,1,2,3,2
these are not
1,2,2,4
2,
1,1,2,3,5
Related to this
There is a natural sequence indexing, but no so easy to calculate.
Let look for A_n for n>0, since A_0 = 1.
Indexing is done in 2 steps.
Part 1:
Group sequences by places where A_n = max(A_0 .. A_n-1) + 1. Call these places steps.
On steps are consecutive numbers (2,3,4,5,...).
On non-step places we can put numbers from 1 to number of steps with index less than k.
Each group can be represent as binary string where 1 is step and 0 non-step. E.g. 001001010 means group with 112aa3b4c, a<=2, b<=3, c<=4. Because, groups are indexed with binary number there is natural indexing of groups. From 0 to 2^length - 1. Lets call value of group binary representation group order.
Part 2:
Index sequences inside a group. Since groups define step positions, only numbers on non-step positions are variable, and they are variable in defined ranges. With that it is easy to index sequence of given group inside that group, with lexicographical order of variable places.
It is easy to calculate number of sequences in one group. It is number of form 1^i_1 * 2^i_2 * 3^i_3 * ....
Combining:
This gives a 2 part key: <Steps, Group> this then needs to be mapped to the integers. To do that we have to find how many sequences are in groups that have order less than some value. For that, lets first find how many sequences are in groups of given length. That can be computed passing through all groups and summing number of sequences or similar with recurrence. Let T(l, n) be number of sequences of length l (A_0 is omitted ) where maximal value of first element can be n+1. Than holds:
T(l,n) = n*T(l-1,n) + T(l-1,n+1)
T(1,n) = n
Because l + n <= sequence length + 1 there are ~sequence_length^2/2 T(l,n) values, which can be easily calculated.
Next is to calculate number of sequences in groups of order less or equal than given value. That can be done with summing of T(l,n) values. E.g. number of sequences in groups with order <= 1001010 binary, is equal to
T(7,1) + # for 1000000
2^2 * T(4,2) + # for 001000
2^2 * 3 * T(2,3) # for 010
Optimizations:
This will give a mapping but the direct implementation for combining the key parts is >O(1) at best. On the other hand, the Steps portion of the key is small and by computing the range of Groups for each Steps value, a lookup table can reduce this to O(1).
I'm not 100% sure about upper formula, but it should be something like it.
With these remarks and recurrence it is possible to make functions sequence -> index and index -> sequence. But not so trivial :-)
I think hash with out sorting should be the thing.
As A0 always start with 0, may be I think we can think of the sequence as an number with base 12 and use its base 10 as the key for look up. ( Still not sure about this).
This is a python function which can do the job for you assuming you got these values stored in a file and you pass the lines to the function
def valid_lines(lines):
for line in lines:
line = line.split(",")
if line[0] == 1 and line[-1] and line[-1] <= max(line)+1:
yield line
lines = (line for line in open('/tmp/numbers.txt'))
for valid_line in valid_lines(lines):
print valid_line
Given the sequence, I would sort it, then use the hash of the sorted sequence as the index of the table.

Resources