Calculating CLR coded index size - clr

Im trying to read MemberRef coded index (MemberRefParent), Before I do that I need to know its size, according to ECMA-335 Section II.24.2.6, if I understood it correctly the coded index is calculated like so:
my pseudo code
m=max_rows(t0..tn-1); //returns the number of rows of the table that has the most rows.
if(m<2^(16-log(n)){
//size is 2
} else {
//size is 4
}
When I tested the code on CLI file, I got an error, I must have missed something, I hope someone can help me find where I was wrong.
From section II.24.2.6, ECMA-335
If e is a coded index that points into table ti out of n possible
tables t0, …tn-1, then it is stored as e << (log n) | tag{ t0, …tn-1}[
ti] using 2 bytes if the maximum number of rows of tables t0, …tn-1,
is less than 2^(16 – (log n)), and using 4 bytes otherwise.
-
MemberRefParent: 3 bits to encode tag Tag
TypeDef 0
TypeRef 1
ModuleRef 2
MethodDef 3
TypeSpec 4

Related

Need help explaining this formula provided to me

I recently posted on here to get help with a formula, here is the link...https://stackoverflow.com/questions/75068029/vlook-up-style-forumla-but-range-is-2-cells A user called rockinfreakshow was really awesome and provided a great solution for me. I'm not very experienced and don't understand what the formula at all but I'd love to be able to add more attributes to it. Is anyone able to help break it down for me ?
I havent tried anything here, it's totally out of my realm of understanding
=MAKEARRAY(COUNTA(B2:B),COUNTA(D1:O1),LAMBDA(r,c,IF(REGEXMATCH(LAMBDA(ax,bx,IFS(REGEXMATCH(ax,"Mixed")*REGEXMATCH(INDEX(C2:C,r),"Blend")*REGEXMATCH(INDEX(C2:C,r),"Filter"),"BLEND-"&bx&"|FILTER-"&bx,REGEXMATCH(ax,"Mixed")*NOT(REGEXMATCH(INDEX(C2:C,r),"Blend"))*REGEXMATCH(INDEX(C2:C,r),"Filter"),"ESP-"&bx&"|FILTER-"&bx,REGEXMATCH(ax,"Mixed")*NOT(REGEXMATCH(INDEX(C2:C,r),"Filter")),"BLEND-"&bx&"|ESP-"&bx,LEN(ax),SUBSTITUTE(ax&"-"&bx,"Espresso","ESP")))(regexextract(INDEX(B2:B,r),"([^\s]*?) Subscription"),IFNA(SWITCH(REGEXEXTRACT(INDEX(C2:C,r),"Small|Medium|Large"),"Small",250,"Medium",450,"Large",900),SWITCH(REGEXEXTRACT(INDEX(B2:B,r),"Medium|Large"),"Medium",225,"Large",450))),"(?i)"&INDEX(D1:O1,,c)),1,)))
see the WHY LAMBDA? part of this answer to understand the LAMBDA
the formula contains 2x LAMBDA and there are a total of 4 placeholders which translates to:
r - COUNTA(B2:B)
c - COUNTA(D1:O1)
ax - REGEXEXTRACT(INDEX(B2:B, r), "([^\s]*?) Subscription")
bx - IFNA(SWITCH(REGEXEXTRACT(INDEX(C2:C, r), "Small|Medium|Large"),
"Small", 250, "Medium", 450, "Large", 900),
SWITCH(REGEXEXTRACT(INDEX(B2:B, r), "Medium|Large"),
"Medium", 225, "Large", 450))
r counts how many items are in B column
c counts how many items are in row 1 of range D1:O1
ax extracts the word from B column that precedes the word Subscription
bx is a bit complex but essentially it extracts from C column word Small or Medium or Large and replaces it with 250, 450 or 900 respectively. then if C column does not contain one of those 3 words it checks for Medium or Large within B column and assigns 225 or 450 respectively
what we are left with is the core of the formula:
IFS( REGEXMATCH(ax, "Mixed")*
REGEXMATCH(INDEX(C2:C, r), "Blend")*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "BLEND-"&bx&"|FILTER-"&bx,
___________________________________________________________________________
REGEXMATCH(ax, "Mixed")*
NOT(REGEXMATCH(INDEX(C2:C, r), "Blend"))*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "ESP-"&bx&"|FILTER-"&bx,
___________________________________________________________________________
REGEXMATCH(ax, "Mixed")*
NOT(REGEXMATCH(INDEX(C2:C, r), "Filter")), "BLEND-"&bx&"|ESP-"&bx,
___________________________________________________________________________
LEN(ax), SUBSTITUTE(ax&"-"&bx, "Espresso", "ESP"))
for better visualization, the IFS formula contains only 4 elements. each of these 4 elements acts as a switch - if there is a match x we get output y. for example let's dissect the first element...
REGEXMATCH(ax, "Mixed")*
REGEXMATCH(INDEX(C2:C, r), "Blend")*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "BLEND-"&bx&"|FILTER-"&bx
there are 3x REGEXMATCHes multiplied by each other. whenever there is such multiplication in array formulae it translates as AND logic gate (if there would be + it would mean OR logic gate) eg.:
1 * 1 = 1
1 * 0 = 0
0 * 1 = 0
0 * 0 = 0
REGEXMATCH outputs TRUE or FALSE so if we get 3x TRUE the whole argument is considered as TRUE (because 1 * 1 * 1 = 1) so we proceed to output our first switch
therefore if B column contains Mixed and C column contains Blend and C column contains Filter then we output Blend-000|Filter-000 where 000 stands for a specific number determined from bx placeholder/formula and also you can notice the | (which btw stands for OR logic within the regex) but in this case, it's just a unique symbol to join stuff for REGEXMATCH. which REGEXMATCH is this for you may ask? ...this one:
so the output of IFS formula is the input for most outer REGEXMATCH and we check if the IFS output matches something within D1:O1 range. IF yes then output 1 otherwise output nothing. shortened:
IF(REGEXMATCH(IFS(...), "(?i)"&INDEX(D1:O1,,c), 1, )
(?i) in regex means "case insensitive". it is there just for safety reasons because regex is by default case sensitive.
and we reached the MAKEARRAY formula that creates an array of numbers across the whole range with height r and width c where output is the result of IF eg. either 1 or empty cell

hypothesis function space in decision tree

I am reading the book "Artificial Intelligence" by Stuart Russell and Peter Norvig (Chapter 18). The following paragraph is from the decision trees context.
For a wide variety of problems, the decision tree format yields a
nice, concise result. But some functions cannot be represented
concisely. For example, the majority function, which returns true if
and only if more than half of the inputs are true, requires an
exponentially large decision tree.
In other words, decision trees are good for some kinds of functions
and bad for others. Is there any kind of representation that is
efficient for all kinds of functions? Unfortunately, the answer is no.
We can show this in a general way. Consider the set of all Boolean
functions on "n" attributes. How many different functions are in this
set? This is just the number of different truth tables that we can
write down, because the function is defined by its truth table.
A truth table over "n" attributes has 2^n rows, one for each
combination of values of the attributes.
We can consider the “answer” column of the table as a 2^n-bit number
that defines the function. That means there are (2^(2^n)) different
functions (and there will be more than that number of trees, since
more than one tree can compute the same function). This is a scary
number. For example, with just the ten Boolean attributes of our
restaurant problem there are 2^1024 or about 10^308 different
functions to choose from.
What does author mean by "answer" column of the table as a 2^n-bit number that defines the function?
How did author derive (2^(2^n)) different functions?
Please elaborate on above question, preferably with simple example, such as n = 3.
Consider a general truth table for a 3-input function, where the result for each triple is also a Boolean (1 or 0), represented by variables i through 'p':
A B C f(a,b,c)
0 0 0 i
0 0 1 j
0 1 0 k
0 1 1 l
1 0 0 m
1 0 1 n
1 1 0 o
1 1 1 p
We can now represent any function on three variables as an 8-bit number, ijklmnop. For instance, and is 00000001; or is 01111111; one_hot (exactly one input True) is 01101000.
For 3 variables, you have 2^3 bits in the "answer", the complete function definition. Since there are 8 bits in the "answer", there are 2^8 possible functions we can define.
Does that outline the field of comprehension for you?
More detail on an example function
You simply (once you see the pattern) make the eight bits correspond to the entires in the table. For instance, the table for one-hot looks like this:
A B C f(a,b,c)
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0
Reading down the "answer" column, labeled f(a,b,c), you get the 8-bit sequence 01101000. That 8-bit number is sufficient to completely define the function: the rows listing all the combinations of a, b, c are in a fixed (numerical) sequence.
You can write any such function in a template format:
def and(a, b, c):
and_def = '00000001'
index = 4*a + 2*b + 1*c
return and_def[index]
Now, if we generalize this to any 3-input binary function:
def_bin_func(a, b, c, func_def)
return func_def[4*a + 2*b + 1*c]
If you wish, you can further generalize the template for a list of inputs: concatenate the bits and use that integer as the index into the func_def string.
Does that clear it up?

Bitwise operation alternative in Neo4j cypher query

I need to do a bitwise "and" in a cypher query. It seems that cypher does not support bitwise operations. Any suggestions for alternatives?
This is what I want to detect ...
For example 268 is (2^8 + 2^3 + 2^2) and as you can see 2^3 = 8 is a part of my original number. So if I use bitwise AND it will be (100001100) & (1000) = 1000 so this way I can detect if 8 is a part of 268 or not.
How can I do this without bitwise support? any suggestions? I need to do this in cypher.
Another way to perform this type of test using cypher would be to convert your decimal values to collections of the decimals that represent the bits that are set.
// convert the binary number to a collection of decimal parts
// create an index the size of the number to convert
// create a collection of decimals that correspond to the bit locations
with '100001100' as number
, [1,2,4,8,16,32,64,128,256,512,1024,2048,4096] as decimals
with number
, range(length(number)-1,0,-1) as index
, decimals[0..length(number)] as decimals
// map the bits to decimal equivalents
unwind index as i
with number, i, (split(number,''))[i] as binary_placeholder, decimals[-i-1] as decimal_placeholder
// multiply the decimal value by the bits that are set
with collect(decimal_placeholder * toInt(binary_placeholder)) as decimal_placeholders
// filter out the zero values from the collection
with filter(d in decimal_placeholders where d > 0) as decimal_placeholders
return decimal_placeholders
Here is a sample of what this returns.
Then when you want to test whether the number is in the decimal, you can just test the actual decimal for presence in the collection.
with [4, 8, 256] as decimal_placeholders
, 8 as decimal_to_test
return
case
when decimal_to_test in decimal_placeholders then
toString(decimal_to_test) + ' value bit is set'
else
toString(decimal_to_test) + ' value bit is NOT set'
end as bit_set_test
Alternatively, if one had APOC available they could use apoc.bitwise.op which is a wrapper around the java bitwise operations.
RETURN apoc.bitwise.op(268, "&",8 ) AS `268_AND_8`
Which yields the following result
If you absolutely have to do the operation in cypher probably a better solution would be to implement something like #evan's SO solution Alternative to bitwise operation using cypher.
You could start by converting your data using cypher that looks something like this...
// convert binary to a product of prime numbers
// start with the number to conver an a collection of primes
with '100001100' as number
, [2,3,5,7,13,17,19,23,29,31,37] as primes
// create an index based on the size of the binary number to convert
// take a slice of the prime array that is the size of the number to convert
with number
, range(length(number)-1,0,-1) as index
, primes[0..length(number)] as primes, decimals[0..length(number)] as decimals
// iterate over the index and match the prime number to the bits in the number to convert
unwind index as i
with (split(number,''))[i] as binary_place_holder, primes[-i-1] as prime_place_holder, decimals[-i-1] as decimal_place_holder
// collect the primes that are set by multiplying by the set bits
with collect(toInt(binary_place_holder) * prime_place_holder) as prime_placeholders
// filter out the zero bits
with filter(p in prime_placeholders where p > 0) as prime_placeholders
// return a product of primes of the set bits
return prime_placeholders, reduce(pp = 1, p in prime_placeholders | pp * p) as prime_product
Sample of the output of the above query. The query could be adapted to update attributes with the prime product.
Here is a screen cap of how the conversion breaks down
Then when you want to use it you could use the modulus of the prime number in the location of the bit you want to test.
// test if the fourth bit is set in the decimal 268
// 268 is the equivalent of a prime product of 1015
// a modulus 7 == 0 will indicate the bit is set
with 1015 as prime_product
, [2,3,5,7,13,17,19,23,29,31,37] as primes
, 4 as bit_to_test
with bit_to_test
, prime_product
, primes[bit_to_test-1] as prime
, prime_product % primes[bit_to_test-1] as mod_remains
with
case when mod_remains = 0 then
'bit ' + toString(bit_to_test) + ' set'
else
'bit ' + toString(bit_to_test) + ' NOT set'
end as bit_set
return bit_set
It almost certainly defeats the purpose of choosing a bitwise operation in the first place but if you absolutely needed to AND the two binary numbers in cypher you could do something like this with collections.
with split('100001100', '') as bin_term_1
, split('000001000', '') as bin_term_2
, toString(1) as one
with bin_term_1, bin_term_2, one, range(0,size(bin_term_1)-1,1) as index
unwind index as i
with i, bin_term_1, bin_term_2, one,
case
when (bin_term_1[i] = one) and (bin_term_2[i] = one) then
1
else
0
end as r
return collect(r) as AND
Thanks Dave. I tried your solutions and they all worked. They were a good hint for me to find another approach. This is how I solved it. I used String comparison.
with '100001100' as number , '100000000' as sub_number
with number,sub_number,range(length (number)-1,length (number)-length(sub_number),-1) as tail,length (number)-length(sub_number) as difference
unwind tail as i
with i,sub_number,number, i - length (number) + length (sub_number) as sub_number_position
with sub_number_position, (split(number,''))[i-1] as bit_mask , (split(sub_number,''))[sub_number_position] as sub_bit
with collect(toInt(bit_mask) * toInt(sub_bit)) as result
return result
Obviously the number and sub_number can have different values.

Odd Checksum Result(s) - Not Receiving Expected Results

I have been trying to produce a checksum based on a file header and am receiving conflicting results. In the slave devices manual, it states the following to produce the checksum:
"A simple eight-bit calculation is used for the header checksum. The steps required are as follows:
Calculate the sum of the header bytes in a single byte. Alternatively calculate
the sum and then AND the result with FFhex.
The checksum = FFhex - the sum from step 1."
Here, I have created the following code in Lua:
function header_checksum(string)
local sum = 0
for i = 1, #string do
sum = sum + string.byte(i)
end
local chksum = 255 - (sum & 255)
return chksum
end
If I send the following (4x byte) string down print(header_checksum("0181B81800")) I get the following result:
241 (string sent as you see it)
0 (each byte is changed to hex and then sent to function)
In the example given, it states that the byte should be AD, which is 173(dec) or \255.
Can someone please tell me what is wrong with what I am doing; either the code written, my approach, or both?
function header_checksum(header)
local sum = -1
for i = 1, #header do
sum = sum - header:byte(i)
end
return sum % 256
end
print(header_checksum(string.char(0x01,0x81,0xB8,0x18,0x00))) --> 173

Constrained Sequence to Index Mapping

I'm puzzling over how to map a set of sequences to consecutive integers.
All the sequences follow this rule:
A_0 = 1
A_n >= 1
A_n <= max(A_0 .. A_n-1) + 1
I'm looking for a solution that will be able to, given such a sequence, compute a integer for doing a lookup into a table and given an index into the table, generate the sequence.
Example: for length 3, there are 5 the valid sequences. A fast function for doing the following map (preferably in both direction) would be a good solution
1,1,1 0
1,1,2 1
1,2,1 2
1,2,2 3
1,2,3 4
The point of the exercise is to get a packed table with a 1-1 mapping between valid sequences and cells.
The size of the set in bounded only by the number of unique sequences possible.
I don't know now what the length of the sequence will be but it will be a small, <12, constant known in advance.
I'll get to this sooner or later, but though I'd throw it out for the community to have "fun" with in the meantime.
these are different valid sequences
1,1,2,3,2,1,4
1,1,2,3,1,2,4
1,2,3,4,5,6,7
1,1,1,1,2,3,2
these are not
1,2,2,4
2,
1,1,2,3,5
Related to this
There is a natural sequence indexing, but no so easy to calculate.
Let look for A_n for n>0, since A_0 = 1.
Indexing is done in 2 steps.
Part 1:
Group sequences by places where A_n = max(A_0 .. A_n-1) + 1. Call these places steps.
On steps are consecutive numbers (2,3,4,5,...).
On non-step places we can put numbers from 1 to number of steps with index less than k.
Each group can be represent as binary string where 1 is step and 0 non-step. E.g. 001001010 means group with 112aa3b4c, a<=2, b<=3, c<=4. Because, groups are indexed with binary number there is natural indexing of groups. From 0 to 2^length - 1. Lets call value of group binary representation group order.
Part 2:
Index sequences inside a group. Since groups define step positions, only numbers on non-step positions are variable, and they are variable in defined ranges. With that it is easy to index sequence of given group inside that group, with lexicographical order of variable places.
It is easy to calculate number of sequences in one group. It is number of form 1^i_1 * 2^i_2 * 3^i_3 * ....
Combining:
This gives a 2 part key: <Steps, Group> this then needs to be mapped to the integers. To do that we have to find how many sequences are in groups that have order less than some value. For that, lets first find how many sequences are in groups of given length. That can be computed passing through all groups and summing number of sequences or similar with recurrence. Let T(l, n) be number of sequences of length l (A_0 is omitted ) where maximal value of first element can be n+1. Than holds:
T(l,n) = n*T(l-1,n) + T(l-1,n+1)
T(1,n) = n
Because l + n <= sequence length + 1 there are ~sequence_length^2/2 T(l,n) values, which can be easily calculated.
Next is to calculate number of sequences in groups of order less or equal than given value. That can be done with summing of T(l,n) values. E.g. number of sequences in groups with order <= 1001010 binary, is equal to
T(7,1) + # for 1000000
2^2 * T(4,2) + # for 001000
2^2 * 3 * T(2,3) # for 010
Optimizations:
This will give a mapping but the direct implementation for combining the key parts is >O(1) at best. On the other hand, the Steps portion of the key is small and by computing the range of Groups for each Steps value, a lookup table can reduce this to O(1).
I'm not 100% sure about upper formula, but it should be something like it.
With these remarks and recurrence it is possible to make functions sequence -> index and index -> sequence. But not so trivial :-)
I think hash with out sorting should be the thing.
As A0 always start with 0, may be I think we can think of the sequence as an number with base 12 and use its base 10 as the key for look up. ( Still not sure about this).
This is a python function which can do the job for you assuming you got these values stored in a file and you pass the lines to the function
def valid_lines(lines):
for line in lines:
line = line.split(",")
if line[0] == 1 and line[-1] and line[-1] <= max(line)+1:
yield line
lines = (line for line in open('/tmp/numbers.txt'))
for valid_line in valid_lines(lines):
print valid_line
Given the sequence, I would sort it, then use the hash of the sorted sequence as the index of the table.

Resources