Related
I have a small dataset, like this
_d=pd.DataFrame([
[1,2.0,'a','mango','2017-07-07',1],
[2,2.55,'b','apple','2017-08-07',0],
[3,5.7,np.nan,'bannan',np.nan,1],
[4,np.nan,'d','grpaes','2017-09-07',1],
[5,5.7,'e','pineapple','2017-10-07',0],
[6,8.3,np.nan,'orange','2017-01-07',0],
[5,5.7,'e',np.nan,'2017-10-07',1],
[6,np.nan,'f',np.nan,np.nan,0],
[7,6.8,'g','pomegranate','2017-02-07',1],
[np.nan,55.5,'h','water melon','2017-03-07',0],
[9,6.8,'i','mango',np.nan,1],
[10,3.5,np.nan,'orange','2017-06-07',1],
[11,2.78,'k','pomegranate','2017-09-07',0]
] ,columns=['ind','score','grade','group','da','target']
)
To handle NaN values and encode the category features, I used this code
y=_d['target']
x=_d.drop(['target'],axis=1)
int_columns=_d.select_dtypes(['float64','int64']).columns
obj_columns=_d.select_dtypes(['object','category']).columns
int_pipeline=Pipeline([
('impute_values',SimpleImputer(missing_values=np.nan,strategy='mean')),
('scaling',StandardScaler())
])
cat_pipeline=Pipeline([
('cat_impute',SimpleImputer(strategy='constant',fill_value='missing')),
('encoding',OneHotEncoder(drop='first'))
])
column_trans=ColumnTransformer(transformers=[
('int_p',int_pipeline,['ind', 'score']),
('cat_p',cat_pipeline,['grade', 'group'])
],remainder='passthrough')
mdl_pipeline=Pipeline([
('value_transform',column_trans)
])
transformed_data=mdl_pipeline.fit_transform(x,y)
When I run this code, I get the following error
ValueError: could not convert string to float: '2017-07-07'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call
last) Input In [253], in <cell line: 27>()
18 column_trans=ColumnTransformer(transformers=[
19 ('int_p',int_pipeline,['ind', 'score']),
20 ('cat_p',cat_pipeline,['grade', 'group'])
21 ],remainder='passthrough')
23 mdl_pipeline=Pipeline([
24 ('value_transform',column_trans)
25 # ,('mdl',LogisticRegression())
26 ])
---> 27 transformed_data=mdl_pipeline.fit_transform(x,y)
File ~\Anaconda3\lib\site-packages\sklearn\pipeline.py:434, in
Pipeline.fit_transform(self, X, y, **fit_params)
432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
433 if hasattr(last_step, "fit_transform"):
--> 434 return last_step.fit_transform(Xt, y, **fit_params_last_step)
435 else:
436 return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)
File
~\Anaconda3\lib\site-packages\sklearn\compose_column_transformer.py:699,
in ColumnTransformer.fit_transform(self, X, y)
696 self._validate_output(Xs)
697 self._record_output_indices(Xs)
--> 699 return self._hstack(list(Xs))
File
~\Anaconda3\lib\site-packages\sklearn\compose_column_transformer.py:783,
in ColumnTransformer._hstack(self, Xs)
778 converted_Xs = [
779 check_array(X, accept_sparse=True, force_all_finite=False)
780 for X in Xs
781 ]
782 except ValueError as e:
--> 783 raise ValueError(
784 "For a sparse output, all columns should "
785 "be a numeric or convertible to a numeric."
786 ) from e
788 return sparse.hstack(converted_Xs).tocsr()
789 else:
ValueError: For a sparse output, all columns should be a numeric or
convertible to a numeric.
The value error
ValueError: could not convert string to float: '2017-07-07'
doesnt make any sense, as I have set remainder='passthrough', In the columnTransformer, Why is my code not working
Set sparse_threshold=0 of your ColumnTransformer. Otherwise, according to doc:
If the output of the different transformers contains sparse matrices, these will be stacked as a sparse matrix if the overall density is lower than this value. Use sparse_threshold=0 to always return dense. When the transformed output consists of all dense data, the stacked result will be dense, and this keyword will be ignored.
So it's trying to convert results of OneHotEncoder to sprase matrix but it can't since sparse matrices require numerical values (hence attempt to convert to something numerical)
I was wondering, how would I make a simple lua string or entire code look look C++ compiled code but run as regular vanilla lua?
print("Test string") -- How would this look like C++ compiler code?
With Lua you can not directly dump print to a binary Format.
...as i know.
Dumping a Function to a Binary is easy doing with own defined Functions...
> -- Lua 5.4
> myfunc = function() print("Teststring") return end
> string.dump(myfunc, true)
uaT�
�
xV(w#����
��DGG��print�Teststring������
> load(string.dump(myfunc, true))()
Teststring
As you can see, like in a compiled C Binary the Constants are not obfuscated.
More obfuscating you can reach with converting the binary String to Bytecode...
> string.dump(myfunc, true):byte(1, -1)
27 76 117 97 84 0 25 147 13 10 26 10 4 8 8 120 86 0 0 0 0 0 0 0 00 0 0 40 119 64 1 128 129 129 0 0 2 133 11 0 0 0 131 128 0 0 68 0 21 71 0 1 0 71 0 1 0 130 4 134 112 114 105 110 116 4 139 84 101 115 116 115 116 114 105 110 103 129 0 0 0 128 128 128 128 128
...and for converting back later lets put it into a table...
> byte_code_tab = {string.dump(myfunc, true):byte(1, -1)}
> table.concat(byte_code_tab,',')
27,76,117,97,84,0,25,147,13,10,26,10,4,8,8,120,86,0,0,0,0,0,0,0,0,0,0,0,40,119,64,1,128,129,129,0,0,2,133,11,0,0,0,131,128,0,0,68,0,2,1,71,0,1,0,71,0,1,0,130,4,134,112,114,105,110,116,4,139,84,101,115,116,115,116,114,105,110,103,129,0,0,0,128,128,128,128,128
...now a function is needed to get it back...
> bytes_dec = function(tab) local txt = '' for k, v in pairs(tab) do txt = txt .. tostring(v):char() end return txt end
> bytes_dec(byte_code_tab)
uaT�
�
xV(w#����
��DGG��print�Teststring������
> load(bytes_dec(byte_code_tab))()
Teststring
EDIT
To show how it work with a single Lua file that returning a table with a __call metamethod check out this...
-- obfsc.lua
return setmetatable({27,76,117,97,84,0,25,147,13,10,26,10,4,8,8,120,86,0,0,0,0,0,0,0,0,0,0,0,40,119,64,1,128,129,129,0,0,2,133,11,0,0,0,131,128,0,0,68,0,2,1,71,0,1,0,71,0,1,0,130,4,134,112,114,105,110,116,4,139,84,101,115,116,115,116,114,105,110,103,129,0,0,0,128,128,128,128,128},
{__call = function(self, ...)
local txt = ''
for k, v in pairs(self) do
txt = txt .. tostring(v):char()
end
return load(txt)()
end})
...the bytes_dec function is stored in the __call metamethod...
$ /usr/local/bin/lua
Lua 5.4.4 Copyright (C) 1994-2022 Lua.org, PUC-Rio
> require('obfsc')
table: 0x565d3650 ./obfsc.lua
> require('obfsc')()
Teststring
...and do also the load()
But it is up to you where you store: bytes_dec()
Another nice method is ROT.
Its very simple and also old but good enough for de/obfuscating.
An Impression...
$ /bin/lua
Lua 5.1.5 Copyright (C) 1994-2012 Lua.org, PUC-Rio
> rot=require('rot')
> -- Lets rotate the Banner
> print(rot('Lua 5.1.5 Copyright (C) 1994-2012 Lua.org, PUC-Rio'))
5!`unqnu``/092)'(4`hi`qyytmrpqr`
5!n/2'l`m)/ 51
> -- Now read source of rot.lua into rot_src and print it
> rot_src = io.open('rot.lua'):read('*a')
> print(rot_src)
-- rot.lua
local rotator = function(...)
local args, rot, c = {...}, {}, ''
for i = 1, 63 do rot[c.char(i)] = c.char(i + 64) end
for i = 64, 127 do rot[c.char(i)] = c.char(i - 64) end
return args[1]:gsub('.', rot)
end
return rotator
> -- Obfuscate the source and print it
> rot_obfsc = rot(rot_src)
> print(rot_obfsc)
mm`2/4n,5!J,/#!,`2/4!4/2`}`&5.#4)/.hnnniJ,/#!,`!2'3l`2/4l`#`}`;nnn=l`;=l`ggJJ&/2`)`}`ql`vs`$/`2/4#(!2h)i`}`#n#(!2h)`k`vti`%.$J&/2`)`}`vtl`qrw`$/`2/4#(!2h)i`}`#n#(!2h)`m`vti`%.$JJ2%452.`!2'3z'35"hgngl`2/4iJ%.$JJ2%452.`2/4!4/2J
> -- Deobfuscate and print on the fly
> print(rot(rot_obfsc))
-- rot.lua
local rotator = function(...)
local args, rot, c = {...}, {}, ''
for i = 1, 63 do rot[c.char(i)] = c.char(i + 64) end
for i = 64, 127 do rot[c.char(i)] = c.char(i - 64) end
return args[1]:gsub('.', rot)
end
return rotator
236
I have built a RNN with BasicRNN now I want to use the LSTMCell but the passage does not seem trivial. What should I change?
First i define all the placeholders and variables:
X_placeholder = tf.placeholder(tf.float32, [batch_size, truncated_backprop_length, embedding_size])
Y_placeholder = tf.placeholder(tf.int32, [batch_size, truncated_backprop_length])
init_state = tf.placeholder(tf.float32, [batch_size, state_size])
W = tf.Variable(np.random.rand(state_size, num_classes),dtype=tf.float32)
b = tf.Variable(np.zeros((batch_size, num_classes)), dtype=tf.float32)
W2 = tf.Variable(np.random.rand(state_size, num_classes),dtype=tf.float32)
b2 = tf.Variable(np.zeros((batch_size, num_classes)), dtype=tf.float32)
Then I unstack the labels:
labels_series = tf.transpose(batchY_placeholder)
labels_series = tf.unstack(batchY_placeholder, axis=1)
inputs_series = X_placeholder
Then i define my RNN:
cell = tf.contrib.rnn.BasicLSTMCell(state_size, state_is_tuple = False)
states_series, current_state = tf.nn.dynamic_rnn(cell, inputs_series, initial_state = init_state)
The error that I get is:
InvalidArgumentError Traceback (most recent call last)
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, debug_python_shape_fn, require_shape_fn)
669 node_def_str, input_shapes, input_tensors, input_tensors_as_shapes,
--> 670 status)
671 except errors.InvalidArgumentError as err:
/home/deepnlp2017/anaconda3/lib/python3.5/contextlib.py in __exit__(self, type, value, traceback)
65 try:
---> 66 next(self.gen)
67 except StopIteration:
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()
468 compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 469 pywrap_tensorflow.TF_GetCode(status))
470 finally:
InvalidArgumentError: Dimensions must be equal, but are 50 and 100 for 'rnn/while/basic_lstm_cell/mul' (op: 'Mul') with input shapes: [32,50], [32,100].
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-19-2ac617f4dde4> in <module>()
4 #cell = tf.contrib.rnn.BasicRNNCell(state_size)
5 cell = tf.contrib.rnn.BasicLSTMCell(state_size, state_is_tuple = False)
----> 6 states_series, current_state = tf.nn.dynamic_rnn(cell, inputs_series, initial_state = init_state)
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py in dynamic_rnn(cell, inputs, sequence_length, initial_state, dtype, parallel_iterations, swap_memory, time_major, scope)
543 swap_memory=swap_memory,
544 sequence_length=sequence_length,
--> 545 dtype=dtype)
546
547 # Outputs of _dynamic_rnn_loop are always shaped [time, batch, depth].
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py in _dynamic_rnn_loop(cell, inputs, initial_state, parallel_iterations, swap_memory, sequence_length, dtype)
710 loop_vars=(time, output_ta, state),
711 parallel_iterations=parallel_iterations,
--> 712 swap_memory=swap_memory)
713
714 # Unpack final output if not using output tuples.
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py in while_loop(cond, body, loop_vars, shape_invariants, parallel_iterations, back_prop, swap_memory, name)
2624 context = WhileContext(parallel_iterations, back_prop, swap_memory, name)
2625 ops.add_to_collection(ops.GraphKeys.WHILE_CONTEXT, context)
-> 2626 result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
2627 return result
2628
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py in BuildLoop(self, pred, body, loop_vars, shape_invariants)
2457 self.Enter()
2458 original_body_result, exit_vars = self._BuildLoop(
-> 2459 pred, body, original_loop_vars, loop_vars, shape_invariants)
2460 finally:
2461 self.Exit()
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py in _BuildLoop(self, pred, body, original_loop_vars, loop_vars, shape_invariants)
2407 structure=original_loop_vars,
2408 flat_sequence=vars_for_body_with_tensor_arrays)
-> 2409 body_result = body(*packed_vars_for_body)
2410 if not nest.is_sequence(body_result):
2411 body_result = [body_result]
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py in _time_step(time, output_ta_t, state)
695 skip_conditionals=True)
696 else:
--> 697 (output, new_state) = call_cell()
698
699 # Pack state if using state tuples
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py in <lambda>()
681
682 input_t = nest.pack_sequence_as(structure=inputs, flat_sequence=input_t)
--> 683 call_cell = lambda: cell(input_t, state)
684
685 if sequence_length is not None:
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py in __call__(self, inputs, state, scope)
182 i, j, f, o = array_ops.split(value=concat, num_or_size_splits=4, axis=1)
183
--> 184 new_c = (c * sigmoid(f + self._forget_bias) + sigmoid(i) *
185 self._activation(j))
186 new_h = self._activation(new_c) * sigmoid(o)
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y)
882 if not isinstance(y, sparse_tensor.SparseTensor):
883 y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
--> 884 return func(x, y, name=name)
885
886 def binary_op_wrapper_sparse(sp_x, y):
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py in _mul_dispatch(x, y, name)
1103 is_tensor_y = isinstance(y, ops.Tensor)
1104 if is_tensor_y:
-> 1105 return gen_math_ops._mul(x, y, name=name)
1106 else:
1107 assert isinstance(y, sparse_tensor.SparseTensor) # Case: Dense * Sparse.
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py in _mul(x, y, name)
1623 A `Tensor`. Has the same type as `x`.
1624 """
-> 1625 result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
1626 return result
1627
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py in apply_op(self, op_type_name, name, **keywords)
761 op = g.create_op(op_type_name, inputs, output_types, name=scope,
762 input_types=input_types, attrs=attr_protos,
--> 763 op_def=op_def)
764 if output_structure:
765 outputs = op.outputs
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py in create_op(self, op_type, inputs, dtypes, input_types, name, attrs, op_def, compute_shapes, compute_device)
2395 original_op=self._default_original_op, op_def=op_def)
2396 if compute_shapes:
-> 2397 set_shapes_for_outputs(ret)
2398 self._add_op(ret)
2399 self._record_op_seen_by_control_dependencies(ret)
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py in set_shapes_for_outputs(op)
1755 shape_func = _call_cpp_shape_fn_and_require_op
1756
-> 1757 shapes = shape_func(op)
1758 if shapes is None:
1759 raise RuntimeError(
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py in call_with_requiring(op)
1705
1706 def call_with_requiring(op):
-> 1707 return call_cpp_shape_fn(op, require_shape_fn=True)
1708
1709 _call_cpp_shape_fn_and_require_op = call_with_requiring
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py in call_cpp_shape_fn(op, input_tensors_needed, input_tensors_as_shapes_needed, debug_python_shape_fn, require_shape_fn)
608 res = _call_cpp_shape_fn_impl(op, input_tensors_needed,
609 input_tensors_as_shapes_needed,
--> 610 debug_python_shape_fn, require_shape_fn)
611 if not isinstance(res, dict):
612 # Handles the case where _call_cpp_shape_fn_impl calls unknown_shape(op).
/home/deepnlp2017/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, debug_python_shape_fn, require_shape_fn)
673 missing_shape_fn = True
674 else:
--> 675 raise ValueError(err.message)
676
677 if missing_shape_fn:
ValueError: Dimensions must be equal, but are 50 and 100 for 'rnn/while/basic_lstm_cell/mul' (op: 'Mul') with input shapes: [32,50], [32,100].
You should consider giving the error trace. Otherwise it is hard (or impossible) to help.
I reproduced the situation and found that the issue was coming from state unpacking, i.e. line c, h = state.
Try to set state_is_tuple to false i.e.
cell = tf.contrib.rnn.BasicLSTMCell(state_size, state_is_tuple=False)
I'm not sure why this is happening. Are you loading a previous model? What is your tensorflow version?
More information on TensorFlow RNN Cells:
I would suggest you to take a look at: WildML post, section "RNN CELLS, WRAPPERS AND MULTI-LAYER RNNS".
It states that:
BasicRNNCell – A vanilla RNN cell.
GRUCell – A Gated Recurrent Unit cell.
BasicLSTMCell – An LSTM cell based on Recurrent Neural Network Regularization. No peephole connection or cell clipping.
LSTMCell – A more complex LSTM cell that allows for optional peephole connections and cell clipping.
MultiRNNCell – A wrapper to combine multiple cells into a multi-layer cell.
DropoutWrapper – A wrapper to add dropout to input and/or output connections of a cell.
Given this, I would suggest you to switch from BasicRNNCell to BasicLSTMCell. Where Basic here means "use it unless you know what you are doing". If you want to try LSTMs without going into details, thats the way to go. It may be straightforward, just replace with it and voilà!
If not, share some of your code + error.
Hope it helps
The problem seems to be with the init_state variable.
Basic RNN cells only have one state variable while LSTM has a visible and a hidden state. Specify the options state_is_tuple=False will concat the two state variables into one, therefore double the size of what you have specified in the init_state declaration.
To avoid this, one can use the built-in zero_state method for an LSTMCell to initialize the state in the correct way without worrying about size differences.
So it would simply be:
init_state = cell.zero_state(batch_size, dtype)
Of course will will have to be placed after the line where the cell is built.
I want to make a script that takes any number, counts up to them and returns them in a format.
so like this
for i = 1,9 do
print(i)
end
will return
1
2
3
4
5
6
7
8
9
however I want it to print like this
1 2 3
4 5 6
7 8 9
and I want it to work even with things more than 9 so things like 20 would be like this
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20
I'm sure it can be done using the string library in lua but I am not sure how to use that library.
Any help?
function f(n,per_line)
per_line = per_line or 3
for i = 1,n do
io.write(i,'\t')
if i % per_line == 0 then io.write('\n') end
end
end
f(9)
f(20)
The for loop takes an optional third step:
for i = 1, 9, 3 do
print(string.format("%d %d %d", i, i + 1, i + 2))
end
I can think of 2 ways to do this:
local NUMBER = 20
local str = {}
for i=1,NUMBER-3,3 do
table.insert(str,i.." "..i+1 .." "..i+2)
end
local left = {}
for i=NUMBER-NUMBER%3+1,NUMBER do
table.insert(left,i)
end
str = table.concat(str,"\n").."\n"..table.concat(left," ")
And another one using gsub:
local NUMBER = 20
local str = {}
for i=1,NUMBER do
str[i] = i
end
-- Makes "1 2 3 4 ..."
str = table.concat(str," ")
-- Divides it per 3 numbers
-- "%d+ %d+ %d+" matches 3 numbers divided by spaces
-- (You can replace the spaces (including in concat) with "\t")
-- The (...) capture allows us to get those numbers as %1
-- The "%s?" at the end is to remove any trailing whitespace
-- (Else each line would be "N N N " instead of "N N N")
-- (Using the '?' as the last triplet might not have a space)
-- ^ e.g. NUMBER = 6 would make it end with "4 5 6"
-- The "%1\n" just gets us our numbers back and adds a newline
str = str:gsub("(%d+ %d+ %d+)%s?","%1\n")
print(str)
I've benchmarked both code snippets. The upper one is a tiny bit faster, although the difference is almost nothing:
Benchmarked using 10000 interations
NUMBER 20 20 20 100 100
Upper 256 ms 276 ms 260 ms 1129 ms 1114 ms
Lower 284 ms 280 ms 282 ms 1266 ms 1228 ms
Use a temporary table to contain the values until you print them:
local temp = {}
local cols = 3
for i = 1,9 do
if #temp == cols then
print(table.unpack(temp))
temp = {}
end
temp[#temp + 1] = i
end
--Last minute check for leftovers
if #temp > 0 then
print(table.unpack(temp))
end
temp = nil
Referring to the original problem: Optimizing hand-evaluation algorithm for Poker-Monte-Carlo-Simulation
I have a list of 5 to 7 cards and want to store their value in a hashtable, which should be an array of 32-bit-integers and directly accessed by the hashfunctions value as index.
Regarding the large amount of possible combinations in a 52-card-deck, I don't want to waste too much memory.
Numbers:
7-card-combinations: 133784560
6-card-combinations: 20358520
5-card-combinations: 2598960
Total: 156.742.040 possible combinations
Storing 157 million 32-bit-integer values costs about 580MB. So I would like to avoid increasing this number by reserving memory in an array for values that aren't needed.
So the question is: How could a hashfunction look like, that maps each possible, non duplicated combination of cards to a consecutive value between 0 and 156.742.040 or at least comes close to it?
Paul Senzee has a great post on this for 7 cards (deleted link as it is broken and now points to a NSFW site).
His code is basically a bunch of pre-computed tables and then one function to look up the array index for a given 7-card hand (represented as a 64-bit number with the lowest 52 bits signifying cards):
inline unsigned index52c7(unsigned __int64 x)
{
const unsigned short *a = (const unsigned short *)&x;
unsigned A = a[3], B = a[2], C = a[1], D = a[0],
bcA = _bitcount[A], bcB = _bitcount[B], bcC = _bitcount[C], bcD = _bitcount[D],
mulA = _choose48x[7 - bcA], mulB = _choose32x[7 - (bcA + bcB)], mulC = _choose16x[bcD];
return _offsets52c[bcA] + _table4[A] * mulA +
_offsets48c[ (bcA << 4) + bcB] + _table [B] * mulB +
_offsets32c[((bcA + bcB) << 4) + bcC] + _table [C] * mulC +
_table [D];
}
In short, it's a bunch of lookups and bitwise operations powered by pre-computed lookup tables based on perfect hashing.
If you go back and look at this website, you can get the perfect hash code that Senzee used to create the 7-card hash and repeat the process for 5- and 6-card tables (essentially creating a new index52c7.h for each). You might be able to smash all 3 into one table, but I haven't tried that.
All told that should be ~628 MB (4 bytes * 157 M entries). Or, if you want to split it up, you can map it to 16-bit numbers (since I believe most poker hand evaluators only need 7,462 unique hand scores) and then have a separate map from those 7,462 hand scores to whatever hand categories you want. That would be 314 MB.
Here's a different answer based on the colex function concept. It works with bitsets that are sorted in descending order. Here's a Python implementation (both recursive so you can see the logic and iterative). The main concept is that, given a bitset, you can always calculate how many bitsets there are with the same number of set bits but less than (in either the lexicographical or mathematical sense) your given bitset. I got the idea from this paper on hand isomorphisms.
from math import factorial
def n_choose_k(n, k):
return 0 if n < k else factorial(n) // (factorial(k) * factorial(n - k))
def indexset_recursive(bitset, lowest_bit=0):
"""Return number of bitsets with same number of set bits but less than
given bitset.
Args:
bitset (sequence) - Sequence of set bits in descending order.
lowest_bit (int) - Name of the lowest bit. Default = 0.
>>> indexset_recursive([51, 50, 49, 48, 47, 46, 45])
133784559
>>> indexset_recursive([52, 51, 50, 49, 48, 47, 46], lowest_bit=1)
133784559
>>> indexset_recursive([6, 5, 4, 3, 2, 1, 0])
0
>>> indexset_recursive([7, 6, 5, 4, 3, 2, 1], lowest_bit=1)
0
"""
m = len(bitset)
first = bitset[0] - lowest_bit
if m == 1:
return first
else:
t = n_choose_k(first, m)
return t + indexset_recursive(bitset[1:], lowest_bit)
def indexset(bitset, lowest_bit=0):
"""Return number of bitsets with same number of set bits but less than
given bitset.
Args:
bitset (sequence) - Sequence of set bits in descending order.
lowest_bit (int) - Name of the lowest bit. Default = 0.
>>> indexset([51, 50, 49, 48, 47, 46, 45])
133784559
>>> indexset([52, 51, 50, 49, 48, 47, 46], lowest_bit=1)
133784559
>>> indexset([6, 5, 4, 3, 2, 1, 0])
0
>>> indexset([7, 6, 5, 4, 3, 2, 1], lowest_bit=1)
0
"""
m = len(bitset)
g = enumerate(bitset)
return sum(n_choose_k(bit - lowest_bit, m - i) for i, bit in g)