Catboost hyperparams search - machine-learning

I want to use default hyperparams in randomized search, how can I do it? (per_float_feature_quantization param here)
grid = {'learning_rate': [0.1, 0.16, 0.2],
'depth': [4, 6, 10],
'l2_leaf_reg': [1, 3, 5, 7, 9],
'iterations': [800, 1000, 1500, 2000],
'bagging_temperature': [1, 2, 3, 4, 5],
'border_count': [128, 256, 512],
'grow_policy': ['SymmetricTree', 'Depthwise'],
'per_float_feature_quantization':[None, '3:border_count=1024']}
model = CatBoostClassifier(loss_function='MultiClass',
custom_metric='Accuracy',
eval_metric='TotalF1',
od_type='Iter',
od_wait=40,
task_type="GPU",
devices='0:1',
random_seed=42,
cat_features=cat_features)
randomized_search_result = model.randomized_search(grid,
X=X,
y=y
)
And I've got
CatBoostError: library/cpp/json/writer/json_value.cpp:499: Not a map

There is an error in one or more of the parameters of your grid. Commenting them out one-by-one should help you identify it.
As a side note, Optuna recently released support for CatBoost, in case you want to try that instead of a grid search. Optuna’s documentation of a CatBoost example.

Related

How can we reduce the size of the graph generated by Maximal Clique and remove the nodes of specific cliques?

I am using the networkx—find_cliques library for finding the maximal cliques in a graph. I want to reduce the size of this graph based on maximal cliques.
Here is the code:
from torch_geometric.utils.convert
import to_networkx from torch_geometric.data import Data
import networkx as nx
edge_list = torch.tensor([
[0, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 7, 7, 8 ], # Source Nodes
[1, 2, 3, 4, 5, 3, 9, 5, 6, 7, 8, 9, 8, 9, 9 ] # Target Nodes
], dtype=torch.long)
node_features = torch.tensor([
[-8, 1, 5, 8, 2, -3], # Features of Node 0
[-1, 0, 2, -3, 0, 1], # Features of Node 1
[1, -1, 0, -1, 2, 1], # Features of Node 2
[0, 1, 4, -2, 3, 4], # Features of Node 3
],dtype=torch.long)
data = Data(x=node_features, edge_index=edge_list, edge_attr=edge_weight)
G_directed = to_networkx(data)
G_undirected = G_directed.to_undirected()
no_cliques= nx.find_cliques(G, nodes=None)
print(No_cliques)
List of Maximal Cliques = {1,[1,2], 2,[2, 3, 4], 3,[4, 5, 6], 4,[6, 7], 5, [7, 8, 9, 10], 6, [10,3]}
In the next step, we reduce the size of the original graph in the coarsened graph as we consider one clique as one node and joint the edge based on this rule, joining two cliques if they are not disjoint. I want to remove such a clique whose node already appeared in other cliques. In the above example, the nodes of clique 6 are already assigned in cliques no 2 and 5. So in the new graph, this clique should be removed from the clique list.
For better understanding, I am posting the picture.
hierarchy of a graph as the coarsened graph at each level
I want to make this type of graph hierarchy based on maximal clique. Does anyone know about it? How can I do it?

How to generate conditions within constraints in Z3py

Let us assume there are 5-time slots and at each time slot, I have 4 options to choose from, each with a known reward, for eg. rewards = [5, 2, 1, -3]. At every time step, at least 1 of the four options must be selected, with a condition that, if option 3 (with reward -3) is chosen at a time t, then for the remaining time steps, none of the options should be selected. As an example, considering the options are indexed from 0, both [2, 1, 1, 0, 3] and [2, 1, 1, 3, 99] are valid solutions with the second solution having option 3 selected in the 3rd time step and 99 is some random value representing no option was chosen.
The Z3py code I tried is here:
T = 6 #Total time slots
s = Solver()
pick = [[Bool('t%d_ch%d' %(j, i)) for i in range(4)] for j in range(T)]
# Rewards of each option
Rewards = [5, 2, 1, -3]
# Select at most one of the 4 options as True
for i in range(T):
s.add(Or(Not(Or(pick[i][0], pick[i][1], pick[i][2], pick[i][3])),
And(Xor(pick[i][0],pick[i][1]), Not(Or(pick[i][2], pick[i][3]))),
And(Xor(pick[i][2],pick[i][3]), Not(Or(pick[i][0], pick[i][1])))))
# If option 3 is picked, then none of the 4 options should be selected for the future time slots
# else, exactly one should be selected.
for i in range(len(pick)-1):
for j in range(4):
s.add(If(And(j==3,pick[i][j]),
Not(Or(pick[i+1][0], pick[i+1][1], pick[i+1][2], pick[i+1][3])),
Or(And(Xor(pick[i+1][0],pick[i+1][1]), Not(Or(pick[i+1][2], pick[i+1][3]))),
And(Xor(pick[i+1][2],pick[i+1][3]), Not(Or(pick[i+1][0], pick[i+1][1]))))))
if s.check()==False:
print("unsat")
m=s.model()
print(m)
With this implementation, I am not getting solutions such as [2, 1, 1, 3, 99]. All of them either do not have option 3 or have it in the last time slot.
I know there is an error inside the If part but I'm unable to figure it out. Is there a better way to achieve such solutions?
It's hard to decipher what you're trying to do. From a basic reading of your description, I think this might be an instance of the XY problem. See https://xyproblem.info/ for details on that, and try to cast your question in terms of what your original goal is; instead of a particular solution, you're trying to implement. (It seems to me that the solution you came up with is unnecessarily complicated.)
Having said that, you can solve your problem as stated if you get rid of the 99 requirement and simply indicate -3 as the terminator. Once you pick -3, then all the following picks should be -3. This can be coded as follows:
from z3 import *
T = 6
s = Solver()
Rewards = [5, 2, 1, -3]
picks = [Int('pick_%d' % i) for i in range(T)]
def pickReward(p):
return Or([p == r for r in Rewards])
for i in range(T):
if i == 0:
s.add(pickReward(picks[i]))
else:
s.add(If(picks[i-1] == -3, picks[i] == -3, pickReward(picks[i])))
while s.check() == sat:
m = s.model()
picked = []
for i in picks:
picked += [m[i]]
print(picked)
s.add(Or([p != v for p, v in zip(picks, picked)]))
When run, this prints:
[5, -3, -3, -3, -3, -3]
[1, 5, 5, 5, 5, 1]
[1, 2, 5, 5, 5, 1]
[2, 2, 5, 5, 5, 1]
[2, 5, 5, 5, 5, 1]
[2, 1, 5, 5, 5, 1]
[1, 1, 5, 5, 5, 1]
[2, 1, 5, 5, 5, 2]
[2, 5, 5, 5, 5, 2]
[2, 5, 5, 5, 5, 5]
[2, 5, 5, 5, 5, -3]
[2, 1, 5, 5, 5, 5]
...
I interrupted the above as it keeps enumerating all the possible picks. There are a total of 1093 of them in this particular case.
(You can get different answers depending on your version of z3.)
Hope this gets you started. Stating what your original goal is directly is usually much more helpful, should you have further questions.

Can I use cvxpy to split integer-2D-array to two arrays?

I have a problem that I wonder if I can solve using cvxpy:
The problem:
I have a two dimensional integers array and I want to split it to two array in a way that each row of the source array is either in the 1st or 2nd array.
The requirement from these arrays us that for each column, the sum of integers in array #1 will be as close as possible to twice the sum of integers in array #2.
Example:
Consider the input array:
[
[1, 2, 3, 4],
[4, 6, 2, 5],
[3, 9, 1, 2],
[8, 1, 0, 9],
[8, 4, 0, 5],
[9, 8, 0, 4]
]
The sums of its columns is [33, 30, 6, 29] so ideally we are looking for 2 arrays that the sums of their columns will be:
Array #1: [22, 20, 4, 19]
Array #2: [11, 10, 2, 10]
Off course this is not always possible but I looking for the best solution for this problem.
A possible solution for this specific example might be:
Array #1:
[
[1, 2, 3, 4],
[4, 6, 2, 5],
[8, 4, 0, 5],
[9, 8, 0, 4]
]
With column sums: [22, 20, 5, 18]
Array #2:
[
[3, 9, 1, 2],
[8, 1, 0, 9],
]
With column sums: [11, 10, 1, 11]
Any suggestions?
You can use a boolean vector variable to select rows. The only thing left to decide is how much to penalize errors. In this case I just used the norm of the difference vector.
import cvxpy as cp
import numpy as np
data = np.array([
[1, 2, 3, 4],
[4, 6, 2, 5],
[3, 9, 1, 2],
[8, 1, 0, 9],
[8, 4, 0, 5],
[9, 8, 0, 4]
])
x = cp.Variable(data.shape[0], boolean=True)
prob = cp.Problem(cp.Minimize(cp.norm((x - 2 * (1 - x)) * data)))
prob.solve()
A = np.round(x.value) # data
B = np.round(1 - x.value) # data
A and B are the sum of rows.
(array([21., 20., 4., 19.]), array([12., 10., 2., 10.]))

Keras Tokenizer num_words doesn't seem to work

>>> t = Tokenizer(num_words=3)
>>> l = ["Hello, World! This is so&#$ fantastic!", "There is no other world like this one"]
>>> t.fit_on_texts(l)
>>> t.word_index
{'fantastic': 6, 'like': 10, 'no': 8, 'this': 2, 'is': 3, 'there': 7, 'one': 11, 'other': 9, 'so': 5, 'world': 1, 'hello': 4}
I'd have expected t.word_index to have just the top 3 words. What am I doing wrong?
There is nothing wrong in what you are doing. word_index is computed the same way no matter how many most frequent words you will use later (as you may see here). So when you will call any transformative method - Tokenizer will use only three most common words and at the same time, it will keep the counter of all words - even when it's obvious that it will not use it later.
Just a add on Marcin's answer ("it will keep the counter of all words - even when it's obvious that it will not use it later.").
The reason it keeps counter on all words is that you can call fit_on_texts multiple times. Each time it will update the internal counters, and when transformations are called, it will use the top words based on the updated counters.
Hope it helps.
Limiting num_words to a small number (eg, 3) has no effect on fit_on_texts outputs such as word_index, word_counts, word_docs. It does have effect on texts_to_matrix. The resulting matrix will have num_words (3) columns.
>>> t = Tokenizer(num_words=3)
>>> l = ["Hello, World! This is so&#$ fantastic!", "There is no other world like this one"]
>>> t.fit_on_texts(l)
>>> print(t.word_index)
{'world': 1, 'this': 2, 'is': 3, 'hello': 4, 'so': 5, 'fantastic': 6, 'there': 7, 'no': 8, 'other': 9, 'like': 10, 'one': 11}
>>> t.texts_to_matrix(l, mode='count')
array([[0., 1., 1.],
[0., 1., 1.]])
Just to add a little bit to farid khafizov's answer,
words at sequence of num_words and above are removed from the results of texts_to_sequences (4 in 1st, 5 in 2nd and 6 in 3rd sentence disappeared respectively)
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
print(tf.__version__) # 2.4.1, in my case
sentences = [
'I love my dog',
'I, love my cat',
'You love my dog!'
]
tokenizer = Tokenizer(num_words=4)
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
seq = tokenizer.texts_to_sequences(sentences)
print(word_index) # {'love': 1, 'my': 2, 'i': 3, 'dog': 4, 'cat': 5, 'you': 6}
print(seq) # [[3, 1, 2], [3, 1, 2], [1, 2]]

Dividing elements of a ruby array into an exact number of (nearly) equal-sized sub-arrays [duplicate]

This question already has answers here:
How to chunk an array in Ruby
(2 answers)
Closed 4 years ago.
I need a way to split an array in to an exact number of smaller arrays of roughly-equal size. Anyone have any method of doing this?
For instance
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
groups = a.method_i_need(3)
groups.inspect
=> [[1,2,3,4,5], [6,7,8,9], [10,11,12,13]]
Note that this is an entirely separate problem from dividing an array into chunks, because a.each_slice(3).to_a would produce 5 groups (not 3, like we desire) and the final group may be a completely different size than the others:
[[1,2,3], [4,5,6], [7,8,9], [10,11,12], [13]] # this is NOT desired here.
In this problem, the desired number of chunks is specified in advance, and the sizes of each chunk will differ by 1 at most.
You're looking for Enumerable#each_slice
a = [0, 1, 2, 3, 4, 5, 6, 7]
a.each_slice(3) # => #<Enumerator: [0, 1, 2, 3, 4, 5, 6, 7]:each_slice(3)>
a.each_slice(3).to_a # => [[0, 1, 2], [3, 4, 5], [6, 7]]
Perhaps I'm misreading the question since the other answer is already accepted, but it sounded like you wanted to split the array in to 3 equal groups, regardless of the size of each group, rather than split it into N groups of 3 as the previous answers do. If that's what you're looking for, Rails (ActiveSupport) also has a method called in_groups:
a = [0,1,2,3,4,5,6]
a.in_groups(2) # => [[0,1,2,3],[4,5,6,nil]]
a.in_groups(3, false) # => [[0,1,2],[3,4], [5,6]]
I don't think there is a ruby equivalent, however, you can get roughly the same results by adding this simple method:
class Array; def in_groups(num_groups)
return [] if num_groups == 0
slice_size = (self.size/Float(num_groups)).ceil
groups = self.each_slice(slice_size).to_a
end; end
a.in_groups(3) # => [[0,1,2], [3,4,5], [6]]
The only difference (as you can see) is that this won't spread the "empty space" across all the groups; every group but the last is equal in size, and the last group always holds the remainder plus all the "empty space".
Update:
As #rimsky astutely pointed out, the above method will not always result in the correct number of groups (sometimes it will create multiple "empty groups" at the end, and leave them out). Here's an updated version, pared down from ActiveSupport's definition which spreads the extras out to fill the requested number of groups.
def in_groups(number)
group_size = size / number
leftovers = size % number
groups = []
start = 0
number.times do |index|
length = group_size + (leftovers > 0 && leftovers > index ? 1 : 0)
groups << slice(start, length)
start += length
end
groups
end
Try
a.in_groups_of(3,false)
It will do your job
As mltsy wrote, in_groups(n, false) should do the job.
I just wanted to add a small trick to get the right balance
my_array.in_group(my_array.size.quo(max_size).ceil, false).
Here is an example to illustrate that trick:
a = (0..8).to_a
a.in_groups(4, false) => [[0, 1, 2], [3, 4], [5, 6], [7, 8]]
a.in_groups(a.size.quo(4).ceil, false) => [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
This needs some better cleverness to smear out the extra pieces, but it's a reasonable start.
def i_need(bits, r)
c = r.count
(1..bits - 1).map { |i| r.shift((c + i) * 1.0 / bits ) } + [r]
end
> i_need(2, [1, 3, 5, 7, 2, 4, 6, 8])
=> [[1, 3, 5, 7], [2, 4, 6, 8]]
> i_need(3, [1, 3, 5, 7, 2, 4, 6, 8])
=> [[1, 3, 5], [7, 2, 4], [6, 8]]
> i_need(5, [1, 3, 5, 7, 2, 4, 6, 8])
=> [[1, 3], [5, 7], [2, 4], [6], [8]]

Resources