Parameter of OneHotEncoder : Categories

Parameter of OneHotEncoder : Categories - machine-learning

I have been coding on ML via Scikit-learn from few months.
but a update has came on scikit object of preprocessing which is OneHotEncoder.
here was a parameter categorical_features which is now changed to categories and now i am not understanding how to writes is values
The code which I am writing is :
from sklearn.preprocessing import LabelEncoder , OneHotEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])
ohe = OneHotEncoder(categories = X[:,0].all())
X = ohe.fit_transform(X).toarray()
and is showing this error
runcell(0, 'C:/Mobile Videos/OPencv/opencv-master/samples/data/untitled2.py')
Traceback (most recent call last):
File "C:\Mobile Videos\OPencv\opencv-master\samples\data\untitled2.py", line 25, in
X = ohe.fit_transform(X).toarray()
File "C:\Users\Harshit\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py", line 372, in fit_transform
return super().fit_transform(X, y)
File "C:\Users\Harshit\Anaconda3\lib\site-packages\sklearn\base.py", line 571, in fit_transform
*return self.fit(X, **fit_params).transform(X)*
File "C:\Users\Harshit\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py", line 347, in fit
self._fit(X, handle_unknown=self.handle_unknown)
File "C:\Users\Harshit\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py", line 77, in _fit
if len(self.categories) != n_features:
TypeError: object of type 'int' has no len()
And if I am making the parameter auto then it is changing the whole data set into code like while changing the Label to code
Could you please help me out from this problem?????

Related

"LightGBMError: Do not support special JSON characters in feature name"

My 'X' data is a pandas data frame of time-series. I extracted features of X data using Tsfresh and try to apply LightGBM algorithm to classify the data into 0(Bad) and 1(Good). But it shows an error. Columns of my X data are`
Index(['0__ratio_beyond_r_sigma__r_1',
'0__change_quantiles__f_agg_"mean"isabs_True__qh_0.8__ql_0.0',
'0__cwt_coefficients__coeff_1__w_20__widths(2, 5, 10, 20)',
'0__cwt_coefficients__coeff_1__w_10__widths(2, 5, 10, 20)',
'0__change_quantiles__f_agg_"var"_isabs_False__qh_0.8__ql_0.0',
'0__change_quantiles__f_agg"mean"_isabs_True__qh_0.4__ql_0.0',
'0__change_quantiles__f_agg"mean"_isabs_True__qh_0.8__ql_0.6',
'0__change_quantiles__f_agg"mean"_isabs_False__qh_0.4__ql_0.0',
'0__fft_coefficient__attr"real"_coeff_3',
'0__change_quantiles__f_agg"mean"_isabs_True__qh_1.0__ql_0.0',
...
'0__quantile__q_0.4', '0__fft_coefficient__attr"imag"coeff_39',
'0__large_standard_deviation__r_0.2',
'0__cwt_coefficients__coeff_13__w_10__widths(2, 5, 10, 20)',
'0__fourier_entropy__bins_10',
'0__fft_coefficient__attr"angle"_coeff_9',
'0__fft_coefficient__attr"imag"_coeff_17',
'0__fft_coefficient__attr"angle"_coeff_92', '0__maximum',
'0__fft_coefficient__attr"imag"__coeff_32'],
dtype='object', length=225)
My code is
`
import lightgbm as lgb
d_train = lgb.Dataset(X_train, label=y_train)
lgbm_params = {'learning_rate':0.05, 'boosting_type':'dart',
'objective':'binary',
'metric':['auc', 'binary_logloss'],
'num_leaves':100,
'max_depth':10}
clf = lgb.train(lgbm_params, d_train, 50)
y_pred_lgbm=clf.predict(X_test)
for i in range(0, X_test.shape[0]):
if y_pred_lgbm[i]>=.5:
y_pred_lgbm[i]=1
else:
y_pred_lgbm[i]=0
cm_lgbm = confusion_matrix(y_test, y_pred_lgbm)
sns.heatmap(cm_lgbm, annot=True)
`
I tried below code to change my columns but it does not work.
`
import re
X = X.rename(columns = lambda u:re.sub('[^A-Za-z0-9_]+', '', u))
After applying that rename function the columns looks as below
`
Index(['0__ratio_beyond_r_sigma__r_1',
'0__change_quantiles__f_agg_mean__isabs_True__qh_08__ql_00',
'0__cwt_coefficients__coeff_1__w_20__widths_251020',
'0__cwt_coefficients__coeff_1__w_10__widths_251020',
'0__change_quantiles__f_agg_var__isabs_False__qh_08__ql_00',
'0__change_quantiles__f_agg_mean__isabs_True__qh_04__ql_00',
'0__change_quantiles__f_agg_mean__isabs_True__qh_08__ql_06',
'0__change_quantiles__f_agg_mean__isabs_False__qh_04__ql_00',
'0__fft_coefficient__attr_real__coeff_3',
'0__change_quantiles__f_agg_mean__isabs_True__qh_10__ql_00',
...
'0__quantile__q_04', '0__fft_coefficient__attr_imag__coeff_39',
'0__large_standard_deviation__r_02',
'0__cwt_coefficients__coeff_13__w_10__widths_251020',
'0__fourier_entropy__bins_10',
'0__fft_coefficient__attr_angle__coeff_9',
'0__fft_coefficient__attr_imag__coeff_17',
'0__fft_coefficient__attr_angle__coeff_92', '0__maximum',
'0__fft_coefficient__attr_imag__coeff_32'],
dtype='object', length=225)
`
What should I do to get rid of this error?

u cant put like '_' these kind of symbol in column names or the lgb will report this kind of error

How to gather all client weights at server in TFF?

I am trying to implement a custom aggregation using TFF by changing the code from this tutorial . I would like to rewrite next_fn so that all the client weights are placed at the server for further computations. As federated_collect was removed from tff-nightly, I am trying to do that using federated_aggregate.
This is what I have so far:
def accumulate(x, y):
x.append(y)
return x
def merge(x, y):
x.extend(y)
return y
#tff.federated_computation(federated_server_type, federated_dataset_type)
def next_fn(server_state, federated_dataset):
server_weights_at_client = tff.federated_broadcast(
server_state.trainable_weights)
client_deltas = tff.federated_map(
client_update_fn, (federated_dataset, server_weights_at_client))
z = []
agg_result = tff.federated_aggregate(client_deltas, z,
accumulate=tff.tf_computation(accumulate),
merge=tff.tf_computation(merge),
report=tff.tf_computation(lambda x: x))
new_weights = do_smth_with_result(agg_result)
server_state = tff.federated_map(
server_update_fn, (server_state, new_weights))
return server_state
However this results in the following Exception:
File "/home/yana/Documents/Uni/Thesis/grufedatt_try.py", line 351, in <module>
def next_fn(server_state, federated_dataset):
File "/home/yana/anaconda3/envs/fedenv/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/wrappers/computation_wrapper.py", line 494, in __call__
wrapped_func = self._strategy(
File "/home/yana/anaconda3/envs/fedenv/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/wrappers/computation_wrapper.py", line 222, in __call__
result = fn_to_wrap(*args, **kwargs)
File "/home/yana/Documents/Uni/Thesis/grufedatt_try.py", line 358, in next_fn
agg_result = tff.federated_aggregate(client_deltas, z,
File "/home/yana/anaconda3/envs/fedenv/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/federated_context/intrinsics.py", line 140, in federated_aggregate
raise TypeError(
TypeError: Expected parameter `accumulate` to be of type (<<<float32[9999,96],float32[96,1024],float32[256,1024],float32[1024],float32[256,96],float32[96]>>,<float32[9999,96],float32[96,1024],float32[256,1024],float32[1024],float32[256,96],float32[96]>> -> <<float32[9999,96],float32[96,1024],float32[256,1024],float32[1024],float32[256,96],float32[96]>>), but received (<<>,<float32[9999,96],float32[96,1024],float32[256,1024],float32[1024],float32[256,96],float32[96]>> -> <<float32[9999,96],float32[96,1024],float32[256,1024],float32[1024],float32[256,96],float32[96]>>) instead.

Try using tff.aggregators.federated_sample with max_num_samples being equal to the number of clients you have.
That should be a simple drop-in replacement for how you would previously use tff.federated_collect.
In your accumulate, the issue is that you are changing number of tensors the accumulator would contain, so you get an error when accumulating more than a single accumuland. If you would want to go this way though, for a rank-1 accumuland with k elements, you could probably do something like the following instead:
#tff.tf_computation(tff.types.TensorType(tf.float32, [None, k]),
tff.types.TensorType(tf.float32, [k]))
def accumulate(accumulator, accumuland):
return tf.concat([accumulator, tf.expand_dims(accumuland, axis=0)], axis=0)

When using building a federated averaging process - TypeError: Expected a callable.... found Enhanced Model

1 issue at large
I am producing a iterative process via tff.learning.build_federated_averaging_process(). and receive the error:
Traceback (most recent call last):
File "B:\tools and software\Anaconda\envs\bookProjects\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-47998fd56829>", line 1, in <module>
runfile('B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/train_v04.py', args=['--experiment_name=temp', '--client_batch_size=20', '--client_optimizer=sgd', '--client_learning_rate=0.2', '--server_optimizer=sgd', '--server_learning_rate=1.0', '--total_rounds=200', '--rounds_per_eval=1', '--rounds_per_checkpoint=50', '--rounds_per_profile=0', '--root_output_dir=B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/logs/fed_out/'], wdir='B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection')
File "B:\tools and software\PyCharm 2020.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "B:\tools and software\PyCharm 2020.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/train_v04.py", line 306, in <module>
app.run(main)
File "B:\tools and software\Anaconda\envs\bookProjects\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "B:\tools and software\Anaconda\envs\bookProjects\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/train_v04.py", line 299, in main
train_main()
File "B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/train_v04.py", line 262, in train_main
server_optimizer_fn=server_optimizer_fn,
File "B:\tools and software\Anaconda\envs\bookProjects\lib\site-packages\tensorflow_federated\python\learning\federated_averaging.py", line 211, in build_federated_averaging_process
stateful_delta_aggregate_fn, stateful_model_broadcast_fn)
File "B:\tools and software\Anaconda\envs\bookProjects\lib\site-packages\tensorflow_federated\python\learning\framework\optimizer_utils.py", line 498, in build_model_delta_optimizer_process
py_typecheck.check_callable(model_fn)
File "B:\tools and software\Anaconda\envs\bookProjects\lib\site-packages\tensorflow_federated\python\common_libs\py_typecheck.py", line 106, in check_callable
type_string(type(target))))
TypeError: Expected a callable, found non-callable tensorflow_federated.python.learning.model_utils.EnhancedModel.
highlighting:
in build_federated_averaging_process
stateful_delta_aggregate_fn, stateful_model_broadcast_fn)
and
TypeError: Expected a callable, found non-callable tensorflow_federated.python.learning.model_utils.EnhancedModel.
2 have tried
looked at another similar issue here have tried to make
model_fn a collection.abc Callable, model_fn=Callable[[], model_fn]
only creates a new error.
3 some code:
iterative process:
model_fn = model_builder(input_dim=sysarg,
input_spec=input_spec)
iterative_process = tff.learning.build_federated_averaging_process(
model_fn=model_fn,
client_optimizer_fn=client_optimizer_fn,
server_optimizer_fn=server_optimizer_fn,
)
iterative_process = compression_process_adapter.CompressionProcessAdapter(iterative_process)```
model bulder:
def model_builder(input_dim, input_spec):
model = create_model(input_dim)
return tff.learning.from_keras_model(keras_model=model,
loss=tf.keras.losses.MeanSquaredError(),
input_spec=input_spec,
metrics=[tf.keras.metrics.Accuracy()],
)
create model (for good measure)
def create_model(input_dim):
autoencoder = Sequential([
tf.keras.layers.Dense(int(0.75 * input_dim), activation="tanh", input_shape=(input_dim,)),
tf.keras.layers.Dense(int(0.5 * input_dim), activation="tanh"),
tf.keras.layers.Dense(int(0.33 * input_dim), activation="tanh"),
tf.keras.layers.Dense(int(0.25 * input_dim), activation="tanh"),
tf.keras.layers.Dense(int(0.33 * input_dim), activation="tanh"),
tf.keras.layers.Dense(int(0.5 * input_dim), activation="tanh"),
tf.keras.layers.Dense(int(0.75 * input_dim), activation="tanh"),
tf.keras.layers.Dense(input_dim)
])

The model_fn argument of tff.learning.build_federated_averaging_process needs to be a callable (What is a callable?) that takes no arguments and returns a tff.learning.Model.
From the code (reproduced here for readability):
def model_builder(input_dim, input_spec):
model = create_model(input_dim)
return tff.learning.from_keras_model(
keras_model=model,
loss=tf.keras.losses.MeanSquaredError(),
input_spec=input_spec,
metrics=[tf.keras.metrics.Accuracy()])
model_fn = model_builder(input_dim=sysarg, input_spec=input_spec)
iterative_process = tff.learning.build_federated_averaging_process(
model_fn=model_fn,
client_optimizer_fn=client_optimizer_fn,
server_optimizer_fn=server_optimizer_fn)
model_fn is actually an instance of tff.learning.Model, rather than a callable that constructs and returns a model.
model_builder should be passed to tff.learning.build_federated_averaging_process, but since it takes arguments this won't work as-is.
One alternative would be to use functools.partial:
iterative_process = tff.learning.build_federated_averaging_process(
model_fn=functools.partial(model_builder, input_dim=sysarg, input_spec=input_spec),
client_optimizer_fn=client_optimizer_fn,
server_optimizer_fn=server_optimizer_fn)
Or even use a no-arg lambda:
iterative_process = tff.learning.build_federated_averaging_process(
model_fn=lambda: model_builder(input_dim=sysarg, input_spec=input_spec),
client_optimizer_fn=client_optimizer_fn,
server_optimizer_fn=server_optimizer_fn)
The following question has a good discussion on the differences between lambdas and partials: Python: Why is functools.partial necessary?

Keras predict function throws AttributeError: 'tuple' object has no attribute 'ndim'

I am following this link to train rnn classifier on small dataset to check if the code is working.
While running command
rnn.predict(data_test, 'answer.csv'), throws exception:
AttributeError: 'tuple' object has no attribute 'ndim'
Here is the predict function
def predict(self, data_test, answer_filename):
word_matrix, char_matrix, additional_features_matrix = data_test
print("Test example: ")
print(word_matrix[0])
print(char_matrix[0])
print(additional_features_matrix[0])
preds = self.model.predict([word_matrix, char_matrix, additional_features_matrix],
batch_size=self.batch_size, verbose=1)
index_to_author = { 0: "EAP", 1: "HPL", 2: "MWS" }
submission = pd.DataFrame({"id": test["id"], index_to_author[0]: preds[:, 0],
index_to_author[1]: preds[:, 1], index_to_author[2]: preds[:, 2]})
submission.to_csv(answer_filename, index=False)
The word_matrix, char_matrix, additional_features_matrix are of variable length. In my case, the dimensions are (80,), (80, 30) and (1153, 15) respectively. I google it and found that I should add padding to the input numpy array.
But, the code in the link worked fine. I am not able to understand what am I doing wrong. Can somebody help me with this?

I found out my own mistake. If you follow this link then you will find the following line of code:
_, additional_features_matrix_test = collect_additional_features(x.iloc[idx_train], x_test)
The function collect_additional_features returns a tuple of two ndarrays. My mistake was that I missed _ and hence the line of code became:
additional_features_matrix_test = collect_additional_features(x.iloc[idx_train], x_test)
Thus the additional_features_matrix_test became a tuple instead of an ndarray and while passing the additional_features_matrix_test to the LSTM it threw the error AttributeError: 'tuple' object has no attribute 'ndim'

Theano: expected 2, got 1 with shape (128,).')

Im trying to make a neural network. I have followed the video from
https://www.youtube.com/watch?v=S75EdAcXHKk
I have loaded the adult.data training set.
I am now on my way of training and i have these lines where the code fails.
while(epocs<5):
i = 0
for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):
print(trX.shape)
tr = trX[start:end]
print(tr.shape[0])
print(tr.shape[1])
self.cost = train(tr.reshape(tr.shape[0],tr.shape[1]), trY[start:end])
epocs+=1
I am strugling with an error message which is:
n.training()
File "C:\Users\Bjornars\PycharmProjects\cogs-118a\Project\NN\Network.py", line 101, in training
self.cost = train(tr.reshape(128,106), trY[start:end])
File "C:\Anaconda3\lib\site-packages\theano\compile\function_module.py", line 513, in call
allow_downcast=s.allow_downcast)
File "C:\Anaconda3\lib\site-packages\theano\tensor\type.py", line 169, in filter
data.shape))
TypeError: ('Bad input argument to theano function with name "C:\Users\Bjornars\PycharmProjects\cogs-118a\Project\NN\Network.py:84" at index 1(0-based)', 'Wrong number of dimensions: expected 2, got 1 with shape (128,).')
The shape of the array im sending in is (5000,106)
---Solved----

Used this, it expected array not number in trY
def preprocess(self,trDmatrix,labels):
for i in range(len(trDmatrix)):
numbers = [0.0]*2
numbers[int(labels[i])]= 1.0
labels[i] = numbers
return trDmatrix, labels

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Parameter of OneHotEncoder : Categories - machine-learning

Related

"LightGBMError: Do not support special JSON characters in feature name"

How to gather all client weights at server in TFF?

When using building a federated averaging process - TypeError: Expected a callable.... found Enhanced Model

Keras predict function throws AttributeError: 'tuple' object has no attribute 'ndim'

Theano: expected 2, got 1 with shape (128,).')

Categories

Resources