Converting Ragged Tensor to List - machine-learning

So I was trying to convert ragged tensor to list. The ragged tensor is returned as a result of tokenization of texts using sub_word Tokenizer.
I am trying to convert this ragged tensor to list using the following code:
token = tokenizer.pt.tokenize(text) # returns ragged tensor
token_as_list = token.to_list()
But this results in error :
ValueError: to_list can only be used in eager mode.
I checked if i was running in eager environment using tf.executing_eagerly() which return True
Are there any other methods to convert ragged tensor to list.
I want to do it because i want to increase and pad the last dimension for eg:
tensor = (64,120) # (batch_size,seq_len)
# convert this to (64,128) by padding it

Related

How to normalize data which contain positive and negative numbers into 0 and 1 manually (without sklearn.preprocessing.MinMaxScaler package )?

I want to normalize data without using package. I use minmax scaler based on formula. but when i want to normalize data i get error like below.
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Normalization code :
minInsole = min(SmartInsole)
maxInsole = max(SmartInsole)
norm_data = (SmartInsole - minInsole) / ( maxInsole - minInsole )
Data Shape:

Sklearn: Found input variables with inconsistent numbers of samples:

I have built a model.
est1_pre = ColumnTransformer([('catONEHOT', OneHotEncoder(dtype='int',handle_unknown='ignore'),['Var1'])],remainder='drop')
est2_pre = ColumnTransformer([('BOW', TfidfVectorizer(ngram_range=(1, 3),max_features=1000),['Var2'])],remainder='drop')
m1= Pipeline([('FeaturePreprocessing', est1_pre),
('clf',alternative)])
m2= Pipeline([('FeaturePreprocessing', est2_pre),
('clf',alternative)])
model_combo = StackingClassifier(
estimators=[('cate',m1),('text',m2)],
final_estimator=RandomForestClassifier(n_estimators=10,
random_state=42)
)
I can successfully, fit and predict using m1 and m2.
However, when I look at the combination model_combo
Any attempt in calling .fit/.predict results in ValueError: Found input variables with inconsistent numbers of samples:
model_fitted=model_combo.fit(x_train,y_train)
x_train contains Var1 and Var2
How to fit model_combo?
The problem is that sklearn text preprocessors (TfidfVectorizer in this case) operate on one-dimensional data, not two-dimensional as most other preprocessors. So the vectorizer treats its input as an iterable of its columns, so there's only one "document". This can be fixed in the ColumnTransformer by specifying the column to operate on not in a list:
est2_pre = ColumnTransformer([('BOW', TfidfVectorizer(ngram_range=(1, 3),max_features=1000),'Var2')],remainder='drop')

How can one add batch mechanism to the input function in Tensorflow tutorial overcoming tf.Sparsetensor objects?

How can one add batch mechanism to the input_fn in the TensorFlow Wide & Deep Learning Tutorial overcoming that some features are represented as tf.Sparsetensor objects?
I have made many attempts around adding tf.train.slice_input_producer and tf.train.batchto the original code (see below), but have failed miserably so far.
I would like to keep the global working of that input_fn as it is handy while while training and evaluating the model.
Can someone help, please?
def input_fn(df):
# Creates a dictionary mapping from each continuous feature column name (k) to
# the values of that column stored in a constant Tensor.
continuous_cols = {k: tf.constant(df[k].values)
for k in CONTINUOUS_COLUMNS}
# Creates a dictionary mapping from each categorical feature column name (k)
# to the values of that column stored in a tf.SparseTensor.
categorical_cols = {k: tf.SparseTensor(indices=[[i, 0] for i in range(df[k].size)],
values=df[k].values,
shape=[df[k].size, 1]) for k in CATEGORICAL_COLUMNS}
# Merges the two dictionaries into one.
feature_cols = dict(continuous_cols.items() + categorical_cols.items())
# Converts the label column into a constant Tensor.
labels = tf.constant(df[LABEL_COLUMN].values)
'''
Changes from here:
'''
features_slices, features_slices = tf.train.slice_input_producer([features_cols, labels], ...)
features_batches, labels_batches = tf.train.batch([features_slices, features_slices], ...)
# Returns the feature and label batches.
return features_batches, labels_batches

How do I create a dataset with multiple images the same format as CIFAR10?

I have images 1750*1750 and I would like to label them and put them into a file in the same format as CIFAR10. I have seen a similar answer before that gave an answer:
label = [3]
im = Image.open(img)
im = (np.array(im))
print(im)
r = im[:,:,0].flatten()
g = im[:,:,1].flatten()
b = im[:,:,2].flatten()
array = np.array(list(label) + list(r) + list(g) + list(b), np.uint8)
array.tofile("info.bin")
but it doesn't include how to add multiple images in a single file. I have looked at CIFAR10 and tried to append the arrays in the same way, but all I got was the following error:
E tensorflow/core/client/tensor_c_api.cc:485] Read less bytes than requested
Note that I am using Tensorflow to do my computations, and I have been able to isolate the problem from the data.
The CIFAR-10 binary format represents each example as a fixed-length record with the following format:
1-byte label.
1 byte per pixel for the red channel of the image.
1 byte per pixel for the green channel of the image.
1 byte per pixel for the blue channel of the image.
Assuming you have a list of image filenames called images, and a list of integers (less than 256) called labels corresponding to their labels, the following code would write a single file containing these images in CIFAR-10 format:
with open(output_filename, "wb") as f:
for label, img in zip(labels, images):
label = np.array(label, dtype=np.uint8)
f.write(label.tostring()) # Write label.
im = np.array(Image.open(img), dtype=np.uint8)
f.write(im[:, :, 0].tostring()) # Write red channel.
f.write(im[:, :, 1].tostring()) # Write green channel.
f.write(im[:, :, 2].tostring()) # Write blue channel.

Convert indices in longtensor format to binary selection mask in torch

I have a LongTensor which contains all the indices I want from another tensor. How can I convert this LongTensor into a ByteTensor that can be used as a selection mask.
Assume,
th> imageLabels:size()
17549
3
[torch.LongStorage of size 2]
[0.0001s]
th> indices
1
22
32
[torch.LongTensor of size 3]
I need a way to access imageLabels using [index] notation so that I can change some values in imageLabels in-place.
Is there any way to do this? As far as I understood from the docs, :index, :narrow operations return a completely new Tensor.
Correct, :index, narrow return a new tensor, the new tensor uses the same original storage as stated in the doc here: "For methods narrow, select and sub the returned tensor shares the same Storage as the original"
I ended up using indexFill.
targetTensor:indexFill(1, indices, 0)
the first argument is the dimension,
indices is the LongTensor containing all the indices we are interested in
0 is the value to fill. Can be any number
Hope this helps. Its all in the docs. We have to read it patiently.

Resources