How to use custom dataset in tensorflow? - machine-learning

I have started learning tensorflow recently. I am trying to input my custom python code as training data. I have generated random exponential signals and want the network to learn from that. This is the code I am using for generating signal-
import matplotlib.pyplot as plt
import random
import numpy as np
lorange= 1
hirange= 10
amplitude= random.uniform(-10,10)
t= 10
random.seed()
tau=random.uniform(lorange,hirange)
x=np.arange(t)
plt.xlabel('t=time")
plt.ylabel('x(t)')
plt.plot(x, amplitude*np.exp(-x/tau))
plt.show()
How can I use this graph as input vector in tensorflow?

You have to use tf.placeholder function (see the doc):
# Your input data
x = np.arange(t)
y = amplitude*np.exp(-x/tau)
# Create a corresponding tensorflow node
x_node = tf.placeholder(tf.float32, shape=(t,))
y_node = tf.placeholder(tf.float32, shape=(t,))
You can then use x_node and y_node in your tensorflow code (for instance use x_node as the input of a neural network and try to predict y_node).
Then when using sess.run() you have to feed the input data x and y with a feed_dict argument:
with tf.Session() as sess:
sess.run([...], feed_dict={x_node: x, y_node: y})

Related

How to draw ROC curve for a multi-class dataset?

I have a multi-class confusion matrix as below and would like to draw its associated ROC curve for one of its classes (e.g. class 1). I know the "one-VS-all others" theory should be used in this case, but I want to know how exactly we need to change the threshold to obtain different pairs of TP and corresponding FP rates.enter image description here
SkLearn has a handy implementation which calculates the tpr and fpr and another function which generates the auc for you. You can just apply this to your data by treating each class on its own (all other data being negative) by looping through each class. The code below was inspired by the scikit-learn page on this topic itself.
import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
#generating synthetic data
N_classes = 3
N_per_class=100
labels = np.concatenate([[i]*N_per_class for i in range(N_classes)])
preds = np.stack([np.random.uniform(0,1,N_per_class*N_classes) for _ in range(N_classes)]).T
preds /= preds.sum(1,keepdims=True) #approximate softmax
tpr,fpr,roc_auc = ([[]]*N_classes for _ in range(3))
f,ax = plt.subplots()
#generate ROC data
for i in range(N_classes):
fpr[i], tpr[i], _ = roc_curve(labels==i, preds[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
ax.plot(fpr[i],tpr[i])
plt.legend(['Class {:d}'.format(d) for d in range(N_classes)])
plt.xlabel('FPR')
plt.ylabel('TPR')

Flatten layer incompatible with input

I am trying to run the code
import data_processing as dp
import numpy as np
test_set = dp.read_data("./data2019-12-01.csv")
import tensorflow as tf
import keras
def train_model():
autoencoder = keras.Sequential([
keras.layers.Flatten(input_shape=[400]),
keras.layers.Dense(150,name='bottleneck'),
keras.layers.Dense(400,activation='sigmoid')
])
autoencoder.compile(optimizer='adam',loss='mse')
return autoencoder
trained_model=train_model()
trained_model.load_weights('./weightsfile.h5')
trained_model.evaluate(test_set,test_set)
The test_set in line 3 is of numpy array of shape (3280977,400). I am using keras 2.1.4 and tensorflow 1.5.
However, this puts out the following error
ValueError: Input 0 is incompatible with layer flatten_1: expected min_ndim=3, found ndim=2
How can I solve it? I tried changing the input_shape in flatten layer and also searched on the internet for possible solutions but none of them worked out. Can anyone help me out here? Thanks
After much trial and error, I was able to run the code. This is the code which runs:-
import data_processing as dp
import numpy as np
test_set = np.array(dp.read_data("./datanew.csv"))
print(np.shape(test_set))
import tensorflow as tf
from tensorflow import keras
# import keras
def train_model():
autoencoder = keras.Sequential([
keras.layers.Flatten(input_shape=[400]),
keras.layers.Dense(150,name='bottleneck'),
keras.layers.Dense(400,activation='sigmoid')
])
autoencoder.compile(optimizer='adam',loss='mse')
return autoencoder
trained_model=train_model()
trained_model.load_weights('./weightsfile.h5')
trained_model.evaluate(test_set,test_set)
The change I made is I replaced
import keras
with
from tensorflow import keras
This may work for others also, who are using old versions of tensorflow and keras. I used tensorflow 1.5 and keras 2.1.4 in my code.
Keras and TensorFlow only accept batch input data for prediction.
You must 'simulate' the batch index dimension.
For example, if your data is of shape (M x N), you need to feed at the prediction step a tensor of form (K x M x N), where K is the batch_dimension.
Simulating the batch axis is very easy, you can use numpy to achieve that:
Using: np.expand_dims(axis = 0), for an input tensor of shape M x N, you now have the shape 1 x M x N. This why you get that error, that missing '1' or 'K', the third dimension is that batch_index.

How can I get the weights (kernel) of a Dense layer before Model in Keras?

import numpy as np
from keras import backend as K
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Dense, Input
import matplotlib.pyplot as plt
# download the mnist to the path
# X shape (60,000 28x28), y shape (10,000, )
(x_train, _), (x_test, y_test) = mnist.load_data()
# data pre-processing
x_train = x_train.astype('float32') / 255. - 0.5 # minmax_normalized
x_test = x_test.astype('float32') / 255. - 0.5 # minmax_normalized
x_train = x_train.reshape((x_train.shape[0], -1))
x_test = x_test.reshape((x_test.shape[0], -1))
# in order to plot in a 2D figure
encoding_dim = 2
# this is our input placeholder
input_img = Input(shape=(784,))
# encoder layers
encoder = Dense(2, activation='relu')(input_img)
# decoder layers
decoder = Dense(784, activation='relu')(encoder)`
I want to know how can I get the weights (such as the kernel of Dense_2) of a Dense layer before Model in keras?
If i run:autoencoder = Model(input=input_img,output=decoder), then do autoencoder.get_layer('dense_2').kernel, I can get the kernel. However, I want to set the kernel as one of the output. So, I must get the kernel before Model.
I want to get the kernel because it will be set as one part of the loss function, such as loss2=tf.square(kernel' * kernel, axis=-1). So I must get the kernel before running Model.
How can I do that?
Thanks!
I think you mean you need to have one of your middle layers as one of the outputs.
In your case, you can change your model creation in this way:
autoencoder = Model(input=input_img,output=[encoder,decoder])
you can define even different losses for each of these two outputs!

How to prevent simple keras autoencoder from over compressing data?

I am trying to use the keras frontend with tensorflow backend for a simple autoencoder as a multidimensional scaling technique to plot multidimensional data into 2 dimensions. Many times when I run it (not sure how to set random seed for keras btw) one of the dimensions is collapsed to yield a 1 dimensional embedding (the plot should help explain). Why is this happening? How can I make sure the dimensions are preserved and utilized by the autoencoder? I realize this is the most simple and basic form of an autoencoder that I have implemented but I would like to build on this to make better autoencoders for this task.
from sklearn.datasets import load_iris
from sklearn import model_selection
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Load data
X = load_iris().data
Y = pd.get_dummies(load_iris().target).as_matrix()
X_tr, X_te, Y_tr, Y_te = model_selection.train_test_split(X,Y, test_size=0.3, stratify=Y.argmax(axis=1))
dims = X_tr.shape[1]
n_classes = Y_tr.shape[1]
# Autoencoder
encoding_dim = 2
# this is our input placeholder
input_data = tf.keras.Input(shape=(4,))
# "encoded" is the encoded representation of the input
encoded = tf.keras.layers.Dense(encoding_dim,
activation='relu',
)(input_data)
# "decoded" is the lossy reconstruction of the input
decoded = tf.keras.layers.Dense(4, activation='sigmoid')(encoded)
# this model maps an input to its reconstruction
autoencoder = tf.keras.models.Model(input_data, decoded)
# this model maps an input to its encoded representation
encoder = tf.keras.models.Model(input_data, encoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
network_training = autoencoder.fit(X_tr, X_tr,
epochs=100,
batch_size=5,
shuffle=True,
verbose=False,
validation_data=(X_te, X_te))
# Plot data
embeddings = encoder.predict(X_te)
plt.scatter(embeddings[:,0], embeddings[:,1], c=Y_te.argmax(axis=1), edgecolor="black", linewidth=1)
Run algorithm once
Run algorithm again

Bagging using random forest classifier in sklearn

I built a random forest and I want to find the out of bag score.But my out of bag score is coming out to be 1.0,but it should be less than 1.My sample size consists of 20000 elements.Here is the python code.Please tell the changes to be done.Here X is a numpy array of datasets and Z contains true labels.
import csv
import numpy as np
from sklearn import preprocessing
from sklearn import cross_validation
from sklearn.ensemble import RandomForestClassifier
with open('C:\Users\Harsh Bhandari\Desktop\letter.csv') as f:
reader = csv.reader(f, delimiter='\t')
data = [(col1, int(col2), int(col3), int(col4),int(col5),int(col6),int(col7),int(col8),int(col9),int(col10),int(col11),int(col12),int(col13),int(col14),int(col15),int(col16),int(col17))
for col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17 in reader]
X=[]
Y=[]
i=0
while i<20000:
t=data[i][1:]
X.append(t)
t=data[i][0]
Y.append(t)
i=1+i
X=np.asarray(X)
Y=np.asarray(Y)
le = preprocessing.LabelEncoder()
Z=le.fit_transform(Y)
clf = RandomForestClassifier(n_estimators=100,oob_score=True)
clf=clf.fit(X,Z)
a=clf.predict(X)
scores=clf.score(X,a)
print scores
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
In score you send the Test Data and its actual labels, here you are passing the predicted labels itself which match the prediction hence you are
getting 1.0 score.
i see a couple things here.
you are doing clf.score(X, a)
but you should be doing clf.score(X, Z)
where Z is the true label for X
the score parameter is defined as such clf.score(X, true_labels_for_X)
you instead put the values that you predicted as y_true which dosen't make sense. since Sklearn will already run predict on X, you don't need to pass a.
Also, you can find the oobscore of by doing
print clf.oob_score_

Resources