Flatten layer incompatible with input - machine-learning

I am trying to run the code
import data_processing as dp
import numpy as np
test_set = dp.read_data("./data2019-12-01.csv")
import tensorflow as tf
import keras
def train_model():
autoencoder = keras.Sequential([
keras.layers.Flatten(input_shape=[400]),
keras.layers.Dense(150,name='bottleneck'),
keras.layers.Dense(400,activation='sigmoid')
])
autoencoder.compile(optimizer='adam',loss='mse')
return autoencoder
trained_model=train_model()
trained_model.load_weights('./weightsfile.h5')
trained_model.evaluate(test_set,test_set)
The test_set in line 3 is of numpy array of shape (3280977,400). I am using keras 2.1.4 and tensorflow 1.5.
However, this puts out the following error
ValueError: Input 0 is incompatible with layer flatten_1: expected min_ndim=3, found ndim=2
How can I solve it? I tried changing the input_shape in flatten layer and also searched on the internet for possible solutions but none of them worked out. Can anyone help me out here? Thanks

After much trial and error, I was able to run the code. This is the code which runs:-
import data_processing as dp
import numpy as np
test_set = np.array(dp.read_data("./datanew.csv"))
print(np.shape(test_set))
import tensorflow as tf
from tensorflow import keras
# import keras
def train_model():
autoencoder = keras.Sequential([
keras.layers.Flatten(input_shape=[400]),
keras.layers.Dense(150,name='bottleneck'),
keras.layers.Dense(400,activation='sigmoid')
])
autoencoder.compile(optimizer='adam',loss='mse')
return autoencoder
trained_model=train_model()
trained_model.load_weights('./weightsfile.h5')
trained_model.evaluate(test_set,test_set)
The change I made is I replaced
import keras
with
from tensorflow import keras
This may work for others also, who are using old versions of tensorflow and keras. I used tensorflow 1.5 and keras 2.1.4 in my code.

Keras and TensorFlow only accept batch input data for prediction.
You must 'simulate' the batch index dimension.
For example, if your data is of shape (M x N), you need to feed at the prediction step a tensor of form (K x M x N), where K is the batch_dimension.
Simulating the batch axis is very easy, you can use numpy to achieve that:
Using: np.expand_dims(axis = 0), for an input tensor of shape M x N, you now have the shape 1 x M x N. This why you get that error, that missing '1' or 'K', the third dimension is that batch_index.

Related

Clustering on 97 features of categorical data

I am trying to apply unsupervised learning on a data with 97 features and around 6500 rows/samples. All features have discrete data (mostly from 1-10) with some being binary (0/1). What are some of the best clustering algorithms to apply on this data. Thank You!
It's impossible to say which clustering algo will perform best on your given dataset. You just have to try several methodologies and inspect the final results that you get. Here are several clustering algos that you can try.
https://github.com/ASH-WICUS/Notebooks/blob/master/Clustering%20Algorithms%20Compared.ipynb
Here is a small sample.
import statsmodels.api as sm
import numpy as np
import pandas as pd
mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data
df_cars = pd.DataFrame(mtcars)
df_cars.head()
from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import KMeans
from matplotlib import pyplot
# define dataset
X = df_cars[['mpg','hp']]
# define the model
model = KMeans(n_clusters=8)
# fit the model
model.fit(X)
# assign a cluster to each example
yhat = model.predict(X)
X['kmeans']=yhat
pyplot.scatter(X['mpg'], X['hp'], c=X['kmeans'], cmap='rainbow', s=50, alpha=0.8)
# plot X & Y coordinates and color by cluster number
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df_cars, x="hp", y="mpg", color="kmeans", size='mpg', hover_data=['kmeans'])
fig.show()

How to know if my data has been scaled by StandardScaler?

"I have scaled my dataset by using Standard Scaler , Now how to know it has been scaled, I am sure it has been scaled but how to see it"
As #Coderji said you can always find out the mean and standard deviation, which should be equal to 0 and 1 respectively.
However, there is another method to visualize it.
from sklearn import datasets
import numpy as np
from sklearn.preprocessing import StandardScaler
I am using iris dataset for this example.
iris = datasets.load_iris()
X = iris.data
sc = StandardScaler()
sc.fit(X)
x = sc.transform(X)
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x[:,1])
See this Output for sepel length
Similarly you can see for all variables or a simple pairplot will do the job.
This gives an idea that the data is standardised visually.

Why are my inception-v3 (in Keras) predictions all wrong?

I am an ML beginner and simply implementing inception-v3 using the ImageNet weights. This is my first run at it. My implementation is in Keras. My predictions are all wrong and I need a little foot up, to see what the problem is. It is actaully pretty difficult to find an example of inception-v3 used from top to bottom using Keras online. Most are tutorials on transfer learning. Here is my code.
import keras as k
from keras.applications.inception_v3 import InceptionV3
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
from keras.preprocessing import image
import cv2
import numpy as np
model = k.applications.inception_v3.InceptionV3(include_top=True, weights='imagenet', input_tensor=None, input_shape=None)
im = 'images/cat.jpg'
cv2.imread(im).shape
(168, 299, 3)
im = cv2.resize(cv2.imread(im), (299, 299)).astype(np.float32)
im = np.expand_dims(im, axis=0)
im.shape
(1, 299, 299, 3)
preds = model.predict(im)
print('Predicted:', decode_predictions(preds
Predicted: [[('n03047690', 'clog', 1.0), ('n01924916', 'flatworm', 7.0789714e-11), ('n03950228', 'pitcher', 2.1705252e-11), ('n02841315', 'binoculars', 4.1622389e-13), ('n06359193', 'web_site', 3.8697981e-16)]]
Could someone suggest how this most basic of implementations is wrong. Perhaps my input shape is incorrect?
The inception-v3 model requires that you run your image through preprocess_input() before predicting.
add:
im=preprocess_input(im)
Also, you should import preprocess_input from keras.applications.inception_v3 and not from keras.applications.imagenet_utils

How to use custom dataset in tensorflow?

I have started learning tensorflow recently. I am trying to input my custom python code as training data. I have generated random exponential signals and want the network to learn from that. This is the code I am using for generating signal-
import matplotlib.pyplot as plt
import random
import numpy as np
lorange= 1
hirange= 10
amplitude= random.uniform(-10,10)
t= 10
random.seed()
tau=random.uniform(lorange,hirange)
x=np.arange(t)
plt.xlabel('t=time")
plt.ylabel('x(t)')
plt.plot(x, amplitude*np.exp(-x/tau))
plt.show()
How can I use this graph as input vector in tensorflow?
You have to use tf.placeholder function (see the doc):
# Your input data
x = np.arange(t)
y = amplitude*np.exp(-x/tau)
# Create a corresponding tensorflow node
x_node = tf.placeholder(tf.float32, shape=(t,))
y_node = tf.placeholder(tf.float32, shape=(t,))
You can then use x_node and y_node in your tensorflow code (for instance use x_node as the input of a neural network and try to predict y_node).
Then when using sess.run() you have to feed the input data x and y with a feed_dict argument:
with tf.Session() as sess:
sess.run([...], feed_dict={x_node: x, y_node: y})

Bagging using random forest classifier in sklearn

I built a random forest and I want to find the out of bag score.But my out of bag score is coming out to be 1.0,but it should be less than 1.My sample size consists of 20000 elements.Here is the python code.Please tell the changes to be done.Here X is a numpy array of datasets and Z contains true labels.
import csv
import numpy as np
from sklearn import preprocessing
from sklearn import cross_validation
from sklearn.ensemble import RandomForestClassifier
with open('C:\Users\Harsh Bhandari\Desktop\letter.csv') as f:
reader = csv.reader(f, delimiter='\t')
data = [(col1, int(col2), int(col3), int(col4),int(col5),int(col6),int(col7),int(col8),int(col9),int(col10),int(col11),int(col12),int(col13),int(col14),int(col15),int(col16),int(col17))
for col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17 in reader]
X=[]
Y=[]
i=0
while i<20000:
t=data[i][1:]
X.append(t)
t=data[i][0]
Y.append(t)
i=1+i
X=np.asarray(X)
Y=np.asarray(Y)
le = preprocessing.LabelEncoder()
Z=le.fit_transform(Y)
clf = RandomForestClassifier(n_estimators=100,oob_score=True)
clf=clf.fit(X,Z)
a=clf.predict(X)
scores=clf.score(X,a)
print scores
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
In score you send the Test Data and its actual labels, here you are passing the predicted labels itself which match the prediction hence you are
getting 1.0 score.
i see a couple things here.
you are doing clf.score(X, a)
but you should be doing clf.score(X, Z)
where Z is the true label for X
the score parameter is defined as such clf.score(X, true_labels_for_X)
you instead put the values that you predicted as y_true which dosen't make sense. since Sklearn will already run predict on X, you don't need to pass a.
Also, you can find the oobscore of by doing
print clf.oob_score_

Resources