Nudity Detection Algorithm
normalization
Zoning
Feature Extraction
Classification using SVM
Following methods are use
1.Normalization: First image is converted into .jpg format and size of 256X256.Then it is converted into YCbCr color space,for this i use OpenCV python.Here is the code.
2.Zoning: Normalized Images are then divided into three zones.It is because assumption is "Nudity of images are found mostly in Central zone".
3.Feature Extraction: In this module Image is in YCbCr,Skin pixels are filtered by thresholding in range (0,133,77),(255,173,127) and divided into three zones.and then for each zones features are calculated 2 color features(number of connected skin pixels and proportion of skin pixel to total pixel) and 2 texture features(Homogeneity and correlation).texture features are calculated using glcm (skimage.features module).here is the code
import os
import numpy as np
import cv2
from cv2 import cv
import skimage.feature as sf
total_pixels=256.0*256.0
class normalize:
def __init__(self,src,dst):
self.src=src
self.dst=dst+"_1.jpg"
def resize(self):
x,y=256,256
src=cv2.imread(self.src,1)
src=cv2.resize(src,(x,y))
cv2.imwrite(self.dst,src)
dst=cv2.imread(self.dst,1)
return dst
"""Segmentation module is used to segment out skin pixels in YCrCb color space"""
def segmentation(src):
img=src.copy()
img=cv2.cvtColor(src,cv.CV_BGR2YCrCb)
dst=cv2.inRange(img,(0,133,77),(255,173,127))
return dst
"""Image Zoning and feature extraction module"""
class features:
def __init__(self,src):
self.zone1=src
self.zone2=src[30:226,30:226]
self.zone3=src[60:196,60:196]
def createglcm(self,zone):
return sf.greycomatrix(zone,[1],[0,np.pi/4,np.pi/2,-np.pi/2,-np.pi/4,np.pi*25/12],normed=True)
def getCorrelation(self,glcm):
return sf.greycoprops(glcm,'correlation')
def getHomogeneity(self,glcm):
return sf.greycoprops(glcm,'homogeneity')
def getcolorfeatures(self,zone):
contours, hierarchy = cv2.findContours(zone,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
skin_pixel_connected=0
for i in range(len(contours)):
skin_pixel_connected=skin_pixel_connected+cv2.contourArea(contours[i])
return [skin_pixel_connected,skin_pixel_connected/total_pixels]
Now i have retrieved a list of various features as given in code. How to make feature vector for svm from python lists.How to use SVM for training by using nude and non-nude image(i have 5000 images) and then for detection.? Can any body suggest me.
You create a SVM the objekt.
Train your SVM, using the fit method with your training picture.
Use the predict method to predict on your test/data.
code:
from sklearn import svm
clf = svm.SVC()
clf.fit(X,y)
clf.predict(X_test)
For the feature Vector X simply merge your features into one np array for every train data.
After cross-validation strategy,C=100.00 and gamma=0.07
this is what my code looks:
from sklearn.svm import SVC
classifier=SVC(kernel='rbf',C=100.0,gamma=0.07,cache_size=800)
classifier.fit(np.array(featurespace),np.array(classes))
classifier.predict(X_test)
Related
I am trying to understand whether I can Visualize a 4-dimensional graph by breaking it down into smaller dimensions.
For example when we have a 2-d plane as a prediction for a 3-d graph, We can just chose a 2-d graph that shows our prediction as a line. Can I do the same for a 4-d graph? If yes then how?
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
data = pd.read_csv('housing.csv')
data = data[:50] #taking just 50 rows from the excel file
model = linear_model.LinearRegression() #loading the model from the library
model.fit(data[['median_income','total_rooms','households']],data.median_house_value)
# Pls add code here for visualizations
Actually you can do one funny thing - since your object is a function from R^3->R, you could, in principle, take your input space as a 3d cube (I am guessing your data is somewhat bounded), and then use colour to code your prediction. This way you will get a 3d coloured point cloud. You will probably need transparency to see through it + some interactive investigation to rotate/move around, but 4d is the highest "visualisable" dimension (as long as one dimension is "special" and thus can be coded as a colour).
In the study of machine learning and pattern recognition, we know that if a sample i has two dimensional feature like (length, weight), both of length and weight belongs to Gaussian distribution, so we can use a multivariate Gaussian distribution to describe it
it's just a 3d plot looks like this :
where z axis is the possibility ,
but what if this sample i has three dimensional features, x1, x2 , x3 ....xn or even more, how do we correctly plot it using one plot???
You can use dimensionality reduction methods to visualize higher dimensional data.
https://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html#sphx-glr-auto-examples-manifold-plot-compare-methods-py
convert D dimensional data into 2 or 3 dimensional data
plot the transformed data points on 2 or 3 data points depending upon the dimension to which the data was reduced.
Lets consider an example. Take 10th dimensional Gaussian
import matplotlib.pyplot as plt
import numpy as np
DIMENSION = 10
mean = np.zeros((DIMENSION,))
cov = np.eye(DIMENSION)
X = np.random.multivariate_normal(mean, cov, 5000)
Then perform dimensionality reduction (I used PCA, you can choose any other method depending upon the prior knowledge of effectiveness of the algorithm for a particular type of data)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA
X_2d = PCA(n_components=2).fit_transform(X)
X_3d = PCA(n_components=3).fit_transform(X)
Then Plot them
fig = plt.figure(figsize=(12,4))
ax = fig.add_subplot(121, projection='3d')
ax.scatter(X_3d[:,0],X_3d[:,1],X_3d[:,2])
plt.title('3D')
fig.add_subplot(122)
plt.scatter(X_2d[:,0], X_2d[:,1])
plt.title('2D')
You can play with other algos as well. Each offers different kind of advantage.
I hope this answers your question.
Note: In higher dimension, phenomenon like "curse of dimensionality" also comes into play. so accurate projection in lower dimensional may not be possible. Something like why Greenland appears to be of similar size to that of Africa on cartographic map.
currently I am working on flower Classification dataset of kaggle which has only 210 images, with this set of image I am getting accuracy of only 11% on validation set.
enter code here
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cv2
#from tqdm import tqdm
import os
import warnings
warnings.filterwarnings('ignore')
flower_img = r'C:\Users\asus\Downloads\flower_images\flower_images'
data = pd.read_csv(r'C:\Users\asus\Downloads\flower_images\flower_labels.csv')
img = os.listdir(flower_img)[1]
image_name = [img.split('.')[-2] for img in os.listdir(flower_img)]
label_array = np.array(data['label'])
label_unique = np.unique(label_array)
names = [' phlox','rose','calendula','iris','leucanthemum maximum','bellflower','viola','rudbeckia laciniata','peony','aquilegia']
Flower_names = {}
for i in range(10):
Flower_names[i] = names[i]
print(Flower_names)
Flower_names.get(8)
x = data['label'][2]
Flower_names.get(x)
i=0
for img in os.listdir(flower_img):
#print(img)
path = os.path.join(flower_img,img)
#img = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
img = cv2.imread(path)
#print(img.shape)
img = cv2.resize(img,(128,128))
data['file'][i] = np.array(img)
i+=1
data['file'][0].shape
plt.imshow(data['file'][0])
plt.show()
import keras
from keras.models import Sequential
from keras.layers import Dense,Conv2D,Activation,MaxPool2D,Dropout,Flatten
model = Sequential()
model.add(Conv2D(32,kernel_size=3,activation='relu',input_shape=(128,128,3)))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(64,kernel_size=3,activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(128,kernel_size=3,activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
#model.add(Conv2D(512,kernel_size=3,activation='relu'))
#model.add(MaxPool2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(512,activation='relu'))
model.add(Dense(10,activation='softmax'))
model.add(Dropout(0.25))
from keras.optimizers import Adam
model.compile(loss='categorical_crossentropy',optimizer=Adam(lr=0.002),metrics=['accuracy'])
model.summary()
x = np.array([i for i in data['file']]).reshape(-1,128,128,3)
y = np.array([i for i in data['label']])
from keras.utils import to_categorical
y = to_categorical(y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y)
model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=10)
model.evaluate(x_test,y_test)
model.evaluate(x_train,y_train)
how can I increase accuracy only using this dataset also how can I predict classes for any input image.
Link of Flower color images dataset : https://www.kaggle.com/olgabelitskaya/flower-color-images
Your dataset size is very small. Convolutional neural networks are optimal when trained using very large data sets. You really want to have thousands of images (or more!) in your data set.
You can try to enhance your current data set by using various image processing techniques to increase the size of the data set. These techniques will take the original images, skew them, rotate them and do other modification to bolster the size of the training data. These techniques can be helpful, but increasing the natural size of the data set is preferred.
If you cannot increase the size of the dataset, you should examine why you need to use a CNN. There are other algorithms that may give better results when trained with a smaller data set. Take a look at Support Vector Machines or k-NN.
If you must use a CNN, Transfer Learning is a good solution. You can use the features from a trained model and apply them to your problem. I have had great success with this approach.
The things you can do:
Progressive resizing link
Image augmentation link
Transfer learning link
To be honest, there are much and much more techniques could be utilized to enhance the effectiveness of used data. Try to search about this topic. These ones are the ones that I remember in a minute. These ones that I've given link are just major example ones. You can dig better with a dedicated research.
I'm a beginner and making a linear regression model, when I make predictions on the basis of test sets, it works fine. But when I try to predict something for a specific value. It gives an error. The tutorial I'm watching, they don't have any errors.
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Fitting Linear Regression to the dataset
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
# Visualising the Linear Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg.predict(X), color = 'blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
# Predicting a new result with Linear Regression
lin_reg.predict(6.5)
ValueError: Expected 2D array, got scalar array instead:
array=6.5.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
According to the Scikit-learn documentation, the input array should have shape (n_samples, n_features). As such, if you want a single example with a single value, you should expect the shape of your input to be (1,1).
This can be done by doing:
import numpy as np
test_X = np.array(6.5).reshape(-1, 1)
lin_reg.predict(test_X)
You can check the shape by doing:
test_X.shape
The reason for this is because the input can have many samples (i.e. you want to predict for multiple data points at once), or/and each sample can have many features.
Note: Numpy is a Python library to support large arrays and matrices. When scikit-learn is installed, Numpy should be installed as well.
I'm trying to solve a binary classification task. The training data set contains 9 features and after my feature engineering I ended having 14 features. I want to use a stacking classifier approach with
mlxtend.classifier.StackingClassifier by using 4 different classifiers, but when trying to predict the test datata set I got the error: ValueError: query data dimension must match training data dimension
%%time
models=[KNeighborsClassifier(weights='distance'),
GaussianNB(),SGDClassifier(loss='hinge'),XGBClassifier()]
calibrated_models=Calibrated_classifier(models,return_names=False)
meta=LogisticRegression()
stacker=StackingCVClassifier(classifiers=calibrated_models,meta_classifier=meta,use_probas=True).fit(X.values,y.values)
Remark: In my code I just programmed a function to return a list with calibrated classifiers StackingCVClassifier I have checked this is not causing the error
Remark 2: I had already tried to perform a stacker from scratch with the same results so I had thought It was something wrong with my own stacker
from sklearn.linear_model import LogisticRegression
def StackingClassifier(X,y,models,stacker=LogisticRegression(),return_data=True):
names,ls=[],[]
predictions=pd.DataFrame()
for model in models:
names.append(str(model)[:str(model).find('(')])
for i,model in enumerate(models):
model.fit(X,y)
ls=model.predict_proba(X)[:,1]
predictions[names[i]]=ls
if return_data:
return predictions
else:
return stacker.fit(predictions,y)
Could you please help me to understand the correct usage of a stacking classifiers?
EDIT:
This is my code for calibrated classifier. This function takes a list of n classifiers and apply sklearn fucntion CalibratedClassifierCV to each one and returns a list with n calibrated classifiers. You have an option to return as a zip list since this function is mainly intended to be used along with sklearn's VotingClassifier
def Calibrated_classifier(models,method='sigmoid',return_names=True):
calibrated,names=[],[]
for model in models:
names.append(str(model)[:str(model).find('(')])
for model in models:
clf=CalibratedClassifierCV(base_estimator=model,method=method)
calibrated.append(clf)
if return_names:
return zip(names,calibrated)
else:
return calibrated
I have tried your code with Iris dataset. It is working fine, I think the problem is with the dimension of your test data and not with the calibration.
from sklearn.linear_model import LogisticRegression
from mlxtend.classifier import StackingCVClassifier
from sklearn import datasets
X, y = datasets.load_iris(return_X_y=True)
models=[KNeighborsClassifier(weights='distance'),
SGDClassifier(loss='hinge')]
calibrated_models=Calibrated_classifier(models,return_names=False)
meta=LogisticRegression( multi_class='ovr')
stacker = StackingCVClassifier(classifiers=calibrated_models,
meta_classifier=meta,use_probas=True,cv=3).fit(X,y)
Prediction
stacker.predict([X[0]])
#array([0])