Which data visualization techniques to use to analyse data while solving a classification problem? - machine-learning

I am solving a classification problem and I cannot find a good visualization method to analyse my data. Usually while dealing with prediction problems I use barplot, distplot, scatterplot, linegraph,etc. I want to know some common data visualization techniques for classification problems.

Hi guys I figured that countplot is the the equivalent of histogram https://seaborn.pydata.org/generated/seaborn.countplot.html
Example of countplot
Example of catplot
Update : catplot is actually the combination of FacetGrid and countplot.
So if you want to do something simple then countplot will do the work for you but if you want grids then use catplot.

Related

Combining different feature vectors for, SVM training for MRI classification

I've been currently working on my FYP on Brain tumor classification.Extracted features using wavelet transform ,glcm ,polynomial transform etc.
IS IT RIGHT TO APPEND THESE FEATURE VECTORS (columnwise) for training? like combinations of these feature vectors eg: glcm+wavelet
Can you suggest me any papers related to this?
THANK YOU FOR THE HELP
Yes, this method is known as early fusion.
In other words, early fusion is when you are concatenating 2 or more features sets prior to model training.
There are a number of other methods for feature fusion, including model-, and late-fusion.
Take a look at these papers which might help you:
Specific to a health-based application
figure which might help you to grasp the concept

How can re-train my logistic model using pymc3?

I have a binary classification problem where I have around 15 features. I have chosen these features using some other model. Now I want to perform Bayesian Logistic on these features. My target classes are highly imbalance(minority class is 0.001%) and I have around 6 million records. I want to build a model which can be trained nighty or weekend using Bayesian logistic.
Currently, I have divided the data into 15 parts and then I train my model on the first part and test on the last part then I am updating my priors using Interpolated method of pymc3 and rerun the model using the 2nd set of data. I am checking the accuracy and other metrics(ROC, f1-score) after each run.
Problems:
My score is not improving.
Am I using the right approch?
This process is taking too much time.
If someone can guide me with the right approach and code snippets it will be very helpful for me.
You can use variational inference. It is faster than sampling and produces almost similar results. pymc3 itself provides methods for VI, you can explore that.
I only know this part of question. If you can elaborate your problem a bit further, maybe.. I can help you.

Which deep learning model to use for capturing minor features in a image?

I have class which has slightly different features from the other class:
ex - This image has buckle in it (consider it as a class) https://6c819239693cc4960b69-cc9b957bf963b53239339d3141093094.ssl.cf3.rackcdn.com/1000006329245-822018-Black-Black-1000006329245-822018_01-345.jpg
But This image is quite similar to it but has no buckle :
https://sc01.alicdn.com/kf/HTB1ASpYSVXXXXbdXpXXq6xXFXXXR/latest-modern-classic-chappal-slippers-for-men.jpg
I am little confused about which model to use in these kind of cases which actually learns pixel to pixel values.
Any thoughts will be appreciable.
thanks !!
I have already tried Inception,Resnet etc models.
With a less volume train data (300-400 around each class) can we reach a good recall/precision/F1 score.
You might want to look into transfer learning due to the small dataset, what you can do is use a transferred ResNet model to work as a feature extractor and try a YOLO(You only look once) algorithm on it, look through each window(Look Sliding window implementation using ConvNets) to obtain a belt buckle and based on that you can classify the image.
Based on my understanding of your dataset, to do the above approach though you will need to re-annotate your dataset as per the requirements of YOLO algorithm.
To look at an example of the above approach, visit https://mc.ai/implementing-yolo-using-resnet-as-feature-extractor/
Edit If you have XML annotated Dataset and need to convert it to csv to follow the above example use https://github.com/datitran/raccoon_dataset
Happy modelling.

OpenCV: what is the difference between these 2 haar cascade data sets?

I have seen there are 2 different Haar Cascade datasets in OpenCV. For an example, take a look at haarcascade_upperbody.xml and haarcascade_mcs_upperbody.xml. what is this new mcs thing? The only difference I can monitor is that haarcascade_mcs_upperbody.xml is providing a way better results than the other one.
So, can someone please explain me the difference between these 2 types? When training my own datasets, how can I select between these 2?
I think this web site have the answer : OpenCV
The diference is eepending on there train data, so that, if you want to select a suit classifier, I prefer you try both two to find a better result.

Unsupervised classification methods available

I'm doing a research which involves "unsupervised classification".
Basically I have a trainSet and I want to cluster data in X number of classes in unsupervised way. Idea is similar to what k-means does.
Let's say
Step1)
featureSet is a [1057x10] matrice and I want to cluster them into 88 clusters.
Step2)
Use previously calculated classes to compute how does the testData is classified
Question
-Is it possible to do it with SVM or N-N ? Anything else ?
-Any other recommendations ?
There are many clustering algorithms out there, and the web is awash with information on them and sample implementations. A good starting point is the Wikipedia entry on cluster analysis Cluster_analysis.
As you have a working k-means implementation, you could try one of the many variants to see if they yeild better results (k-means++ perhaps, seeing as you mentioned SVM). If you want a completely different approach, have a look at Kohonen Maps - also called Self Organising Feature Maps. If that looks too tricky, a simple hierarchical clustering would be easy to implement (find the nearest two items, combine, rinse and repeat).
This sounds like a classic clustering problem. Neither SVMs or neural networks are going to be able to directly solve this problem. You can use either approach for dimensionality reduction, for example to embed your 10-dimensional data in two-dimensional space, but they will not put the data into clusters for you.
There are a huge number of clustering algorithms besides k-means. If you wanted a contrasting approach, you might want to try an agglomerative clustering algorithm. I don't know what kind of computing environment you are using, but I quite like R and this (very) short guide on clustering.

Resources