I am working with a very large time series dataset. I have clustered the dataset into 12 clusters. Cluster 0 comprises 39,301 instances of the 56-attribute dataset. I have the summary report using WEKA which shows the mean and std of each of the 56 attributes for the cluster. So, how can I generate or visualize this dataset/cluster of 56 x 39,301 matrix, using WEKA or Python.
Thank you.
Related
I am currently working on this weather dataset.
The dataset contains 5 columns and 1632 rows. I have tried out SVM models which outputs a single Y (Temperature or Pressure or Humidity) value and have trained 3 different models for each weather parameter.
But is there any way with which I can obtain values for Temperature,Humidity and Pressure as outputs (with Place and Month being inputs) using one single prediction model? Is it possible to obtain these values as a single output vector consisting of 3 values?
I've just started with Machine Learning so I'm not really familiar with the advanced concepts. I tried searching for possible solutions online but couldn't find much. Any help and suggestions are welcome!
I'm creating a character recognition software in Python using scikit-learn. I have a large dataset of images labelled [A-Za-z]. I'm using linear SVM. Training the model using all the samples with 52 different labels is very, very slow.
If I divide my training dataset in 13 sections such that each section has images of only 4 characters, and no image can be a part of more than 1 section, and then train 13 different models.
How can I combine those models together to create a more accurate model? OR if I perform classification of test set on all 13 models and compare individual sample's result on basis of confidence score (selecting the one with with highest score), will it affect the accuracy of the overall model?
It seems what you need to to is some kind of Order Reduction of data.
After the order reduction classify data into in 13 large group and then do a final classification tool.
I would look into Linear Discriminant Analysis for the first step I mentioned.
I am working on a seizure prediction research, and I am using Weka to train my data vectors on.
If I have 10 seizures, each seizure is represented by 5 vectors, which makes a total of 50 vectors corresponding to 10 seizures. However, Weka is treating these vectors as totally independent even though each 5 vectors corresponds to only one seizure.
So how can I let Weka take this into account when performing the Learning?
I am using weka for traffic classification. I have an .arff dataset that contains different columns and rows. Each row is an instance where each column is a feature. Is there any software that can visualize my Dataset for more than two features?
I have noticed that weka can visualize two dataset,However I need to visualize up to 8 features.
Thanks in advance.
You can check out the so called Parallel Coordinates which can visualize any number of features. There are many existing implementations, some of which are avaliable from prof. Inselberg page
My problem initial features are x , y ,theta that normalized in range[0,255].
For each object number of features is variable.
Clustering is applied so each cluster has number of features & each object belongs to multiple clusters.
In the predict stage ,compute clusters for each object from initial features(new features).
Each object belongs to a maximum of 10 clusters.
Total number of clusters is 4000.
If we consider new features constant for each object we have 4000 dimension that
it very large for classify.Only 10 features may be useful and my features is sparse.
My question :
Is there any way that we can classify these sparse features with best performance & which classifier is useful for it?
Note:I use locality sensitive hashing for classify new features with 4000 dimension that is very slow.
I used the principal component analysis for reduction of dimension of features to 10 dim then used the SVM for classification of new features & solved my problem.