How can i get the error in a two way anova analysis in sigmaplot? - sigmaplot

I'm trying to find the error in a two way anova analysis in sigmaplot. However, my analysis only gives me the residual value. How can i get the error?
Thank you all!

Related

Machine learning with handling features which are suppose to have missing data

I am currently working in a project for my MSc and I am having this issue with that dataset. I don't have previous experience in machine learning and this is my first exposure.
In my dataset I started doing my EDA (Exploratory Data Analysis) and I have a categorical feature with missing data which is Province_State. This column has 52360 missing values and as a percentage that is a 5.40%. I guess that is not too bad and according to what I learnt, I should impute these missing values or delete the column if I have reasonable reasonings.
My logical reasoning is that, not every country has provinces. Therefore that is pretty normal that there are missing values. I clearly don't see a point in imputing these missing values with a random value because that is not logically and it will also lead inaccuracy within the model because we cannot come up with a value which does not practically exist for that particular country.
I think I should do one of the following:
Impute all the missing values to a constant value such as -1 or "NotApplicable"
Remove the feature from the dataset
Please help me with a solution and thank you very much in advance.
(This dataset can be accessed from this link)
There are many ways to handle missing data .Deleting the whole column is not a good idea in most cases as you will be discarding information, however if you still want to delete the feature perform univariate analysis on that feature and see if its useful and decide accordingly.
Instead of removing the feature you can use any of the following ways:
Impute missing values with Mean/Median.
Predict missing values.
Impute all the missing values to -1.
Use algorithms that support missing values.

how can I identify variable target on dataset for prediction with machine learning

I'm working on a project to use the decision tree to predict attacks from log.
the problem is after normalizing the log files I don't know how to identify the output class in order to compare the results obtained from the decision tree with the real results.
to tell the truth I don't know how to identify the real class.
do i need to correlate in order to identify the class?
Thanks for your help
Your questions is not clear. It would be great if your can share the logs or the result dataset that you are aiming for.
However, you can check if you are following Classification or Regression. The main difference between them is that the output variable in regression is numerical (or continuous) while that for classification is categorical (or discrete).
So check for the columns that fully describe the above(Classification or regression)
thank's #Running Rabbit, as i said before i normalize a set of log (snort log, access log from apache, error log from apache) with IDMEF protocol like this
enter image description here
and here is the original log file snort for exemple:
bastion snort: [1:2001669:1] BLEEDING-EDGE Web Proxy GET Request [Classification: Potentially Bad Traffic] [Priority: 2]: {TCP} 220.170.88.36:3047 -> 11.11.79.82:80
the aim is how do I get the class attack to identify if is an attack or not {yes, no} I don't know how can I found a real target class before using DTA for prediction.
Thank's

How to predict result with no label but a specific loss function

I've meet a problem recently. The target result are several columns(3 columns, only contain 1 or 0).
The target result could lead to a penalty function which could be used as a loss function for my model.
I've researched MLP\FFNN\SVM for these kind of problem which seems like to be unsupervised learning. But still got a little confused about how to apply these algorithm to the problem.
Cause there are a lot of examples of these algorithm seems to have a label for training.
So how could i tackle this problem? Any suggestion please?

Retrieval Based Q/A bot

I am trying to train a retrieval based Q/A chat bot using RNN (classification). I tried training for about 1000 steps, but have hardly got any meaning full results (ACC < 10). Basically, I was trying to map the tensorflow DBPedia example over my dataset. (So I turned my Q/A problem to a classification one). DBPedia is a clean and grammatically correct dataset. However, my dataset is full of short forms and grammatical errors / spelling mistakes. I have tried to correct many of them using the (right/wrongs) words pairs and stemming.
I have read that sequence to sequence model works the best for such problems. However, I had not expected the RNN to fail so miserably.
Any ideas why it did ?
[EDIT] : Even Char level CNN gives similar result.

What is Weka's InfoGainAttributeEval formula for evaluating Entropy with continuous values?

I'm using Weka's attribute selection function for Information Gain and I'm trying to figure out what the specific formula Weka uses when dealing with continuous data.
I understand the usual formula for Entropy is this for when the values in the data are discrete. I understand that when dealing with continuous data one can either use Differential Entropy or discretize the values. I've tried looking at Weka's explanation to InfoGainAttributeEval and have looking through so many other references, but can't find anything.
Maybe its just me, but would anyone know how Weka implements this case?
Thanks!
I asked the author Mark Hall and he said:
It uses the supervised MDL-based discretization method of Fayad and
Irani. See the javadocs:
http://weka.sourceforge.net/doc.stable-3-8/weka/attributeSelection/InfoGainAttributeEval.html
Also you can see this link for the discretization method:
http://weka.sourceforge.net/doc.stable-3-8/weka/filters/supervised/attribute/Discretize.html

Resources