Logistic regression Maximum Likelihood in rapidminer [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I wanted to get the maximum likelihood in logistic regression with this result (I'm really not sure if this is how it looks like):
I am currently using logistic regression to National Achievement Test(a performance exam for students,NAT -GRADE-REMARKS the Y axis) and their scholastic grade(In the example below ARTS-G12(Grade 12)-Q1(Quarter 1), the x Axis).
I wanted to know the maximum likelihood of students to Pass the National Achievement Test or to get VLM or LM. For my example in the image above the category in the National Achievement Test is VLM(Very Low Mastery) which is set to 1 and the LM(Low Master) Set to 0. VLM and LM are the only categories the student gets.
I wanted to know the maximum likelihood in this graph in order to fit a s like line in sigmoid function. I just really dont know how to interpret the scatterplot below. Do I need to zoom it out? I really cant interpret it

Regarding the interpretation of the scatterplot:
the obvious (for clarity): green is VLM, blue LM
each dot means there is X number of students with the given grade on the horizontal axis.
the coloring most probably defines X. Guess: the darker it gets the more students with the same grade?
From this graph, it seems a student's grade is not related to a VLM or VL category because there are low and high marks in both categories.

Related

Batch normalization [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Why the batch normalization is working on the different samples of the same characteristics instead of different characteristics of the same sample? Shouldn't it be the normalization of different features? In the diagram, why do we use the first row and not the first column?
Could someone help me?
Because different features of the same object mean different things, and it's not logical to calculate some statistics over these values. They can have different range, mean, std, etc. E.g. one of your features could mean the age of a person and other one is the height of the person. If you calculate mean of these values you will not get any meaningful number.
In classic machine learning (especially in linear models and KNN) you should normalize your features (i.e. calculate mean and std of the specific feature over the entire dataset and transform your features to (X-mean(X)) / std(X) ). Batch normalization is analogue of this applied to stochastic optimization methods, like SGD (it's not meaningful to use global statistics on mini batch, furthermore you want to use batch norm more often than just before the first layer). More fundamenal ideas could be found in the original paper

difference between classification and regression in k-nearest neighbor? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
what is the difference between using K-nearest neighbor in classification and using it in regression?
and when using KNN in recommendation system. Does it concerned as classification or as regression?
In classification tasks, the user seeks to predict a category, which is usually represented as an integer label, but represents a category of "things". For instance, you could try to classify pictures between "cat" and "dog" and use label 0 for "cat" and 1 for "dog".
The KNN algorithm for classification will look at the k nearest neighbours of the input you are trying to make a prediction on. It will then output the most frequent label among those k examples.
In regression tasks, the user wants to output a numerical value (usually continuous). It may be for instance estimate the price of a house, or give an evaluation of how good a movie is.
In this case, the KNN algorithm would collect the values associated with the k closest examples from the one you want to make a prediction on and aggregate them to output a single value. usually, you would choose the average of the k values of the neighbours, but you could choose the median or a weighted average (or actually anything that makes sense to you for the task at hand).
For your specific problem, you could use both but regression makes more sense to me in order to predict some kind of a "matching percentage ""between the user and the thing you want to recommand to him.

Davies-Bouldin Index higher or lower score better [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed last year.
Improve this question
I trained a KMEANS clustering model using Google Bigquery, and it gives me these metrics in the evaluation tab of my model. My question is are we trying to maximize or minimize Davies-Bouldin index and mean-squared distance?
Davies-Bouldin index is a validation metric that is often used in order to evaluate the optimal number of clusters to use. It is defined as a ratio between the cluster scatter and the cluster’s separation and a lower value will mean that the clustering is better.
Regarding the second metric, the mean squared distance makes reference to the intra cluster variance, which we want to minimize as a lower WCSS (within-cluster sums of squares) will maximize the distance between clusters.

How to round a prediction when it should be a (non-categorical) integer? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
Say I am trying to predict a variable y which is a score from 0 to 10 (integer numbers only), and I am using a linear regression model. The model actually produces real numbers in that interval.
I am using regression, and not classification, because I want to be able to say that missing the correct prediction by (say) 2 is worse than missing it by 1. Currently I am using the average absolute error as the evaluation metric.
Given the prediction from the model is a real number, what is the best way to constraint it to be in the allowed set of integers (from 0 to 10)? Should I just round the prediction to the nearest integer, or any better way?
You can also use a multinomial logistic regression model and one can go for classification accuracy for the measure of the performance of the model.
Have a range from 0 to 11, and round to the nearest .5 number. This gives you 10 evenly spaced, equally sized categories. If you can, weigh the regression on how close it was to the .5 mark, as the results should ideally not be close enough to the boundary to cause ambiguity.
Alternatively, have a range from -0.5 to 10.5 and use the integers as the target. It makes no difference but is compatible with your existing network.

Data Mining - K nearest neighbor [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
This is my homework. I'm not asking you to do my homework here, I need a hint to keep going.
I know what is K nearest neighbor algorithm however I always seen it on graphs not like this. Can you guys tell me what I should do? I've been trying to figure out how to start doing this but I could not. I would appreciate a small hint from you guys.
This assignment helps you understand the steps in KNN.
KNN is based on distances. Find the K nearest neighbors and then maybe vote for a classification problem.
Your training data can be considered as (x1,x2, y) : age and profit are features (x1, x2) while BUY or NOT BUY is the label/output y.
To apply Knn you need to calculate the distance, which is based on features. Since the two features share different units ( year, USD), you should convert them into non-unit features which is called normalization, part 4.1 in your handout. After that, the feature vector will look like (-0.4,-0.8). The number should be between -1 and 0 if the suggested formula in part 4.1 is used.
Then use the normalized feature to calculate the distances (Euclidean in the handout) between every training data point and your interested company ( normalized as well). This is required in 4.2.
Last step should be to pick K nearest neighbor and decide BUY or NOT BUY judging from the outputs of those neighbors. ( a simple voting maybe?)

Resources