KNN Decision Boundary - machine-learning

I have two classes:
x={-3,-2,1} //represented by *
y={0,5,6,7} //represented by x
If k=3, how do you determine the decision boundary?
* * x * x x x
| | | | | | | | | | | | |
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
Supposedly the correct answer is 1.5, between 1 and 2. How does that work?

The KNN algorithm classifies new observations by looking at the K nearest neighbors, looking at their labels, and assigning the majority (most popular) label to the new observation.
For KNN with K=3, anything < 1.5 will be classified as * and anything > 1.5 will be classified as x.
You can see this by trying out a few examples. Suppose you need to classify a value of 1. The three nearest neighbors are the * at 1, the x at 0, and the * at -2. Since there are two *'s and one x, 1 will be classified as *.
Now suppose you want to classify 2. Here, the three nearest neighbors are the x at 0, the * at 1, and the x at 5. So 2 would get classified as x.
The KNN process implicitly defines a decision boundary. The best way to determine it that I'm aware of is to try a bunch of examples and look for the transition boundary where observation classifications change from one class to another class. In your example this would look like this:
-5 -> *
-4 -> *
-3 -> *
-2 -> *
-1 -> *
0 -> *
1 -> *
2 -> x
3 -> x
4 -> x
You can see this in your example - the decision boundary is somewhere between 1 and 2. Hence the 1.5 answer.

Related

opencv mat 2d array, swap index and value

I have a 2d Mat which has x,y as indexes and data is a z value.
I would like to create a mat with x,z index
and give y values as the data.
Yes, I understand I would have to limit the range and datatype of z.
I can do it element by element. Is there something faster?
sample
array in:
Y | (X,Y) = Z
3 | 3 1 3
2 | 1 3 4
1 | 2 2 1
---------- X
1 2 3
array out: Z |
4 | 0 0 2 (X,Z) = Y
3 | 3 2 3
2 | 1 1 0
1 | 2 3 1
--------------
1 2 3 X
The resulting matrix will actually be very sparse.
The original will be roughly 100x20 (x100,y20).
The result will be probably 100x1000 with the 20 y values
in each x column. So that is pretty sparse!
I don't know if that matters in the selection of available tools.

How to manipulate multiple nested arrays in Dyalog APL?

I have been given matrices filled with alphanumerical values excluding lower case letters like so:
XX11X1X
XX88X8X
Y000YYY
ZZZZ789
ABABABC
and have been tasked with counting the repetitions in each row and then tallying up a score depending on the ranking of the character being repeated. I used {⍺ (≢⍵)}⌸¨ ↓ m to help me. For the example above I would get something like this:
X 4 X 4 Y 4 Z 4 A 3
1 3 8 3 0 3 7 1 B 3
8 1 C 1
9 1
This is great but now I need to do a function that would be able to multiply the numbers with each letter. I can access the first matrix with ⊃ but then I am completely lost on how to access the other ones. I can simply write ⊃w[2] and ⊃w[3] and so forth but I need a way to change every matrix at the same time in one function. For this example, the array of the ranking is as follow: ZYXWVUTSRQPONMLKJIHGFEDCBA9876543210 so for the first array XX11X1X
which corresponds to:
X 4
1 3
So the X is 3rd in the array so it corresponds to a 3 and 1 is 35th so it's a 35. The final scoring would be something like (3×104)+(35×103). My biggest problem is not necessarily the scoring part but being able to access each matrix individually in one function. So for this nested array:
X 4 X 4 Y 4 Z 4 A 3
1 3 8 3 0 3 7 1 B 3
8 1 C 1
9 1
if I do arr[1] it gives me the scalar
X 4
1 3
and ⍴ arr[1] gives me nothing confirming it so I can do ⊃arr[1] to get the matrix itself and have access to each column individually. This is where I'm stuck. I'm trying to write a function to be able to do the math for each matrix and then saving those results to an array. I can easily do the math for the first matrix but I can't do it for all of them. I might have made a mistake by making using {⍺ (≢⍵)}⌸¨ ↓ m to get those matrices. Thanks.
Using your example arrangement:
⎕ ← arranged ← ⌽ ⎕D , ⎕A
ZYXWVUTSRQPONMLKJIHGFEDCBA9876543210
So now, we can get the index values:
1 ⌷ m
XX11X1X
∪ 1 ⌷ m
X1
arranged ⍳ ∪ 1 ⌷ m
3 35
While you could compute the intermediary step first, it is much simpler to include most of the final formula in in Key's operand:
{ ( arranged ⍳ ⍺ ) × 10 * ≢⍵ }⌸¨ ↓m
┌───────────┬───────────┬───────────┬─────────────────┬───────────────┐
│30000 35000│30000 28000│20000 36000│10000 290 280 270│26000 25000 240│
└───────────┴───────────┴───────────┴─────────────────┴───────────────┘
Now we just need to sum each:
+/¨ { ( arranged ⍳ ⍺ ) × 10 * ≢⍵ }⌸¨ ↓m
65000 58000 56000 10840 51240
In fact, we can combine the summation with the application of Key to avoid a double loop:
{ +/ { ( arranged ⍳ ⍺ ) × 10 * ≢⍵ }⌸ ⍵}¨ ↓m
65000 58000 56000 10840 51240
For completeness, here is a way to use the intermediary result. Let's start by working on just the first matrix (you can get the second one with 2⊃ instead of ⊃ ― for details, see Problems when trying to use arrays in APL. What have I missed?):
⊃{⍺ (≢⍵)}⌸¨ ↓m
X 4
1 3
We can insert a function between the left column elements and the right column elements with reduction:
{⍺ 'foo' ⍵}/ ⊃{⍺ (≢⍵)}⌸¨ ↓m
┌─────────┬─────────┐
│┌─┬───┬─┐│┌─┬───┬─┐│
││X│foo│4│││1│foo│3││
│└─┴───┴─┘│└─┴───┴─┘│
└─────────┴─────────┘
So now we simply have to modify the placeholder function with one that looks up the left argument in the arranged items, and multiplies by ten to the power of the right argument:
{ ( arranged ⍳ ⍺ ) × 10 * ⍵ }/ ⊃{⍺ (≢⍵)}⌸¨ ↓m
30000 35000
Instead of applying this to only the first matrix, we apply it to each matrix:
{ ( arranged ⍳ ⍺ ) × 10 * ⍵ }/¨ {⍺ (≢⍵)}⌸¨ ↓m
┌───────────┬───────────┬───────────┬─────────────────┬───────────────┐
│30000 35000│30000 28000│20000 36000│10000 290 280 270│26000 25000 240│
└───────────┴───────────┴───────────┴─────────────────┴───────────────┘
Now we just need to sum each:
+/¨ { ( arranged ⍳ ⍺ ) × 10 * ⍵ }/¨ {⍺ (≢⍵)}⌸¨ ↓m
65000 58000 56000 10840 51240
However, this is a much more circuitous approach, and is only provided here for reference.

Artificial Neural Network Toplogy

I am currently trying to revise for my final year exams and came across this question, I have looked everywhere in my lecture slides for any sort of help and cannot find any. Any help in providing insight in to how to solve this question would be appreciated (I am not just asking for the answer, I need to comprehend the topic). Furthermore, do I assume that all inputs are equal to 1? Do i include 7 inputs in the input layer? Im at a loss as to how to answer.
The question is as follows:
b) Determine, with justification, the simplest type and topology (i.e. number of neurons & layers) of artificial neural network that could learn the data set below.
Click here for picture of the dataset.
If I'm not mistaken, you have two inputs X1, X2, and one target output. For each input consisting, of two numbers X1, X2, the appropriate output ("target") is given.
As a first step, you could sketch the seven data points - just draw the 3 ones and 4 zeroes at the right places on on the square (X1, X2) ∈ [0, 1.05] × [0, 1]. Maybe you remember something similar from the lecture, possibly near a mention of "XOR".
The edit queue is full, so adding data from the linked image here
Pattern X1 X2 Target
1 0.01 -0.1 1
2 0.90 0.09 0
3 0.89 -0.05 0
4 1.05 0.95 1
5 -0.01 0.12 0
6 1.05 0.97 1
7 0.98 0.10 0
It looks like 1 possible solution is X1 >= 1.0 OR X2 <= -0.1
Alternatively, if you round each of X1 and X2, it becomes
Pattern X1 X2 Target
1 0 0 1
2 1 0 0
3 1 0 0
4 1 1 1
5 0 0 0
6 1 1 1
7 1 0 0
Then it IS XOR, and the solution is round(X1) XOR round(X2). In that case you can use 1 activation layer (like round, RELU, sigmoid, linear), 1 hidden layer of 2 neurons and 1 output layer of 1 neuron.
See this stackoverflow post for a detail of how to solve XOR with a neural net.

Multi-level and multi-argument Index in Google Sheets

I am writing a sheet where I am trying to create a multi level Index that searches through 5 different columns with 3 pieces of data. So for example:
x = 40
y = 5000
z = 20000
Column1 | Column2 | Column3 | Column4 | Column5 | Column6
13 | 29 | 0 | 0 | 0 | Yes
30 | 870 | 0 | 0 | 0 | No
10 | 870 | 0 | 30000 | 1 | Blue
10 | 870 | 30001 | 100000 | 1 | Yes
10 | 870 | 100001 | 300000 | 1 | Unknown
Here's a sample set of my data, what I need is to compare
the variable x to columns 1 and then 2 (x must fall between these values)
variable y to columns 3 and 4 (y must fall between these values)
and then finally z to column 5 (z must be above these values)
In each of these cases I need to know if the the variable is either lower than or higher than . Finally, I need the matching data from column 6 to be returned as a result in my sheet. At the moment I have a simply IMMENSE list of nested if statements which consider all of these criteria separately but it doesn't lend itself very well to editing when changes need to be made to the values.
I've looked at every single page on the internet (every... single... page...) and can't seem to find the solution to my issue. Most solutions I have found are either using a single data point, using multiple data points against a single range or simply don't seem to work. The latest iteration I have tried is:
=INDEX('LTV Data'!$N$3:$N$10, MATCH($D$5 & $G$8 & $G$12, ARRAYFORMULA($D$5 <= 'LTV Data'!$H$3:$H$10 & $D$5 >= 'LTV Data'!$I$3:$I$10 & $G$12 <= 'LTV Data'!$J$3:$J$10 & $G$12 >= 'LTV Data'!$K$3:$K$10 & $G$8 <= 'LTV Data'!$L$3:$L$10), 0), 7)
But this only produces an error as the separate values I want to test against are concatenated and the Match can't find that string. I'm also unsure about the greater than and less than symbols as to how valid that syntax is. Is anyone able to shed some light on how I can achieve the result I need in a more elegant way than the mass of IFS, ANDS + ORs I have right now? Like I said, it works but it sure ain't pretty ;)
Thanks a bunch in advance!
ETA: So with the above variables the result I would like would be 'Blue'. This is because x falls between columns 1 and 2, y falls between columns 3 and 4 and z is higher than column 5 on the third row. This is all contained in the MATCH statement in the example code above. Please see the MATCH statement to see the comparisons I am trying to make.
You need to put the different criteria together using multiplication if you want to get the effect of an AND in an array:
=INDEX(F2:F10,MATCH(1,(A2:A10<x)*(B2:B10>x)*(C2:C10<y)*(D2:D10>y)*(E1:E10<z),0))
or
=INDEX(F2:F10,MATCH(1,(A2:A10<=x)*(B2:B10>=x)*(C2:C10<=y)*(D2:D10>=y)*(E1:E10<=z),0))
to include the equality (I have used named ranges for x, y and z).
Works in Google Sheets and (if entered as an array formula) Excel.
In Google Sheets you also have the option of using a filter
=filter(F2:F10,A2:A10<=x,B2:B10>=x,C2:C10<=y,D2:D10>=y,E2:E10<=z)
but then you aren't guaranteed to get just one row.

Relationship between m and n in filter of image processing

I have a filter/kernel like
| 1 1 1|
H = 1/m | 1 n 1|
| 1 1 1|
I want to know what is the relationship between m and n in this filter and how this relationship
effect the image using convolution.
There doesn't have to be any relationship between n and m, but if you want the convolution to be normalized, you need the sum of the kernel to be 1. In that case
m = 8 + n
The wiki page on kernels also explains that
Normalization ensures that the pixel values in the output image are of
the same relative magnitude as those in the input image.
Otherwise if m < 8 + n they will be brighter, or if m > 8 + n they will be dimmer.
NOTE
As pointed out by BЈовић, changing n changes the action of the filter significantly (see comments on this question).

Resources