plotting in octave syntax - machine-learning

pos = find(y==1);
neg = find(y==0);
plot(X(pos, 1), X(pos, 2), "k+", "LineWidth", 2, 'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), "ko", "MarkerFaceColor", 'y', 'MarkerSize', 7);
I understand that find function gives us the index of the data where y==1 and y==0. But I am not sure what X(pos,1) and X(pos,2) do in the function below. Can someone explain how this plot function works?

pos and neg are vectors with indices where the condition y==1 (respectively y==0) is fulfiled. y seems to be a vector with length n, X seems to be a nx2 Matrix. X(pos,1) are all elements of the first column of X at rows where the condition y==1 is met.
y = [ 2 3 1 4 0 1 2 6 0 4]
X = [55 19;54 96;19 85;74 81;94 34;82 80;79 92;57 36;70 81;69 4]
X(find(y==1), 1)
which gives
ans =
19
82
Note that find isn't needed here,
X(y==1, 1)
would be sufficient

Here X is nx2 matix and pos is a m vector having indexes where y==1 in the matrix X.
As X(pos,1) is m x 1 matrix with values of 1st row of matrix X where x==1, same is the case of X(pos,2).
Plotting a graph with
plot(X(pos, 1), X(pos, 2), "k+", "LineWidth", 2, 'MarkerSize', 7);
will give you a graph with '+'points having x coordinate X(pos,1) [values of 1st row of matrix X where x==1] and y coordinate X(pos,2) [values of 2st row of matrix X where x==1].
Similarly with plot(X(neg, 1), X(neg, 2), "ko", "MarkerFaceColor", 'y', 'MarkerSize', 7);
will give you a graph with yellow dots having x coordinate X(neg,1) [values of 1st row of matrix X where x==0] and y coordinate X(neg,2) [values of 2st row of matrix X where x==0].
You can also directly use y==1 instead of pos.

Code:
pos = find(y == 1); neg = find(y == 0);
% Plot Examples
plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2,'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y', 'MarkerSize', 7);
Answer: In simple word, X(pos, 1) stores all the values of first column of X where y == 1
and X(pos, 2) store all the values of second column of X where y == 1.
Similarly it will happens to X(neg, 1), X(neg, 2) where X stores the values of first, Second Columns of X respectively where y == 0.
Now I'm including some output here to better understanding.
here is my dataset.
34.62365962451697,78.0246928153624,0
61.10666453684766,96.51142588489624,1
30.28671076822607,43.89499752400101,0
35.84740876993872,72.90219802708364,0
60.18259938620976,86.30855209546826,1
79.0327360507101,75.3443764369103,1
See the values of X(pos, 1) "first column of X where y == 1 ", X(pos, 2) Second column of X where y == 1 " , X(neg, 1) first column of X where y == 0 " and X(neg, 2) Second column of X where y == 0 "
You can that X(pos, 2) plot over X(pos, 1), similarly X(neg, 2) plot over X(neg, 1)

Related

How to select a row from an array based on the highest correlation to a given row?

For each row of a data array X, I want to find the row (number or index) from a data array Y that shows the highest correlation.
X Row
Value 1
Value 2
Value 3
Row index in Y with highest Corr
X1
10
5
1
?
X2
1
5
10
?
Y Row
Value 1
Value 2
Value 3
Y1
1
4
10
Y2
3
4
3
Y3
10
4
1
...
From that, I would want to obtain the row index in Y with the highest correlation to each row in X
X Row
Value 1
Value 2
Value 3
Row index in Y with highest Corr
X1
10
5
1
Y3
X2
1
5
10
Y1
I tried to apply a combination of Index and SortN to Arrayformula(CORREL(X1,Y1:Y)) but that does not work because it seems that correl will concatenate the rows if one argument consists of an array instead of a vector.
Use byrow() and filter(), like this:
=byrow(
B2:D3,
lambda(
rowX,
lambda(
labelY, correlY,
single( filter(labelY, correlY = max(correlY)) )
)(
A11:A13,
byrow(
B11:D13,
lambda(
rowY,
correl(rowX, rowY)
)
)
)
)
)
...where the range A3:D3 holds the X table and the range A11:D13 holds the Y table.

how to set condition in objective function in cvxpy

I have a brute force optimization algorithm with the objective function of the form:
np.clip(x # M, a_min=0, a_max=1) # P
where x is a Boolean decision vector, M is a Boolean matrix/tensor and P is a probability vector. As you can guess, x # M as an inner product can have values higher than 1 where is not allowed as the obj value should be a probability scalar or vector (if M is a tensor) between 0 to 1. So, I have used numpy.clip to fix the x # M to 0 and 1 values. How can I set up a mechanism like clip in cvxpy to achieve the same result? I have spent ours on internet with no lock so I appreciate any hint. I have been trying to use this to replicate clip but it raises Exception: Cannot evaluate the truth value of a constraint or chain constraints, e.g., 1 >= x >= 0. As a side note, since cvxpy cannot handle tensors, I loop through tensor slices with M[s].
n = M.shape[0]
m = M.shape[1]
w = M.shape[2]
max_budget_of_decision_variable = 7
x = cp.Variable(n, boolean=True)
obj = 0
for s in range(m):
for w in range(w):
if (x # M[s])[w] >= 1:
(x # M[s])[w] = 1
obj += x # M[s] # P
objective = cp.Maximize(obj)
cst = []
cst += [cp.sum(y) <= max_budget_of_decision_variable ]
prob = cp.Problem(objective, constraints = cst)
As an example, consider M = np.array([ [1, 0, 0, 1, 1, 0], [0, 0, 1, 0, 1, 0], [1, 1, 1, 0, 1, 0]]) and P = np.array([0.05, 0.15, 0.1, 0.15, 0.5, 0.05]).

cspline in Maxima giving me a result which indicates an error in Maxima

I have the following cubic polynomial f(x)=x³ - 3 x² + x -5 for which the cubic spline should provide the exact same polynomial assuming the following data:
(-1, -10), (0,-5), (1, -6) with second derivative at the extremes f''(-1)=-12, f''(1)=0 (note that f''(x)=6x-6.)
Here the piece of code that I tried on:
/* polynomial to interpolate and data */
f(x) := x^3 - 3* x^2 + x - 5$
x0:-1$
x1:0$
x2:1$
y0:f(x0)$
y1:f(x1)$
y2:f(x2)$
p:[[x0,y0],[x1,y1],[x2,y2]]$
fpp(x) := diff(f(x),x,2);
fpp0 : at( fpp(x), [x=x0]);
fpp2 : at( fpp(x), [x=x2]);
/* here I call cspline with d1=fpp0 and dn=fpp2 */
load(interpol)$
cspline(p, d1=fpp0, dn=fpp2);
I expected the original polynomial (f(x)=x³ -3 x² + x -5) but I got the result:
(%o40) (-16*x^3-15*x^2+6*x-5)*charfun2(x,-inf,0)+(8*x^3-15*x^2+6*x-5)*charfun2(x,0,inf)
which does not agrees with the original polynomial.
Evenmore. Here is a test on the results provided by Maxima.
Code:
/* verification */
h11(x) := -16*x^3 - 15* x^2 + 6* x - 5;
h22(x) := 8* x^3 - 15*x^2 + 6* x - 5;
h11pp(x) := diff(h11(x), x, 2);
h11pp0: at( h11pp(x), [x=x0]);
h22pp(x) := diff(h22(x), x, 2);
h22pp2 : at(h22pp(x), [x=x2]);
which throws 66 and 18 as the boundary conditions, which should be instead -12 and 0.
Thanks.
It appears you've misinterpreted the arguments d1 and dn for cspline. As the description of cspline says, d1 and dn specify the first derivative for the spline at the endpoints, not the second derivative.
When I use the first derivative of f to specify the values for d1 and dn, I get the expected result:
(%i2) f(x) := x^3 - 3* x^2 + x - 5$
(%i3) [x0, x1, x2]: [-1, 0, 1] $
(%i4) [y0, y1, y2]: map (f, %);
(%o4) [- 10, - 5, - 6]
(%i5) p: [[x0, y0], [x1, y1], [x2, y2]];
(%o5) [[- 1, - 10], [0, - 5], [1, - 6]]
(%i6) load (interpol) $
(%i7) cspline (p, d1 = at(diff(f(x), x), x=x0), dn = at(diff(f(x), x), x=x2));
3 2
(%o7) (x - 3 x + x - 5) charfun2(x, minf, 0)
3 2
+ (x - 3 x + x - 5) charfun2(x, 0, inf)

Pixels regions comparision

I'm trying to write a python script for GIMP, who's aim is to slice a picture into a tileset (identify each unique 16x16 tiles in a picture).
So far, I'm able to read tiles (in fact a 16x16 pixels region) and write it somewhere.
But all my attempts at comparing tiles failed.
Did I miss Something ?
My script is as follow:
#!/usr/bin/env python
from gimpfu import *
# compare 2 tiles,
# return 1 if identical, false otherwise
def tileCompare(tile1, tile2):
if(tile1 == tile2):
return 1
return 0
# return tile at (x, y) coordinates
def readTile(layer, x, y):
pr = layer.get_pixel_rgn(x,y,16,16)
return pr[x:x+16, y:y+16]
# write tile at (x, y) coordinates on given layer
def writeTile(layer, x, y, tile):
pr = layer.get_pixel_rgn(x,y,16,16)
pr[x:x+16, y:y+16] = tile
def TilesSlicer(sourceLayer, targetLayer):
# Actual plug-in code will go here
# iterate tiles (result in tileSource)
for x in range(0, sourceLayer.width, 16):
for y in range(0, sourceLayer.height, 16):
tileSource = readTile(sourceLayer, x, y)
found = 0
# iterate tiles again (result in tileIterator)
for a in range(0, sourceLayer.width, 16):
for b in range(0, sourceLayer.height, 16):
tileIterator = readTile(sourceLayer, x, y)
# compare tiles
# if identical and not yet found
# write it in the target layer
if (tileCompare(tileSource, tileIterator) == 1):
if(found == 0):
writeTile(tileIterator, a, b, tileSource)
found = 1
register(
"TilesSlicer",
"Tiles slicer",
"Slice a picture into tiles",
"Fabrice Lambert",
"Fabrice Lambert",
"April 2019",
"Tiles slicer...",
"RGB*",
[
(PF_DRAWABLE, "sourceLayer", "Source Layer: ", None),
(PF_DRAWABLE, "targetLayer", "Target Layer: ", None),
],
[],
TilesSlicer,
menu="<Image>/Filters/My Scripts")
main()
Thanks for your suggestions.
Nvm,
I found the problem:
tileIterator = readTile(sourceLayer, a, b)
instead of:
tileIterator = readTile(sourceLayer, x, y)
Alright,
After refining a bit, the script is as follow:
- Added tiles width and height to handle any tile size.
- Removed target layer parameter, the script now create it.
- Added real time display to give feedbacks to the user (sadly, progress bar doesn't work).
- Improved speed.
#!/usr/bin/env python
from gimpfu import *
# compare 2 tiles,
# return 1 if identical, 0 otherwise
def tileCompare(tile1, tile2):
if(tile1 == tile2):
return 1
return 0
# return tile at (x, y) coordinates
def readTile(layer, x, y, width, height):
pr = layer.get_pixel_rgn(x, y, width, height)
return pr[x:x+width, y:y+height]
# write tile at (x, y) coordinates on given layer
def writeTile(layer, x, y, width, height, tile):
pr = layer.get_pixel_rgn(x, y, width, height)
pr[x:x+width, y:y+height] = tile
layer.update(x, y, width, height)
gimp.displays_flush()
def TilesSlicer(sourceLayer, tileWidth, tileHeight):
# Actual plug-in code will go here
if((sourceLayer.width % tileWidth) != 0):
gimp.message("The layer width is not multiple of " + str(tileWidth))
gimp.quit()
if((sourceLayer.height % tileWidth) != 0):
gimp.message("The layer height is not multiple of " + str(tileHeight))
gimp.quit()
totalTiles = (sourceLayer.width / tileWidth) * (sourceLayer.height / tileHeight)
tilesProcessed = 0
gimp.progress_init("Processing...")
gimp.progress_update(0.0)
sourceImage = sourceLayer.image
targetLayer = pdb.gimp_layer_new(sourceImage, sourceLayer.width, sourceLayer.height, sourceImage.base_type, "Target", 100.0, sourceLayer.mode)
targetLayer.add_alpha()
targetLayer.fill(TRANSPARENT_FILL)
sourceImage.add_layer(targetLayer, 0)
# iterate tiles (result in tileSource)
for x in range(0, sourceLayer.width, tileWidth):
for y in range(0, sourceLayer.height, tileHeight):
tileSource = readTile(sourceLayer, x, y, tileWidth, tileHeight)
found = 0
# iterate tiles again (result in tileIterator)
for a in range(0, sourceLayer.width, tileWidth):
for b in range(0, sourceLayer.height, tileHeight):
tileIterator = readTile(sourceLayer, a, b, tileWidth, tileHeight)
# compare tiles
# if identical and not yet found
# write it in the target layer
# and abort iteration (for speed purpose)
if (tileCompare(tileSource, tileIterator) == 1):
if(found == 0):
writeTile(targetLayer, a, b, tileWidth, tileHeight, tileIterator)
found = 1
break
if(found == 1):
break
tilesProcessed = tilesProcessed + 1
gimp.progress_update(tilesProcessed / totalTiles)
gimp.displays_flush()
register(
"TilesSlicer",
"Tiles slicer",
"Slice a picture into tiles",
"Fabrice Lambert",
"Fabrice Lambert",
"April 2019",
"Tiles slicer...",
"RGB*",
[
(PF_DRAWABLE, "sourceLayer", "Source Layer: ", None),
(PF_INT8, "tileWidth", "Tile width: ", 16),
(PF_INT8, "tileHeight", "Tile height: ", 16),
],
[],
TilesSlicer,
menu="<Image>/Filters/My Scripts")
main()
It can probably be refined better, and if someone have anything to deal with the progress bar, let me know.
I'm open to suggestions.

how to plot on a graph using the hypothesis function by substituting the value of theta 0 and theta 1

this is the hypothesis function h(x)=theta 0 + theta 1(x)
After putting the value of theta 0 as 0 and theta 1 as 0.5, how to plot it on a graph?
It is the same way that we graph the linear equations. Let us assume h(x) as y and θ as some constant and x as x. So we basically have a linear expression like this y = m + p * x (m,p are constants) . To even simplify it assume the function as y = 2 + 4x. To plot this we will just assume the values of x from a range (0,5) so now for each value of x we will have corresponding value of x. so our (x,y) set will look like this ([0, 1, 2, 3, 4], [2, 6, 10, 14, 18]). Now the graph can be plotted as we know both x and y coords.
You simply plot the line equation y = 0 + 0.5 * x
So you get something like this plot
Here's how I did it with Python
import matplotlib.pyplot as plt
import numpy as np
theta_0 = 0
theta_1 = 0.5
def h(x):
return theta_0 + theta_1 * x
x = range(-100, 100)
y = map(h, x)
plt.plot(x, y)
plt.ylabel(r'$h_\theta(x)$')
plt.xlabel(r'$x$')
plt.title(r'Plot of $h_\theta(x) = \theta_0 + \theta_1 \cdot \ x$')
plt.text(60, .025, r'$\theta_0=0,\ \theta_1=0.5$')
plt.show()

Resources