complexity between FCBF & Greedy Forward Selection - machine-learning

I am reading about Feature Selection methods and comparing FCBF method vs Greedy forward selection method
the complexity of FCBF is O(M N log N) where M is the number of dataset instances and N is the number of dataset features as per "Understanding and Using Rough Set Based Feature Selection Concepts, Techniques and Applications" book Page 45
and the complexity of Greedy forward selection method is O(m^2 M N log N)
where M is the total number of features, m is the subset feature numbers, and N is the number of datapoints as per "OMEGA: ON-LINE MEMORY-BASED GENERAL PURPOSE SYSTEM CLASSIFIER" book page 122
although both complexities look the same
yet M & N has different meaning in each one
N is the number of features in the first and M is the number of the feature in the 2nd.
my question is :
if we going to unify symbols will the 2nd complexity be like O(k^2 K M log M)
or they are actually the same variables, i mean N in the 1st is the N in the 2nd and M in the 1st is M in the 2nd and the problem is that I did not understand the equations right?

Related

How can we implement efficiently a maximum set coverage arc of fixed cardinality?

I am working on solving the following problem and implement the solution in C++.
Let us assume that we have an oriented weighted graph G = (V, A, w) and P a set of persons.
We receive a number of queries such that every query gives a person p and two vertices s and d and asks to compute the minimum weighted path between s and d for the person p. One person can have multiple paths.
After the end of all queries I have a number k <= |A| and I should give k arcs such that the number of persons using at least one of the k arcs is maximal (this is a maximum coverage problem).
To solve the first part I implemented the Djikistra algorithm using priority_queue and I compute the minimal weight between s and d. (Is this a good way to do ?)
To solve the second part I store for every arc the set of persons that use this arc and I use a greedy algorithm to compute the set of arcs (at each stage, I choose an arc used by the largest number of uncovered persons). (Is this a good way to do it ?)
Finally, if my algorithms are goods how can I implement them efficiently in C++?

Could you explain this question? i am new to ML, and i faced this problem, but its solution is not clear to me

The problem is in the picture
Question's image:
Question 2
Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemists obtains the dataset below. In the column on the right, kj/mole is the unit measuring the amount of energy released. examples.
You would like to use linear regression (h a(x)=a0+a1 x) to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for a0 and a1? You should be able to select the right answer without actually implementing linear regression.
A) a0=−1780.0, a1=−530.9 B) a0=−569.6, a1=−530.9
C) a0=−1780.0, a1=530.9 D) a0=−569.6, a1=530.9
Since all a0s are negative but two a1s are positive lets figure out the latter first.
As you can see by increasing the number of carbon atoms the energy is become more and more negative, so the relation cannot be positively correlated which rules out options c and d.
Then for the intercept the value that produces the least error is the correct one. For the 1 and 10 (easier to calculate) the outputs are about -2300 and -7000 for a, -1100 and -5900 for b, so one would prefer b over a.
PS: You might be thinking there should be obvious values for a0 and a1 from the data, it's not. The intention of the question is to give you a general understanding of the best fit. Also this way of solving is kinda machine learning as well

Fast Exact Solvers for Chromatic Number

Finding the chromatic number of a graph is an NP-Hard problem, so there isn't a fast solver 'in theory'. Is there any publicly available software that can compute the exact chromatic number of a graph quickly?
I'm writing a Python script that computes the chromatic number of many graphs, but it is taking too long for even small graphs. The graphs I am working with a wide range of graphs that can be sparse or dense but usually less than 10,000 nodes. I formulated the problem as an integer program and passed it to Gurobi to solve. Do you have recommendations for software, different IP formulations, or different Gurobi settings to speed this up?
import networkx as nx
from gurobipy import *
# create test graph
n = 50
p = 0.5
G = nx.erdos_renyi_graph(n, p)
# compute chromatic number -- ILP solve
m = Model('chrom_num')
# get maximum number of variables necessary
k = max(nx.degree(G).values()) + 1
# create k binary variables, y_0 ... y_{k-1} to indicate whether color k is used
y = []
for j in range(k):
y.append(m.addVar(vtype=GRB.BINARY, name='y_%d' % j, obj=1))
# create n * k binary variables, x_{l,j} that is 1 if node l is colored with j
x = []
for l in range(n):
x.append([])
for j in range(k):
x[-1].append(m.addVar(vtype=GRB.BINARY, name='x_%d_%d' % (l, j), obj=0))
# objective function is minimize colors used --> sum of y_0 ... y_{k-1}
m.setObjective(GRB.MINIMIZE)
m.update()
# add constraint -- each node gets exactly one color (sum of colors used is 1)
for u in range(n):
m.addConstr(quicksum(x[u]) == 1, name='NC_%d' % u)
# add constraint -- keep track of colors used (y_j is set high if any time j is used)
for u in range(n):
for j in range(k):
m.addConstr(x[u][j] <= y[j], name='SH_%d_%d' % (u,j))
# add constraint -- adjacent nodes have different colors
for u in range(n):
for v in G[u]:
if v > u:
for j in range(k):
m.addConstr(x[u][j] + x[v][j] <= 1, name='ADJ_%d_%d_COL_%d' % (u,v,j))
# update model, solve, return the chromatic number
m.update()
m.optimize()
chrom_num = m.objVal
I am looking to compute exact chromatic numbers although I would be interested in algorithms that compute approximate chromatic numbers if they have reasonable theoretical guarantees such as constant factor approximation, etc.
You might want to try to use a SAT solver or a Max-SAT solver. I expect that they will work better than a reduction to an integer program, since I think colorability is closer to satsfiability.
SAT solvers receive a propositional Boolean formula in Conjunctive Normal Form and output whether the formula is satisfiable. The following problem COL_k is in NP:
Input: Graph G and natural number k.
Output: G is k-colorable.
To solve COL_k you encode it as a propositional Boolean formula with one propositional variable for each pair (u,c) consisting of a vertex u and a color 1<=c<=k. You need to write clauses which ensure that every vertex is is colored by at least one color. You also need clauses to ensure that each edge is proper.
Then you just do a binary search to find the value of k such that G is k-colorable but not (k-1)-colorable.
There are various free SAT solvers. I have used Lingeling successfully, but you can find many others on the SAT competition website. They all use the same input and output format. Google "MiniSAT User Guide: How to use the MiniSAT SAT Solver" for an explanation on this format.
You can also use a Max-SAT solver, again consult the Max-SAT competition website. They can solve the Partial Max-SAT problem, in which clauses are partitioned into hard clauses and soft clauses. Here, the solver finds the maximal number of soft clauses which can be satisfied while also satisfying all of the hard clauses, see the input format in the Max-SAT competition website (under rules->details).
You can formulate the chromatic number problem as one Max-SAT problem (as opposed to several SAT problems as above). In this sense, Max-SAT is a better fit. On the other hand, I have the impression that SAT solvers generally perform better than Max-SAT solvers. I don't have any experience with this kind of solver, so cannot say anything more.

Optimal Range for Universal Scalability Law (USL)

I'm doing a report and needed to have a test for the scalability of a mind map database software design idea. I wanted to use the USL equation to get a quantifiable metric for scalability, but I have no idea what range is considered good for USL. Any help would be appreciated :)
USL Eq'n:
C(N) = N/ (1 + α (N − 1) + β N (N − 1))
The three terms in the denominator of eqn. are associated repectively with the three Cs: the level of concurrency, a contention penalty (with stength α) and a coherency penalty (with stength β). The parameter values are defined in the range: 0 ≤ α, β < 1. The independent variable N can represent either
Do you mean number of measurements by "What range"? If yes then you cannot be assured about the required number of measured data points beforehand. Until you don't see any change in the predicted maximum number of concurrency after including more data points you have to keep adding more data points.
The estimated parameters and predictions there of are not reliable if you use the MS Excel spread sheet method explained in the book "Guerrilla Capacity Planning" book. Check out the paper "Mythbuster for the Guerrillas" to understand why and how to get reliable results. It might be worth to read the paper "Better Prediction Using the Super-serial Scalability Law Explained by the Least Square Error Principle and the Machine Repairman Model”.

Neural Networks: What does "linearly separable" mean?

I am currently reading the Machine Learning book by Tom Mitchell. When talking about neural networks, Mitchell states:
"Although the perceptron rule finds a successful weight vector when
the training examples are linearly separable, it can fail to converge
if the examples are not linearly separable. "
I am having problems understanding what he means with "linearly separable"? Wikipedia tells me that "two sets of points in a two-dimensional space are linearly separable if they can be completely separated by a single line."
But how does this apply to the training set for neural networks? How can inputs (or action units) be linearly separable or not?
I'm not the best at geometry and maths - could anybody explain it to me as though I were 5? ;) Thanks!
Suppose you want to write an algorithm that decides, based on two parameters, size and price, if an house will sell in the same year it was put on sale or not. So you have 2 inputs, size and price, and one output, will sell or will not sell. Now, when you receive your training sets, it could happen that the output is not accumulated to make our prediction easy (Can you tell me, based on the first graph if X will be an N or S? How about the second graph):
^
| N S N
s| S X N
i| N N S
z| S N S N
e| N S S N
+----------->
price
^
| S S N
s| X S N
i| S N N
z| S N N N
e| N N N
+----------->
price
Where:
S-sold,
N-not sold
As you can see in the first graph, you can't really separate the two possible outputs (sold/not sold) by a straight line, no matter how you try there will always be both S and N on the both sides of the line, which means that your algorithm will have a lot of possible lines but no ultimate, correct line to split the 2 outputs (and of course to predict new ones, which is the goal from the very beginning). That's why linearly separable (the second graph) data sets are much easier to predict.
This means that there is a hyperplane (which splits your input space into two half-spaces) such that all points of the first class are in one half-space and those of the second class are in the other half-space.
In two dimensions, that means that there is a line which separates points of one class from points of the other class.
EDIT: for example, in this image, if blue circles represent points from one class and red circles represent points from the other class, then these points are linearly separable.
In three dimensions, it means that there is a plane which separates points of one class from points of the other class.
In higher dimensions, it's similar: there must exist a hyperplane which separates the two sets of points.
You mention that you're not good at math, so I'm not writing the formal definition, but let me know (in the comments) if that would help.
Look at the following two data sets:
^ ^
| X O | AA /
| | A /
| | / B
| O X | A / BB
| | / B
+-----------> +----------->
The left data set is not linearly separable (without using a kernel). The right one is separable into two parts for A' andB` by the indicated line.
I.e. You cannot draw a straight line into the left image, so that all the X are on one side, and all the O are on the other. That is why it is called "not linearly separable" == there exist no linear manifold separating the two classes.
Now the famous kernel trick (which will certainly be discussed in the book next) actually allows many linear methods to be used for non-linear problems by virtually adding additional dimensions to make a non-linear problem linearly separable.

Resources