Minimum of subexponential random variables - minimum

It is known that if independent random variables X and Y have common subexponential distribution F,
then distribution of their minimum min{X,Y} is also subexponential. The proof can be found
in Theorem 1 of Geluk "Some closure properties for subexponential distributions" Statistics &
Probability Letters 79 (2009). https://reader.elsevier.com/reader/sd/pii/S0167715209000029?token=FA3000965369626F157C77F71E95CFE090C2FC14EF7CBC7C254BACE773568EA19E0B88E882002E55F3269E431B0049A6&originRegion=eu-west-1&originCreation=20230216101708
I tried to prove that the inverse statement is wrong:
Assume that we know that the distribution of minimum of i.i.d. random variables X and Y is subexponential. I need a counter-example showing that it not necessarily follows that the distribution of X is subexponential.
Since P(min(X,Y)>x)=(\bar F(x))^2, one should to prove that 1-(\bar F(x))^2 \in S \nRightarrow F\in S. Here \bar F:=1-F.

Related

Confusion about probability density about p(x;w) and p(x|w), where w is the parameter

I can't understand the difference between p(x|w) and p(x;w),like this:
p(x|mu,sigma)=N(x|mu,sigma) and p(x;mu,sigma)=N(x;mu,sigma), where N is normal distribution.

optimal separating hyperplane objective function confusion

Chapter 4.5.2 of Elements of Statistical Learning
I don't understand what does it mean:
"Since for any β and β0 satisfying these inequalities, any positively scaled
multiple satisfies them too, we can arbitrarily set ||β|| = 1/M." 
Also, how does maximize M becomes minimize 1/2(||β||^2) ?
"Since for any β and β0 satisfying these inequalities, any positively scaled multiple satisfies them too, we can arbitrarily set ||β|| = 1/M." 
y_i(x_i' b + b0) >= M ||b||
thus for any c>0
y_i(x_i' [bc] + [b0c]) >= M ||bc||
thus you can always find such c that ||bc|| = 1/M, so we can focus only on b such that they have such norm (we simply limit the space of possible solutions because we know that scaling does not change much)
Also, how does maximize M becomes minimize 1/2(||β||^2) ?
We put ||b|| = 1/M, thus M=1/||b||
max_b M = max_b 1 / ||b||
now maximization of positive f(b) is equivalent of minimization of 1/f(b), so
min ||b||
and since ||b|| is positive, its minimization is equivalent to minimization of the square, as well as multiplied by 1/2 (this does not change the optimal b)
min 1/2 ||b||^2

Log likelihood function for GDA(Gaussian Discriminative analysis)

I am having trouble understanding the likelihood function for GDA given in Andrew Ng's CS229 notes.
l(φ,µ0,µ1,Σ) = log (product from i to m) {p(x(i)|y(i);µ0,µ1,Σ)p(y(i);φ)}
The link is http://cs229.stanford.edu/notes/cs229-notes2.pdf Page 5.
For Linear regression the function was product from i to m p(y(i)|x(i);theta)
which made sense to me.
Why is there a change here saying it is given by p(x(i)|y(i) and that is multiplied by p(y(i);phi)?
Thanks in advance
The starting formula on page 5 is
l(φ,µ0,µ1,Σ) = log <product from i to m> p(x_i, y_i;µ0,µ1,Σ,φ)
leaving out the parameters φ,µ0,µ1,Σ for now, that can be simplified to
l = log <product> p(x_i, y_i)
using the chain rule you can convert that to either
l = log <product> p(x_i|y_i)p(y_i)
or
l = log <product> p(y_i|x_i)p(x_i).
In the page 5 formula, the φ is moved to p(y_i), because only p(y) depends on it.
The likelihood starts with the joint probability distribution p(x,y) instead of the conditional probability distribution p(y|x), which is why GDA is called a generative model (models from x to y and from y to x), while logistic regression is considered a discriminatory model (models from x to y, one-way). Both have their advantages and disadvantages. There seems to be a chapter about that further below.

Normalize a feature in this table

This has become quite a frustrating question, but I've asked in the Coursera discussions and they won't help. Below is the question:
I've gotten it wrong 6 times now. How do I normalize the feature? Hints are all I'm asking for.
I'm assuming x_2^(2) is the value 5184, unless I am adding the x_0 column of 1's, which they don't mention but he certainly mentions in the lectures when talking about creating the design matrix X. In which case x_2^(2) would be the value 72. Assuming one or the other is right (I'm playing a guessing game), what should I use to normalize it? He talks about 3 different ways to normalize in the lectures: one using the maximum value, another with the range/difference between max and mins, and another the standard deviation -- they want an answer correct to the hundredths. Which one am I to use? This is so confusing.
...use both feature scaling (dividing by the
"max-min", or range, of a feature) and mean normalization.
So for any individual feature f:
f_norm = (f - f_mean) / (f_max - f_min)
e.g. for x2,(midterm exam)^2 = {7921, 5184, 8836, 4761}
> x2 <- c(7921, 5184, 8836, 4761)
> mean(x2)
6676
> max(x2) - min(x2)
4075
> (x2 - mean(x2)) / (max(x2) - min(x2))
0.306 -0.366 0.530 -0.470
Hence norm(5184) = 0.366
(using R language, which is great at vectorizing expressions like this)
I agree it's confusing they used the notation x2 (2) to mean x2 (norm) or x2'
EDIT: in practice everyone calls the builtin scale(...) function, which does the same thing.
It's asking to normalize the second feature under second column using both feature scaling and mean normalization. Therefore,
(5184 - 6675.5) / 4075 = -0.366
Usually we normalize all of them to have zero mean and go between [-1, 1].
You can do that easily by dividing by the maximum of the absolute value and then remove the mean of the samples.
"I'm assuming x_2^(2) is the value 5184" is this because it's the second item in the list and using the subscript _2? x_2 is just a variable identity in maths, it applies to all rows in the list. Note that the highest raw mid-term exam result (i.e. that which is not squared) goes down on the final test and the lowest raw mid-term result increases the most for the final exam result. Theta is a fixed value, a coefficient, so somewhere your normalisation of x_1 and x_2 values must become (EDIT: not negative, less than 1) in order to allow for this behaviour. That should hopefully give you a starting basis, by identifying where the pivot point is.
I had the same problem, in my case the thing was that I was using as average the maximum x2 value (8836) minus minimum x2 value (4761) divided by two, instead of the sum of each x2 value divided by the number of examples.
For the same training set, I got the question as
Q. What is the normalized feature x^(3)_1?
Thus, 3rd training ex and 1st feature makes out to 94 in above table.
Now, normalized form is
x = (x - mean(x's)) / range(x)
Values are :
x = 94
mean(89+72+94+69) / 4 = 81
range = 94 - 69 = 25
Normalized x = (94 - 81) / 25 = 0.52
I'm taking this course at the moment and a really trivial mistake I made first time I answered this question was using comma instead of dot in the answer, since I did by hand and in my country we use comma to denote decimals. Ex:(0,52 instead of 0.52)
So in the second time I tried I used dot and works fine.

Minimum and maximum values of integer variable

Let's assume a very simple constraint: solve(x > 0 && x < 5).
Can Z3 (or any other SMT solver, or any other automatic technique)
compute the minimum and maximum values of (integer) variable x that satisfies the given constraints?
In our case, the minimum is 1 and the maximum is 4.
Z3 has not support for optimizing (maximizing/minimizing) objective functions or variables.
We plan to add this kind of capability, but it will not happen this year.
In the current version, we can "optimize" an objective function by solving several problems where in each iteration we add additional constraints. We know we found the optimal when the problem becomes unsatisfiable. Here is a small Python script that illustrates the idea. The script maximizes the value of a variable X. For minimization, we just have to replace s.add(X > last_model[X]) with s.add(X < last_model[X]). This script is very naive, it performs a "linear search". It can be improved in many ways, but it demonstrates the basic idea.
You can also try the script online at: http://rise4fun.com/Z3Py/KI1
See the following related question: Determine upper/lower bound for variables in an arbitrary propositional formula
from z3 import *
# Given formula F, find the model the maximizes the value of X
# using at-most M iterations.
def max(F, X, M):
s = Solver()
s.add(F)
last_model = None
i = 0
while True:
r = s.check()
if r == unsat:
if last_model != None:
return last_model
else:
return unsat
if r == unknown:
raise Z3Exception("failed")
last_model = s.model()
s.add(X > last_model[X])
i = i + 1
if (i > M):
raise Z3Exception("maximum not found, maximum number of iterations was reached")
x, y = Ints('x y')
F = [x > 0, x < 10, x == 2*y]
print max(F, x, 10000)
As Leonardo pointed out, this was discussed in detail before: Determine upper/lower bound for variables in an arbitrary propositional formula. Also see: How to optimize a piece of code in Z3? (PI_NON_NESTED_ARITH_WEIGHT related).
To summarize, one can either use a quantified formula, or go iteratively. Unfortunately, these techniques are not equivalent:
Quantified approach needs no iteration, and can find global min/max in a single call to the solver; at least in theory. However, it does give rise to harder formulas. So, the backend solver can time-out, or simply return "unknown".
Iterative approach creates simple formulas for the backend solver to deal with, but it can loop forever if there's no optimal value; simplest example being trying to find the largest Int value. Quantified version can solve this problem nicely by quickly telling you that there is no such value, while the iterative version would go on indefinitely. This can be a problem if you don't know ahead of time that your constraints do have an optimal solution. (Needless to say, the "sufficient" iteration count is typically hard to guess, and might depend on random factors, like the seed used by the solver.)
Also keep in mind that if there is a custom optimization algorithm for the problem domain at hand, it's unlikely that a general purpose SMT solver can outperform it.
z3 now supports optimization.
from z3 import *
o = Optimize()
x = Int( 'x' )
o.add(And(x > 0, x < 5))
o.maximize(x)
print(o.check()) # prints sat
print(o.model()) # prints [x = 4]
This particular problem is an integer program.

Resources