Difference of quasiconcave functions has unknown curvature (CVXPY) - cvxpy

I am attempting to model an allocation problem via (quasi)convex optimization: given a matrix of unknowns X containing the amount of a certain product in a certain moment and its value C, i want to maximize the resulting income cp.sum(cp.multiply(X, C)).
Among close/equivalent solutions, i want to have the least amount of different products types in inventary, - cp.sum(cp.maximum(0, cp.sign(X)), which counts the number of non-null entries in X as a penalty.
According to cvxpy, both functions are quasiconcave (the first is affine, and the second quasilinear), but when i compose them linearly, cp.sum(cp.multiply(X, C)) - cp.sum(cp.maximum(0, cp.sign(X)), the resulting problem has an UNKNOWN curvature, and refuses to solve since not 'DQCP'.
The problem is simplified, as there are additional parameters and weights, and in the constrains X is bound to be non-negative and to satisfy space availability, but this simple version reproduces this unexpected behavior.
Is the bug in cvxpy or in my math?

The difference of two quasiconcave functions is not necessarily quasiconcave, so this operation is not permitted in CVXPY.

Related

Neo4j floating point sum different results

I am using neo4j to calculate some statistics on a data set. For that I am often using sum on a floating point value. I am getting different results depending on the circumstances. For example, a query that does this:
...
WITH foo
ORDER BY foo.fooId
RETURN SUM(foo.Weight)
Returns different result than the query that simply does the sum:
...
RETURN SUM(foo.Weight)
The differences are miniscule (293.07724195098984 vs 293.07724195099007). But it is enough to make simple equality checks fail. Another example would be a different instance of the database, loaded with the same data using the same loading process can produce the same issue (the dbs might not be 1:1, the load order of some relations might be different). I took the raw values that neo4j sums (by simply removing the SUM()) and verified that they are the same in all cases (different dbs and ordered/not ordered).
What are my options here? I don't mind losing some precision (I already tried to cut down the precision from 15 to 12 decimal places but that did not seem to work), but I need the results to match up.
Because of rounding errors, floats are not associative. (a+b)+c!=a+(b+c).
The result of every operation is rounded to fit the floats coding constraints and (a+b)+c is implemented as round(round(a+b) +c) while a+(b+c) as round(a+round(b+c)).
As an obvious illustration, consider the operation (2^-100 + 1 -1). If interpreted as a (2^-100 + 1)-1, it will return 0, as 1+2^-100 would require a precision too large for floats or double coding in IEEE754 and can only be coded as 1.0. While (2^-100 +(1-1)) correctly returns 2^-100 that can be coded by either floats or doubles.
This is a trivial example, but these rounding errors may exist after every operation and explain why floating point operations are not associative.
Databases generally do not return data in a garanteed order and depending on the actual order, operations will be done differently and that explains the behaviour that you have.
In general, for this reason, it not a good idea to do equality comparison on floats. Generally, it is advised to replace a==b by abs(a-b) is "sufficiently" small.
"sufficiently" may depend on your algorithm. float are equivalent to ~6-7 decimals and doubles to 15-16 decimals (and I think that it is what is used on your DB). Depending on the number of computations, you may have the last 1--3 decimals affected.
The best is probably to use
abs(a-b)<relative-error*max(abs(a),abs(b))
where relative-error must be adjusted to your problem. Probably something around 10^-13 can be correct, but you must experiment, as rounding errors depends on the number of computations, on the dispersion of the values and on what you may consider as "equal" for you problem.
Look at this site for a discussion on comparison methods. And read What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg that discusses, among others, these problems.

General Linear Model - Repeated Measures with Covariates, Estimated Marginal Means are not adjusting? Bug?

I am running a Repeated Measures two-way ANCOVA. The model produces an Estimated Marginal Means table, but the values are exactly the same (to the hundredths decimal place) as the Means in the descriptive statistics, despite there being a note at the bottom of the EMM table indicating that "the covariates appearing in the model are evaluated at the following values:..."
Is this a bug, or could I be doing something wrong?
Update:
Responding to question below, I should note that I used the drop down menus to run the analysis; however, this is the code that is used when I 'paste' the code.
DATASET ACTIVATE DataSet1.
GLM FT10 FT11 FT12 FT13 FT14 FT15 FT16 FT17 FT18 FT19 FT110 FT111 FT20 FT21 FT22 FT23 FT24 FT25 FT26 FT27 FT28 FT29 FT210 FT211 WITH SpatialScore FPSRTScore LDMean VGTotal
/WSFACTOR=Matching 2 Polynomial Trial 12 Polynomial
/METHOD=SSTYPE(3)
/PLOT=PROFILE(Trial*Matching) TYPE=LINE ERRORBAR=CI MEANREFERENCE=NO AXIS=AUTO
/EMMEANS=TABLES(OVERALL) WITH(SpatialScore=MEAN FPSRTScore=MEAN LDMean=MEAN VGTotal=MEAN)
/EMMEANS=TABLES(Matching) WITH(SpatialScore=MEAN FPSRTScore=MEAN LDMean=MEAN VGTotal=MEAN)COMPARE ADJ(BONFERRONI)
/EMMEANS=TABLES(Trial) WITH(SpatialScore=MEAN FPSRTScore=MEAN LDMean=MEAN VGTotal=MEAN)COMPARE ADJ(BONFERRONI)
/EMMEANS=TABLES(Matching*Trial) WITH(SpatialScore=MEAN FPSRTScore=MEAN LDMean=MEAN VGTotal=MEAN)
/PRINT=DESCRIPTIVE ETASQ
/CRITERIA=ALPHA(.05)
/WSDESIGN=Matching Trial Matching*Trial
/DESIGN=SpatialScore FPSRTScore LDMean VGTotal.
This is expected behavior. The reason that the EMMEANS don't differ from the observed means is that the covariate adjustment is done at the cell level in terms of between-subjects effects, and you have only one cell because you don't have any between-subjects factors.

(Sub)optimal way to get a legit range info when using a SMT constraint with Z3

This question is related to my previous question
Is it possible to get a legit range info when using a SMT constraint with Z3
So it seems that "efficiently" finding the maximum range info is not proper, given typical 32-bit vectors and so on. But on the other hand, I am thinking whether it is feasible to find certain "sub-maximum" range info, which hopefully becomes more efficient. Another thing is that we may want to have certain "safe" guarantee, say for all elements in the sub-maximum range, they must satisfy the constraint, but there could exist some other solutions that would satisfy the constraint as well.
I am currently exploring whether model counting technique could make sense in this setting. Any thoughts would be appreciated very much. Thanks.
General case
This is not just a question of efficiency. Consider a problem where you have two variables a and b, and a single constraint:
a != b
What's the range of b? (maximum or otherwise?)
You can say all values are legitimate. But that would be wrong, as obviously the choice of a impacts the choice of b. The more variables you have around, the more complicated the problem will become. I don't think the problem is even well defined in this case, so searching for a solution (efficient or otherwise) doesn't make much sense.
Single variable assumption
Having said that, I think you can come up with a solution if you assume there's precisely one variable in the system. (Or, alternatively, if you fix all the other variables to some predefined constants.) If you're willing to go down this path, then you can implement a binary search algorithm to find a reasonably sized range by simply proving the quantified formula
Exists([b], And(b >= minBound, b <= maxBound, Not(constraints)))
Once you get unsat for this, you have your range. So long as you get sat, you can adjust your minBound/maxBound to search within smaller ranges. In the worst case, this can turn into a linear walk, but you can "cut-down" this search by making sure you go down a significant size in each step. That could be a parameter to the whole search, depending on how large you want your intervals to be. It'll have to be a choice between trying to find a maximal range, and how long you want to spend in this search. Of course, if you cut-down too much, you can miss a big interval, but that's the cost of efficiency.
Example1 (Good case) There's a single constraint that says b != 5. Then your search will be quick and depending on which branch you'll go, you'll either find [0, 4] or [6, 255] assuming 8-bit words.
Example2 (Bad case) There's a single constraint that says b is even. Then your search will exhibit worst-case behavior, and if your "cut-down" size is 1, you'll possibly iterate 255 times before you settle down on [0, 0]; assuming z3 gives you the maximum odd number in each call.
I hope that illustrates the point. In general, though, I'd assume you'd be closer to the "good case" for practical applications and even if your cut-down size is minimal you can most likely converge in a few iterations. Of course, this entirely depends on your problem domain, but I'd expect it to hold for software analysis in general.

Converting a apriori object to a list taking more time even for small number of data

I am working on a data set of more than 22,000 records, and when I tried it with the apriori model, it's taking way too much time even for small number of records like 20. Is there a problem in my code or Is there a faster way to convert the asscocians into a list quickly? The code I used is below.
for i in range(0, 20):
transactions.append([str(dataset.values[i,j]) for j in range(0, 543)])
from apyori import apriori
associations = apriori(transactions, min_support=0.004, min_confidence=0.3, min_lift=3, min_length=2)
result = list(associations)
It's difficult to assess without your data, but the complexity of Apriori is based on a number of factors, including your support threshold, number of transactions, number of items, average/max transaction length, etc.
In cases where even a small number of transactions is taking a long time to run it's often a matter of too low of a minimum support. When support is very low (near 0) the algorithm is effectively still brute forcing, since it has to look at all possible combinations of items, of every length. This is the equivalent of a mathematical power set, which is exponential. For just 41 items you're actually trying 2^41 -1 possible combinations, which is just over 1.1 TRILLION possibilities.
I recommend starting with a "high" min_support at first (e.g. 0.20) and then working your way down slowly. It's easier to test things that take seconds at first than ones that'll take a long time.
Other important note: There is no min_length parameter in Apyori. I'm not sure where everyone's getting that from (you're not alone in thinking there is one), unless it's this one random blog post I found. The parameters are as follows (straight from the code):
Keyword arguments:
min_support -- The minimum support of relations (float).
min_confidence -- The minimum confidence of relations (float).
min_lift -- The minimum lift of relations (float).
max_length -- The maximum length of the relation (integer).
For what it's worth, I wrote unofficial docs for Apyori that can be found here.

Non-Evaluation of Numerical Expression in Maxima

I start with a simple Maxima question, the answer to which may provide the answer to the actual problem I'm grappling with.
Related Simple Question:
How can I get maxima to calculate:
bfloat((1+%i)^0.3);
Might there be an option variable that can be set so that this evaluates to a complex number?
Actual Question:
In evaluating approximations for numerical time integration for finite element methods, for this purpose I'm using spectral analysis, which requires the calculation of the eigenvalues of a 4 x 4 matrix. This matrix "cav" is also calculated within maxima, using some of the algebra capabilities of maxima, but sustituting numerical values, so that matrix is entirely numerical, i.e. containing no variables. I've calculated the eigenvalues with Mathematica and it returns 4 real eigenvalues. However Maxima calculates horrenduously complicated expressions for this case, which apparently it does not "know" how to simplify, even numerically as "bigfloat". Perhaps this problem arises because Maxima first approximates the matrix "cac" by rational numbers (i.e. fractions) and then tries to solve the problem fully exactly, instead of simply using numerical "bigfloat" computations throughout. Is there I way I can change this?
Note that if you only change the input value of gzv to say 0.5 it works fine, and returns numerical values of complex eigenvalues.
I include the code below. Note that all of the code up until "cav:subst(vs,ca)$" is just for the definition of the matrix cav and seems to work fine. It is in the few statements thereafter that it fails to calculate numerical values for the eigenvalues.
v1:v0+ (1-gg)*a0+gg*a1$
d1:d0+v0+(1/2-gb)*a0+gb*a1$
obf:a1+(1+ga)*(w^2*d1 + 2*gz*w*(d1-d0)) -
ga *(w^2*d0 + 2*gz*w*(d0-g0))$
obf:expand(obf)$
cd:subst([a1=1,d0=0,v0=0,a0=0,g0=0],obf)$
fd:subst([a1=0,d0=1,v0=0,a0=0,g0=0],obf)$
fv:subst([a1=0,d0=0,v0=1,a0=0,g0=0],obf)$
fa:subst([a1=0,d0=0,v0=0,a0=1,g0=0],obf)$
fg:subst([a1=0,d0=0,v0=0,a0=0,g0=1],obf)$
f:[fd,fv,fa,fg]$
cad1:expand(cd*[1,1,1/2-gb,0] - gb*f)$
cad2:expand(cd*[0,1,1-gg,0] - gg*f)$
cad3:expand(-f)$
cad4:[cd,0,0,0]$
cad:matrix(cad1,cad2,cad3,cad4)$
gav:-0.05$
ggv:1/2-gav$
gbv:(ggv+1/2)^2/4$
gzv:1.1$
dt:0.01$
wv:bfloat(dt*2*%pi)$
vs:[ga=gav,gg=ggv,gb=gbv,gz=gzv,w=wv]$
cav:subst(vs,ca)$
cav:bfloat(cav)$
evam:eigenvalues(cav)$
evam:bfloat(evam)$
eva:evam[1]$
The main problem here is that Maxima tries pretty hard to make computations exact, and it's hard to tell it to ease up and allow inexact results.
Is there a mistake in the code you posted above? You have cav:subst(vs,ca) but ca is not defined. Is that supposed to be cav:subst(vs,cad) ?
For the short problem, usually rectform can simplify complex expressions to something more usable:
(%i58) rectform (bfloat((1+%i)^0.3));
`rat' replaced 1.0B0 by 1/1 = 1.0B0
(%o58) 2.59023849130283b-1 %i + 1.078911979230303b0
About the long problem, if fixed-precision (i.e. ordinary floats, not bigfloats) is acceptable to you, then you can use the LAPACK function dgeev to compute eigenvalues and/or eigenvectors.
(%i51) load (lapack);
<bunch of messages here>
(%o51) /usr/share/maxima/5.39.0/share/lapack/lapack.mac
(%i52) dgeev (cav);
(%o52) [[- 0.02759949957202372, 0.06804641655485913, 0.997993508502892, 0.928429191717788], false, false]
If you really need variable precision, I don't know what to try. In principle it's possible to rework the LAPACK code to work with variable-precision floats, but that's a substantial task and I'm not sure about the details.

Resources