Example of factorization with the Montgomery curve - elliptic-curve

I have programmed the elliptic curve method for integer factorization using Montgomery curves(the same idea as Lenstra's elliptic curve method, just changed a bit so it works with Montgomey curves). However, I haven't really been able to find any examples of numbers being factorized using the method, and I would really like to be able to test it on numbers I know should give a result, in order to check if it works as it should. So my question is, does anyone have an example of the method used on numbers, so that I can see whether my code gives the same output using the same numbers?

You might like to factor the Mersenne number M(677) = 2^677-1 = 1943118631 * 531132717139346021081 * 978146583988637765536217 * P53 * P98. The P53 can be found by elliptic curve factorization with B1 = 9000000, B2 = 16000000, and lucky curve sigma = 8689346476060549. You might enjoy my blog, which gives a solution to that factorization and also has a bunch of other prime-number stuff if you want to poke around.

Related

How to perform a linear homography of an image in tensorflow

I would like to be able to replicate the behaviour of the opencv function warpPerspective which takes as input an image and an homography matrix, and projects the image according to the homography matrix (more details here : https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html).
It seems like tf.contrib.image.sparse_image_warp should do the job, but I am unable to replicate the behaviour of warpPerspective. The output I get is distorted in a non-linear fashion despite the use of the parameter interpolation_order=1.
With some further research, I suspect this is due to the fact that tf.contrib.image.interpolate_spline does not perform linear interpolation even when its order is 1 but rather uses some RBF kernels.
I can't see a way around this except encoding it with a dense_image_warp, but it seems a bit overkill and maybe costly. Does anyone has another solution ?
After some research, here is a solution. it uses the tf.contrib.image.dense_image_warp function and is not really pretty, but still, it works :
This first function computes the optical flow needed to perform the homography :
def homography_matrix_to_flow(tf_homography_matrix, im_shape1, im_shape2):
Y, X = np.meshgrid(range(im_shape1), range(im_shape2))
Z = np.ones_like(X)
XYZ = np.stack((X, Y, Z), axis=-1)
tf_XYZ = tf.constant(XYZ.astype("float64"))
tf_XYZ = tf_XYZ[tf.newaxis,:,:, :, tf.newaxis]
tf_homography_matrix = tf.tile(tf_homography_matrix[tf.newaxis, tf.newaxis], (1, im_shape2, im_shape1, 1, 1))
tf_unnormalized_transformed_XYZ = tf.matmul(tf_homography_matrix, tf_XYZ, transpose_b=False)
tf_transformed_XYZ = tf_unnormalized_transformed_XYZ / tf_unnormalized_transformed_XYZ[:,:,:, -1][:,:,:, tf.newaxis]
flow = -tf.squeeze(tf_transformed_XYZ-tf_XYZ)[..., :2]
return flow
Then, it used to warp the original image to the distorted image.
There is one trick : due to how the tf.contrib.image.dense_image_warp function works, you need to pass the inverse of the homography matrix to find the correct optical flow to use.
homography_matrix = np.array([[-4.86219067e-01, -2.20871298e+00, 4.08214879e+02],
[-1.02940133e-01, -5.60378659e+00, 3.87573763e+02],
[-1.35051362e-04, -6.59600583e-03, 2.91244998e-01]])
inv_homography_matrix = np.linalg.inv(homography_matrix)
tf_inv_homography_matrix = tf.constant(inv_homography_matrix)[tf.newaxis]
flow = homography_matrix_to_flow(tf_inv_homography_matrix, img.shape[1], img.shape[2])[tf.newaxis]
flow =tf.tile(flow, (self.bs, 1,1,1))
image_warped = tf.contrib.image.dense_image_warp(tf.transpose(img, (0,2,1,3)), flow)
image_warped = tf.transpose(image_warped, (0,2,1,3))
I still hope to find a better answer (one which does not have to compute a whole tensor of flow), therefore, I leave the question unanswered for now.

Why the hypothesis has to introduce two parameters, namely θ0 and θ1

I was learning Machine Learning from this course on Coursera taught by Andrew Ng. The instructor defines the hypothesis as a linear function of the "input" (x, in my case) like the following:
hθ(x) = θ0 + θ1(x)
In supervised learning, we have some training data and based on that we try to "deduce" a function which closely maps the inputs to the corresponding outputs. To deduce the function, we introduce the hypothesis as a linear function of input (x). My question is, why the function involving two θs is chosen? Why it can't be as simple as y(i) = a * x(i) where a is a co-efficient? Later we can go about finding a "good" value of a for a given example (i) using an algorithm? This question might look very stupid. I apologize but I'm not very good at machine learning I am just a beginner. Please help me understand this.
Thanks!
The a corresponds to θ1. Your proposed linear model is leaving out the intercept, which is θ0.
Consider an output function y equal to the constant 5, or perhaps equal to a constant plus some tiny fraction of x which never exceeds .01. Driving the error function to zero is going to be difficult if your model doesn't have a θ0 that can soak up the D.C. component.

Explanation for Values in Scharr-Filter used in OpenCV (and other places)

The Scharr-Filter is explained in Scharrs dissertation. However the values given on page 155 (167 in the pdf) are [47 162 47] / 256. Multiplying this with the derivation-filter would yield:
Yet all other references I found use
Which is roughly the same as the ones given by Scharr, scaled by a factor of 32.
Now my guess is that the range can be represented better, but I'm curious if there is an official explanation somewhere.
To get the ball rolling on this question in case no "expert" can be found...
I believe the values [3, 10, 3] ... instead of [47 162 47] / 256 ... are used simply for speed. Recall that this method is competing against the Sobel Operator whose coefficient values are are 0, and positive/negative 1's and 2's.
Even though the divisor in the division, 256 or 512, is a power of 2 and can can be performed by a shift, doing that and multiplying by 47 or 162 is going to take more time. A multiplication by 3 however can in fact be done on some RISC architectures like the IBM POWER series in a single shift-and-add operation. That is 3x = (x << 1) + x. (On these architectures, the shifter and adder are separate units and can be done independently).
I don't find it surprising that Phd paper used the more complicated and probably more precise formula; it needed to prove or demonstrate something, and the author probably wasn't totally certain or concerned that it be used and implemented alongside other methods. The purpose in the thesis was probably to have "perfect rotational symmetry". Afterwards when one decides to implement it, that person I suspect used the approximation formula and gave up a little on perfect rotational symmetry, to gain speed. That person's goal as I said was to have something that was competitive at the expense of little bit of speed for this rotational stuff.
Since I'm guessing you are willing to do work this as it is your thesis, my suggestion is to implement the original algorithm and benchmark it against both the OpenCV Scharr and Sobel code.
The other thing to try to get an "official" answer is: "Use the 'source', Luke!". The code is on github so check it out and see who added the Scharr filter there and contact that person. I won't put the person's name here, but I will say that the code was added 2010-05-11.

setting nls parameters for a curve

I am a relative newcomer to R and not a mathematician but a geneticist. I have many sets of multiple pairs of data points. When they are plotted they yield a flattened S curve with most of the data points ending up near the zero mark. A minority of the data points fly far off creating what is almost two J curves, one down and one up. I need to find the inflection points where the data sharply veers upward or downward. This may be an issue with my math but is seems to me that if I can smooth and fit a curve to the line and get an equation I could then take the second derivative of the curve and determine the inflection points from where the second derivative changes sign. I tried it in excel and used the curve to get approximate fit to get the starting formula but the data has a bit of "wiggling" in it so determining any one inflection point is not possible even if I wanted to do it all manually (which I don't). Each of the hundreds of data sets that I have to find these two inflection points in will yield about the same curve but will have slightly different inflection points and determining those inflections points precisely is absolutely critical to the problem. So if I can set it up properly once in an equation that should do it. For simplicity I would like to break them into the positive curve and the negative curve and do each one separately. (Maybe there is some easier formula for s curves that makes that a bad idea?)
I have tried reading the manual and it's kind of hard to understand likely because of my weak math skills. I have also been unable to find any similar examples I could study from.
This is the head of my data set:
x y
[1,] 1 0.00000000
[2,] 2 0.00062360
[3,] 3 0.00079720
[4,] 4 0.00085100
[5,] 5 0.00129020
(X is just numbering 1 to however many data points and the number of X will vary a bit by the individual set.)
This is as far as I have gotten to resolve the curve fitting part:
pos_curve1 <- nls(curve_fitting ~ (scal*x^scal),data = cbind.data.frame(curve_fitting),
+ start = list(x = 0, scal = -0.01))
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
Am I just doing the math the hard way? What am I doing wrong with the nls? Any help would be much much appreciated.
Found it. The curve is exponential not J and the following worked.
fit <- nls(pos ~ a*tmin^b,
data = d,
start = list(a = .1, b = .1),
trace = TRUE)
Thanks due to Jorge I Velez at R Help Oct 26, 2009
Also I used "An Appendix to An R Companion to Applied Regression, second edition" by John Fox & Sanford Weisberg last revision 13: Dec 2010.
Final working settings for me were:
fit <- nls(y ~ a*log(10)^(x*b),pos_curve2,list(a = .01, b = .01), trace=TRUE)
I figured out what the formula should be by using open office spread sheet and testing the various curve fit options until I was able to show exponential was the best fit. I then got the structure of the equation from that. I used the Fox & Sanford article to understand the way to set the parameters.
Maybe I am not alone in this one but I really found it hard to figure out the parameters and there were few references or questions on it that helped me.

Recommended anomaly detection technique for simple, one-dimensional scenario?

I have a scenario where I have several thousand instances of data. The data itself is represented as a single integer value. I want to be able to detect when an instance is an extreme outlier.
For example, with the following example data:
a = 10
b = 14
c = 25
d = 467
e = 12
d is clearly an anomaly, and I would want to perform a specific action based on this.
I was tempted to just try an use my knowledge of the particular domain to detect anomalies. For instance, figure out a distance from the mean value that is useful, and check for that, based on heuristics. However, I think it's probably better if I investigate more general, robust anomaly detection techniques, which have some theory behind them.
Since my working knowledge of mathematics is limited, I'm hoping to find a technique which is simple, such as using standard deviation. Hopefully the single-dimensioned nature of the data will make this quite a common problem, but if more information for the scenario is required please leave a comment and I will give more info.
Edit: thought I'd add more information about the data and what I've tried in case it makes one answer more correct than another.
The values are all positive and non-zero. I expect that the values will form a normal distribution. This expectation is based on an intuition of the domain rather than through analysis, if this is not a bad thing to assume, please let me know. In terms of clustering, unless there's also standard algorithms to choose a k-value, I would find it hard to provide this value to a k-Means algorithm.
The action I want to take for an outlier/anomaly is to present it to the user, and recommend that the data point is basically removed from the data set (I won't get in to how they would do that, but it makes sense for my domain), thus it will not be used as input to another function.
So far I have tried three-sigma, and the IQR outlier test on my limited data set. IQR flags values which are not extreme enough, three-sigma points out instances which better fit with my intuition of the domain.
Information on algorithms, techniques or links to resources to learn about this specific scenario are valid and welcome answers.
What is a recommended anomaly detection technique for simple, one-dimensional data?
Check out the three-sigma rule:
mu = mean of the data
std = standard deviation of the data
IF abs(x-mu) > 3*std THEN x is outlier
An alternative method is the IQR outlier test:
Q25 = 25th_percentile
Q75 = 75th_percentile
IQR = Q75 - Q25 // inter-quartile range
IF (x < Q25 - 1.5*IQR) OR (Q75 + 1.5*IQR < x) THEN x is a mild outlier
IF (x < Q25 - 3.0*IQR) OR (Q75 + 3.0*IQR < x) THEN x is an extreme outlier
this test is usually employed by Box plots (indicated by the whiskers):
EDIT:
For your case (simple 1D univariate data), I think my first answer is well suited.
That however isn't applicable to multivariate data.
#smaclell suggested using K-means to find the outliers. Beside the fact that it is mainly a clustering algorithm (not really an outlier detection technique), the problem with k-means is that it requires knowing in advance a good value for the number of clusters K.
A better suited technique is the DBSCAN: a density-based clustering algorithm. Basically it grows regions with sufficiently high density into clusters which will be maximal set of density-connected points.
DBSCAN requires two parameters: epsilon and minPoints. It starts with an arbitrary point that has not been visited. It then finds all the neighbor points within distance epsilon of the starting point.
If the number of neighbors is greater than or equal to minPoints, a cluster is formed. The starting point and its neighbors are added to this cluster and the starting point is marked as visited. The algorithm then repeats the evaluation process for all the neighbors recursively.
If the number of neighbors is less than minPoints, the point is marked as noise.
If a cluster is fully expanded (all points within reach are visited) then the algorithm proceeds to iterate through the remaining unvisited points until they are depleted.
Finally the set of all points marked as noise are considered outliers.
There are a variety of clustering techniques you could use to try to identify central tendencies within your data. One such algorithm we used heavily in my pattern recognition course was K-Means. This would allow you to identify whether there are more than one related sets of data, such as a bimodal distribution. This does require you having some knowledge of how many clusters to expect but is fairly efficient and easy to implement.
After you have the means you could then try to find out if any point is far from any of the means. You can define 'far' however you want but I would recommend the suggestions by #Amro as a good starting point.
For a more in-depth discussion of clustering algorithms refer to the wikipedia entry on clustering.
This is an old topic but still it lacks some information.
Evidently, this can be seen as a case of univariate outlier detection. The approaches presented above have several pros and cons. Here are some weak spots:
Detection of outliers with the mean and sigma has the obvious disadvantage of dependence of mean and sigma on the outliers themselves.
The case of the small sample limit (see question for example) is not adequately covered by, 3 sigma, K-Means, IQR etc.
And I could go on... However the statistical literature offers a simple metric: the median absolute deviation. (Medians are insensitive to outliers)
Details can be found here: https://www.sciencedirect.com/book/9780128047330/introduction-to-robust-estimation-and-hypothesis-testing
I think this problem can be solved in a few lines of python code like this:
import numpy as np
import scipy.stats as sts
x = np.array([10, 14, 25, 467, 12]) # your values
np.abs(x - np.median(x))/(sts.median_abs_deviation(x)/0.6745) #MAD criterion
Subsequently you reject values above a certain threshold (97.5 percentile of the distribution of data), in case of an assumed normal distribution the threshold is 2.24. Here it translates to:
array([ 0.6745 , 0. , 1.854875, 76.387125, 0.33725 ])
or the 467 entry being rejected.
Of course, one could argue, that the MAD (as presented) also assumes a normal dist. Therefore, why is it that argument 2 above (small sample) does not apply here? The answer is that MAD has a very high breakdown point. It is easy to choose different threshold points from different distributions and come to the same conclusion: 467 is the outlier.
Both three-sigma rule and IQR test are often used, and there are a couple of simple algorithms to detect anomalies.
The three-sigma rule is correct
mu = mean of the data
std = standard deviation of the data
IF abs(x-mu) > 3*std THEN x is outlier
The IQR test should be:
Q25 = 25th_percentile
Q75 = 75th_percentile
IQR = Q75 - Q25 // inter-quartile range
If x > Q75 + 1.5 * IQR or x < Q25 - 1.5 * IQR THEN x is a mild outlier
If x > Q75 + 3.0 * IQR or x < Q25 – 3.0 * IQR THEN x is a extreme outlier

Resources