currency arbitrage and matrix - currency

I am very new to the field of quants but i was just wondering if matrices can be used to identify the arbitrage opportunity available in multi currency conversions. It would be sort of a shortest path finding problem or minimum cost algorithm used in different other problem sets.

This algorithms book explains (or hints, since it's an exercise), how to do it using logarithms then a classic shortest path. It was a fun problem.

For the question "are matrices useful to identify arbitrage opportunities available in multi-currency conversions?", the answer is yes. You would use a matrix to store each conversion rate from currency i to currency j in cell (i,j).
For the question "would an algorithm that finds such opportunities be similar to a shortest path finding problem?", the answer is also yes. Given the matrix for a problem, you would apply an algorithm that only resembles the Floyd-Warshall algorithm.
For a full explanation have a look here.

Related

Comparison of two normal distribution

I have two normally distributed samples. I want to know how close or similar it is. I tried few methods to find the similarity, like z-score and bhattacharyya distance.
Bhattacharyya distance didn't work for me. It gives the same distance if the standard deviation of two samples is same. It doesn't change with change in mean.
I want to know whether any method is available that take the samples or its mean and standard deviation to find the similarity or similarity rank something like this.
I am not from mathematics background, so please ignore the terminology mistakes and let me know if any clarification is required.
I assume you're not looking for a relationship between the two samples, where a correlation coefficient would be appropriate?
I've been investigating a similar question for my current data and am looking at the Mahalanobis distance and the Earthmovers distance.
I found this post from a different forum which gave me a few ideas

How to measure similarity between code snippets written in programming language

I am trying to solve the following problem.
Given a particular code snippet I need to give back the top review comments for the code snippet, here we want to give all the comments that were given to similar code snippets.
I am trying to form it as a machine learning problem.I think we can use KNN algorithm, but here I am not sure how should I measure the similarity between two code snippet ? Is there any pre-existing similarity measure for it ? I tried to search in google but not found any useful link
Kindly help
Edit distance between the two strings containing the considered comments could be a useful measure of similarity.
Also, n-gram cosine distance could be useful, that is, you extract the n-grams (e.g. 3-char-segments), build the vectors counting these n-grams and calculate cosine distance.
Another one would be Jaccard similarity between n-gram vectors (as above).

Classifier or heuristics?

I need to classify questions asking user to specify brand.
I has some set of samples featuring word "brand".
Positives like:
"What is your favorite cosmetic brand?",
"Which fragrance brand (if any) do you think this advert is for?"...
and negatives like:
"Is there any particular reason why you chose this brand?"
Of cause, it's possible to train 2-class classifier based on concrete samples. However precision and recall will be poor. Is there any way to construct something having good precision based on variety of positive samples?
Precision and recall does not have to be poor. You should try and build a binary classifier (I would recommend SVM or decision tree for this purpose). I would recommend extracting features like the number of occurrences of each word in a sample (or tf-idf) or the length of the words and sentences. I guess that the question word in the sentence will have a major impact on the classification.
In addition, please note that a good precision value is very easy to get when you do not care about recall.
Choosing a set of words as features using tf-idf and training a tree algorithm seems the easiest way to go but I would also suggest to also try k-means clustering in the case that noe or more categories of answers considered as "neutral" emerge. This will possible help you decide which of these you consider positive or negative in order to re-factor your feature vector and subsequently your algorithm.
I am also a huge fan of HMM variants (I have used them to perform energy disaggregation) and I suggest you have a look at the following. It might give you some extra ideas:
http://www.merl.com/publications/docs/TR2004-085.pdf

Objective-C Quadratic/Polynomial regression (Linest function in excel)

The objective-c math library seems pretty basic.
I'm looking for some statistics analysis functions like the Excel function "linest" to retrieve the quadratic or polynomial regressions of a data set with a given order.
Is there any function similar to the "linest" function for objective-c? Or a known statistics library/framework?
I have a hard time to believe I'm the first person to stumble upon this problem in iOS.
I spend several days getting through the math and getting it in code because I couldn't find a math library for iOS with the function I needed. I wouldn't recommend anyone do to that again, it wasn't a walk in the park, so I published my solution on my github. You can find it here:
https://github.com/KingIsulgard/iOS-Polynomial-Regression
It's easy to use, just give the x values and y values of the data and the order of polynomial you want to get and voila, you got it.
Hope this might help some people. Feel free to improve if you can. I'm just happy it finally worked.
The standard math library in general only gives you an interface to the elementary mathematical operations that are implemented in the FPU part of a CPU.
For linear regression you need either your own algorithm, it is not that complicated to implement in a handful of loops, or a dedicated (most likely) statistics library.
Writing your own algorithm for higher order or general regression is simple if a QR decomposition algorithm is available, for instance via bindings for LAPACK or similar. Then to solve
minimize sum (b[0]*f[0](x[k])+...+b[n]*f[n](x[k])-y[k])^2
one has just to construct the matrix [X|Y] where X[k,j]=f[j](x[k]) is the matrix of the values of the ansatz functions and Y[k]=y[k] is the column vector of the values to approximate. Apply the QR algorithm to [X|Y], identify or extract the R factor from its result and solve for b in
R*[b|1]'=0
via back-substitution.

What algorithm would you use for clustering based on people attributes?

I'm pretty new in the field of machine learning (even if I find it extremely interesting), and I wanted to start a small project where I'd be able to apply some stuff.
Let's say I have a dataset of persons, where each person has N different attributes (only discrete values, each attribute can be pretty much anything).
I want to find clusters of people who exhibit the same behavior, i.e. who have a similar pattern in their attributes ("look-alikes").
How would you go about this? Any thoughts to get me started?
I was thinking about using PCA since we can have an arbitrary number of dimensions, that could be useful to reduce it. K-Means? I'm not sure in this case. Any ideas on what would be most adapted to this situation?
I do know how to code all those algorithms, but I'm truly missing some real world experience to know what to apply in which case.
K-means using the n-dimensional attribute vectors is a reasonable way to get started. You may want to play with your distance metric to see how it affects the results.
The first step to pretty much any clustering algorithm is to find a suitable distance function. Many algorithms such as DBSCAN can be parameterized with this distance function then (at least in a decent implementation. Some of course only support Euclidean distance ...).
So start with considering how to measure object similarity!
In my opinion you should also try expectation-maximization algorithm (also called EM). On the other hand, you must be careful while using PCA because this algorithm may reduce the dimensions relevant to clustering.

Resources