Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am looking for theory, algorithms and similar for how to compare music. More specifically, I am looking into how to dupecheck music tracks that have different bitrates or perhaps slightly different variations (radio vs album version), but otherwise sound the same.
Use cases for this include services such as Grooveshark, Youtube, etc. where they get a lot of duplicate tracks. I am also interested in text comparisons (Britney Spers vs Britney Spears, how far they deviate, etc.) although this is secondary and I already have some sources to go on in this area.
I am mostly interested in codec-agnostic comparison techniques and algoritms (assuming a "raw" stream), but codec-specific resources are appreciated.
I am aware of projects such as musicbrainz.org, but have not investigated it further, and would be interested if such projects could be of help in this endeavor.
As far as comparing names is concerned you might want to take a look at the Levenshtein distance algorithm. Given two strings it will calculate a distance measurement which can be used as a basis for catching duplicates.
I personally have used it in a tool I developed for an application with a rather large database that had a large number of duplicates in it. Using it in conjunction with some other data comparisons relevant to my domain I was able to point my tool at the application database and quickly find many of the duplicated records. Not going to lie, I thought it was pretty darn cool to see in action.
It's even quick to implement, here's a C# version:
public int CalculateDistance(string s, string t) {
int n = s.Length; //length of s
int m = t.Length; //length of t
int[,] d = new int[n + 1, m + 1]; // matrix
int cost; // cost
// Step 1
if (n == 0) return m;
if (m == 0) return n;
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++) ;
for (int j = 0; j <= m; d[0, j] = j++) ;
// Step 3
for (int i = 1; i <= n; i++) {
//Step 4
for (int j = 1; j <= m; j++) {
// Step 5
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
// Step 6
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
I wrote a similar answer here: Music Recognition and Signal Processing.
In the research community, the problem of finding similarity between two signals (up to environmental distortions such as noise, mild variations in tempo, pitch, or bitrate) is known as audio (or music) fingerprinting. This topic has been studied heavily for at least a decade. This early (and oft cited) paper by Haitsma and Kalker clearly describes the problem and proposes a simple solution.
The problem of finding musical similarity between two versions of the same song is known as cover song identification. This problem is also studied heavily but is still considered open.
Perhaps the two most popular commercial solutions for content-based musical search are Midomi and Shazam.
I believe this addresses your question. Check Google Scholar for recent solutions to these problems. The ISMIR proceedings are available for free online.
Related
By definition, a thread is a path of execution within a process.
But during the implementation of a kernel, a thread_id or global_index is generated to access a memory location allocated. For instance, in the Matrix Multiplication code below, ROW and COL are generated to access matrix A and B sequential.
My doubt here is, index generated isn't pointing to a thread(by definition), instead, it is used to access the location of the data in the memory, then why do we refer to it as thread index or global thread index and why not memory index or something else?
__global__ void matrixMultiplicationKernel(float* A, float* B, float* C, int N) {
int ROW = blockIdx.y*blockDim.y+threadIdx.y;
int COL = blockIdx.x*blockDim.x+threadIdx.x;
float tmpSum = 0;
if (ROW < N && COL < N) {
// each thread computes one element of the block sub-matrix
for (int i = 0; i < N; i++) {
tmpSum += A[ROW * N + i] * B[i * N + COL];
}
}
C[ROW * N + COL] = tmpSum;
}
This question seems to be mostly about semantics, so let's start at Wikipedia
.... a thread of execution is the smallest sequence of programmed
instructions that can be managed independently by a scheduler ....
That is pretty much describes exactly what s thread in CUDA is -- the kernel is the sequence of instructions, and the scheduler is the warp/thread scheduler in each streaming multiprocessor on the GPU.
The code in your question is calculating the unique ID of the thread in the kernel launch, as it is abstracted in the CUDA programming/execution model. It has no intrinsic relationship to memory layouts, only to the unique ID in the kernel launch. The fact it is being used to ensure that each parallel operation is being performed on a different memory location is programming technique and nothing more.
Thread ID seems like a logical moniker to me, but to paraphrase Miles Davis when he was asked what the name of the jam his band just played at the Isle of Wight festival in 1970: "call it whatever you want".
I have a problem where I need to do a linear interpolation on some data as it is acquired from a sensor (it's technically position data, but the nature of the data doesn't really matter). I'm doing this now in matlab, but since I will eventually migrate this code to other languages, I want to keep the code as simple as possible and not use any complicated matlab-specific/built-in functions.
My implementation initially seems OK, but when checking my work against matlab's built-in interp1 function, it seems my implementation isn't perfect, and I have no idea why. Below is the code I'm using on a dataset already fully collected, but as I loop through the data, I act as if I only have the current sample and the previous sample, which mirrors the problem I will eventually face.
%make some dummy data
np = 109; %number of data points for x and y
x_data = linspace(3,98,np) + (normrnd(0.4,0.2,[1,np]));
y_data = normrnd(2.5, 1.5, [1,np]);
%define the query points the data will be interpolated over
qp = [1:100];
kk=2; %indexes through the data
cc = 1; %indexes through the query points
qpi = qp(cc); %qpi is the current query point in the loop
y_interp = qp*nan; %this will hold our solution
while kk<=length(x_data)
kk = kk+1; %update the data counter
%perform online interpolation
if cc<length(qp)-1
if qpi>=y_data(kk-1) %the query point, of course, has to be in-between the current value and the next value of x_data
y_interp(cc) = myInterp(x_data(kk-1), x_data(kk), y_data(kk-1), y_data(kk), qpi);
end
if qpi>x_data(kk), %if the current query point is already larger than the current sample, update the sample
kk = kk+1;
else %otherwise, update the query point to ensure its in between the samples for the next iteration
cc = cc + 1;
qpi = qp(cc);
%It is possible that if the change in x_data is greater than the resolution of the query
%points, an update like the above wont work. In this case, we must lag the data
if qpi<x_data(kk),
kk=kk-1;
end
end
end
end
%get the correct interpolation
y_interp_correct = interp1(x_data, y_data, qp);
%plot both solutions to show the difference
figure;
plot(y_interp,'displayname','manual-solution'); hold on;
plot(y_interp_correct,'k--','displayname','matlab solution');
leg1 = legend('show');
set(leg1,'Location','Best');
ylabel('interpolated points');
xlabel('query points');
Note that the "myInterp" function is as follows:
function yi = myInterp(x1, x2, y1, y2, qp)
%linearly interpolate the function value y(x) over the query point qp
yi = y1 + (qp-x1) * ( (y2-y1)/(x2-x1) );
end
And here is the plot showing that my implementation isn't correct :-(
Can anyone help me find where the mistake is? And why? I suspect it has something to do with ensuring that the query point is in-between the previous and current x-samples, but I'm not sure.
The problem in your code is that you at times call myInterp with a value of qpi that is outside of the bounds x_data(kk-1) and x_data(kk). This leads to invalid extrapolation results.
Your logic of looping over kk rather than cc is very confusing to me. I would write a simple for loop over cc, which are the points at which you want to interpolate. For each of these points, advance kk, if necessary, such that qp(cc) is in between x_data(kk) and x_data(kk+1) (you can use kk-1 and kk instead if you prefer, just initialize kk=2 to ensure that kk-1 exists, I just find starting at kk=1 more intuitive).
To simplify the logic here, I'm limiting the values in qp to be inside the limits of x_data, so that we don't need to test to ensure that x_data(kk+1) exists, nor that x_data(1)<pq(cc). You can add those tests in if you wish.
Here's my code:
qp = [ceil(x_data(1)+0.1):floor(x_data(end)-0.1)];
y_interp = qp*nan; % this will hold our solution
kk=1; % indexes through the data
for cc=1:numel(qp)
% advance kk to where we can interpolate
% (this loop is guaranteed to not index out of bounds because x_data(end)>qp(end),
% but needs to be adjusted if this is not ensured prior to the loop)
while x_data(kk+1) < qp(cc)
kk = kk + 1;
end
% perform online interpolation
y_interp(cc) = myInterp(x_data(kk), x_data(kk+1), y_data(kk), y_data(kk+1), qp(cc));
end
As you can see, the logic is a lot simpler this way. The result is identical to y_interp_correct. The inner while x_data... loop serves the same purpose as your outer while loop, and would be the place where you read your data from wherever it's coming from.
So if I have lets say 4 integers:
int a = 50000 , int b = 5000000 , int c = 100 , int d = 500
Now what I wanted to run is b - a And c - d.
my question is would b-a run slightly slower than c - d or they would be executed in the exact same speed by the processor?
First of all, in the case you presented operations will be exactly the same. You can read about how these circuits do the work here
Second of all, fixed point mathematics are very very very fast on all computers these days. If this is your bottleneck, I don't know what to tell you.
In one of my SMT program, I use a real term. I need to bound the precision of the real number for increasing the efficiency, as there are almost infinite number of solutions are possible for this number, although only 5/6 digits after the decimal point is necessary. For example, the possible valuation of the real numbers can be the following, though all are the same if we take the first seven digits after the decimal point.
1197325/13631488 = 0.087835238530......
19157213/218103808 = 0.087835298134......
153257613/1744830464 = 0.087835245980......
1226060865/13958643712 = 0.087835243186......
I want that the SMT solver considers all these four number as a single number (so that the search space reduces). Is there any way to control the precision of the real number?
I tried programmatically (using Z3 Dot Net API) to solve this above problem, which is shown in the following. Here DelBP[j] is a real term.
{
BoolExpr[] _Exprs = new BoolExpr[nBuses];
for (j = 1; j <= nBuses; j++)
{
_Exprs[j - 1] = z3.MkEq(DelBP[j], z3.MkDiv(z3.MkInt2Real(DelBP_A[j]), z3.MkInt2Real(DelBP_B[j])));
}
BoolExpr Expr = z3.MkAnd(_Exprs);
s.Assert(Expr);
tw.WriteLine("(assert {0})", Expr.ToString());
}
{
BoolExpr[] _Exprs = new BoolExpr[nBuses];
for (j = 1; j <= nBuses; j++)
{
_Exprs[j - 1] = z3.MkAnd(z3.MkGe(DelBP_A[j], z3.MkInt(1)),
z3.MkLe(DelBP_A[j], z3.MkInt(10000)));
}
BoolExpr Expr = z3.MkAnd(_Exprs);
s.Assert(Expr);
tw.WriteLine("(assert {0})", Expr.ToString());
}
{
BoolExpr[] _Exprs = new BoolExpr[nBuses];
for (j = 1; j <= nBuses; j++)
{
_Exprs[j - 1] = z3.MkAnd(z3.MkGe(DelBP_B[j], z3.MkInt(1)),
z3.MkLe(DelBP_B[j], z3.MkInt(10000)));
}
BoolExpr Expr = z3.MkAnd(_Exprs);
s.Assert(Expr);
tw.WriteLine("(assert {0})", Expr.ToString());
}
However, it did not work. Can anyone help me to solve this problem? Thank you in advance.
If you feel the need to control the "precision" of real-numbers, then that strongly suggests Real is not the correct domain for your problem. Some ideas, depending on what you're really trying to do:
If 6 digits past the decimal point is all you care, then you might get away with using plain Integers, multiplying everything by 1e6 and restricting all variables to be less than 1e6; or some other similar transformation.
Keep in mind that Z3 has support for IEEE-floating point numbers these days, which are by definition of limited precision. So you can use those if your domain is truly the floating-point numbers as prescribed by IEEE-754.
If you're trying to generate "successive" results, i.e., by solving the problem, then adding the constraint that the result should be different than the previous one, and calling Z3 again; then you can consider adding a constraint that says the new result should differ from the old by more than 1e6 in absolute value.
Whether any of this applies depends on the precise problem you're trying to solve. If you can share some more of your problem, people might be able to come up with other ideas. But the first choice should be figuring out if Real is really the domain you want to work with.
In my kernel, if a condition is met, I update an item of the output buffer
if (condition(input[i])) //?
output[i] = 1;
otherwise the output may stay the same, having value of 0.
The density of updates are quite unpredictable, depending on the input. Furthermore which output location will be updated is also not known. (i may force them though, in some cases)
My question is, is it better to write all items, to achieve coalescing, or do a selective write?
output[i] = condition(input[i]); //?
Would you mind discussing your statements?
Coalescing is achieved even if some threads in the warp do not participate in the load or store, as long as all participating threads satisfy the requirements of coalescing. So conditional writes should have no effect on memory throughput.
However, doing a conditional write may involve additional instructions due to involving a branch (this would probably explain, for example, the difference in performance measured by Eugene in his answer).
On my setup kernel that does conditional set (option 1) runs for 1.727 us and option 2 1.399 us. This is my code (setConditional is the faster one):
__global__ void conditionalSet(unsigned int* array) {
if ((threadIdx.x & 3) == 0) {
array[threadIdx.x] = 1;
}
}
__global__ void setConditional(unsigned int* array) {
array[threadIdx.x] = (threadIdx.x & 3) == 0 ? 1 : 0;
}