Re-implementing Muller and Mueller clock recovery with control_loop - signal-processing

I'm currently implementing symbol time recovery blocks. The idea is to be able to choose different TEDs (Gardner, Zero-crossing, Early-Late, Maximum-likelihood etc). In blocks like M&M recovery, the gain parameters of the loop are expressed explicitly (gain_omega and gain_mu) which can be difficult to get right. The contro_loop class is, however, more convenient (loop characteristics can be specified by "loop bandwidth" and "damping factor"(zeta)). So my first test started with the re-implementation of the MM Clock Recovery with a control loop. The work function of this block is shown below (Comments are mine)
clock_recovery_mm_ff_impl::general_work(int noutput_items,
gr_vector_int &ninput_items,
gr_vector_const_void_star &input_items,
gr_vector_void_star &output_items)
{
const float *in = (const float *)input_items[0];
float *out = (float *)output_items[0];
int ii = 0; // input index
int oo = 0; // output index
int ni = ninput_items[0] - d_interp->ntaps(); // don't use more input than this
float mm_val;
while(oo < noutput_items && ii < ni ) {
// produce output sample
out[oo] = d_interp->interpolate(&in[ii], d_mu); //Interpolation
mm_val = slice(d_last_sample) * out[oo] - slice(out[oo]) * d_last_sample; // Error calculation
d_last_sample = out[oo];
//Loop filtering
d_omega = d_omega + d_gain_omega * mm_val; //Frequency
d_omega = d_omega_mid + gr::branchless_clip(d_omega-d_omega_mid, d_omega_lim); //Bound the frequency
d_mu = d_mu + d_omega + d_gain_mu * mm_val; //Phase
ii += (int)floor(d_mu); // Basepoint index
d_mu = d_mu - floor(d_mu); // Fractional interval
oo++;
}
consume_each(ii);
return oo;
}
Here is my code. First, the control loop is initialized the constructor
loop(new gr::blocks::control_loop(0.02,(1 + d_omega_relative_limit)*omega,
(1 - d_omega_relative_limit)*omega))
First of all I would like to eliminate a couple of doubts that I have regarding pll (the control_loop above) in symbol timing recovery particularly phase and frequency ranges (that are in turn used for wrapping). Taking an analogy from Costas loop : carrier phase is wrapped between -2pi and +2pi and the frequency offset is tracked between -1 and +1. It is quite straightforward to see why. Unfortunately I can't get my head around phase and frequency tracking in symbol recovery. From the m&m block, frequency is tracked between (1+omega_relative_limit) and (1 - omega_relative_limit)*omega where omega is simply the number of samples per symbol. Phase is tracked between 0 and omega. I dont understand why this is so and why the m&m block doesn't wrap it. Any ideas here will be appreciated.
And here is my work function
debug_time_recovery_pam_test_1_impl::general_work (int noutput_items,
gr_vector_int &ninput_items,
gr_vector_const_void_star &input_items,
gr_vector_void_star &output_items)
{
// Tell runtime system how many output items we produced.
const float *in = (const float *)input_items[0];
float *out = (float *)output_items[0];
int ii = 0; // input index
int oo = 0; // output index
int ni = ninput_items[0] - d_interp->ntaps(); // don't use more input than this
float mm_val;
while(oo < noutput_items && ii < ni ) {
// produce output sample
out[oo] = d_interp->interpolate(&in[ii], d_mu);
//Calculating error
mm_val = slice(d_last_sample) * out[oo] - slice(out[oo]) * d_last_sample;
d_last_sample = out[oo];
//Loop filtering
loop->advance_loop(mm_val); // Filter the error
loop->frequency_limit(); //Stop frequency from wandering too far
//Loop phase and frequency
d_omega = loop->get_frequency();
d_mu = loop->get_phase();
//d_omega = d_omega + d_gain_omega * mm_val;
//d_omega = d_omega_mid + gr::branchless_clip(d_omega-d_omega_mid, d_omega_lim);
//d_mu = d_mu + d_omega + d_gain_mu * mm_val;
ii += (int)floor(d_mu); // Basepoint index
d_mu = d_mu - floor(d_mu);//Fractional interval
oo++;
}
consume_each(ii);
return oo;
}
I have tried to use the block in a GFSK demodulator and I got this error
python: /build/gnuradio-bJXzXK/gnuradio-3.7.9.1/gnuradio-runtime/include/gnuradio/buffer.h:177: unsigned int gr::buffer::index_add(unsigned int, unsigned int): Assertion `s < d_bufsize' failed.
The first google search regarding this error suggests that im somehow "abusing" the scheduler since this error comes somewhere below the API. I think my calculation of d_omega and d_mu from the control loop is a bit naive but unfortunately I don't know any other way of doing so. Another alternative will be to use a modulo-1 counter (incrementing or decrementing) but I want to explore this option first.

Related

Octave script falling into error when processing data with sampling frequency above 50kS/s

I'm working with an Octave script to process data files with high sample rates (up to 200kS/s collected over 3 minutes). The code runs into issues when processing any files with a sample rate above 50kS/s, regardless of size or number of samples but functions correctly otherwise.
The error I receive when attempting to run the code with files above 50kS/s is called from the hist function:
error: x(0): subscripts must be either integers 1 to (2^63)-1 or logicals
error:
I have narrowed the cause to the following section of code (note that FS is the detected sampling frequency):
FILTER_ORDER = 1;
FILTER_CUTOFF = 1 / (2 * pi * 300e-3);
[b_lp, a_lp] = butter(FILTER_ORDER, FILTER_CUTOFF / (FS / 2), 'low');
%
s = SCALING_FACTOR * filter(b_lp, a_lp, u_q) ;
P = s ;
%
tau = 20;
transient = tau * FS; % index after the transient
Pmax = max(P(transient:end));
%
s = s(transient:end);
%
NUMOF_CLASSES = 10000; % number of bins used for the histogram
[bin_cnt, cpf.magnitude] = hist(s, NUMOF_CLASSES); % sorts data into the number of bins specified
I can try to provide more information if required, I'm not very familiar with Octave.

Why does memkind and numactl improve program performance a lot?

According to the course https://www.coursera.org/learn/parallelism-ia/home/welcome, there is one example which tried to illustrate the improvement from memkind API by using hbw_posix_memalign((void**)&pixel, 64, sizeof(P)*width*height); I think this API only provides us aligned memory allocation. I do not know why this can help improve the GPflops so much as the following shows.
The part of coding is as the following. Here the memory which stores img_in is allocated by memkind API.
template<typename P>
void ApplyStencil(ImageClass<P> & img_in, ImageClass<P> & img_out) {
const int width = img_in.width;
const int height = img_in.height;
P * in = img_in.pixel;
P * out = img_out.pixel;
#pragma omp parallel for
for (int i = 1; i < height-1; i++)
#pragma omp simd
for (int j = 1; j < width-1; j++) {
P val = -in[(i-1)*width + j-1] - in[(i-1)*width + j] - in[(i-1)*width + j+1]
-in[(i )*width + j-1] + 8*in[(i )*width + j] - in[(i )*width + j+1]
-in[(i+1)*width + j-1] - in[(i+1)*width + j] - in[(i+1)*width + j+1];
val = (val < 0 ? 0 : val);
val = (val > 255 ? 255 : val);
out[i*width + j] = val;
}
}
My questions are as the follwing:
Is it only because we can use less memory operation to get our data and then we can improve the performance almost 5 times?
In terms of numaclt, based on the linux documentation, it allows us to bind the processes with specific nodes or cpus. When we use the command numactl -m 1, we can get the improvement 5 times. I not sure if the improvement comes from NUMA communication delay.

shape of input to calculate information gain

I want to calculate the information gain on 20_newsgroup data set.
I am using the code here(also I put a copy of the code down of the question).
As you see the input to the algorithm is X,y
My confusion is that, X is going to be a matrix with documents in rows and features as column. (according to 20_newsgroup it is 11314,1000
in case i only considered 1000 features).
but according to the concept of information gain, it should calculate information gain for each feature.
(So I was expecting to see the code in a way loop through each feature, so the input to the function be a matrix where rows are features and columns are class)
But X is not feature here but X stands for documents, and I can not see the part in the code that take care of this part! ( I mean considering each document, and then going through each feature of that document; like looping through rows but at the same time looping through columns as the features are stored in columns).
I have read this and this and many similar questions but they are not clear in terms of input matrix shape.
this is the code for reading 20_newsgroup:
newsgroup_train = fetch_20newsgroups(subset='train')
X,y = newsgroup_train.data,newsgroup_train.target
cv = CountVectorizer(max_df=0.99,min_df=0.001, max_features=1000,stop_words='english',lowercase=True,analyzer='word')
X_vec = cv.fit_transform(X)
(X_vec.shape) is (11314,1000) which is not features in the 20_newsgroup data set. I am thinking am I calculating Information gain in a incorrect way?
This is the code for Information gain:
def information_gain(X, y):
def _calIg():
entropy_x_set = 0
entropy_x_not_set = 0
for c in classCnt:
probs = classCnt[c] / float(featureTot)
entropy_x_set = entropy_x_set - probs * np.log(probs)
probs = (classTotCnt[c] - classCnt[c]) / float(tot - featureTot)
entropy_x_not_set = entropy_x_not_set - probs * np.log(probs)
for c in classTotCnt:
if c not in classCnt:
probs = classTotCnt[c] / float(tot - featureTot)
entropy_x_not_set = entropy_x_not_set - probs * np.log(probs)
return entropy_before - ((featureTot / float(tot)) * entropy_x_set
+ ((tot - featureTot) / float(tot)) * entropy_x_not_set)
tot = X.shape[0]
classTotCnt = {}
entropy_before = 0
for i in y:
if i not in classTotCnt:
classTotCnt[i] = 1
else:
classTotCnt[i] = classTotCnt[i] + 1
for c in classTotCnt:
probs = classTotCnt[c] / float(tot)
entropy_before = entropy_before - probs * np.log(probs)
nz = X.T.nonzero()
pre = 0
classCnt = {}
featureTot = 0
information_gain = []
for i in range(0, len(nz[0])):
if (i != 0 and nz[0][i] != pre):
for notappear in range(pre+1, nz[0][i]):
information_gain.append(0)
ig = _calIg()
information_gain.append(ig)
pre = nz[0][i]
classCnt = {}
featureTot = 0
featureTot = featureTot + 1
yclass = y[nz[1][i]]
if yclass not in classCnt:
classCnt[yclass] = 1
else:
classCnt[yclass] = classCnt[yclass] + 1
ig = _calIg()
information_gain.append(ig)
return np.asarray(information_gain)
Well, after going through the code in detail, I learned more about X.T.nonzero().
Actually it is correct that information gain needs to loop through features.
Also it is correct that the matrix scikit-learn give us here is based on doc-features.
But:
in code it uses X.T.nonzero() which technically transform all the nonzero values into array. and then in the next row loop through the length of that array range(0, len(X.T.nonzero()[0]).
Overall, this part X.T.nonzero()[0] is returning all the none zero features to us :)

Why doesn't this Fibonacci Number function work in O(log N)?

So the Fibonacci number for log (N) — without matrices.
Ni // i-th Fibonacci number
= Ni-1 + Ni-2 // by definition
= (Ni-2 + Ni-3) + Ni-2 // unwrap Ni-1
= 2*Ni-2 + Ni-3 // reduce the equation
= 2*(Ni-3 + Ni-4) + Ni-3 //unwrap Ni-2
// And so on
= 3*Ni-3 + 2*Ni-4
= 5*Ni-4 + 3*Ni-5
= 8*Ni-5 + 5*Ni-6
= Nk*Ni-k + Nk-1*Ni-k-1
Now we write a recursive function, where at each step we take k~=I/2.
static long N(long i)
{
if (i < 2) return 1;
long k=i/2;
return N(k) * N(i - k) + N(k - 1) * N(i - k - 1);
}
Where is the fault?
You get a recursion formula for the effort: T(n) = 4T(n/2) + O(1). (disregarding the fact that the numbers get bigger, so the O(1) does not even hold). It's clear from this that T(n) is not in O(log(n)). Instead one gets by the master theorem T(n) is in O(n^2).
Btw, this is even slower than the trivial algorithm to calculate all Fibonacci numbers up to n.
The four N calls inside the function each have an argument of around i/2. So the length of the stack of N calls in total is roughly equal to log2N, but because each call generates four more, the bottom 'layer' of calls has 4^log2N = O(n2) Thus, the fault is that N calls itself four times. With only two calls, as in the conventional iterative method, it would be O(n). I don't know of any way to do this with only one call, which could be O(log n).
An O(n) version based on this formula would be:
static long N(long i) {
if (i<2) {
return 1;
}
long k = i/2;
long val1;
long val2;
val1 = N(k-1);
val2 = N(k);
if (i%2==0) {
return val2*val2+val1*val1;
}
return val2*(val2+val1)+val1*val2;
}
which makes 2 N calls per function, making it O(n).
public class fibonacci {
public static int count=0;
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
int i = scan.nextInt();
System.out.println("value of i ="+ i);
int result = fun(i);
System.out.println("final result is " +result);
}
public static int fun(int i) {
count++;
System.out.println("fun is called and count is "+count);
if(i < 2) {
System.out.println("function returned");
return 1;
}
int k = i/2;
int part1 = fun(k);
int part2 = fun(i-k);
int part3 = fun(k-1);
int part4 = fun(i-k-1);
return ((part1*part2) + (part3*part4)); /*RESULT WILL BE SAME FOR BOTH METHODS*/
//return ((fun(k)*fun(i-k))+(fun(k-1)*fun(i-k-1)));
}
}
I tried to code to problem defined by you in java. What i observed is that complexity of above code is not completely O(N^2) but less than that.But as per conventions and standards the worst case complexity is O(N^2) including some other factors like computation(division,multiplication) and comparison time analysis.
The output of above code gives me information about how many times the function
fun(int i) computes and is being called.
OUTPUT
So including the time taken for comparison and division, multiplication operations, the worst case time complexity is O(N^2) not O(LogN).
Ok if we use Analysis of the recursive Fibonacci program technique.Then we end up getting a simple equation
T(N) = 4* T(N/2) + O(1)
where O(1) is some constant time.
So let's apply Master's method on this equation.
According to Master's method
T(n) = aT(n/b) + f(n) where a >= 1 and b > 1
There are following three cases:
If f(n) = Θ(nc) where c < Logba then T(n) = Θ(nLogba)
If f(n) = Θ(nc) where c = Logba then T(n) = Θ(ncLog n)
If f(n) = Θ(nc) where c > Logba then T(n) = Θ(f(n))
And in our equation a=4 , b=2 & c=0.
As case 1 c < logba => 0 < 2 (which is log base 2 and equals to 2) is satisfied
hence T(n) = O(n^2).
For more information about how master's algorithm works please visit: Analysis of Algorithms
Your idea is correct, and it will perform in O(log n) provided you don't compute the same formula
over and over again. The whole point of having N(k) * N(i-k) is to have (k = i - k) so you only have to compute one instead of two. But if you only call recursively, you are performing the computation twice.
What you need is called memoization. That is, store every value that you already have computed, and
if it comes up again, then you get it in O(1).
Here's an example
const int MAX = 10000;
// memoization array
int f[MAX] = {0};
// Return nth fibonacci number using memoization
int fib(int n) {
// Base case
if (n == 0)
return 0;
if (n == 1 || n == 2)
return (f[n] = 1);
// If fib(n) is already computed
if (f[n]) return f[n];
// (n & 1) is 1 iff n is odd
int k = n/2;
// Applying your formula
f[n] = fib(k) * fib(n - k) + fib(k - 1) * fib(n - k - 1);
return f[n];
}

libsvm-java throws NullPointerException after a few iteration of training

I am using libsvm java package for a sentence classification task. I have 3 classes. Every sentence is represented as a vector of size 435. The format of vector_file is as follows:
1 0 0.12 0 0.5 0.24 0.32 0 0 0 ... 0.43 0 First digit indicates class label and remaining is the vector.
The following is how I am making the svm_problem:
public void makeSvmProb(ArrayList<Float> inputVector,float label,int p){
// p is 0 to 77 (total training sentences)
int idx=0,count=0;
svm_prob.y[p]=label;
for(int i=0;i<inputVector.size();i++){
if(inputVector.get(i)!=0) {
count++; // To get the count of non-zero values
}
}
svm_node[] x = new svm_node[count];
for(int i=0;i<inputVector.size();i++){
if(inputVector.get(i)!=0){
x[idx] = new svm_node();
x[idx].index = i;
x[idx].value = inputVector.get(i);
idx++;
}
}
svm_prob.x[p]=x;
}
Parameter settings:
param.svm_type = svm_parameter.C_SVC;
param.kernel_type = svm_parameter.RBF;
param.degree = 3;
param.gamma = 0.5;
param.coef0 = 0;
param.nu = 0.5;
param.cache_size = 40;
param.C = 1;
param.eps = 1e-3;
param.p = 0.1;
param.shrinking = 1;
param.probability = 0;
param.nr_weight = 0;
param.weight_label = new int[0];
param.weight = new double[0];
While executing the program, After 2 iterations, I am getting a NullPointerException. I couldn't figure out what is going wrong.
This is the error coming:
optimization finished, #iter = 85
nu = 0.07502654779820772
obj = -15.305162227093849, rho = -0.03157808477381625
nSV = 47, nBSV = 1
*
optimization finished, #iter = 88
nu = 0.08576821199868506
obj = -17.83925196551639, rho = 0.1297986754900152
nSV = 51, nBSV = 3
Exception in thread "main" java.lang.NullPointerException
at libsvm.Kernel.dot(svm.java:207)
at libsvm.Kernel.<init>(svm.java:199)
at libsvm.SVC_Q.<init>(svm.java:1156)
at libsvm.svm.solve_c_svc(svm.java:1333)
at libsvm.svm.svm_train_one(svm.java:1510)
at libsvm.svm.svm_train(svm.java:2067)
at SvmOp.<init>(SvmOp.java:130)
at Main.main(Main.java:8)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Any idea on what is going wrong?
The NullPointerException is thrown in Line 207 in svm.class. Investigating the source code shows:
static double dot(svm_node[] x, svm_node[] y)
{
double sum = 0;
int xlen = x.length;
...
}
Line 207 is int xlen = x.length;. So in this case, we see, that one of your svm_node (or vectors) is null.
For this reason, we cannot really help you here, as we would need more information / source code to debug it.
I would go for the following strategy:
Investigate the svm_node objects after you completed the building of the svm_problem in a debugger and look for null values.
Check the build process of your svm_problem. The problem might be there.
An other possibility would be to change your data-format and be compliant to the official LIBSVM format:
As stated in the documentation, the data format uses sparse-data format and should be like that:
<label> 0:i 1:K(xi,x1) ... L:K(xi,xL)
The ascending integer refers to the attribute or feature id, which is necessary for internal representation of the vector.
I previously replied to a similar question here and added an example for the data format.
This format can be read out of the box as the code to construct the svm_problem is included in the library.

Resources