Autodiff for Jacobian derivative with respect to individual joint angles - drake

I am trying to compute $\partial{J}{q_i}$ in drake C++ for manipulator and as per my search, the best approach seems to be using autodiff function. I was not able to fully understand autodiff approach from the resources that I found, so I apologize if my approach is not clear enough. I have used my understanding from some already asked questions mentioned on the forum regarding auto diff as well as https://drake.mit.edu/doxygen_cxx/classdrake_1_1multibody_1_1_multibody_plant.html as reference.
As I want to calculate $\partial{J}{q_i}$, the return type will be a tensor i.e. 3 * 7 * 7(or 6 * 7 * 7 depending on the spatial jacobian). I can think of using std::vectorEigen::MatrixXd to allocate the output or alternatively just doing one $q_i$ at a time and computing the respective jacobian for the auto diff. In either case, I was struggling to pass it in the initializing the jacobian function.
I did the following to initialize autodiff
std::unique_ptr<multibody::MultibodyPlant<AutoDiffXd>> mplant_autodiff = systems::System<double>::ToAutoDiffXd(mplant);
std::unique_ptr<systems::Context<AutoDiffXd>> mContext_autodiff = mplant_autodiff->CreateDefaultContext();
mContext_autodiff->SetTimeStateAndParametersFrom(*mContext);
const multibody::Frame<AutoDiffXd>* mFrame_EE_autodiff = &mplant_autodiff->GetBodyByName(mEE_link).body_frame();
const multibody::Frame<AutoDiffXd>* mWorld_Frame_autodiff = &(mplant_autodiff->world_frame());
//Initialize the q as autodiff vector
drake::AutoDiffVecXd q_autodiff = drake::math::InitializeAutoDiff(mq_robot);
MatrixX<AutoDiffXd> mJacobian_autodiff; // Linear Jacobian matrix.
mplant_autodiff->SetPositions(context_autodiff.get(), q_autodiff);
mplant_autodiff->CalcJacobianTranslationalVelocity(*mContext_autodiff,
multibody::JacobianWrtVariable::kQDot,
*mFrame_EE_autodiff,
Eigen::Vector3d::Zero(),
*mWorld_Frame_autodiff,
*mWorld_Frame_autodiff,
&mJacobian_autodiff
);
However, as far as I understand, InitializeAutoDiff initializes to the identity matrix, whereas I want to $\partial{J}{q_i}$, so is there is a better way to do it. In addition, I get error messages when I try to call the jacobian matrix. Is there a way to address this problem both for $\partial{J}{q_i}$ for each q_i and changing q_i in a for loop or directly getting the result in a tensor. My apologies if I am doing something total tangent to the correct approach. I thank you in anticipation.

However, as far as I understand, InitializeAutoDiff initializes to the identity matrix, whereas I want to $\partial{J}{q_i}$, so is there is a better way to do it
That is correct. When you call InitializeAutoDiff and compute mJacobian_autodiff, you get a matrix of AutoDiffXd. Each AutoDiffXd has a value() function that stores the double value, and a derivatives() storing the gradient as an Eigen::VectorXd. We have
mJacobian(i, j).value() = J(i, j)
mJacobian_autodiff(i, j).derivatives()(k) = ∂J(i, j)/∂q(k)
So if you want to create a std::vecot<Eigen::MatrixXd> such that the k'th entry of this vector stores the matrix ∂J/∂q(k), then here is a code
std::vector<Eigen::MatrixXd> dJdq(q_autodiff.rows());
for (int i = 0; i < q_autodiff.rows(); ++i) {
dJdq[i].resize(mJacobian_autodiff.rows(), mJacobian_autodiff.cols());
}
for (int i = 0; i < q_autodiff.rows(); ++i) {
// dJidq stores the gradient of the ∂J.col(i)/∂q, namely dJidq(j, k) = ∂J(j, i)/∂q(k)
auto dJidq = ExtractGradient(mJacobian_autodiff.col(i));
for (int j = 0; j < static_cast<int>(dJdq.size()); ++j) {
dJdq[j].col(i) = dJidq.col(j);
}
}
Compute ∂J/∂q(i) for a single i
If you do not want to compute ∂J/∂q(i) for all i, but only for one specific i, you can change the initialization of q_autodiff from InitializeAutoDiff to this
AutoDiffVecXd q_autodiff(q.rows());
for (int k = 0; k < q_autodiff.rows(); ++k) {
q_autodiff(k).value() = q(k)
q_autodiff(k).derivatives() = Vector1d::Zero();
if (k == i) {
q_autodiff(k).derivatives()(0) = 1;
}
}
namely q_autodiff stores the gradient ∂q/∂q(i), which is 0 for all k != i and 1 when k == i. And then you can compute mJacobian_autodiff using your current code. Now mJacobian_autodiff(m, n).derivatives() store the gradient of ∂J(m, m)/∂q(i) for that specific i. You can extract this gradient as
Eigen::Matrix dJdqi(mJacobian_autodiff.rows(), mJacobian_autodiff.cols());
for (int m = 0; m < dJdqi.rows(); ++m) {
for (int n = 0; n < dJdqi.cols(); ++n) {
dJdqi(m, n) = mJacobian_autodiff(m, n).derivatives()(0);
}
}

Related

Real FFT output

I have implemented fft into at32ucb series ucontroller using kiss fft library and currently struggling with the output of the fft.
My intention is to analyse sound coming from piezo speaker.
Currently, the frequency of the sounder is 420Hz which I successfully got from the fft output (cross checked with an oscilloscope). However, the output frequency is just half of expected if I put function generator waveform into the system.
I suspect its the frequency bin calculation formula which I got wrong; currently using, fft_peak_magnitude_index*sampling frequency / fft_size.
My input is real and doing real fft. (output samples = N/2)
And also doing iir filtering and windowing before fft.
Any suggestion would be a great help!
// IIR filter calculation, n = 256 fft points
for (ctr=0; ctr<n; ctr++)
{
// filter calculation
y[ctr] = num_coef[0]*x[ctr];
y[ctr] += (num_coef[1]*x[ctr-1]) - (den_coef[1]*y[ctr-1]);
y[ctr] += (num_coef[2]*x[ctr-2]) - (den_coef[2]*y[ctr-2]);
y1[ctr] = y[ctr] - 510; //eliminate dc offset
// hamming window
hamming[ctr] = (0.54-((0.46) * cos(2*M_PI*ctr/n)));
window[ctr] = hamming[ctr]*y1[ctr];
fft_input[ctr].r = window[ctr];
fft_input[ctr].i = 0;
fft_output[ctr].r = 0;
fft_output[ctr].i = 0;
}
kiss_fftr_cfg fftConfig = kiss_fftr_alloc(n,0,NULL,NULL);
kiss_fftr(fftConfig, (kiss_fft_scalar * )fft_input, fft_output);
peak = 0;
freq_bin = 0;
for (ctr=0; ctr<n1; ctr++)
{
fft_mag[ctr] = 10*(sqrt((fft_output[ctr].r * fft_output[ctr].r) + (fft_output[ctr].i * fft_output[ctr].i)))/(0.5*n);
if(fft_mag[ctr] > peak)
{
peak = fft_mag[ctr];
freq_bin = ctr;
}
frequency = (freq_bin*(10989/n)); // 10989 is the sampling freq
//************************************
//Usart write
char filtResult[10];
//sprintf(filtResult, "%04d %04d %04d\n", (int)peak, (int)freq_bin, (int)frequency);
sprintf(filtResult, "%04d %04d %04d\n", (int)x[ctr], (int)fft_mag[ctr], (int)frequency);
char c;
char *ptr = &filtResult[0];
do
{
c = *ptr;
ptr++;
usart_bw_write_char(&AVR32_USART2, (int)c);
// sendByte(c);
} while (c != '\n');
}
The main problem is likely to be how you declared fft_input.
Based on your previous question, you are allocating fft_input as an array of kiss_fft_cpx. The function kiss_fftr on the other hand expect an array of scalar. By casting the input array into a kiss_fft_scalar with:
kiss_fftr(fftConfig, (kiss_fft_scalar * )fft_input, fft_output);
KissFFT essentially sees an array of real-valued data which contains zeros every second sample (what you filled in as imaginary parts). This is effectively an upsampled version (although without interpolation) of your original signal, i.e. a signal with effectively twice the sampling rate (which is not accounted for in your freq_bin to frequency conversion). To fix this, I suggest you pack your data into a kiss_fft_scalar array:
kiss_fft_scalar fft_input[n];
...
for (ctr=0; ctr<n; ctr++)
{
...
fft_input[ctr] = window[ctr];
...
}
kiss_fftr_cfg fftConfig = kiss_fftr_alloc(n,0,NULL,NULL);
kiss_fftr(fftConfig, fft_input, fft_output);
Note also that while looking for the peak magnitude, you probably are only interested in the final largest peak, instead of the running maximum. As such, you could limit the loop to only computing the peak (using freq_bin instead of ctr as an array index in the following sprintf statements if needed):
for (ctr=0; ctr<n1; ctr++)
{
fft_mag[ctr] = 10*(sqrt((fft_output[ctr].r * fft_output[ctr].r) + (fft_output[ctr].i * fft_output[ctr].i)))/(0.5*n);
if(fft_mag[ctr] > peak)
{
peak = fft_mag[ctr];
freq_bin = ctr;
}
} // close the loop here before computing "frequency"
Finally, when computing the frequency associated with the bin with the largest magnitude, you need the ensure the computation is done using floating point arithmetic. If as I suspect n is an integer, your formula would be performing the 10989/n factor using integer arithmetic resulting in truncation. This can be simply remedied with:
frequency = (freq_bin*(10989.0/n)); // 10989 is the sampling freq

ID3D11DeviceContext::DrawIndexed() Failed

my program is Directx Program that draws a container cube within it smaller cubes....these smaller cubes fall by time i hope you understand what i mean...
The program isn't complete yet ...it should draws the container only ....but it draws nothing ...only the background color is visible... i only included what i think is needed ...
this is the routines that initialize the program
bool Game::init(HINSTANCE hinst,HWND _hw){
Directx11 ::init(hinst , _hw);
return LoadContent();}
Directx11::init()
bool Directx11::init(HINSTANCE hinst,HWND hw){
_hinst=hinst;_hwnd=hw;
RECT rc;
GetClientRect(_hwnd,&rc);
height= rc.bottom - rc.top;
width = rc.right - rc.left;
UINT flags=0;
#ifdef _DEBUG
flags |=D3D11_CREATE_DEVICE_DEBUG;
#endif
HR(D3D11CreateDevice(0,_driverType,0,flags,0,0,D3D11_SDK_VERSION,&d3dDevice,&_featureLevel,&d3dDeviceContext));
if (d3dDevice == 0 || d3dDeviceContext == 0)
return 0;
DXGI_SWAP_CHAIN_DESC sdesc;
ZeroMemory(&sdesc,sizeof(DXGI_SWAP_CHAIN_DESC));
sdesc.Windowed=true;
sdesc.BufferCount=1;
sdesc.BufferDesc.Format=DXGI_FORMAT_R8G8B8A8_UNORM;
sdesc.BufferDesc.Height=height;
sdesc.BufferDesc.Width=width;
sdesc.BufferDesc.Scaling=DXGI_MODE_SCALING_UNSPECIFIED;
sdesc.BufferDesc.ScanlineOrdering=DXGI_MODE_SCANLINE_ORDER_UNSPECIFIED;
sdesc.OutputWindow=_hwnd;
sdesc.BufferDesc.RefreshRate.Denominator=1;
sdesc.BufferDesc.RefreshRate.Numerator=60;
sdesc.Flags=0;
sdesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
if (m4xMsaaEnable)
{
sdesc.SampleDesc.Count=4;
sdesc.SampleDesc.Quality=m4xMsaaQuality-1;
}
else
{
sdesc.SampleDesc.Count=1;
sdesc.SampleDesc.Quality=0;
}
IDXGIDevice *Device=0;
HR(d3dDevice->QueryInterface(__uuidof(IDXGIDevice),reinterpret_cast <void**> (&Device)));
IDXGIAdapter*Ad=0;
HR(Device->GetParent(__uuidof(IDXGIAdapter),reinterpret_cast <void**> (&Ad)));
IDXGIFactory* fac=0;
HR(Ad->GetParent(__uuidof(IDXGIFactory),reinterpret_cast <void**> (&fac)));
fac->CreateSwapChain(d3dDevice,&sdesc,&swapchain);
ReleaseCOM(Device);
ReleaseCOM(Ad);
ReleaseCOM(fac);
ID3D11Texture2D *back = 0;
HR(swapchain->GetBuffer(0,__uuidof(ID3D11Texture2D),reinterpret_cast <void**> (&back)));
HR(d3dDevice->CreateRenderTargetView(back,0,&RenderTarget));
D3D11_TEXTURE2D_DESC Tdesc;
ZeroMemory(&Tdesc,sizeof(D3D11_TEXTURE2D_DESC));
Tdesc.BindFlags = D3D11_BIND_DEPTH_STENCIL;
Tdesc.ArraySize = 1;
Tdesc.Format= DXGI_FORMAT_D24_UNORM_S8_UINT;
Tdesc.Height= height;
Tdesc.Width = width;
Tdesc.Usage = D3D11_USAGE_DEFAULT;
Tdesc.MipLevels=1;
if (m4xMsaaEnable)
{
Tdesc.SampleDesc.Count=4;
Tdesc.SampleDesc.Quality=m4xMsaaQuality-1;
}
else
{
Tdesc.SampleDesc.Count=1;
Tdesc.SampleDesc.Quality=0;
}
HR(d3dDevice->CreateTexture2D(&Tdesc,0,&depthview));
HR(d3dDevice->CreateDepthStencilView(depthview,0,&depth));
d3dDeviceContext->OMSetRenderTargets(1,&RenderTarget,depth);
D3D11_VIEWPORT vp;
vp.TopLeftX=0.0f;
vp.TopLeftY=0.0f;
vp.Width = static_cast <float> (width);
vp.Height= static_cast <float> (height);
vp.MinDepth = 0.0f;
vp.MaxDepth = 1.0f;
d3dDeviceContext -> RSSetViewports(1,&vp);
return true;
SetBuild() Prepare the matrices inside the container for the smaller cubes ....i didnt program it to draw the smaller cubes yet
and this the function that draws the scene
void Game::Render(){
d3dDeviceContext->ClearRenderTargetView(RenderTarget,reinterpret_cast <const float*> (&Colors::LightSteelBlue));
d3dDeviceContext->ClearDepthStencilView(depth,D3D11_CLEAR_DEPTH | D3D11_CLEAR_STENCIL,1.0f,0);
d3dDeviceContext-> IASetInputLayout(_layout);
d3dDeviceContext-> IASetPrimitiveTopology(D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
d3dDeviceContext->IASetIndexBuffer(indices,DXGI_FORMAT_R32_UINT,0);
UINT strides=sizeof(Vertex),off=0;
d3dDeviceContext->IASetVertexBuffers(0,1,&vertices,&strides,&off);
D3DX11_TECHNIQUE_DESC des;
Tech->GetDesc(&des);
Floor * Lookup; /*is a variable to Lookup inside the matrices structure (Floor Contains XMMATRX Piese[9])*/
std::vector<XMFLOAT4X4> filled; // saves the matrices of the smaller cubes
XMMATRIX V=XMLoadFloat4x4(&View),P = XMLoadFloat4x4(&Proj);
XMMATRIX vp = V * P;XMMATRIX wvp;
for (UINT i = 0; i < des.Passes; i++)
{
d3dDeviceContext->RSSetState(BuildRast);
wvp = XMLoadFloat4x4(&(B.Memory[0].Pieces[0])) * vp; // Loading The Matrix at translation(0,0,0)
HR(ShadeMat->SetMatrix(reinterpret_cast<float*> ( &wvp)));
HR(Tech->GetPassByIndex(i)->Apply(0,d3dDeviceContext));
d3dDeviceContext->DrawIndexed(build_ind_count,build_ind_index,build_vers_index);
d3dDeviceContext->RSSetState(PieseRast);
UINT r1=B.GetSize(),r2=filled.size();
for (UINT j = 0; j < r1; j++)
{
Lookup = &B.Memory[j];
for (UINT r = 0; r < Lookup->filledindeces.size(); r++)
{
filled.push_back(Lookup->Pieces[Lookup->filledindeces[r]]);
}
}
for (UINT j = 0; j < r2; j++)
{
ShadeMat->SetMatrix( reinterpret_cast<const float*> (&filled[i]));
Tech->GetPassByIndex(i)->Apply(0,d3dDeviceContext);
d3dDeviceContext->DrawIndexed(piese_ind_count,piese_ind_index,piese_vers_index);
}
}
HR(swapchain->Present(0,0));}
thanks in Advance
One bug in your program appears to be that you're using i, the index of the current pass, as an index into the filled vector, when you should apparently be using j.
Another apparent bug is that in the loop where you are supposed to be iterating over the elements of filled, you're not iterating over all of them. The value r2 is set to the size of filled before you append anything to it during that pass. During the first pass this means that nothing will be drawn by this loop. If your technique only has one pass then this means that the second DrawIndexed call in your code will never be executed.
It also appears you should be only adding matrices to filled once, regardless of the number of the passes the technique has. You should consider if your code is actually meant to work with techniques with multiple passes.

Got different EM::predict() results after EM::read() saved model in OpenCV

I'm new to OpenCV and C++ and I'm trying to build a classifier using Gaussian Mixture Model within the OpenCV. I figured out how it works and got it worked ... maybe. I got something like this now:
If I classify the training samples just after the model was trained and saved, I got the result I want. But when I reclassify my training data using the read(), one of the clusters is missing, means I got different cluster result from the same GMM model. I don't get it now because the cluster I want was gone, I can't reproduce the classification again until I retrained the model using the same data. I checked the code in runtime and the result valule in the Vec2d from which predict() returned was never assigned to 1 (I set 3 clusters).
Maybe there's a bug or I did something wrong?
p.s. I'm using 2.4.8 in VS2013
My programs like this:
train part
void GaussianMixtureModel::buildGMM(InputArray _src){
//use source to train GMM and save the model
Mat samples, input = _src.getMat();
createSamples(input, samples);
bool status = em_model.train(samples);
saveModel();
}
save/load the model
FileStorage fs(filename, FileStorage::READ);
if (fs.isOpened()) // if we have file with parameters, read them
{
const FileNode& fn = fs["StatModel.EM"];
em_model.read(fn);
fs.release();
}
FileStorage fs_save(filename, FileStorage::WRITE);
if (fs_save.isOpened()) // if we have file with parameters, read them
{
em_model.write(fs_save);
fs_save.release();
}
predict part
vector<Mat> GaussianMixtureModel::classify(Mat input){
/// samples is a matrix of channels x N elements, each row is a set of feature
Mat samples;
createSamples(input, samples);
for (int k = 0; k < clusterN; k++){
masks[k] = Mat::zeros(input.size(), CV_8UC1);
}
int idx = 0;
for (int i = 0; i < input.rows; i++){
for (int j = 0; j < input.cols; j++){
//process the predicted probability
Mat probs(1, clusterN, CV_64FC1);
Vec2d response = em_model.predict(samples.row(idx++), probs);
int result = cvRound(response[1]);
for (int k = 0; k < clusterN; k++){
if (result == k){
// change to the k-th class's picture
masks[k].at<uchar>(i, j) = 255;
}
...
// something else
}
}
}
}
I suppose my answer will be too late but as I have encountered the same problem the solution I found may be useful for others.
By analysing the source code, I notice that in the case of EM::COV_MAT_DIAGONAL the eigen values of covariances matrix(covsEigenValues in source code) are obtained via SVD after loading the saved data.
However, SVD computes the singular(eigen in our case)values and stores it in ASCENDING order.
To prevent this , I simply extract directly the diagonal element of loaded covariance matrix in covsEigenValues to keep the good order.

Summation of Perceptron not working properly. Getting large summation

So I have a run method which summates the weights of the edges in the artificial neural network with the threshold values of the input nodes.
Sort of like this:
Now my test perceptron should produce a summation of -3, but I am getting a value of 1176!!! What is going on here?
Here is the code that I have written for my run() method, constructor, and my main method.
Constructor:
public class Perceptron {
//We want to create a variable which will represent the number of weighted edges
//in the 2-dimensional array.
protected int num_weighted_Edges;
//Inside this class we want to create a data field which is a
//2-D array of WeightedEdges. Since the weightedEdges will be in
//double data type, we will create a double type 2-dimensional
//array.
protected WeightedEdge[][] weightedEdges;
protected int[] weights;
//We set a double field named eta equal to 0.05.
protected double eta = 0.05;
//We initialize a constructor which only takes a parameter int n.
public Perceptron(int n){
//We want to create a new graph which will have n + 1 vertices
//, where we also want vertex 0 to act like the output node
//as in a neural network.
this.num_weighted_Edges = n;
weights = new int[num_weighted_Edges];
//First we need to verify that n is a positive real number
if (num_weighted_Edges < 0){
throw new RuntimeException("You cannot have a perceptron of negative value");
}
else {
//Test code for testing if this code works.
System.out.println("A perceptron of " + num_weighted_Edges + " input nodes, and 1 output node was created");
}
//Now we create a graph object with "n" number of vertices.
weightedEdges = new WeightedEdge[num_weighted_Edges + 1][num_weighted_Edges + 1];
//Create a for loop that will iterate the weightedEdges array.
//We want to create the weighted edges from vertex 1 and not vertex 0
//since vertex 0 will be the output node, so we set i = 1.
for (int i = 1; i < weightedEdges.length; i++){
for (int j = 0; j < weightedEdges[i].length; j++){
//This will create a weighted edge in between [1][0]...[2][0]...[3][0]
//The weighted edge will have a random value between -1 and 1 assigned to it.
weightedEdges[i][0] = new WeightedEdge(i, j, 1);
}
}
}
This is my run() method:
//This method will take the input nodes, do a quick verification check on it and
//sum up the weights using the simple threshold function described in class to return
//either a 1 or -1. 1 meaning fire, and -1 not firing.
public int run(int[] weights){
//So this method will act like the summation function. It will take the int parameters
//you put into the parameter field and multiply it times the input nodes in the
//weighted edge 2 d array.
//Setup a summation counter.
int sum = 0;
if (weights.length != num_weighted_Edges){
throw new RuntimeException("Array coming in has to equal the number of input nodes");
}
else {
//We iterate the weights array and use the sum counter to sum up weights.
for (int i = 0; i < weights.length; i++){
//Create a nested for loop which will iterate over the input nodes
for ( int j = 1; j < weightedEdges.length; j++){
for (int k = 0; k < weightedEdges[j].length; k++){
//This takes the weights and multiplies it times the value in the
//input nodes. The sum should equal greater than 0 or less than 0.
sum += (int) ((weightedEdges[j][0].getWeight()) * i);
//Here the plus equals sign takes the product of (weightedEdges[j][0] * i) and
//then adds it to the previous value.
}
}
}
}
System.out.println(sum);
//If the sum is greater than 0, we fire the neuron by returning 1.
if (sum > 0){
//System.out.println(1); test code
return 1;
}
//Else we don't fire and return -1.
else {
//System.out.println(-1); test code
return -1;
}
}
This is my main method:
//Main method which will stimulate the artificial neuron (perceptron, which is the
//simplest type of neuron in an artificial network).
public static void main(String[] args){
//Create a test perceptron with a user defined set number of nodes.
Perceptron perceptron = new Perceptron(7);
//Create a weight object that creates an edge between vertices 1 and 2
//with a weight of 1.5
WeightedEdge weight = new WeightedEdge(1, 2, 1.5);
//These methods work fine.
weight.getStart();
weight.getEnd();
weight.setWeight(2.0);
//Test to see if the run class works. (Previously was giving a null pointer, but
//fixed now)
int[] test_weight_Array = {-1, -1, -1, -1, -1, 1, 1};
//Tested and works to return output of 1 or -1. Also catches exceptions.
perceptron.run(test_weight_Array);
//Testing a 2-d array to see if the train method works.
int[][] test_train_Array = {{1}, {-1}, {1}, {1}, {1}, {1}, {1}, {1}};
//Works and catches exceptions.
perceptron.train(test_train_Array);
}
}
I think you should change
sum += (int) ((weightedEdges[j][0].getWeight()) * i);
to
sum += (int) ((weightedEdges[j][k].getWeight()) * i);

OpenCV Hough strongest lines

Do the HoughLines or HoughLinesP functions in OpenCV return the list of lines in accumulator order like the HoughCircles function does? I would like to know the ordering of lines. It would also be very handy to get a the accumulator value for the lines so an intelligent and adaptive threshold could be used instead of a fixed one. Are either the ordering or the accumulator value available without rewriting OpenCV myself?
HoughTransform orders lines descending by number of votes. You can see the code here
However, the vote count is lost as the function returns - the only way to have it is to modify OpenCV.
The good news is that is not very complicated - I did it myself once. It's a metter of minutes to change the output from vector< Vec2f > to vector< Vec3f > and populate the last param with vote count.
Also, you have to modify CvLinePolar to add the third parameter - hough is implemented in C, and there is a wrapper over it in C++, so you have to modify both the implementation and the wrapper.
The main code to modify is here
for( i = 0; i < linesMax; i++ )
{
CvLinePolar line;
int idx = sort_buf[i];
int n = cvFloor(idx*scale) - 1;
int r = idx - (n+1)*(numrho+2) - 1;
line.rho = (r - (numrho - 1)*0.5f) * rho;
line.angle = n * theta;
// add this line, and a field voteCount to CvLinePolar
// DO NOT FORGET TO MODIFY THE C++ WRAPPER
line.voteCount = accum[idx];
cvSeqPush( lines, &line );
}

Resources