How to get stack trace for C/C++ program in CYGWIN environment? - printing

How to get stack trace for C/C++ program in CYGWIN environment ?
** I was looking for a back trace mechanism, I've compiled some of the solutions found here and made it a small program for quick reference.
My Answers with a code snippet:
#if defined(__CYGWIN__)
#include <Windows.h>
#include <dbghelp.h>
#include <psdk_inc/_dbg_common.h>
#include <cxxabi.h>
#include <cstring>
class Error // Windows version
{
private:
void *stacktrace[MAX_STACKTRACE_SIZE];
size_t stacktrace_size;
public:
const char* message;
Error(const char* m)
: message(m)
, stacktrace_size(0)
{
// Capture the stack, when error is 'hit'
stacktrace_size = CaptureStackBackTrace(0, MAX_STACKTRACE_SIZE, stacktrace, nullptr);
}
void print_backtrace(ostream& out) const
{
SYMBOL_INFO * symbol;
HANDLE process;
size_t length;
process = GetCurrentProcess();
SymInitialize(process, nullptr, TRUE);
symbol = (SYMBOL_INFO *)calloc(sizeof(SYMBOL_INFO) + 256 * sizeof(char), 1);
symbol->MaxNameLen = 255;
symbol->SizeOfStruct = sizeof(SYMBOL_INFO);
length = strlen (symbol->Name);
std::string result;
char tempStr[255] = {0};
for (int i = 0; i < stacktrace_size; i++)
{
int status = 0;
// '_' is missing in symbol->Name , hence prefix it and concat with symbol->Name
char prefixed_symbol [256] = "_" ;
SymFromAddr(process, (DWORD64)(stacktrace[i]), 0, symbol);
auto backtrace_line = string(symbol->Name);
if (backtrace_line.size() == 0) continue;
// https://en.wikipedia.org/wiki/Name_mangling
// Prefix '_' with symbol name, so that __cxa_demangle does the job correctly
// $ c++filt -n _Z9test_ringI12SmallIntegerIhEEvRK4RingIT_E
strcat (prefixed_symbol, symbol->Name);
char * demangled_name = abi::__cxa_demangle(prefixed_symbol, nullptr, nullptr, &status);
if(status < 0)
{
sprintf(tempStr, "%i: %s - 0x%0X\n", stacktrace_size-i-1, symbol->Name, symbol->Address);
// out << symbol->Name << endl;
}
else
{
sprintf(tempStr, "%i: %s - 0x%0X\n", stacktrace_size - i - 1, demangled_name, symbol->Address);
// out << demangled_name << endl;
}
// Append the extracted info to the result
result += tempStr;
// Free the HEAP allocation made by __cxa_demangle
free((void*)demangled_name);
// Restore the prefix '_' string
prefixed_symbol [1] = '\0';
}
std::cout << result << std::endl;
free(symbol);
}
};
int main ()
{
try {
do_something ();
if (false == status) throw Error("SystemError");
}
catch (const Error &error)
{
cout << "NotImplementedError(\"" << error.message << "\")" << endl;
error.print_backtrace(cout);
return 1;
}
#endif
Command Line Option:
// Use -limagehlp to link the library
g++ -std=c++20 main.cpp -limagehlp

Related

I am trying to have stable flight with px4 and ros2 offboard control

Hello guys I have a offboard code which give about 50 setpoints to drone. It draws spiral with that setpoints. My problem is I couldnt get smooth travel. In every setpoint drone gives a high roll or pitch instant and then floats to the next setpoint. Is there a way to have stable velocity while passing the setpoints. Here is the code:
#include <px4_msgs/msg/offboard_control_mode.hpp>
#include <px4_msgs/msg/trajectory_setpoint.hpp>
#include <px4_msgs/msg/timesync.hpp>
#include <px4_msgs/msg/vehicle_command.hpp>
#include <px4_msgs/msg/vehicle_control_mode.hpp>
#include <px4_msgs/msg/vehicle_local_position.hpp>
#include <rclcpp/rclcpp.hpp>
#include <stdint.h>
#include <chrono>
#include <iostream>
#include "std_msgs/msg/string.hpp"
#include <math.h>
float X;
float Y;
using namespace std::chrono;
using namespace std::chrono_literals;
using namespace px4_msgs::msg;
class setpoint : public rclcpp::Node {
public:
setpoint() : Node("setpoint") {
offboard_control_mode_publisher_ =
this->create_publisher<OffboardControlMode>("fmu/offboard_control_mode/in", 10);
trajectory_setpoint_publisher_ =
this->create_publisher<TrajectorySetpoint>("fmu/trajectory_setpoint/in", 10);
vehicle_command_publisher_ =
this->create_publisher<VehicleCommand>("fmu/vehicle_command/in", 10);
// get common timestamp
timesync_sub_ =
this->create_subscription<px4_msgs::msg::Timesync>("fmu/timesync/out", 10,
[this](const px4_msgs::msg::Timesync::UniquePtr msg) {
timestamp_.store(msg->timestamp);
});
offboard_setpoint_counter_ = 0;
auto sendCommands = [this]() -> void {
if (offboard_setpoint_counter_ == 10) {
// Change to Offboard mode after 10 setpoints
this->publish_vehicle_command(VehicleCommand::VEHICLE_CMD_DO_SET_MODE, 1, 6);
// Arm the vehicle
this->arm();
}
//-------------
subscription_ = this->create_subscription<px4_msgs::msg::VehicleLocalPosition>(
"/fmu/vehicle_local_position/out",
#ifdef ROS_DEFAULT_API
10,
#endif
[this](const px4_msgs::msg::VehicleLocalPosition::UniquePtr msg) {
X = msg->x;
Y = msg->y;
std::cout << "\n\n\n\n\n\n\n\n\n\n";
std::cout << "RECEIVED VEHICLE GPS POSITION DATA" << std::endl;
std::cout << "==================================" << std::endl;
std::cout << "ts: " << msg->timestamp << std::endl;
//std::cout << "lat: " << msg->x << std::endl;
//std::cout << "lon: " << msg->y << std::endl;
std::cout << "lat: " << X << std::endl;
std::cout << "lon: " << Y << std::endl;
std::cout << "waypoint: " << waypoints[waypointIndex][0] << std::endl;
std::cout << "waypoint: " << waypoints[waypointIndex][1] << std::endl;
if((waypoints[waypointIndex][0] + 0.3 > X && waypoints[waypointIndex][0] - 0.3 < X)&&(waypoints[waypointIndex][1] + 0.3 > Y && waypoints[waypointIndex][1] - 0.3 < Y)){
waypointIndex++;
if (waypointIndex >= waypoints.size())
exit(0);
//waypointIndex = 0;
RCLCPP_INFO(this->get_logger(), "Next waypoint: %.2f %.2f %.2f", waypoints[waypointIndex][0], waypoints[waypointIndex][1], waypoints[waypointIndex][2]);
}
});
//--------------
// offboard_control_mode needs to be paired with trajectory_setpoint
publish_offboard_control_mode();
publish_trajectory_setpoint();
// stop the counter after reaching 11
if (offboard_setpoint_counter_ < 11) {
offboard_setpoint_counter_++;
}
};
/*
auto nextWaypoint = [this]() -> void {
waypointIndex++;
if (waypointIndex >= waypoints.size())
waypointIndex = 0;
RCLCPP_INFO(this->get_logger(), "Next waypoint: %.2f %.2f %.2f", waypoints[waypointIndex][0], waypoints[waypointIndex][1], waypoints[waypointIndex][2]);
};
*/
commandTimer = this->create_wall_timer(100ms, sendCommands);
//waypointTimer = this->create_wall_timer(2s, nextWaypoint); //EA
}
void arm() const;
void disarm() const;
void topic_callback() const;
private:
std::vector<std::vector<float>> waypoints = {{0,0,-5,},
{2,0,-5,},
{2.35216,0.476806,-5,},
{2.57897,1.09037,-5,},
{2.64107,1.80686,-5,},
{2.50814,2.58248,-5,},
{2.16121,3.36588,-5,},
{1.59437,4.10097,-5,},
{0.815842,4.73016,-5,},
{-0.151838,5.19778,-5,},
{-1.27233,5.45355,-5,},
{-2.49688,5.45578,-5,},
{-3.76641,5.17438,-5,},
{-5.01428,4.59315,-5,},
{-6.1696,3.71161,-5,},
{-7.16089,2.54591,-5,},
{-7.91994,1.12896,-5,},
{-8.38568,-0.490343,-5,},
{-8.50782,-2.24876,-5,},
{-8.25018,-4.07119,-5,},
{-7.59329,-5.87384,-5,},
{-6.53644,-7.56803,-5,},
{-5.09871,-9.06439,-5,},
{-3.31919,-10.2773,-5,},
{-1.25611,-11.1293,-5,},
{1.01499,-11.5555,-5,},
{3.40395,-11.5071,-5,},
{5.8096,-10.9548,-5,},
{8.12407,-9.89139,-5,},
{10.2375,-8.33272,-5,},
{12.0431,-6.31859,-5,},
{13.4424,-3.91182,-5,},
{14.3502,-1.19649,-5,},
{14.6991,1.72493,-5,},
{14.4435,4.73543,-5,},
{13.5626,7.70817,-5,},
{12.0624,10.5118,-5,},
{9.97696,13.0162,-5,},
{7.36759,15.0983,-5,},
{4.32167,16.6482,-5,},
{0.949612,17.5744,-5,},
{-2.619,17.8084,-5,},
{-6.24045,17.3094,-5,},
{-9.76262,16.0665,-5,},
{-13.0314,14.1004,-5,},
{-15.8974,11.4644,-5,},
{-18.2226,8.24237,-5,},
{-19.8868,4.54696,-5,},
{-20.7936,0.515337,-5,},
{-20.8754,-3.69574,-5,},
{-20.0972,-7.91595,-5,},
{-20.8754,-3.69574,-5,},
{-20.7936,0.515337,-5,},
{-19.8868,4.54696,-5,},
{-18.2226,8.24237,-5,},
{-15.8974,11.4644,-5,},
{-13.0314,14.1004,-5,},
{-9.76262,16.0665,-5,},
{-6.24045,17.3094,-5,},
{-2.619,17.8084,-5,},
{0.949612,17.5744,-5,},
{4.32167,16.6482,-5,},
{7.36759,15.0983,-5,},
{9.97696,13.0162,-5,},
{12.0624,10.5118,-5,},
{13.5626,7.70817,-5,},
{14.4435,4.73543,-5,},
{14.6991,1.72493,-5,},
{14.3502,-1.19649,-5,},
{13.4424,-3.91182,-5,},
{12.0431,-6.31859,-5,},
{10.2375,-8.33272,-5,},
{8.12407,-9.89139,-5,},
{5.8096,-10.9548,-5,},
{3.40395,-11.5071,-5,},
{1.01499,-11.5555,-5,},
{-1.25611,-11.1293,-5,},
{-3.31919,-10.2773,-5,},
{-5.09871,-9.06439,-5,},
{-6.53644,-7.56803,-5,},
{-7.59329,-5.87384,-5,},
{-8.25018,-4.07119,-5,},
{-8.50782,-2.24876,-5,},
{-8.38568,-0.490343,-5,},
{-7.91994,1.12896,-5,},
{-7.16089,2.54591,-5,},
{-6.1696,3.71161,-5,},
{-5.01428,4.59315,-5,},
{-3.76641,5.17438,-5,},
{-2.49688,5.45578,-5,},
{-1.27233,5.45355,-5,},
{-0.151838,5.19778,-5,},
{0.815842,4.73016,-5,},
{1.59437,4.10097,-5,},
{2.16121,3.36588,-5,},
{2.50814,2.58248,-5,},
{2.64107,1.80686,-5,},
{2.57897,1.09037,-5,},
{2.35216,0.476806,-5,},
{2,0,-5,},
{0,0,-5,},
{0,0,0,}
}; // Land
int waypointIndex = 0;
rclcpp::TimerBase::SharedPtr commandTimer;
rclcpp::TimerBase::SharedPtr waypointTimer;
rclcpp::Publisher<OffboardControlMode>::SharedPtr offboard_control_mode_publisher_;
rclcpp::Publisher<TrajectorySetpoint>::SharedPtr trajectory_setpoint_publisher_;
rclcpp::Publisher<VehicleCommand>::SharedPtr vehicle_command_publisher_;
rclcpp::Subscription<px4_msgs::msg::Timesync>::SharedPtr timesync_sub_;
//
rclcpp::Subscription<px4_msgs::msg::VehicleLocalPosition>::SharedPtr subscription_;
//
std::atomic<uint64_t> timestamp_; //!< common synced timestamped
uint64_t offboard_setpoint_counter_; //!< counter for the number of setpoints sent
void publish_offboard_control_mode() const;
void publish_trajectory_setpoint() const;
void publish_vehicle_command(uint16_t command, float param1 = 0.0,
float param2 = 0.0) const;
};
void setpoint::arm() const {
publish_vehicle_command(VehicleCommand::VEHICLE_CMD_COMPONENT_ARM_DISARM, 1.0);
RCLCPP_INFO(this->get_logger(), "Arm command send");
}
void setpoint::disarm() const {
publish_vehicle_command(VehicleCommand::VEHICLE_CMD_COMPONENT_ARM_DISARM, 0.0);
RCLCPP_INFO(this->get_logger(), "Disarm command send");
}
void setpoint::publish_offboard_control_mode() const {
OffboardControlMode msg{};
msg.timestamp = timestamp_.load();
msg.position = true;
msg.velocity = false;
msg.acceleration = false;
msg.attitude = false;
msg.body_rate = false;
offboard_control_mode_publisher_->publish(msg);
}
void setpoint::publish_trajectory_setpoint() const {
TrajectorySetpoint msg{};
msg.timestamp = timestamp_.load();
msg.position = {waypoints[waypointIndex][0],waypoints[waypointIndex][1],waypoints[waypointIndex][2]};
msg.yaw = std::nanf("0"); //-3.14; // [-PI:PI]
trajectory_setpoint_publisher_->publish(msg);
}
void setpoint::publish_vehicle_command(uint16_t command, float param1,
float param2) const {
VehicleCommand msg{};
msg.timestamp = timestamp_.load();
msg.param1 = param1;
msg.param2 = param2;
msg.command = command;
msg.target_system = 1;
msg.target_component = 1;
msg.source_system = 1;
msg.source_component = 1;
msg.from_external = true;
vehicle_command_publisher_->publish(msg);
}
int main(int argc, char* argv[]) {
std::cout << "Starting setpoint node..." << std::endl;
setvbuf(stdout, NULL, _IONBF, BUFSIZ);
rclcpp::init(argc, argv);
rclcpp::spin(std::make_shared<setpoint>());
rclcpp::shutdown();
return 0;
}
We send the setpoints to the controller by giving reference points. The aircraft will then try to maneuver to the given points via its control strategy (usually PID). Therefore, to have a smooth maneuver, it is usually suggested to give a series of discrete points between two waypoints, i.e., trajectory which parameterized by time. It should then solve the abrupt motion of your UAV. I'm no expert, but I hope this helps.

Possible bug in libc++ for mac os ,string destructor is not called when string obj goes out of scope

In libc++ i have found that basic_string destructor does not gets called , once string goes out of the scope the memory is freed by calling delete operator rather than calling its destructor and then calling the delete operator from destructor, why so?
Can some one explain this?
see the sample program
void * operator new ( size_t len ) throw ( std::bad_alloc )
{
void * mem = malloc( len );
if ( (mem == 0) && (len != 0) )
throw std::bad_alloc();
return mem;
}
void operator delete ( void * ptr ) throw()
{
if ( ptr != 0 )
free( ptr );
}
int main(int argc, const char * argv[])
{
std::string mystr("testing very very very big string for string class");
std::string mystr2(mystr1.begin(),mystr.end());
}
Put break point on new and delete and then check the call stack.
new operator gets called from basic_string class while the delete gets called from the end of main, while ideally basic_string destructor should have called first and then the delete operator should be called via deallocate call of allocator, this is valid for 2nd string creation.
I'm seeing the same thing in the debugger that you are; I don't know for sure, but I suspect that stuff is getting inlined. The destructor for basic_string is very small; a single test (for the small string optimization), and then a call to the allocator's deallocate function (through allocate_traits). std::allocators allocate function is also quite small, just a wrapper around operator delete.
You could test this by writing your own allocator. (Later: see below)
More stuff that I generated while investigating this question; read on if you're interested.
[Note: there's a bug in your code - in the second line you wrote: mystr1.begin(),mystr.end()) - where is mystr1 declared?]
Assuming that's a typo, I tried some slightly different code:
#include <string>
#include <new>
#include <iostream>
int news = 0;
int dels = 0;
void * operator new ( size_t len ) throw ( std::bad_alloc )
{
void * mem = malloc( len );
if ( (mem == 0) && (len != 0) )
throw std::bad_alloc();
++news;
return mem;
}
void operator delete ( void * ptr ) throw()
{
++dels;
if ( ptr != 0 )
free( ptr );
}
int main(int argc, const char * argv[])
{
{
std::string mystr("testing very very very big string for string class");
std::string mystr2(mystr.begin(),mystr.end());
std::cout << "News = " << news << "; Dels = " << dels << std::endl;
}
std::cout << "News = " << news << "; Dels = " << dels << std::endl;
}
If you run this code, it prints (at least for me):
News = 2; Dels = 0
News = 2; Dels = 2
which is exactly what it should.
If I toss the code into compiler explorer, then I see both the calls to basic_string::~basic_string(), exactly as I expect. (Well, I see three of them, but one of them is in an exception handling block, which ends with a call to _Unwind_resume).
Later - this code:
#include <string>
#include <new>
#include <iostream>
int news = 0;
int dels = 0;
template <class T>
class MyAllocator
{
public:
typedef T value_type;
MyAllocator() noexcept {}
template <class U>
MyAllocator(MyAllocator<U>) noexcept {}
T* allocate(std::size_t n)
{
++news;
return static_cast<T*>(::operator new(n*sizeof(T)));
}
void deallocate(T* p, std::size_t)
{
++dels;
return ::operator delete(static_cast<void*>(p));
}
friend bool operator==(MyAllocator, MyAllocator) {return true;}
friend bool operator!=(MyAllocator, MyAllocator) {return false;}
};
int main(int argc, const char * argv[])
{
{
typedef std::basic_string<char, std::char_traits<char>, MyAllocator<char>> S;
S mystr("testing very very very big string for string class");
S mystr2(mystr.begin(),mystr.end());
std::cout << "Allocator News = " << news << "; Allocator Dels = " << dels << std::endl;
}
std::cout << "Allocator News = " << news << "; Allocator Dels = " << dels << std::endl;
}
prints:
Allocator News = 2; Allocator Dels = 0
Allocator News = 2; Allocator Dels = 2
which confirms that the allocator is getting called.

obtaining scsi(including SAS and FC) hardisk model and serial number

I have recently been playing around with some hard drive stuff. Now what I want to do is print out the model and serial number of harddisk. Sata drives are very easy with ioctl. scsi on the other hand I have to send an inquiry command. I found a very helpful site which explains everything and even has a example program: http://tldp.org/HOWTO/archived/SCSI-Programming-HOWTO/SCSI-Programming-HOWTO-24.html
but I only get nothing or gibberish as a result if I print it out. I even had to fix the program as stdlib wasn't included and the function Inquiry returned a local variable. But I have no idea how to fix it...
#define DEVICE "/dev/sdb"
/* Example program to demonstrate the generic SCSI interface */
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>
#include <scsi/sg.h>
#define SCSI_OFF sizeof(struct sg_header)
static unsigned char cmd[SCSI_OFF + 18]; /* SCSI command buffer */
int fd; /* SCSI device/file descriptor */
/* process a complete scsi cmd. Use the generic scsi interface. */
static int handle_scsi_cmd(unsigned cmd_len, /* command length */
unsigned in_size, /* input data size */
unsigned char *i_buff, /* input buffer */
unsigned out_size, /* output data size */
unsigned char *o_buff /* output buffer */
)
{
int status = 0;
struct sg_header *sg_hd;
/* safety checks */
if (!cmd_len) return -1; /* need a cmd_len != 0 */
if (!i_buff) return -1; /* need an input buffer != NULL */
#ifdef SG_BIG_BUFF
if (SCSI_OFF + cmd_len + in_size > SG_BIG_BUFF) return -1;
if (SCSI_OFF + out_size > SG_BIG_BUFF) return -1;
#else
if (SCSI_OFF + cmd_len + in_size > 4096) return -1;
if (SCSI_OFF + out_size > 4096) return -1;
#endif
if (!o_buff) out_size = 0;
/* generic scsi device header construction */
sg_hd = (struct sg_header *) i_buff;
sg_hd->reply_len = SCSI_OFF + out_size;
sg_hd->twelve_byte = cmd_len == 12;
sg_hd->result = 0;
#if 0
sg_hd->pack_len = SCSI_OFF + cmd_len + in_size; /* not necessary */
sg_hd->pack_id; /* not used */
sg_hd->other_flags; /* not used */
#endif
/* send command */
status = write( fd, i_buff, SCSI_OFF + cmd_len + in_size );
if ( status < 0 || status != SCSI_OFF + cmd_len + in_size ||
sg_hd->result ) {
/* some error happened */
fprintf( stderr, "write(generic) result = 0x%x cmd = 0x%x\n",
sg_hd->result, i_buff[SCSI_OFF] );
perror("");
return status;
}
if (!o_buff) o_buff = i_buff; /* buffer pointer check */
/* retrieve result */
status = read( fd, o_buff, SCSI_OFF + out_size);
if ( status < 0 || status != SCSI_OFF + out_size || sg_hd->result ) {
/* some error happened */
fprintf( stderr, "read(generic) result = 0x%x cmd = 0x%x\n",
sg_hd->result, o_buff[SCSI_OFF] );
fprintf( stderr, "read(generic) sense "
"%x %x %x %x %x %x %x %x %x %x %x %x %x %x %x %x\n",
sg_hd->sense_buffer[0], sg_hd->sense_buffer[1],
sg_hd->sense_buffer[2], sg_hd->sense_buffer[3],
sg_hd->sense_buffer[4], sg_hd->sense_buffer[5],
sg_hd->sense_buffer[6], sg_hd->sense_buffer[7],
sg_hd->sense_buffer[8], sg_hd->sense_buffer[9],
sg_hd->sense_buffer[10], sg_hd->sense_buffer[11],
sg_hd->sense_buffer[12], sg_hd->sense_buffer[13],
sg_hd->sense_buffer[14], sg_hd->sense_buffer[15]);
if (status < 0)
perror("");
}
/* Look if we got what we expected to get */
if (status == SCSI_OFF + out_size) status = 0; /* got them all */
return status; /* 0 means no error */
}
#define INQUIRY_CMD 0x12
#define INQUIRY_CMDLEN 6
#define INQUIRY_REPLY_LEN 96
#define INQUIRY_VENDOR 8 /* Offset in reply data to vendor name */
/* request vendor brand and model */
static unsigned char *Inquiry ( void )
{
unsigned char Inqbuffer[ SCSI_OFF + INQUIRY_REPLY_LEN ];
unsigned char cmdblk [ INQUIRY_CMDLEN ] =
{ INQUIRY_CMD, /* command */
0, /* lun/reserved */
0, /* page code */
0, /* reserved */
INQUIRY_REPLY_LEN, /* allocation length */
0 };/* reserved/flag/link */
memcpy( cmd + SCSI_OFF, cmdblk, sizeof(cmdblk) );
/*
* +------------------+
* | struct sg_header | <- cmd
* +------------------+
* | copy of cmdblk | <- cmd + SCSI_OFF
* +------------------+
*/
if (handle_scsi_cmd(sizeof(cmdblk), 0, cmd,
sizeof(Inqbuffer) - SCSI_OFF, Inqbuffer )) {
fprintf( stderr, "Inquiry failed\n" );
exit(2);
}
return (Inqbuffer + SCSI_OFF);
}
void main( void )
{
fd = open(DEVICE, O_RDWR);
if (fd < 0) {
fprintf( stderr, "Need read/write permissions for "DEVICE".\n" );
exit(1);
}
/* print some fields of the Inquiry result */
printf( "||%s||", Inquiry() + INQUIRY_VENDOR );
}

HIDAPI in two threads

According to https://github.com/signal11/hidapi/issues/72 HIDAPI ought to be thread safe on Linux machines. However, I can't get it working at all. This is what I do:
#ifdef WIN32
#include <windows.h>
#endif
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <stdlib.h>
#include <assert.h>
#include "hidapi.h"
hid_device *handle;
static void *TaskCode(void *argument)
{
int res;
//hid_device *handle;
unsigned char buf[64];
// res = hid_init();
// if( res == -1 )
// {
// return (void*)1;
// }
//
// handle = hid_open(0x0911, 0x251c, NULL);
// if( handle == NULL )
// {
// return (void*)2;
// }
printf( "while 2\n");
while( 1 )
{
memset( buf, 64, 0 );
res = hid_read(handle, buf, 0);
if( res == -1 )
{
return (void*)3;
}
printf( "received %d bytes\n", res);
for (int i = 0; i < res; i++)
printf("Byte %d: %02x ", i+1, buf[i]);
//printf( "%02x ", buf[0]);
fflush(stdout);
}
return (void*)0;
}
int main(int argc, char* argv[])
{
int res;
//hid_device *handle;
unsigned char buf[65];
res = hid_init();
if( res == -1 )
{
return 1;
}
handle = hid_open(0x0911, 0x251c, NULL);
if( handle == NULL )
{
return 2;
}
hid_set_nonblocking( handle, 0 );
pthread_t thread;
int rc = pthread_create(&thread, NULL, TaskCode, NULL);
printf( "while 1\n");
while(1)
{
int a = getchar();
if( a == 'a')
{
// Get Device Type (cmd 0x82). The first byte is the report number (0x0).
buf[0] = 0x0;
buf[1] = 0x82;
res = hid_write(handle, buf, 65);
if( res != -1 )
printf( "write ok, transferred %d bytes\n", res );
else
{
printf( "write error\n" );
char* str = hid_error(handle);
printf( "error: %s\n", str );
return 1;
}
}
else if( a== 'b')
break;
}
void* trc;
rc = pthread_join(thread, &trc);
printf( "rc code: %d\n", (int)trc );
// Finalize the hidapi library
res = hid_exit();
return 0;
}
If I don't use the global handle, I get 'write error' every time. If I do, as in the example, formally everything works but hid_read always returns 0 bytes... Of course, if I do simple hid_write() followed by hid_read(), I'll get the correct reply to the command 0x82 as intended. I'm really lost here, am I overlooking something?
EDIT: to clarify, zero bytes return also for everything, incl. buttons on mouse etc. So it seems to work but the data buffer is always zero bytes.
Shame on me, a dumb mistake. The code should be:
memset( buf, 0, 64 );
res = hid_read(handle, buf, 64);
and then it works. Should sleep more and write less!

OpenCL: Strange buffer or image bahaviour with NVidia but not Amd

I have a big problem (on Linux):
I create a buffer with defined data, then an OpenCL kernel takes this data and puts it into an image2d_t. When working on an AMD C50 (Fusion CPU/GPU) the program works as desired, but on my GeForce 9500 GT the given kernel computes the correct result very rarely. Sometimes the result is correct, but very often it is incorrect. Sometimes it depends on very strange changes like removing unused variable declarations or adding a newline. I realized that disabling the optimization will increase the probability to fail. I have the most actual display driver in both systems.
Here is my reduced code:
#include <CL/cl.h>
#include <string>
#include <iostream>
#include <sstream>
#include <cmath>
void checkOpenCLErr(cl_int err, std::string name){
const char* errorString[] = {
"CL_SUCCESS",
"CL_DEVICE_NOT_FOUND",
"CL_DEVICE_NOT_AVAILABLE",
"CL_COMPILER_NOT_AVAILABLE",
"CL_MEM_OBJECT_ALLOCATION_FAILURE",
"CL_OUT_OF_RESOURCES",
"CL_OUT_OF_HOST_MEMORY",
"CL_PROFILING_INFO_NOT_AVAILABLE",
"CL_MEM_COPY_OVERLAP",
"CL_IMAGE_FORMAT_MISMATCH",
"CL_IMAGE_FORMAT_NOT_SUPPORTED",
"CL_BUILD_PROGRAM_FAILURE",
"CL_MAP_FAILURE",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"CL_INVALID_VALUE",
"CL_INVALID_DEVICE_TYPE",
"CL_INVALID_PLATFORM",
"CL_INVALID_DEVICE",
"CL_INVALID_CONTEXT",
"CL_INVALID_QUEUE_PROPERTIES",
"CL_INVALID_COMMAND_QUEUE",
"CL_INVALID_HOST_PTR",
"CL_INVALID_MEM_OBJECT",
"CL_INVALID_IMAGE_FORMAT_DESCRIPTOR",
"CL_INVALID_IMAGE_SIZE",
"CL_INVALID_SAMPLER",
"CL_INVALID_BINARY",
"CL_INVALID_BUILD_OPTIONS",
"CL_INVALID_PROGRAM",
"CL_INVALID_PROGRAM_EXECUTABLE",
"CL_INVALID_KERNEL_NAME",
"CL_INVALID_KERNEL_DEFINITION",
"CL_INVALID_KERNEL",
"CL_INVALID_ARG_INDEX",
"CL_INVALID_ARG_VALUE",
"CL_INVALID_ARG_SIZE",
"CL_INVALID_KERNEL_ARGS",
"CL_INVALID_WORK_DIMENSION",
"CL_INVALID_WORK_GROUP_SIZE",
"CL_INVALID_WORK_ITEM_SIZE",
"CL_INVALID_GLOBAL_OFFSET",
"CL_INVALID_EVENT_WAIT_LIST",
"CL_INVALID_EVENT",
"CL_INVALID_OPERATION",
"CL_INVALID_GL_OBJECT",
"CL_INVALID_BUFFER_SIZE",
"CL_INVALID_MIP_LEVEL",
"CL_INVALID_GLOBAL_WORK_SIZE",
};
if (err != CL_SUCCESS) {
std::stringstream str;
str << errorString[-err] << " (" << err << ")";
throw std::string(name)+(str.str());
}
}
int main(){
try{
cl_context m_context;
cl_platform_id* m_platforms;
unsigned int m_numPlatforms;
cl_command_queue m_queue;
cl_device_id m_device;
cl_int error = 0; // Used to handle error codes
clGetPlatformIDs(0,NULL,&m_numPlatforms);
m_platforms = new cl_platform_id[m_numPlatforms];
error = clGetPlatformIDs(m_numPlatforms,m_platforms,&m_numPlatforms);
checkOpenCLErr(error, "getPlatformIDs");
// Device
error = clGetDeviceIDs(m_platforms[0], CL_DEVICE_TYPE_GPU, 1, &m_device, NULL);
checkOpenCLErr(error, "getDeviceIDs");
// Context
cl_context_properties properties[] =
{ CL_CONTEXT_PLATFORM, (cl_context_properties)(m_platforms[0]), 0};
m_context = clCreateContextFromType(properties, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
// m_private->m_context = clCreateContext(properties, 1, &m_private->m_device, NULL, NULL, &error);
checkOpenCLErr(error, "Create context");
// Command-queue
m_queue = clCreateCommandQueue(m_context, m_device, 0, &error);
checkOpenCLErr(error, "Create command queue");
//Build program and kernel
const char* source = "#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable\n"
"\n"
"__kernel void bufToImage(__global unsigned char* in, __write_only image2d_t out, const unsigned int offset_x, const unsigned int image_width , const unsigned int maxval ){\n"
"\tint i = get_global_id(0);\n"
"\tint j = get_global_id(1);\n"
"\tint width = get_global_size(0);\n"
"\tint height = get_global_size(1);\n"
"\n"
"\tint pos = j*image_width*3+(offset_x+i)*3;\n"
"\tif( maxval < 256 ){\n"
"\t\tfloat4 c = (float4)(in[pos],in[pos+1],in[pos+2],1.0f);\n"
"\t\tc.x /= maxval;\n"
"\t\tc.y /= maxval;\n"
"\t\tc.z /= maxval;\n"
"\t\twrite_imagef(out, (int2)(i,j), c);\n"
"\t}else{\n"
"\t\tfloat4 c = (float4)(255.0f*in[2*pos]+in[2*pos+1],255.0f*in[2*pos+2]+in[2*pos+3],255.0f*in[2*pos+4]+in[2*pos+5],1.0f);\n"
"\t\tc.x /= maxval;\n"
"\t\tc.y /= maxval;\n"
"\t\tc.z /= maxval;\n"
"\t\twrite_imagef(out, (int2)(i,j), c);\n"
"\t}\n"
"}\n"
"\n"
"__constant sampler_t imageSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;\n"
"\n"
"__kernel void imageToBuf(__read_only image2d_t in, __global unsigned char* out, const unsigned int offset_x, const unsigned int image_width ){\n"
"\tint i = get_global_id(0);\n"
"\tint j = get_global_id(1);\n"
"\tint pos = j*image_width*3+(offset_x+i)*3;\n"
"\tfloat4 c = read_imagef(in, imageSampler, (int2)(i,j));\n"
"\tif( c.x <= 1.0f && c.y <= 1.0f && c.z <= 1.0f ){\n"
"\t\tout[pos] = c.x*255.0f;\n"
"\t\tout[pos+1] = c.y*255.0f;\n"
"\t\tout[pos+2] = c.z*255.0f;\n"
"\t}else{\n"
"\t\tout[pos] = 200.0f;\n"
"\t\tout[pos+1] = 0.0f;\n"
"\t\tout[pos+2] = 255.0f;\n"
"\t}\n"
"}\n";
cl_int err;
cl_program prog = clCreateProgramWithSource(m_context,1,&source,NULL,&err);
if( -err != CL_SUCCESS ) throw std::string("clCreateProgramWithSources");
err = clBuildProgram(prog,0,NULL,"-cl-opt-disable",NULL,NULL);
if( -err != CL_SUCCESS ) throw std::string("clBuildProgram(fromSources)");
cl_kernel kernel = clCreateKernel(prog,"bufToImage",&err);
checkOpenCLErr(err,"CreateKernel");
cl_uint imageWidth = 80;
cl_uint imageHeight = 90;
//Initialize datas
cl_uint maxVal = 255;
cl_uint offsetX = 0;
int size = imageWidth*imageHeight*3;
int resSize = imageWidth*imageHeight*4;
cl_uchar* data = new cl_uchar[size];
cl_float* expectedData = new cl_float[resSize];
for( int i = 0,j=0; i < size; i++,j++ ){
data[i] = (cl_uchar)i;
expectedData[j] = (cl_float)((unsigned char)i)/255.0f;
if ( i%3 == 2 ){
j++;
expectedData[j] = 1.0f;
}
}
cl_mem inBuffer = clCreateBuffer(m_context,CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,size*sizeof(cl_uchar),data,&err);
checkOpenCLErr(err, "clCreateBuffer()");
clFinish(m_queue);
cl_image_format imgFormat;
imgFormat.image_channel_order = CL_RGBA;
imgFormat.image_channel_data_type = CL_FLOAT;
cl_mem outImg = clCreateImage2D( m_context, CL_MEM_READ_WRITE, &imgFormat, imageWidth, imageHeight, 0, NULL, &err );
checkOpenCLErr(err,"get2DImage()");
clFinish(m_queue);
size_t kernelRegion[]={imageWidth,imageHeight};
size_t kernelWorkgroup[]={1,1};
//Fill kernel with data
clSetKernelArg(kernel,0,sizeof(cl_mem),&inBuffer);
clSetKernelArg(kernel,1,sizeof(cl_mem),&outImg);
clSetKernelArg(kernel,2,sizeof(cl_uint),&offsetX);
clSetKernelArg(kernel,3,sizeof(cl_uint),&imageWidth);
clSetKernelArg(kernel,4,sizeof(cl_uint),&maxVal);
//Run kernel
err = clEnqueueNDRangeKernel(m_queue,kernel,2,NULL,kernelRegion,kernelWorkgroup,0,NULL,NULL);
checkOpenCLErr(err,"RunKernel");
clFinish(m_queue);
//Check resulting data for validty
cl_float* computedData = new cl_float[resSize];;
size_t region[]={imageWidth,imageHeight,1};
const size_t offset[] = {0,0,0};
err = clEnqueueReadImage(m_queue,outImg,CL_TRUE,offset,region,0,0,computedData,0,NULL,NULL);
checkOpenCLErr(err, "readDataFromImage()");
clFinish(m_queue);
for( int i = 0; i < resSize; i++ ){
if( fabs(expectedData[i]-computedData[i])>0.1 ){
std::cout << "Expected: \n";
for( int j = 0; j < resSize; j++ ){
std::cout << expectedData[j] << " ";
}
std::cout << "\nComputed: \n";
std::cout << "\n";
for( int j = 0; j < resSize; j++ ){
std::cout << computedData[j] << " ";
}
std::cout << "\n";
throw std::string("Error, computed and expected data are not the same!\n");
}
}
}catch(std::string& e){
std::cout << "\nCaught an exception: " << e << "\n";
return 1;
}
std::cout << "Works fine\n";
return 0;
}
I also uploaded the source code for you to make it easier to test it:
http://www.file-upload.net/download-3524302/strangeOpenCLError.cpp.html
Please can you tell me if I've done wrong anything?
Is there any mistake in the code or is this a bug in my driver?
Best reagards,
Alex
Edit: changed the program (both: here and the linked one) a little bit to make it more likely to get a mismatch.
I found the bug and this is an annoying one:
When working under linux and just linking the OpenCL program with the most actual "OpenCV" library (yes, the computation lib), the binary parts of the kernels, which get compiled and cached in ~/.nv are damaged.
Can you please install the actual OpenCV library and execute following commands:
Generating bad kernel maybe leading sometimes to bad behaviour:
rm -R ~/.nv && g++ strangeOpenCLError.cpp -lOpenCL -lopencv_gpu -o strangeOpenCLError && ./strangeOpenCLError && ls -la ~/.nv/ComputeCache/*/*
Generating good kernel which performs as desired:
rm -R ~/.nv && g++ strangeOpenCLError.cpp -lOpenCL -o strangeOpenCLError && ./strangeOpenCLError && ls -la ~/.nv/ComputeCache/*/*
In my system when using -lopencv_gpu or -lopencv_core I get a kernel object in ~/.nv with a slightly other size due to sightly different binary parts. So these smaller kernels computed bad results in my systems.
The problem is that the bug does not always appear: Sometimes just when working on buffers, which are big enough. So the more relyable measurement is the different kernel-cache size. I edited the program in my question, now it is more likely that it will create the bad result.
Best regards,
Alex
PS: I also created a bug report at NVidia and it is in progress. They could reproduce the bug on their system.
To turn off Nvidia compiler cache, set env. variable CUDA_CACHE_DISABLE=1. That may helps to avoid the problem in future.
In line
m_context = clCreateContextFromType(properties, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
you should use &error as last parameter to get a meaningful error. Without it I got some silly error messages. (I needed to change the platform to get my GPU board.)
I can not reproduce the error with my nVidia GeForce 8600 GTS. I get a 'Works fine'. I tried it >20 times without any issue.
I also can not see any error beside that you code is a little confusing. You should remove all commented out code and introduce some blank lines for grouping the code a little bit.
Do you have the latest drivers? The behavior you describe sounds very familiar like an uninitialized buffer or variable, but I do not see anything like that.

Resources