Beaglebone: prussdrv_pru_send_event - beagleboneblack

With a Beaglebone black, using the PRU, I have to send an event to the binary code, which is just looping.
The binary (assembly) must understand the event and stop the execution sending an event back to PRU_example.c.
PRU_example.c
...
/* Load and execute binary on PRU */
prussdrv_exec_program (PRU_NUM, "./PRU_example.bin");
sleep(1);
?? prussdrv_pru_send_event ( ?? ); //kill PRU_example.bin
PRU_example.p
...
LOOP:
jmp LOOP // Jump to the lable LOOP
...
HALT
I suppose to use the function prussdrv_pru_send_event, but how is the code in the assembly?

c++
#include "prussdrv.h"
#include <pruss_intc_mapping.h>
prussdrv_exec_program (PRU_NUM, "./PRU_example.bin");
sleep(1);
prussdrv_pru_send_event(ARM_PRU0_INTERRUPT);
//wait till it will be completed
prussdrv_pru_wait_event (PRU_EVTOUT_0);
prussdrv_pru_clear_event (PRU_EVTOUT_0, PRU0_ARM_INTERRUPT );
/* Disable PRU and close memory mapping*/
prussdrv_pru_disable(PRU_NUM);
prussdrv_exit ();
.p
#define PRU0_R31_VEC_VALID 32 // allows notification of program completion
#define PRU_EVTOUT_0 3 // the event number is sent back
.macro MOV32
.mparam dst, src
MOV dst.w0, src & 0xFFFF
MOV dst.w2, src >> 16
.endm
#define temp32reg r10 // temporary register 4bytes
// clear interrupt
MOV32 temp32reg, (0x00000000 | 21)
SBCO temp32reg, CONST_PRUSSINTC, SICR_OFFSET, 4
LOOP:
...
QBBS END, r31,30 // Exit when receive an interrupt
JMP LOOP
END:
MOV R31.b0, PRU0_R31_VEC_VALID | PRU_EVTOUT_0 // Send and output interrupt
HALT

Related

CORTEX_M4_0: error occurs in GPIO code in debug mode

I am having this error whenever I try to debug my program using Code Composer Studio V 9.1.0 :
CORTEX_M4_0: Trouble Reading Memory Block at 0x400043fc on Page 0 of Length 0x4: Debug Port error occurred
I am using a Texas Instruments TM4C123GXL launchpad, and it connects to my laptop via a USB cable. I can successfully build my program, but the errors show up whenever I try to debug my program. My program is supposed to use SysTick interrupts to continuously vary the voltage on an Elegoo membrane switch module to allow the program to see which button I've pressed. I'm not 100% sure I've correctly initialized the GPIO input and output ports, but the errors occur before the program even starts and reaches my main loop.
Here is a screenshot of some code and my errors:
Here is some code:
void SysTickInit()
{
NVIC_ST_CTRL_R = 0;
NVIC_ST_RELOAD_R = 0x0C3500; // delays for 10ms (800,000 in hex) '0x0C3500' is
// original correct
NVIC_ST_CURRENT_R = 0;
NVIC_ST_CTRL_R = 0x07; // activates the systick clock, interrupts and enables it again.
}
void Delay1ms(uint32_t n)
{
uint32_t volatile time;
while (n)
{
time = 72724 * 2 / 91; // 1msec, tuned at 80 MHz
while (time)
{
time--;
}
n--;
}
}
void SysTick_Handler(void) // this function is suppose to change which port
// outputs voltage and runs every time systick goes
// to zero
{
if (Counter % 4 == 0)
{
Counter++;
GPIO_PORTA_DATA_R &= 0x00; // clears all of port A ( 2-5)
GPIO_PORTA_DATA_R |= 0x04; // activates the voltage for PORT A pin 2
Delay1ms(990);
}
else if (Counter % 4 == 1)
{
Counter++;
GPIO_PORTA_DATA_R &= 0x00; // clears all of port A (2-5)
GPIO_PORTA_DATA_R |= 0x08; // activates voltage for PORT A pin 3
Delay1ms(990);
}
else if (Counter % 4 == 2)
{
Counter++;
GPIO_PORTA_DATA_R &= 0x00; // clears all of port A (2-5)
GPIO_PORTA_DATA_R |= 0x10; // activates voltage for PORT A pin 4
Delay1ms(990);
}
else if (Counter % 4 == 3)
{
Counter++;
GPIO_PORTA_DATA_R &= 0x00; // clears all of port A (2-5)
GPIO_PORTA_DATA_R |= 0x20; // activates voltage for PORT A pin 5
Delay1ms(990);
}
}
void KeyPadInit()
{
SYSCTL_RCGCGPIO_R |= 0x03; // turns on the clock for Port A and Port B
while ((SYSCTL_RCGCGPIO_R) != 0x03) { }; // waits for clock to stabilize
GPIO_PORTA_DIR_R |= 0x3C; // Port A pins 2-5 are outputs (i think)
GPIO_PORTA_DEN_R |= 0x3C; // digitally enables Port A pins 2-5
GPIO_PORTA_DIR_R &= ~0xC0; // makes Port A pin 6 and 7 inputs
GPIO_PORTA_DEN_R |= 0XC0; // makes Port A pin 6 and 7 digitally enabled
GPIO_PORTB_DIR_R &= ~0X03; // makes Port B pin 0 and 1 inputs
GPIO_PORTB_DEN_R |= 0x03; // makes PortB pin 0 and 1 digitally enabled
}
I found out why my error was occurring. I went to my professor and he had a custom-built device to check if my pins on my launchpad were working. It turns out some of the pins I was using on Port A and B drew too much current, and according to the custom-built device, became busted. In other words, this error came about because the IDE detected that some of my pins weren't operational anymore.
In my case I had to disable the AHB (advanced high-performance bus) in the GPIOHBCTL register to read the GPIO_PORTx register. If you want to read the register when the AHB is activated you must read the GPIO_PORTx_AHB register.
Access to GPIO_PORTA_AHB register when AHB is activated - it works
Access to GPIO_PORTA register when AHB is activated - it fails
The Solution is to enable The clock getting into the peripheral using the RCC Section

Whats the difference between pthread_join and pthread_mutex_lock?

The following code is taken from this site and it shows how to use mutexes. It implements both pthread_join and pthread_mutex_lock:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *functionC();
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
int counter = 0;
main()
{
int rc1, rc2;
pthread_t thread1, thread2;
/* Create independent threads each of which will execute functionC */
if( (rc1=pthread_create( &thread1, NULL, &functionC, NULL)) )
{
printf("Thread creation failed: %d\n", rc1);
}
if( (rc2=pthread_create( &thread2, NULL, &functionC, NULL)) )
{
printf("Thread creation failed: %d\n", rc2);
}
/* Wait till threads are complete before main continues. Unless we */
/* wait we run the risk of executing an exit which will terminate */
/* the process and all threads before the threads have completed. */
pthread_join( thread1, NULL);
pthread_join( thread2, NULL);
exit(EXIT_SUCCESS);
}
void *functionC()
{
pthread_mutex_lock( &mutex1 );
counter++;
printf("Counter value: %d\n",counter);
pthread_mutex_unlock( &mutex1 );
}
I ran the code as given above as it is and it produced following result:
Counter value: 1
Counter value: 2
But in the second run i removed "pthread_mutex_lock( &mutex1 );" and "pthread_mutex_unlock( &mutex1 );" . I compiled and ran the code, it again produced the same result.
Now the thing that confuses me is why mutex lock is used in above code when same thing can be done without it (using pthread_join)? If pthread_join prevents another thread from running untill the first one has finished then i think it would already prevent the other thread from accessing the counter value. Whats the purpose of pthread_mutex_lock?
The join prevents the starting thread from running (and thus terminating the process) until thread1 and thread2 finish. It doesn't provide any synchronization between thread1 and thread2. The mutex prevents thread1 from reading the counter while thread2 is modifying it, or vice versa.
Without the mutex, the most obvious thing that could go wrong is that thread1 and thread2 run in perfect synch. They each read zero from the counter, each add one to it, and each output "Counter value: 1".

IP camera capture

I am trying to capture the stream of two IP cameras directly connected to a mini PCIe dual gigabit expansion card in a nVidia Jetson TK1.
I achieved to capture the stream of both cameras using gstreamer with the next command:
gst-launch-0.10 rtspsrc location=rtsp://admin:123456#192.168.0.123:554/mpeg4cif latency=0 ! decodebin ! ffmpegcolorspace ! autovideosink rtspsrc location=rtsp://admin:123456#192.168.2.254:554/mpeg4cif latency=0 ! decodebin ! ffmpegcolorspace ! autovideosink
It displays one window per camera, but gives this output just when the capture starts:
WARNING: from element /GstPipeline:pipeline0/GstAutoVideoSink:autovideosink1/GstXvImageSink:autovideosink1-actual-sink-xvimage: A lot of buffers are being dropped.
Additional debug info:
gstbasesink.c(2875): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstAutoVideoSink:autovideosink1/GstXvImageSink:autovideosink1-actual-sink-xvimage:
There may be a timestamping problem, or this computer is too slow.
---> TVMR: Video-conferencing detected !!!!!!!!!
The stream is played good, with "good" synchronization also between cameras, but after a while, suddenly one of the cameras stops, and usually few seconds later the other one stops too. Using an interface snifer like Wireshark I can check that the rtsp packets are still sending from the cameras.
My purpose is to use this cameras to use them as a stereo camera using openCV. I am able to capture the stream with OpenCV with the following function:
camera[0].open("rtsp://admin:123456#192.168.2.254:554/mpeg4cif");//right
camera[1].open("rtsp://admin:123456#192.168.0.123:554/mpeg4cif");//left
It randomnly starts the capture good or bad, synchronized or not, with delay or not, but after a while is impossible to use the captured images as you can observe in the image:
And the output while running the openCV program usually is this: (I have copied the most complete one)
[h264 # 0x1b9580] slice type too large (2) at 0 23
[h264 # 0x1b9580] decode_slice_header error
[h264 # 0x1b1160] left block unavailable for requested intra mode at 0 6
[h264 # 0x1b1160] error while decoding MB 0 6, bytestream (-1)
[h264 # 0x1b1160] mmco: unref short failure
[h264 # 0x1b9580] too many reference frames
[h264 # 0x1b1160] pps_id (-1) out of range
The used cameras are two SIP-1080J modules.
Anyone knows how to achieve a good capture using openCV? First of all get rid of those h264 messages and have stable images while the program executes.
If not, how can I improve the pipelines and buffers using gstreamer to have a good capture without the sudden stop of the stream?. Although I never captured through openCV using gstreamer, perhaps some day I will know how to do it and solve this problem.
Thanks a lot.
After some days of deep search and some attempts, I turned on directly to use the gstreamer-0.10 API. First I learned how to use it with the tutorials from http://docs.gstreamer.com/pages/viewpage.action?pageId=327735
For most of the tutorials, you just need to install libgstreamer0.10-dev and some other packages. I installed all by:
sudo apt-get install libgstreamer0*
Then copy the code of the example you want to try into a .c file and type from the terminal in the folder where the .c file is located (In some examples you have to add more libs to pkg-config):
gcc basic-tutorial-1.c $(pkg-config --cflags --libs gstreamer-0.10) -o basic-tutorial-1.c
After that I did not feel lost I started to try to mix some c and c++ code. You can compile it using a proper g++ command, or with a CMakeLists.txt or the way you want to... As I am developing with a nVidia Jetson TK1, I use Nsight Eclipse Edition and I need to configure the project properties properly to be able to use the gstreamer-0.10 libs and the openCV libs.
Mixing some code, finally I am able to capture the streams of my two IP cameras in real time without appreciable delay, without bad decoding in any frame and both streams synchronized. The only thing left that I have not solved yet is the obtaining of frames in color and not in gray scale when (I have tried with other CV_ values with "segmentation fault" result):
v = Mat(Size(640, 360),CV_8U, (char*)GST_BUFFER_DATA(gstImageBuffer));
The complete code is next where I capture using gstreamer, transform the capture to a openCV Mat object and then show it. The code is for just a capture of one IP camera. You can replicate the objects and methods for capture multiple cameras at the same time.
#include <opencv2/core/core.hpp>
#include <opencv2/contrib/contrib.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/video/video.hpp>
#include <gst/gst.h>
#include <gst/app/gstappsink.h>
#include <gst/app/gstappbuffer.h>
#include <glib.h>
#define DEFAULT_LATENCY_MS 1
using namespace cv;
typedef struct _vc_cfg_data {
char server_ip_addr[100];
} vc_cfg_data;
typedef struct _vc_gst_data {
GMainLoop *loop;
GMainContext *context;
GstElement *pipeline;
GstElement *rtspsrc,*depayloader, *decoder, *converter, *sink;
GstPad *recv_rtp_src_pad;
} vc_gst_data;
typedef struct _vc_data {
vc_gst_data gst_data;
vc_cfg_data cfg;
} vc_data;
/* Global data */
vc_data app_data;
static void vc_pad_added_handler (GstElement *src, GstPad *new_pad, vc_data *data);
#define VC_CHECK_ELEMENT_ERROR(e, name) \
if (!e) { \
g_printerr ("Element %s could not be created. Exiting.\n", name); \
return -1; \
}
/*******************************************************************************
Gstreamer pipeline creation and init
*******************************************************************************/
int vc_gst_pipeline_init(vc_data *data)
{
GstStateChangeReturn ret;
// Template
GstPadTemplate* rtspsrc_pad_template;
// Create a new GMainLoop
data->gst_data.loop = g_main_loop_new (NULL, FALSE);
data->gst_data.context = g_main_loop_get_context(data->gst_data.loop);
// Create gstreamer elements
data->gst_data.pipeline = gst_pipeline_new ("videoclient");
VC_CHECK_ELEMENT_ERROR(data->gst_data.pipeline, "pipeline");
//RTP UDP Source - for received RTP messages
data->gst_data.rtspsrc = gst_element_factory_make ("rtspsrc", "rtspsrc");
VC_CHECK_ELEMENT_ERROR(data->gst_data.rtspsrc,"rtspsrc");
printf("URL: %s\n",data->cfg.server_ip_addr);
g_print ("Setting RTSP source properties: \n");
g_object_set (G_OBJECT (data->gst_data.rtspsrc), "location", data->cfg.server_ip_addr, "latency", DEFAULT_LATENCY_MS, NULL);
//RTP H.264 Depayloader
data->gst_data.depayloader = gst_element_factory_make ("rtph264depay","depayloader");
VC_CHECK_ELEMENT_ERROR(data->gst_data.depayloader,"rtph264depay");
//ffmpeg decoder
data->gst_data.decoder = gst_element_factory_make ("ffdec_h264", "decoder");
VC_CHECK_ELEMENT_ERROR(data->gst_data.decoder,"ffdec_h264");
data->gst_data.converter = gst_element_factory_make ("ffmpegcolorspace", "converter");
VC_CHECK_ELEMENT_ERROR(data->gst_data.converter,"ffmpegcolorspace");
// i.MX Video sink
data->gst_data.sink = gst_element_factory_make ("appsink", "sink");
VC_CHECK_ELEMENT_ERROR(data->gst_data.sink,"appsink");
gst_app_sink_set_max_buffers((GstAppSink*)data->gst_data.sink, 1);
gst_app_sink_set_drop ((GstAppSink*)data->gst_data.sink, TRUE);
g_object_set (G_OBJECT (data->gst_data.sink),"sync", FALSE, NULL);
//Request pads from rtpbin, starting with the RTP receive sink pad,
//This pad receives RTP data from the network (rtp-udpsrc).
rtspsrc_pad_template = gst_element_class_get_pad_template (GST_ELEMENT_GET_CLASS (data->gst_data.rtspsrc),"recv_rtp_src_0");
// Use the template to request the pad
data->gst_data.recv_rtp_src_pad = gst_element_request_pad (data->gst_data.rtspsrc, rtspsrc_pad_template,
"recv_rtp_src_0", NULL);
// Print the name for confirmation
g_print ("A new pad %s was created\n",
gst_pad_get_name (data->gst_data.recv_rtp_src_pad));
// Add elements into the pipeline
g_print(" Adding elements to pipeline...\n");
gst_bin_add_many (GST_BIN (data->gst_data.pipeline),
data->gst_data.rtspsrc,
data->gst_data.depayloader,
data->gst_data.decoder,
data->gst_data.converter,
data->gst_data.sink,
NULL);
// Link some of the elements together
g_print(" Linking some elements ...\n");
if(!gst_element_link_many (data->gst_data.depayloader, data->gst_data.decoder, data->gst_data.converter, data->gst_data.sink, NULL))
g_print("Error: could not link all elements\n");
// Connect to the pad-added signal for the rtpbin. This allows us to link
//the dynamic RTP source pad to the depayloader when it is created.
if(!g_signal_connect (data->gst_data.rtspsrc, "pad-added",
G_CALLBACK (vc_pad_added_handler), data))
g_print("Error: could not add signal handler\n");
// Set the pipeline to "playing" state
g_print ("Now playing A\n");
ret = gst_element_set_state (data->gst_data.pipeline, GST_STATE_PLAYING);
if (ret == GST_STATE_CHANGE_FAILURE) {
g_printerr ("Unable to set the pipeline A to the playing state.\n");
gst_object_unref (data->gst_data.pipeline);
return -1;
}
return 0;
}
static void vc_pad_added_handler (GstElement *src, GstPad *new_pad, vc_data *data) {
GstPad *sink_pad = gst_element_get_static_pad (data->gst_data.depayloader, "sink");
GstPadLinkReturn ret;
GstCaps *new_pad_caps = NULL;
GstStructure *new_pad_struct = NULL;
const gchar *new_pad_type = NULL;
g_print ("Received new pad '%s' from '%s':\n", GST_PAD_NAME (new_pad), GST_ELEMENT_NAME (src));
/* Check the new pad's name */
if (!g_str_has_prefix (GST_PAD_NAME (new_pad), "recv_rtp_src_")) {
g_print (" It is not the right pad. Need recv_rtp_src_. Ignoring.\n");
goto exit;
}
/* If our converter is already linked, we have nothing to do here */
if (gst_pad_is_linked (sink_pad)) {
g_print (" Sink pad from %s already linked. Ignoring.\n", GST_ELEMENT_NAME (src));
goto exit;
}
/* Check the new pad's type */
new_pad_caps = gst_pad_get_caps (new_pad);
new_pad_struct = gst_caps_get_structure (new_pad_caps, 0);
new_pad_type = gst_structure_get_name (new_pad_struct);
/* Attempt the link */
ret = gst_pad_link (new_pad, sink_pad);
if (GST_PAD_LINK_FAILED (ret)) {
g_print (" Type is '%s' but link failed.\n", new_pad_type);
} else {
g_print (" Link succeeded (type '%s').\n", new_pad_type);
}
exit:
/* Unreference the new pad's caps, if we got them */
if (new_pad_caps != NULL)
gst_caps_unref (new_pad_caps);
/* Unreference the sink pad */
gst_object_unref (sink_pad);
}
int vc_gst_pipeline_clean(vc_data *data) {
GstStateChangeReturn ret;
GstStateChangeReturn ret2;
/* Cleanup Gstreamer */
if(!data->gst_data.pipeline)
return 0;
/* Send the main loop a quit signal */
g_main_loop_quit(data->gst_data.loop);
g_main_loop_unref(data->gst_data.loop);
ret = gst_element_set_state (data->gst_data.pipeline, GST_STATE_NULL);
if (ret == GST_STATE_CHANGE_FAILURE) {
g_printerr ("Unable to set the pipeline A to the NULL state.\n");
gst_object_unref (data->gst_data.pipeline);
return -1;
}
g_print ("Deleting pipeline\n");
gst_object_unref (GST_OBJECT (data->gst_data.pipeline));
/* Zero out the structure */
memset(&data->gst_data, 0, sizeof(vc_gst_data));
return 0;
}
void handleKey(char key)
{
switch (key)
{
case 27:
break;
}
}
int vc_mainloop(vc_data* data)
{
GstBuffer *gstImageBuffer;
Mat v;
namedWindow("view",WINDOW_NORMAL);
while (1) {
gstImageBuffer = gst_app_sink_pull_buffer((GstAppSink*)data->gst_data.sink);
if (gstImageBuffer != NULL )
{
v = Mat(Size(640, 360),CV_8U, (char*)GST_BUFFER_DATA(gstImageBuffer));
imshow("view", v);
handleKey((char)waitKey(3));
gst_buffer_unref(gstImageBuffer);
}else{
g_print("gsink buffer didn't return buffer.");
}
}
return 0;
}
int main (int argc, char *argv[])
{
setenv("DISPLAY", ":0", 0);
strcpy(app_data.cfg.server_ip_addr, "rtsp://admin:123456#192.168.0.123:554/mpeg4cif");
gst_init (&argc, &argv);
if(vc_gst_pipeline_init(&app_data) == -1) {
printf("Gstreamer pipeline creation and init failed\n");
goto cleanup;
}
vc_mainloop(&app_data);
printf ("Returned, stopping playback\n");
cleanup:
return vc_gst_pipeline_clean(&app_data);
return 0;
}
I hope this helps!! ;)
uri = 'rtsp://admin:123456#192.168.0.123:554/mpeg4cif'
gst_str = ("rtspsrc location={} latency={} ! rtph264depay ! h264parse ! omxh264dec ! nvvidconv ! video/x-raw, width=(int){}, height=(int){}, format=(string)BGRx ! videoconvert ! appsink sync=false").format(uri, 200, 3072, 2048)
cap= cv2.VideoCapture(gst_str,cv2.CAP_GSTREAMER)
while(True):
_,frame = cap.read()
if frame is None:
break
cv2.imshow("",frame)
cv2.waitKey(0)
cap.release()
cv2.destroyAllWindows()

tty events in epoll queue after closing fd 0

consider the following case:
an EPOLLIN event is registered for fd 0 (stdin)
an EPOLLIN event is generated for fd 0 and implicitly queued for read within epoll
fd 0 is closed (and EPOLL_CTL_DELeted) before calling epoll_wait()
epoll_wait() is called to read the queued events
Now:
if stdin is a terminal, when epoll_wait() is called, the EPOLLIN event from step 2 will be reported
if stdin is not a terminal but a pipe, the EPOLLIN event from step 2 will not be reported
Why is the tty case different?
The test program:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/epoll.h>
struct epoll_event event1 = {
.events = EPOLLIN,
.data = { .fd = 0}
};
int main(int argc, char **argv)
{
int epfd = epoll_create(1);
int rc;
epoll_ctl(epfd, EPOLL_CTL_ADD, 0, &event1);
sleep(2); //allow meself time to type false\n
printf("closing stdin ...\n");
close(0);
//even if i remove it explicitly the event will be found in the que after
epoll_ctl(epfd, EPOLL_CTL_DEL, 0, &event1);
printf("gathering events ...\n");
event1.events = 0;
event1.data.fd = -1;
rc = epoll_wait(epfd, &event1, 1, 0);
switch(rc) {
case 1:
printf("event received: event=%d on fd %d\n", event1.events, event1.data.fd);
break;
case 0:
printf("no events received");
break;
case -1:
printf("epoll_wait error\n");
break;
default:
printf("weird event count %d\n", rc);
}
return 0;
}
running the program with stdin from tty:
[root#tinkerbell src]# ./epolltest
false
closing stdin ...
gathering events ...
event received: event=1 on fd 0
[root#tinkerbell src]# false
[root#tinkerbell src]#
running the program with stdin from a pipe:
[root#tinkerbell src]# cat t.sh
#!/bin/bash
echo "bah";
sleep 10;
[root#tinkerbell src]# ./t.sh | ./epolltest
closing stdin ...
gathering events ...
no events received[root#tinkerbell src]#
different question but the answer can be applied here as well
Events caught by epoll comes from a file* since that is the
abstraction the kernel handles. Events really happen on the file* and
there's no way if you dup()ing 1000 times a single fd, to say that
events are for fd = 122.

Calling MPI functions from multiple threads

I want to implement the following thing using MPI and Pthreads but facing some error:
Each processor will have 2 threads. Each processor's one thread will be sending data to other processors and the other thread will be receiving data from other processors. When I am implementing it, it is giving segmentation fault error with some messages like "current bytes -40, total bytes 0, remote id 5".
Just for testing purpose, when I am using only one thread per processor and that is either sending or receiving data, then the errors do NOT occur.
I found the info "In general, there may be problems if multiple threads make MPI calls. The program may fail or behave unexpectedly. If MPI calls must be made from within a thread, they should be made only by one thread." in the following link: https://computing.llnl.gov/tutorials/pthreads/
I want to use two threads per processor where one thread will use MPI_Send function to send some data and the other thread will receive MPI_Recv function to receive data without using any locking mechanism. Does anyone has any idea how to implement this or how to use multiple threads to call MPI functions without using mutex or locking mechanism?
Here is the code:
int rank, size, msg_num;
// thread function for sending messages
void *Send_Func_For_Thread(void *arg)
{
int send, procnum, x;
send = rank;
for(x=0; x < msg_num; x++)
{
procnum = rand()%size;
if(procnum != rank)
MPI_Send(&send, 1, MPI_INT, procnum, 0, MPI_COMM_WORLD);
}
// sending special message to other processors with tag = 128 to signal the finishing of sending message
for (x = 0; x < size; x++)
{
if(x != rank)
MPI_Send(&send, 1, MPI_INT, x, 128, MPI_COMM_WORLD);
}
pthread_exit((void *)NULL);
}
// thread function for receiving messages
void *Recv_Func_For_Thread(void *arg)
{
MPI_Status status;
int recv, counter = 0;
while(counter != size - 1)
{
MPI_Recv(&recv, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if(status.MPI_TAG == 128)
counter++;
}
pthread_exit((void *)NULL);
}
int main(int argc, char **argv)
{
void *stat;
pthread_attr_t attr;
pthread_t thread[2];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank); // rank -> rank of this processor
MPI_Comm_size(MPI_COMM_WORLD, &size); // size -> total number of processors
srand((unsigned)time(NULL));
msg_num = atoi(argv[1]);
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
// thread 0 will be sending messages
pthread_create(&thread[0], &attr, Send_Func_For_Thread, (void *)0);
// thread 1 will be receiving messages
pthread_create(&thread[1], &attr, Recv_Func_For_Thread, (void *)1);
pthread_attr_destroy(&attr);
pthread_join(thread[0], &stat);
pthread_join(thread[1], &stat);
cout << "Finished : Proc " << rank << "\n";
MPI_Finalize();
pthread_exit((void *)NULL);
return 0;
}
Compile:
========
module load mvapich2/gcc; mpicxx -lpthread -o demo demo.cpp
Run:
====
mpiexec -comm mpich2-pmi demo 10000000
I ran this program with 3 processors and got segmentation fault.
(Since you haven't provided an example, the following is just speculation.)
You must initialize MPI using MPI_Init_thread() instead of MPI_Init(). If I understand your explanation correctly, the "required" argument must have the value MPI_THREAD_MULTIPLE. If MPI_Init_thread() then returns a lower level thread support in the "provided" argument, it means that your MPI implementation doesn't support MPI_THREAD_MULTIPLE; in that case you must do something else. See http://www.mpi-forum.org/docs/mpi-20-html/node165.htm .
It worked with only one line change with MPICH2.
Instead of using MPI_Init, use the following line:
int provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
Thanks all of you for your help and prompt replies!

Resources