I have a problem with the RTOS firmware on the esp8266 (I have a esp12e), after flashing the firmware, reading from uart, it keeps stuck with those lines:
ets Jan 8 2013,rst cause:2, boot mode:(3,0)
load 0x40100000, len 31584, room 16
tail 0
chksum 0x24
load 0x3ffe8000, len 944, room 8
tail 8
chksum 0x9e
load 0x3ffe83b0, len 1080, room 0
tail 8
chksum 0x60
csum 0x60
Now I will explain my HW setup:
GPIO15 -> Gnd
EN -> Vcc
GPIO0 -> Gnd (when flashing)
GPIO0 -> Vcc (normal mode)
For the toolchain I've followed this tutorial and it works well:
http://microcontrollerkits.blogspot.it/2015/12/esp8266-eclipse-development.html
Then I started doing my RTOS blink example, I post my user_main.c code here:
#include "esp_common.h"
#include "gpio.h"
void task2(void *pvParameters)
{
printf("Hello, welcome to client!\r\n");
while(1)
{
// Delay and turn on
vTaskDelay (300/portTICK_RATE_MS);
GPIO_OUTPUT_SET (5, 1);
// Delay and LED off
vTaskDelay (300/portTICK_RATE_MS);
GPIO_OUTPUT_SET (5, 0);
}
}
/******************************************************************************
* FunctionName : user_rf_cal_sector_set
* Description : SDK just reversed 4 sectors, used for rf init data and paramters.
* We add this function to force users to set rf cal sector, since
* we don't know which sector is free in user's application.
* sector map for last several sectors : ABCCC
* A : rf cal
* B : rf init data
* C : sdk parameters
* Parameters : none
* Returns : rf cal sector
*******************************************************************************/
uint32 user_rf_cal_sector_set(void)
{
flash_size_map size_map = system_get_flash_size_map();
uint32 rf_cal_sec = 0;
switch (size_map) {
case FLASH_SIZE_4M_MAP_256_256:
rf_cal_sec = 128 - 5;
break;
case FLASH_SIZE_8M_MAP_512_512:
rf_cal_sec = 256 - 5;
break;
case FLASH_SIZE_16M_MAP_512_512:
case FLASH_SIZE_16M_MAP_1024_1024:
rf_cal_sec = 512 - 5;
break;
case FLASH_SIZE_32M_MAP_512_512:
case FLASH_SIZE_32M_MAP_1024_1024:
rf_cal_sec = 1024 - 5;
break;
default:
rf_cal_sec = 0;
break;
}
return rf_cal_sec;
}
/******************************************************************************
* FunctionName : user_init
* Description : entry of user application, init user function here
* Parameters : none
* Returns : none
*******************************************************************************/
void user_init(void)
{
uart_init_new();
printf("SDK version:%s\n", system_get_sdk_version());
// Config pin as GPIO5
PIN_FUNC_SELECT (PERIPHS_IO_MUX_GPIO5_U, FUNC_GPIO5);
xTaskCreate(task2, "tsk2", 256, NULL, 2, NULL);
}
I also post the flash command, the first executed one time, the second every time I modify the code:
c:/Espressif/utils/ESP8266/esptool.exe -p COM3 write_flash -ff 40m -fm qio -fs 32m 0x3FC000 c:/Espressif/ESP8266_RTOS_SDK/bin/esp_init_data_default.bin 0x3FE000 c:/Espressif/ESP8266_RTOS_SDK/bin/blank.bin 0x7E000 c:/Espressif/ESP8266_RTOS_SDK/bin/blank.bin
c:/Espressif/utils/ESP8266/esptool.exe -p COM3 -b 256000 write_flash -ff 40m -fm qio -fs 32m 0x00000 firmware/eagle.flash.bin 0x40000 firmware/eagle.irom0text.bin
There is something wrong? I really don't understand why it doesn't work.
When I try the NON-OS example they works very well.
I had the same problem as you. This issue is caused by the incorrect address of the eagle.irom0text.bin .
So I changed the address of the eagle.irom0text.bin from 0x40000 (0x10000) to 0x20000 and it worked well for me.
[RTOS SDK version: 1.4.2(f57d61a)]
The correct flash codes in the common_rtos.mk (ESP-12E)
for flashinit
flashinit:
$(vecho) "Flash init data default and blank data."
$(ESPTOOL) -p $(ESPPORT) write_flash $(flashimageoptions) 0x3fc000 $(SDK_BASE)/bin/esp_init_data_default.bin
$(ESPTOOL) -p $(ESPPORT) write_flash $(flashimageoptions) 0x3fe000 $(SDK_BASE)/bin/blank.bin
for flash:
flash: all
#ifeq ($(app), 0)
$(ESPTOOL) -p $(ESPPORT) -b $(ESPBAUD) write_flash $(flashimageoptions) 0x00000 $(FW_BASE)/eagle.flash.bin 0x20000 $(FW_BASE)/eagle.irom0text.bin
else
ifeq ($(boot), none)
$(ESPTOOL) -p $(ESPPORT) -b $(ESPBAUD) write_flash $(flashimageoptions) 0x00000 $(FW_BASE)/eagle.flash.bin 0x20000 $(FW_BASE)/eagle.irom0text.bin
else
$(ESPTOOL) -p $(ESPPORT) -b $(ESPBAUD) write_flash $(flashimageoptions) $(addr) $(FW_BASE)/upgrade/$(BIN_NAME).bin
endif
endif
Related
I'm trying to read a TSL237 light sensor using my ESP8266 Wemos D1 Mini board. I have got code for an Arduino Uno from here (see copied below) and loaded it on my board. I first tried pin D0 on my board (GPIO 16) for the sensor data input, then pin D1 (GPIO5). In both cases I get the same useless output; I've copied one loop of this output below. Any ideas what I'm doing wrong?
CODE:
#define TSL237 2
volatile unsigned long pulse_cnt = 0;
void setup() {
attachInterrupt(0, add_pulse, RISING);
pinMode(TSL237, INPUT);
Serial.begin(9600);
}
void add_pulse(){
pulse_cnt++;
return;
}
unsigned long Frequency() {
pulse_cnt = 0;
delay(10000);// this delay controlls pulse_cnt read out. longer delay == higher number
// DO NOT change this delay; it will void calibration.
unsigned long frequency = pulse_cnt;
return (frequency);
pulse_cnt = 0;
}
void loop() {
unsigned long frequency = Frequency();
Serial.println(frequency);
delay(5000);
}
OUTPUT:
14:26:59.605 -> ets Jan 8 2013,rst cause:2, boot mode:(3,6)
14:26:59.605 ->
14:26:59.605 -> load 0x4010f000, len 3460, room 16
14:26:59.605 -> tail 4
14:26:59.605 -> chksum 0xcc
14:26:59.605 -> load 0x3fff20b8, len 40, room 4
14:26:59.605 -> tail 4
14:26:59.605 -> chksum 0xc9
14:26:59.605 -> csum 0xc9
14:26:59.605 -> v00041fe0
14:26:59.605 -> ~ld
For a project I try do use the ESP8266 RTOS SDK.
First step I install the tools and the toolchain. The hello_world example and the other gpio example works fine. I try the softAP example and get a Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled Error. I figured out that the line 62 : .automode = WIFI_AUTH_WPA_WPA2_PSK not works. I tried WIFI_AUTH_WEP,WIFI_AUTH_WPA_PSK,WIFI_AUTH_WPA2_PSK but only with WIFI_AUTH_OPEN the softAP works. Anyone same behavior or some tips?
Console Trace:
ets Jan 8 2013,rst cause:1, boot mode:(3,6)
load 0x40100000, len 7040, room 16
tail 0
chksum 0xe5
load 0x3ffe8408, len 24, room 8
tail 0
chksum 0x6c
load 0x3ffe8420, len 3312, room 8
tail 8
chksum 0x75
csum 0x75
I (123) boot: ESP-IDF v3.4-rc 2nd stage bootloader
I (123) boot: compile time 19:41:32
I (207) qio_mode: Enabling default flash chip QIO
I (207) boot: SPI Speed : 40MHz
I (208) boot: SPI Mode : QOUT
I (212) boot: SPI Flash Size : 2MB
I (219) boot: Partition Table:
I (224) boot: ## Label Usage Type ST Offset Length
I (236) boot: 0 nvs WiFi data 01 02 00009000 00006000
I (247) boot: 1 phy_init RF data 01 01 0000f000 00001000
I (259) boot: 2 factory factory app 00 00 00010000 000f0000
I (271) boot: End of partition table
I (277) esp_image: segment 0: paddr=0x00010010 vaddr=0x40210010 size=0x52c80 (339072) map
I (406) esp_image: segment 1: paddr=0x00062c98 vaddr=0x40262c90 size=0x0f594 ( 62868) map
I (428) esp_image: segment 2: paddr=0x00072234 vaddr=0x3ffe8000 size=0x005fc ( 1532) load
I (429) esp_image: segment 3: paddr=0x00072838 vaddr=0x40100000 size=0x00080 ( 128) load
I (439) esp_image: segment 4: paddr=0x000728c0 vaddr=0x40100080 size=0x05560 ( 21856) load
I (460) boot: Loaded app from partition at offset 0x10000
I (481) wifi softAP: ESP_WIFI_MODE_AP
I (484) system_api: Base MAC address is not set, read default base MAC address from EFUSE
I (486) system_api: Base MAC address is not set, read default base MAC address from EFUSE
phy_version: 1163.0, 665d56c, Jun 24 2020, 10:00:08, RTOS new
I (557) phy_init: phy ver: 1163_0
I (567) wifi softAP: ----------------###------------
ESP_ERROR_CHECK failed: esp_err_t 0x2 (ERROR) at 0x4021f7cc
file: "softap_example_main.c" line 73
func: wifi_init_softap
expression: esp_wifi_set_config(ESP_IF_WIFI_AP, &wifi_config)
abort() was called at PC 0x4021f7cf on core 0
Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x40221c72 PS : 0x00000030 A0 : 0x40221c70 A1 : 0x3ffeb550
A2 : 0x00000000 A3 : 0xffffffdb A4 : 0x00000001 A5 : 0x00000001
A6 : 0x00000000 A7 : 0x4026663c A8 : 0x00000020 A9 : 0x00000000
A10 : 0x00000008 A11 : 0x00000000 A12 : 0x00000000 A13 : 0x00000000
A14 : 0x00000000 A15 : 0x00000000 SAR : 0x0000001e EXCCAUSE: 0x0000001d
Backtrace: 0x40221c72:0x3ffeb550 0x4021f7d2:0x3ffeb560 0x4022182e:0x3ffeb570 0x40221894:0x3ffeb630 0x402118ef:0x3ffeb640
Example Code from GitHub: (examples/wifi/getting_started/softAP/main/softap_example_main.c)
/* WiFi softAP Example
This example code is in the Public Domain (or CC0 licensed, at your option.)
Unless required by applicable law or agreed to in writing, this
software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied.
*/
#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_system.h"
#include "esp_wifi.h"
#include "esp_event.h"
#include "esp_log.h"
#include "nvs_flash.h"
#include "lwip/err.h"
#include "lwip/sys.h"
/* The examples use WiFi configuration that you can set via project configuration menu.
If you'd rather not, just change the below entries to strings with
the config you want - ie #define EXAMPLE_WIFI_SSID "mywifissid"
*/
#define EXAMPLE_ESP_WIFI_SSID CONFIG_ESP_WIFI_SSID
#define EXAMPLE_ESP_WIFI_PASS CONFIG_ESP_WIFI_PASSWORD
#define EXAMPLE_MAX_STA_CONN CONFIG_ESP_MAX_STA_CONN
static const char *TAG = "wifi softAP";
static void wifi_event_handler(void* arg, esp_event_base_t event_base,
int32_t event_id, void* event_data)
{
if (event_id == WIFI_EVENT_AP_STACONNECTED) {
wifi_event_ap_staconnected_t* event = (wifi_event_ap_staconnected_t*) event_data;
ESP_LOGI(TAG, "station "MACSTR" join, AID=%d",
MAC2STR(event->mac), event->aid);
} else if (event_id == WIFI_EVENT_AP_STADISCONNECTED) {
wifi_event_ap_stadisconnected_t* event = (wifi_event_ap_stadisconnected_t*) event_data;
ESP_LOGI(TAG, "station "MACSTR" leave, AID=%d",
MAC2STR(event->mac), event->aid);
}
}
void wifi_init_softap()
{
tcpip_adapter_init();
ESP_ERROR_CHECK(esp_event_loop_create_default());
wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
ESP_ERROR_CHECK(esp_wifi_init(&cfg));
ESP_ERROR_CHECK(esp_event_handler_register(WIFI_EVENT, ESP_EVENT_ANY_ID, &wifi_event_handler, NULL));
wifi_config_t wifi_config = {
.ap = {
.ssid = EXAMPLE_ESP_WIFI_SSID,
.ssid_len = strlen(EXAMPLE_ESP_WIFI_SSID),
.password = EXAMPLE_ESP_WIFI_PASS,
.max_connection = EXAMPLE_MAX_STA_CONN,
.authmode = WIFI_AUTH_WPA_WPA2_PSK
},
};
if (strlen(EXAMPLE_ESP_WIFI_PASS) == 0) {
wifi_config.ap.authmode = WIFI_AUTH_OPEN;
}
ESP_ERROR_CHECK(esp_wifi_set_mode(WIFI_MODE_AP));
ESP_ERROR_CHECK(esp_wifi_set_config(ESP_IF_WIFI_AP, &wifi_config));
ESP_ERROR_CHECK(esp_wifi_start());
ESP_LOGI(TAG, "wifi_init_softap finished. SSID:%s password:%s",
EXAMPLE_ESP_WIFI_SSID, EXAMPLE_ESP_WIFI_PASS);
}
void app_main()
{
ESP_ERROR_CHECK(nvs_flash_init());
ESP_LOGI(TAG, "ESP_WIFI_MODE_AP");
wifi_init_softap();
}
You have to run idf.py menuconfig and set the SSID and password values.
Password´s under 8 characters gets a Guru-Meditation ERR if the
wifi_config.ap.authmode = WIFI_AUTH_WEP or,
wifi_config.ap.authmode = WIFI_AUTH_WPA_PSK or,
wifi_config.ap.authmode = WIFI_AUTH_WPA2_PSK . Other authmodes not yet testet.
To provid this Error, make sure your pw has more then 7 characters or/and set
the If-Condition after the wifi_config to:
if (strlen(EXAMPLE_ESP_WIFI_PASS) < 8) {
wifi_config.ap.authmode = WIFI_AUTH_OPEN;
ESP_LOGI(TAG," pw-lenght under 8 charcters. Set WIFI_AUTH_OPEN");
}
Your WiFi is not protectet but your µC not crashes.
I am trying to change my pinmuxing modes by device tree overlays, but it seems that it has no effect to the specific registers. I tried the following tutorial:
http://www.valvers.com/embedded-linux/beaglebone-black/step04-gpio/
Default situation:
I am using Debian Linux, image date: 2016-01-24
kernel version: 4.1.15-ti-rt-r43
# output filtered for desired pins:
cat /sys/kernel/debug/pinctrl/44e10800.pinmux/Pins
pin 84 (44e10950.0) 00000037 pinctrl-single
pin 85 (44e10954.0) 00000037 pinctrl-single
pin 86 (44e10958.0) 00000037 pinctrl-single
pin 87 (44e1095c.0) 00000037 pinctrl-single
As you can see the all have mode 7, I need mode 0 (SPI0).
And no one else is using the pin (pretty much the same on every pin => GPIO UNCLAIMED):
pin 84 (44e10950.0): ocp:P9_22_pinmux (GPIO UNCLAIMED) function pinmux_P9_22_default_pin group pinmux_P9_22_default_pin
cat /sys/devices/platform/bone_capemgr/slots
0: PF---- -1
1: PF---- -1
2: PF---- -1
3: PF---- -1
4: P-O-L- 0 Override Board Name,00A0,Override Manuf,cape-universaln
Here is my device tree:
/dts-v1/;
/plugin/;
/ {
compatible = "ti,beaglebone", "ti,beaglebone-black";
part-number = "SPI_SLAVE_PINMUX";
version = "00A0";
fragment#0 {
target = <&am33xx_pinmux>;
__overlay__ {
spi_slave: pinmux_spi_slave {
pinctrl-single,pins = <
0x150 0x30 // SPI0_CS0 Mode 0, SPI
0x154 0x30 // SPI0_D1 Mode 0, SPI
0x158 0x10 // SPI0_D0 Mode 0, SPI
0x15c 0x10 // SPI0_SCLK Mode 0, SPI
>;
};
};
};
fragment#1 {
target = <&spi0>;
__overlay__ {
#address-cells = <1>;
#size.cells = <0>;
compatible = "bone-pinmux-helper";
pinctrl-names = "default";
pinctrl-0 = <&spi_slave>;
status = "okay";
};
};
};
Then I compile my dts file and copy the compiled file to /lib/Firmware.
dtc -O dtb -o SPI_SLAVE_PINMUX-00A0.dtbo -b O -# SPI_SLAVE_PINMUX-00A0.dts
After that I install the driver:
echo SPI_SLAVE_PINMUX > /sys/devices/platform/bone_capemgr/slots
The overlay is loaded:
root#beaglebone:~# cat /sys/devices/platform/bone_capemgr/slots
0: PF---- -1
1: PF---- -1
2: PF---- -1
3: PF---- -1
4: P-O-L- 0 Override Board Name,00A0,Override Manuf,cape-universaln
7: P-O-L- 1 Override Board Name,00A0,Override Manuf,SPI_SLAVE_PINMUX
Now the problem: if I execute the following command:
cat /sys/kernel/debug/pinctrl/44e10800.pinmux/Pins
I can see that nothing has changed.
pin 84 (44e10950.0) 00000037 pinctrl-single
pin 85 (44e10954.0) 00000037 pinctrl-single
pin 86 (44e10958.0) 00000037 pinctrl-single
pin 87 (44e1095c.0) 00000037 pinctrl-single
And I am wondering how this device tree fits to my device driver (char device driver, with access to MCSPI Memory to manipulate registers), but that's a different story... ;)
I would be very grateful, if someone could help me through ...
Many thanks in advance
I have a function (ANSI C) to retrieve time for our ntpd server.
This code work properly when I compile 32bit but doesn't work if I compile in armv64.
It works properly on iPhone 4,4S,5 (32bit), it doesn't work properly on Iphone 5s,6,6S (64bit).
I think that the problem is:
tmit=ntohl((time_t)buf[10]); //# get transmit time
time_t is now 8byte when compiled in armv64.....
Underneath you can find the source code...
Output Correct with Iphone 5 Simulator (32bit) ---------------------------
xxx.xxx.xxx.xxx PORT 123
sendto-->48
prima recv
recv-->48
tmit=-661900093
tmit=1424078403
1424078403-->Time: Mon Feb 16 10:20:03 2015
10:20:03 --> 37203
---------------------------------------------------------
Output Wrong with Iphone 6 Simulator (64bit) ---------------------------
xxx.xxx.xxx.xxx PORT 123
sendto-->48
prima recv
recv-->48
tmit=19612797
tmit=2105591293
2105591293-->Time: Tue Nov 19 00:47:09 38239
00:47:09 --> 2829
//---------------------------------------------------------------------------
long ntpdate(char *hostname) {
//ntp1.inrim.it (193.204.114.232)
//ntp2.inrim.it (193.204.114.233)
int portno=NTP_PORT; //NTP is port 123
int maxlen=1024; //check our buffers
int i=0; // misc var i
unsigned char msg[48]={010,0,0,0,0,0,0,0,0}; // the packet we send
unsigned long buf[maxlen]; // the buffer we get back
//struct in_addr ipaddr; //
struct protoent *proto; //
struct sockaddr_in server_addr;
int s; // socket
int tmit; // the time -- This is a time_t sort of
char ora[20]="";
//
//#we use the system call to open a UDP socket
//socket(SOCKET, PF_INET, SOCK_DGRAM, getprotobyname("udp")) or die "socket: $!";
proto=getprotobyname("udp");
s=socket(PF_INET, SOCK_DGRAM, proto->p_proto);
if(s==-1) {
//printf("ERROR socket=%d\n",s);
return -1;
}
//Setto il timeout per la ricezione --------------------
struct timeval tv;
tv.tv_sec = TIMEOUT_NTP; //sec
tv.tv_usec = 0;
if (setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(struct timeval)) != 0)
{
//printf("Error assigning socket option");
return -1;
}
memset( &server_addr, 0, sizeof( server_addr ));
server_addr.sin_family=AF_INET;
//trasformo il nome in ip
struct hostent *hp = gethostbyname(hostname);
if (hp == NULL) {
return -1;
} else {
sprintf(hostname_ip, "%s", inet_ntoa( *( struct in_addr*)( hp -> h_addr_list[0])));
}
#ifdef LOG_NTP
printf("%s-->%s PORT %d\n",hostname,hostname_ip,portno);
#endif
server_addr.sin_addr.s_addr = inet_addr(hostname_ip);
server_addr.sin_port=htons(portno);
//printf("ipaddr (in hex): %x\n",server_addr.sin_addr);
/*
* build a message. Our message is all zeros except for a one in the
* protocol version field
* msg[] in binary is 00 001 000 00000000
* it should be a total of 48 bytes long
*/
// send the data
i=sendto(s,msg,sizeof(msg),0,(struct sockaddr *)&server_addr,sizeof(server_addr));
#ifdef LOG_NTP
printf("sendto-->%d\n",i);
#endif
if (i==-1)
return -1;
#ifdef LOG_NTP
printf("prima recv\n");
#endif
// get the data back
i=recv(s,buf,sizeof(buf),0);
#ifdef LOG_NTP
printf("recv-->%d\n",i);
#endif
if (i==-1)
{
#ifdef LOG_NTP
printf("Error: %s (%d)\n", strerror(errno), errno);
#endif
return -1;
}
//printf("recvfr: %d\n",i);
//We get 12 long words back in Network order
//for(i=0;i<12;i++)
//printf("%d\t%-8x\n",i,ntohl(buf[i]));
/*
* The high word of transmit time is the 10th word we get back
* tmit is the time in seconds not accounting for network delays which
* should be way less than a second if this is a local NTP server
*/
tmit=ntohl((time_t)buf[10]); //# get transmit time
#ifdef LOG_NTP
printf("tmit=%d\n",tmit);
#endif
/*
* Convert time to unix standard time NTP is number of seconds since 0000
* UT on 1 January 1900 unix time is seconds since 0000 UT on 1 January
* 1970 There has been a trend to add a 2 leap seconds every 3 years.
* Leap seconds are only an issue the last second of the month in June and
* December if you don't try to set the clock then it can be ignored but
* this is importaint to people who coordinate times with GPS clock sources.
*/
tmit-= 2208988800U;
#ifdef LOG_NTP
printf("tmit=%d\n",tmit);
#endif
/* use unix library function to show me the local time (it takes care
* of timezone issues for both north and south of the equator and places
* that do Summer time/ Daylight savings time.
*/
//#compare to system time
#ifdef LOG_NTP
//printf("%d-->Time: %s\n",tmit,ctime((const time_t)&tmit));
printf("%d-->Time: %s\n",tmit,ctime((const time_t)&tmit));
#endif
//i=time(0);
//printf("%d-%d=%d\n",i,tmit,i-tmit);
//printf("System time is %d seconds off\n",i-tmit);
//Prendo l'ora e la converto in HH:MM:SS --> Sec
strftime(ora, 20, "%T", localtime((const time_t)&tmit));
#ifdef LOG_NTP
printf("%s --> %ld\n",ora, C2TIME(ora));
#endif
return C2TIME(ora);
}
I Solved the Problem!!!!!!!!!
uint32_t buf[maxlen];
uint32_t tmit;
instead of:
unsigned long buf[maxlen];
int tmit;
Defining a variable of type time_t
time_t tmit_temp=tmit;
printf("%d-->Time: %s\n",tmit,ctime((const time_t)&tmit_temp));
strftime(ora, 20, "%T", localtime((const time_t)&tmit_temp));
This works properly!!! ;-)
Running the code below to write 1 GB in global memory in the NVIDIA Visual Profiler, I get:
- 100% storage efficiency
- 69.4% (128.6 GB/s) DRAM utilization
- 18.3% total replay overhead
- 18.3% global memory replay overhead.
The memory writes are supposed to be coalesced and there is no divergence in the kernel, so the question is where is the global memory replay overhead coming from? I am running this on Ubuntu 13.04, with nvidia-cuda-toolkit version 5.0.35-4ubuntu1.
#include <cuda.h>
#include <unistd.h>
#include <getopt.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdint.h>
#include <ctype.h>
#include <sched.h>
#include <assert.h>
static void
HandleError( cudaError_t err, const char *file, int line )
{
if (err != cudaSuccess) {
printf( "%s in %s at line %d\n", cudaGetErrorString(err), file, line);
exit( EXIT_FAILURE );
}
}
#define HANDLE_ERROR(err) (HandleError(err, __FILE__, __LINE__))
// Global memory writes
__global__ void
kernel_write(uint32_t *start, uint32_t entries)
{
uint32_t tid = threadIdx.x + blockIdx.x*blockDim.x;
while (tid < entries) {
start[tid] = tid;
tid += blockDim.x*gridDim.x;
}
}
int main(int argc, char *argv[])
{
uint32_t *gpu_mem; // Memory pointer
uint32_t n_blocks = 256; // Blocks per grid
uint32_t n_threads = 192; // Threads per block
uint32_t n_bytes = 1073741824; // Transfer size (1 GB)
float elapsedTime; // Elapsed write time
// Allocate 1 GB of memory on the device
HANDLE_ERROR( cudaMalloc((void **)&gpu_mem, n_bytes) );
// Create events
cudaEvent_t start, stop;
HANDLE_ERROR( cudaEventCreate(&start) );
HANDLE_ERROR( cudaEventCreate(&stop) );
// Write to global memory
HANDLE_ERROR( cudaEventRecord(start, 0) );
kernel_write<<<n_blocks, n_threads>>>(gpu_mem, n_bytes/4);
HANDLE_ERROR( cudaGetLastError() );
HANDLE_ERROR( cudaEventRecord(stop, 0) );
HANDLE_ERROR( cudaEventSynchronize(stop) );
HANDLE_ERROR( cudaEventElapsedTime(&elapsedTime, start, stop) );
// Report exchange time
printf("#Delay(ms) BW(GB/s)\n");
printf("%10.6f %10.6f\n", elapsedTime, 1e-6*n_bytes/elapsedTime);
// Destroy events
HANDLE_ERROR( cudaEventDestroy(start) );
HANDLE_ERROR( cudaEventDestroy(stop) );
// Free memory
HANDLE_ERROR( cudaFree(gpu_mem) );
return 0;
}
The nvprof profiler and the API profiler are giving different results:
$ nvprof --events gst_request ./app
======== NVPROF is profiling app...
======== Command: app
#Delay(ms) BW(GB/s)
13.345920 80.454690
======== Profiling result:
Invocations Avg Min Max Event Name
Device 0
Kernel: kernel_write(unsigned int*, unsigned int)
1 8388608 8388608 8388608 gst_request
$ nvprof --events global_store_transaction ./app
======== NVPROF is profiling app...
======== Command: app
#Delay(ms) BW(GB/s)
9.469216 113.392892
======== Profiling result:
Invocations Avg Min Max Event Name
Device 0
Kernel: kernel_write(unsigned int*, unsigned int)
1 8257560 8257560 8257560 global_store_transaction
I had the impression that global_store_transation could not be lower than gst_request. What is going on here? I can't ask for both events in the same command, so I had to run the two separate commands. Could this be the problem?
Strangely, the API profiler shows different results with perfect coalescing. Here is the output, I had to run twice to get the proper counters:
$ cat config.txt
inst_issued
inst_executed
gst_request
$ COMPUTE_PROFILE=1 COMPUTE_PROFILE_CSV=1 COMPUTE_PROFILE_LOG=log.csv COMPUTE_PROFILE_CONFIG=config.txt ./app
$ cat log.csv
# CUDA_PROFILE_LOG_VERSION 2.0
# CUDA_DEVICE 0 GeForce GTX 580
# CUDA_CONTEXT 1
# CUDA_PROFILE_CSV 1
# TIMESTAMPFACTOR fffff67eaca946b8
method,gputime,cputime,occupancy,inst_issued,inst_executed,gst_request,gld_request
_Z12kernel_writePjj,7771.776,7806.000,1.000,4737053,3900426,557058,0
$ cat config2.txt
global_store_transaction
$ COMPUTE_PROFILE=1 COMPUTE_PROFILE_CSV=1 COMPUTE_PROFILE_LOG=log2.csv COMPUTE_PROFILE_CONFIG=config2.txt ./app
$ cat log2.csv
# CUDA_PROFILE_LOG_VERSION 2.0
# CUDA_DEVICE 0 GeForce GTX 580
# CUDA_CONTEXT 1
# CUDA_PROFILE_CSV 1
# TIMESTAMPFACTOR fffff67eea92d0e8
method,gputime,cputime,occupancy,global_store_transaction
_Z12kernel_writePjj,7807.584,7831.000,1.000,557058
Here gst_request and global_store_transactions are exactly the same, showing perfect coalescing. Which one is correct (nvprof or the API profiler)? Why does NVIDIA Visual Profiler says that I have non-coalesced writes? There are still significant instruction replays, and I have no idea where they are coming from :(
Any ideas? I don't think this is hardware malfunctioning, since I have two boards on the same machine and both show the same behavior.