Where is the global memory replay overhead coming from?

Where is the global memory replay overhead coming from? - memory

Running the code below to write 1 GB in global memory in the NVIDIA Visual Profiler, I get:
- 100% storage efficiency
- 69.4% (128.6 GB/s) DRAM utilization
- 18.3% total replay overhead
- 18.3% global memory replay overhead.
The memory writes are supposed to be coalesced and there is no divergence in the kernel, so the question is where is the global memory replay overhead coming from? I am running this on Ubuntu 13.04, with nvidia-cuda-toolkit version 5.0.35-4ubuntu1.
#include <cuda.h>
#include <unistd.h>
#include <getopt.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdint.h>
#include <ctype.h>
#include <sched.h>
#include <assert.h>
static void
HandleError( cudaError_t err, const char *file, int line )
{
if (err != cudaSuccess) {
printf( "%s in %s at line %d\n", cudaGetErrorString(err), file, line);
exit( EXIT_FAILURE );
}
}
#define HANDLE_ERROR(err) (HandleError(err, __FILE__, __LINE__))
// Global memory writes
__global__ void
kernel_write(uint32_t *start, uint32_t entries)
{
uint32_t tid = threadIdx.x + blockIdx.x*blockDim.x;
while (tid < entries) {
start[tid] = tid;
tid += blockDim.x*gridDim.x;
}
}
int main(int argc, char *argv[])
{
uint32_t *gpu_mem; // Memory pointer
uint32_t n_blocks = 256; // Blocks per grid
uint32_t n_threads = 192; // Threads per block
uint32_t n_bytes = 1073741824; // Transfer size (1 GB)
float elapsedTime; // Elapsed write time
// Allocate 1 GB of memory on the device
HANDLE_ERROR( cudaMalloc((void **)&gpu_mem, n_bytes) );
// Create events
cudaEvent_t start, stop;
HANDLE_ERROR( cudaEventCreate(&start) );
HANDLE_ERROR( cudaEventCreate(&stop) );
// Write to global memory
HANDLE_ERROR( cudaEventRecord(start, 0) );
kernel_write<<<n_blocks, n_threads>>>(gpu_mem, n_bytes/4);
HANDLE_ERROR( cudaGetLastError() );
HANDLE_ERROR( cudaEventRecord(stop, 0) );
HANDLE_ERROR( cudaEventSynchronize(stop) );
HANDLE_ERROR( cudaEventElapsedTime(&elapsedTime, start, stop) );
// Report exchange time
printf("#Delay(ms) BW(GB/s)\n");
printf("%10.6f %10.6f\n", elapsedTime, 1e-6*n_bytes/elapsedTime);
// Destroy events
HANDLE_ERROR( cudaEventDestroy(start) );
HANDLE_ERROR( cudaEventDestroy(stop) );
// Free memory
HANDLE_ERROR( cudaFree(gpu_mem) );
return 0;
}

The nvprof profiler and the API profiler are giving different results:
$ nvprof --events gst_request ./app
======== NVPROF is profiling app...
======== Command: app
#Delay(ms) BW(GB/s)
13.345920 80.454690
======== Profiling result:
Invocations Avg Min Max Event Name
Device 0
Kernel: kernel_write(unsigned int*, unsigned int)
1 8388608 8388608 8388608 gst_request
$ nvprof --events global_store_transaction ./app
======== NVPROF is profiling app...
======== Command: app
#Delay(ms) BW(GB/s)
9.469216 113.392892
======== Profiling result:
Invocations Avg Min Max Event Name
Device 0
Kernel: kernel_write(unsigned int*, unsigned int)
1 8257560 8257560 8257560 global_store_transaction
I had the impression that global_store_transation could not be lower than gst_request. What is going on here? I can't ask for both events in the same command, so I had to run the two separate commands. Could this be the problem?
Strangely, the API profiler shows different results with perfect coalescing. Here is the output, I had to run twice to get the proper counters:
$ cat config.txt
inst_issued
inst_executed
gst_request
$ COMPUTE_PROFILE=1 COMPUTE_PROFILE_CSV=1 COMPUTE_PROFILE_LOG=log.csv COMPUTE_PROFILE_CONFIG=config.txt ./app
$ cat log.csv
# CUDA_PROFILE_LOG_VERSION 2.0
# CUDA_DEVICE 0 GeForce GTX 580
# CUDA_CONTEXT 1
# CUDA_PROFILE_CSV 1
# TIMESTAMPFACTOR fffff67eaca946b8
method,gputime,cputime,occupancy,inst_issued,inst_executed,gst_request,gld_request
_Z12kernel_writePjj,7771.776,7806.000,1.000,4737053,3900426,557058,0
$ cat config2.txt
global_store_transaction
$ COMPUTE_PROFILE=1 COMPUTE_PROFILE_CSV=1 COMPUTE_PROFILE_LOG=log2.csv COMPUTE_PROFILE_CONFIG=config2.txt ./app
$ cat log2.csv
# CUDA_PROFILE_LOG_VERSION 2.0
# CUDA_DEVICE 0 GeForce GTX 580
# CUDA_CONTEXT 1
# CUDA_PROFILE_CSV 1
# TIMESTAMPFACTOR fffff67eea92d0e8
method,gputime,cputime,occupancy,global_store_transaction
_Z12kernel_writePjj,7807.584,7831.000,1.000,557058
Here gst_request and global_store_transactions are exactly the same, showing perfect coalescing. Which one is correct (nvprof or the API profiler)? Why does NVIDIA Visual Profiler says that I have non-coalesced writes? There are still significant instruction replays, and I have no idea where they are coming from :(
Any ideas? I don't think this is hardware malfunctioning, since I have two boards on the same machine and both show the same behavior.

Related

What lives above the last accessible address in the stack?

I've asked people before about why the stack doesn't start at 0x7fff...c before, and was told that typically 0x800... onwards is for the kernel, and the cli args and environment variables live at the top of the user's stack which is why it starts below 0x7fff...c. But I recently tried to examine all the strings with the following program
#include <stdio.h>
#include <string.h>
int main(int argc, const char **argv) {
const char *ptr = argv[0];
while (1) {
printf("%p: %s\n", ptr, ptr);
size_t len = strlen(ptr);
ptr = (void *)ptr + len + 1;
}
}
However, after displaying all my environment variables, I see the following (I compiled the program to an executable called ./t):
0x7ffc19f84fa0: <final env variable string>
0x7ffc19f84fee: _=./t
0x7ffc19f84ff4: ./t
0x7ffc19f84ff8:
0x7ffc19f84ff9:
0x7ffc19f84ffa:
0x7ffc19f84ffb:
0x7ffc19f84ffc:
0x7ffc19f84ffd:
0x7ffc19f84ffe:
0x7ffc19f84fff:
So it appears there's one extra empty byte after the null terminator for the ./t string at bytes 0x7ffc19f84ff4..0x7ffc19f84ff7, and after that I segfault so I guess that's the base of the stack. What actually lives in the remaining "empty" space before kernel memory starts?
Edit: I also tried the following:
global _start
extern print_hex, fgets, puts, print, exit
section .text
_start:
pop rdi
mov rcx, 0
_start_loop:
mov rdi, rsp
call print_hex
pop rdi
call puts
jmp _start_loop
mov rdi, 0
call exit
where print_hex is a routine I wrote elsewhere. It seems this is all I can get
0x00007ffcd272de28
./bin/main
0x00007ffcd272de30
abc
0x00007ffcd272de38
def
0x00007ffcd272de40
ghi
0x00007ffcd272de48
make: *** [Makefile:47: run] Segmentation fault
so it seems that even in _start we don't begin at 0x7fff...

ESP8266-12F WiFi soft AP config.authmode failed

For a project I try do use the ESP8266 RTOS SDK.
First step I install the tools and the toolchain. The hello_world example and the other gpio example works fine. I try the softAP example and get a Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled Error. I figured out that the line 62 : .automode = WIFI_AUTH_WPA_WPA2_PSK not works. I tried WIFI_AUTH_WEP,WIFI_AUTH_WPA_PSK,WIFI_AUTH_WPA2_PSK but only with WIFI_AUTH_OPEN the softAP works. Anyone same behavior or some tips?
Console Trace:
ets Jan 8 2013,rst cause:1, boot mode:(3,6)
load 0x40100000, len 7040, room 16
tail 0
chksum 0xe5
load 0x3ffe8408, len 24, room 8
tail 0
chksum 0x6c
load 0x3ffe8420, len 3312, room 8
tail 8
chksum 0x75
csum 0x75
I (123) boot: ESP-IDF v3.4-rc 2nd stage bootloader
I (123) boot: compile time 19:41:32
I (207) qio_mode: Enabling default flash chip QIO
I (207) boot: SPI Speed : 40MHz
I (208) boot: SPI Mode : QOUT
I (212) boot: SPI Flash Size : 2MB
I (219) boot: Partition Table:
I (224) boot: ## Label Usage Type ST Offset Length
I (236) boot: 0 nvs WiFi data 01 02 00009000 00006000
I (247) boot: 1 phy_init RF data 01 01 0000f000 00001000
I (259) boot: 2 factory factory app 00 00 00010000 000f0000
I (271) boot: End of partition table
I (277) esp_image: segment 0: paddr=0x00010010 vaddr=0x40210010 size=0x52c80 (339072) map
I (406) esp_image: segment 1: paddr=0x00062c98 vaddr=0x40262c90 size=0x0f594 ( 62868) map
I (428) esp_image: segment 2: paddr=0x00072234 vaddr=0x3ffe8000 size=0x005fc ( 1532) load
I (429) esp_image: segment 3: paddr=0x00072838 vaddr=0x40100000 size=0x00080 ( 128) load
I (439) esp_image: segment 4: paddr=0x000728c0 vaddr=0x40100080 size=0x05560 ( 21856) load
I (460) boot: Loaded app from partition at offset 0x10000
I (481) wifi softAP: ESP_WIFI_MODE_AP
I (484) system_api: Base MAC address is not set, read default base MAC address from EFUSE
I (486) system_api: Base MAC address is not set, read default base MAC address from EFUSE
phy_version: 1163.0, 665d56c, Jun 24 2020, 10:00:08, RTOS new
I (557) phy_init: phy ver: 1163_0
I (567) wifi softAP: ----------------###------------
ESP_ERROR_CHECK failed: esp_err_t 0x2 (ERROR) at 0x4021f7cc
file: "softap_example_main.c" line 73
func: wifi_init_softap
expression: esp_wifi_set_config(ESP_IF_WIFI_AP, &wifi_config)
abort() was called at PC 0x4021f7cf on core 0
Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x40221c72 PS : 0x00000030 A0 : 0x40221c70 A1 : 0x3ffeb550
A2 : 0x00000000 A3 : 0xffffffdb A4 : 0x00000001 A5 : 0x00000001
A6 : 0x00000000 A7 : 0x4026663c A8 : 0x00000020 A9 : 0x00000000
A10 : 0x00000008 A11 : 0x00000000 A12 : 0x00000000 A13 : 0x00000000
A14 : 0x00000000 A15 : 0x00000000 SAR : 0x0000001e EXCCAUSE: 0x0000001d
Backtrace: 0x40221c72:0x3ffeb550 0x4021f7d2:0x3ffeb560 0x4022182e:0x3ffeb570 0x40221894:0x3ffeb630 0x402118ef:0x3ffeb640
Example Code from GitHub: (examples/wifi/getting_started/softAP/main/softap_example_main.c)
/* WiFi softAP Example
This example code is in the Public Domain (or CC0 licensed, at your option.)
Unless required by applicable law or agreed to in writing, this
software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied.
*/
#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_system.h"
#include "esp_wifi.h"
#include "esp_event.h"
#include "esp_log.h"
#include "nvs_flash.h"
#include "lwip/err.h"
#include "lwip/sys.h"
/* The examples use WiFi configuration that you can set via project configuration menu.
If you'd rather not, just change the below entries to strings with
the config you want - ie #define EXAMPLE_WIFI_SSID "mywifissid"
*/
#define EXAMPLE_ESP_WIFI_SSID CONFIG_ESP_WIFI_SSID
#define EXAMPLE_ESP_WIFI_PASS CONFIG_ESP_WIFI_PASSWORD
#define EXAMPLE_MAX_STA_CONN CONFIG_ESP_MAX_STA_CONN
static const char *TAG = "wifi softAP";
static void wifi_event_handler(void* arg, esp_event_base_t event_base,
int32_t event_id, void* event_data)
{
if (event_id == WIFI_EVENT_AP_STACONNECTED) {
wifi_event_ap_staconnected_t* event = (wifi_event_ap_staconnected_t*) event_data;
ESP_LOGI(TAG, "station "MACSTR" join, AID=%d",
MAC2STR(event->mac), event->aid);
} else if (event_id == WIFI_EVENT_AP_STADISCONNECTED) {
wifi_event_ap_stadisconnected_t* event = (wifi_event_ap_stadisconnected_t*) event_data;
ESP_LOGI(TAG, "station "MACSTR" leave, AID=%d",
MAC2STR(event->mac), event->aid);
}
}
void wifi_init_softap()
{
tcpip_adapter_init();
ESP_ERROR_CHECK(esp_event_loop_create_default());
wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
ESP_ERROR_CHECK(esp_wifi_init(&cfg));
ESP_ERROR_CHECK(esp_event_handler_register(WIFI_EVENT, ESP_EVENT_ANY_ID, &wifi_event_handler, NULL));
wifi_config_t wifi_config = {
.ap = {
.ssid = EXAMPLE_ESP_WIFI_SSID,
.ssid_len = strlen(EXAMPLE_ESP_WIFI_SSID),
.password = EXAMPLE_ESP_WIFI_PASS,
.max_connection = EXAMPLE_MAX_STA_CONN,
.authmode = WIFI_AUTH_WPA_WPA2_PSK
},
};
if (strlen(EXAMPLE_ESP_WIFI_PASS) == 0) {
wifi_config.ap.authmode = WIFI_AUTH_OPEN;
}
ESP_ERROR_CHECK(esp_wifi_set_mode(WIFI_MODE_AP));
ESP_ERROR_CHECK(esp_wifi_set_config(ESP_IF_WIFI_AP, &wifi_config));
ESP_ERROR_CHECK(esp_wifi_start());
ESP_LOGI(TAG, "wifi_init_softap finished. SSID:%s password:%s",
EXAMPLE_ESP_WIFI_SSID, EXAMPLE_ESP_WIFI_PASS);
}
void app_main()
{
ESP_ERROR_CHECK(nvs_flash_init());
ESP_LOGI(TAG, "ESP_WIFI_MODE_AP");
wifi_init_softap();
}

You have to run idf.py menuconfig and set the SSID and password values.

Password´s under 8 characters gets a Guru-Meditation ERR if the
wifi_config.ap.authmode = WIFI_AUTH_WEP or,
wifi_config.ap.authmode = WIFI_AUTH_WPA_PSK or,
wifi_config.ap.authmode = WIFI_AUTH_WPA2_PSK . Other authmodes not yet testet.
To provid this Error, make sure your pw has more then 7 characters or/and set
the If-Condition after the wifi_config to:
if (strlen(EXAMPLE_ESP_WIFI_PASS) < 8) {
wifi_config.ap.authmode = WIFI_AUTH_OPEN;
ESP_LOGI(TAG," pw-lenght under 8 charcters. Set WIFI_AUTH_OPEN");
}
Your WiFi is not protectet but your µC not crashes.

cJSON Memory leak when freeing cJSON object

I am facing an issue while using the cJSON Library. I am assuming that there is a memory leak that is breaking the code after a certain time (40 mins to 1 hr).
I have copied my code below :
void my_work_handler_5(struct k_work *work)
{
char *ptr1[6];
int y=0;
static int counterdo = 0;
char *desc6 = "RSRP";
char *id6 = "dBm";
char *type6 = "RSRP";
char rsrp_str[100];
snprintf(rsrp_str, sizeof(rsrp_str), "%d", rsrp_current);
sensor5 = cJSON_CreateObject();
cJSON_AddItemToObject(sensor5, "description", cJSON_CreateString(desc6));
cJSON_AddItemToObject(sensor5, "Time", cJSON_CreateString(time_string));
cJSON_AddItemToObject(sensor5, "value", cJSON_CreateNumber(rsrp_current));
cJSON_AddItemToObject(sensor5, "unit", cJSON_CreateString(id6));
cJSON_AddItemToObject(sensor5, "type", cJSON_CreateString(type6));
/* print everything */
ptr1[counterdo] = cJSON_Print(sensor5);
printk("Counterdo value is : %d\n", counterdo);
cJSON_Delete(sensor5);
counterdo = counterdo + 1;
if (counterdo==6){
for(y=0;y<=counterdo;y++){
free(ptr1[y]);
}
counterdo = 0;
}
return;
}
I read some other threads regarding freeing up the memory and tried to do the same. Can anyone let me know if this is the right approach to free up the space allocated to the cJSON Object.
Regards,
Adeel.

Since cJSON is a portable library with no dependencies, this is better to look for a potential issue in your code on a PC: they are specialized tools available in this environment for facilitating the investigation. I am assuming here you have a Linux system, a Windows system with WSL or WSL2 installed, or a Linux virtual machine, available, and gcc, valgrind installed.
A minimal, self-contained, portable version of your code could be:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <cJSON.h>
static int rsrp_current = 1;
static char *time_string = NULL;
void
my_work_handler_5 ()
{
char *ptr1[6];
int y = 0;
static int counterdo = 0;
char *desc6 = "RSRP";
char *id6 = "dBm";
char *type6 = "RSRP";
char rsrp_str[100];
snprintf (rsrp_str, sizeof (rsrp_str), "%d", rsrp_current);
cJSON *sensor5 = cJSON_CreateObject ();
cJSON_AddItemToObject (sensor5, "description", cJSON_CreateString (desc6));
cJSON_AddItemToObject (sensor5, "Time", cJSON_CreateString (time_string));
cJSON_AddItemToObject (sensor5, "value", cJSON_CreateNumber (rsrp_current));
cJSON_AddItemToObject (sensor5, "unit", cJSON_CreateString (id6));
cJSON_AddItemToObject (sensor5, "type", cJSON_CreateString (type6));
/* print everything */
ptr1[counterdo] = cJSON_Print (sensor5);
printf ("Counterdo value is : %d\n", counterdo);
cJSON_Delete (sensor5);
counterdo = counterdo + 1;
if (counterdo == 6)
{
for (y = 0; y <= counterdo; y++)
{
free (ptr1[y]);
}
counterdo = 0;
}
return;
}
int
main (int argc, char **argv)
{
time_t curtime;
time (&curtime);
for (int n = 0; n < 3 * 6; n++)
{
my_work_handler_5 ();
}
}
Build procedure:
wget https://github.com/DaveGamble/cJSON/archive/v1.7.14.tar.gz
tar zxf v1.7.14.tar.gz
gcc -g -O0 -IcJSON-1.7.14 -o cjson cjson.c cJSON-1.7.14/cJSON.c
Running valgrind on the program:
valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose ./cjson
..indicates some memory is being freed that was not previously allocated: Invalid free() / delete / delete[] / realloc():
==6747==
==6747== HEAP SUMMARY:
==6747== in use at exit: 0 bytes in 0 blocks
==6747== total heap usage: 271 allocs, 274 frees, 14,614 bytes allocated
==6747==
==6747== All heap blocks were freed -- no leaks are possible
==6747==
==6747== ERROR SUMMARY: 21 errors from 2 contexts (suppressed: 0 from 0)
==6747==
==6747== 3 errors in context 1 of 2:
==6747== Invalid free() / delete / delete[] / realloc()
==6747== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==6747== by 0x1094DA: my_work_handler_5 (cjson.c:42)
==6747== by 0x10955A: main (cjson.c:59)
==6747== Address 0x31 is not stack'd, malloc'd or (recently) free'd
==6747==
==6747==
==6747== 18 errors in context 2 of 2:
==6747== Conditional jump or move depends on uninitialised value(s)
==6747== at 0x483C9F5: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==6747== by 0x1094DA: my_work_handler_5 (cjson.c:42)
==6747== by 0x10955A: main (cjson.c:59)
==6747== Uninitialised value was created by a stack allocation
==6747== at 0x109312: my_work_handler_5 (cjson.c:11)
==6747==
==6747== ERROR SUMMARY: 21 errors from 2 contexts (suppressed: 0 from 0)
Replacing:
for (y = 0; y <= counterdo; y++)
{
free (ptr1[y]);
}
by:
for (y = 0; y < counterdo; y++)
{
free (ptr1[y]);
}
and executing valgrind again:
==6834==
==6834== HEAP SUMMARY:
==6834== in use at exit: 1,095 bytes in 15 blocks
==6834== total heap usage: 271 allocs, 256 frees, 14,614 bytes allocated
==6834==
==6834== Searching for pointers to 15 not-freed blocks
==6834== Checked 75,000 bytes
==6834==
==6834== 1,095 bytes in 15 blocks are definitely lost in loss record 1 of 1
==6834== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==6834== by 0x10B161: print (cJSON.c:1209)
==6834== by 0x10B25F: cJSON_Print (cJSON.c:1248)
==6834== by 0x1094AB: my_work_handler_5 (cjson.c:30)
==6834== by 0x10959C: main (cjson.c:59)
==6834==
==6834== LEAK SUMMARY:
==6834== definitely lost: 1,095 bytes in 15 blocks
==6834== indirectly lost: 0 bytes in 0 blocks
==6834== possibly lost: 0 bytes in 0 blocks
==6834== still reachable: 0 bytes in 0 blocks
==6834== suppressed: 0 bytes in 0 blocks
==6834==
==6834== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Some memory is definitively being leaked.
The reason is that char *ptr1[6] is not static, and is therefore created on the stack every time my_work_handler_5() is being called. The pointers that were returned are by cJSON_Print() are therefore lost between two calls, and free() is being called on arbitrary pointer values, since ptr1[] is not initialized as it could be:
char *ptr1[6] = { NULL, NULL, NULL, NULL, NULL, NULL };
Since you are freeing memory every 6 calls, this is causing the memory leak you were suspecting.
Replacing:
char *ptr1[6];
by:
static char *ptr1[6];
compiling, running valgrind again:
==6927==
==6927== HEAP SUMMARY:
==6927== in use at exit: 0 bytes in 0 blocks
==6927== total heap usage: 271 allocs, 271 frees, 14,614 bytes allocated
==6927==
==6927== All heap blocks were freed -- no leaks are possible
==6927==
==6927== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
The modified version of the program should now work on your bare-metal system.

How to write to shared library code segment loaded into RAM memory?

I would like you to ask why I cannot write to the loaded shared library code segment in RAM memory in Linux 2.6.28.9 on MIPS CPU platform (LG TV). I am able to read bytes but not able to write anything. In the example source code below (cross-compiled in gcc) I get ERROR 22: Invalid argument (EINVAL) when write() function is called.
// this app tries to replace 4 bytes in code segment memory of loaded shared library
#include <stdio.h> // printf
#include <stdlib.h> // off_t
#include <dlfcn.h> // dlopen, dlclose
#include <fcntl.h> // open, O_RDWR
#include <unistd.h> // lseek, close, read
#include <errno.h> // errno
#include <string.h> // strerror
#include <sys/mman.h> // mprotect, PROT_READ, PROT_WRITE, PROT_EXEC
#define BYTES_TO_REPLACE 4
int main (int argc, char *argv[])
{
int fd, pid;
unsigned *handle;
unsigned long pagesize;
off_t fun_addr, pa_fun_addr;
unsigned char buf[BYTES_TO_REPLACE];
char s[100];
// initialize
pagesize = sysconf(_SC_PAGESIZE); // memory page size from system
pid = getpid(); // PID of current process
// open shared library file, OK
handle = dlopen("/path_to_library_files/shared_library.so", RTLD_LAZY | RTLD_GLOBAL);
// get function address, OK
fun_addr = (off_t)dlsym(handle, "function_name_in_lib");
// open memory device (pseudo-file), OK
sprintf(s, "/proc/%d/mem", pid); // memory space of our process (/proc/self/mem)
//strcpy(s, "/dev/mem"); // in that case when reading from memory ==> ERROR 14: Bad address
fd = open(s, O_RDWR); // open for reading and writing
// go to starting address of the library function loaded earlier, OK
lseek(fd, fun_addr, SEEK_SET);
// read from memory, OK
read(fd, buf, BYTES_TO_REPLACE);
printf("old replaced bytes = [%02X %02X %02X %02X]\n", buf[0], buf[1], buf[2], buf[3]);
// move back, OK
lseek(fd, fun_addr, SEEK_SET);
// unprotect memory page - no error, but does not help
pa_fun_addr = (fun_addr / pagesize) * pagesize; // page-aligned address
mprotect((void *)pa_fun_addr, pagesize, PROT_READ | PROT_WRITE | PROT_EXEC);
// write new data to memory: ERROR 22: Invalid argument
buf[0] = 0x08; buf[1] = 0x00; buf[2] = 0xE0; buf[3] = 0x03; // replacing 4-byte command: jr $ra (MIPS CPU)
if (write(fd, buf, BYTES_TO_REPLACE) != BYTES_TO_REPLACE) printf("ERROR %d: %s!\n", errno, strerror(errno));
// close memory device and shared library
close(fd);
dlclose(handle);
return 0;
}

This is because process code in memory doesn't have write permission by default. To see the permissions of a process's memory, use pmap:
For example, shared libraries below have only rx permission at most:
sudo pmap 5869
5869: vim supervisor_meeting-2017-05-22.txt
000055b391f62000 2604K r-x-- vim
000055b3923ed000 56K r---- vim
000055b3923fb000 100K rw--- vim
000055b392414000 60K rw--- [ anon ]
000055b393377000 2868K rw--- [ anon ]
00007fc59ef5a000 40K r-x-- libnss_files-2.24.so
00007fc59ef64000 2048K ----- libnss_files-2.24.so
00007fc59f164000 4K r---- libnss_files-2.24.so
00007fc59f165000 4K rw--- libnss_files-2.24.so
00007fc59f166000 24K rw--- [ anon ]
<..snip..>
I understand that you're trying to change this with mprotect - but you're also not checking the return value from the mprotect() call - this is probably failing for some reason.
Also, as an aside - write() is not guaranteed to write all bytes given to it, and is quite within it's design to return with no or a partial number of bytes written - I'd suggest you also change the code to reflect this.

esp8266 RTOS blink example doesn't work

I have a problem with the RTOS firmware on the esp8266 (I have a esp12e), after flashing the firmware, reading from uart, it keeps stuck with those lines:
ets Jan 8 2013,rst cause:2, boot mode:(3,0)
load 0x40100000, len 31584, room 16
tail 0
chksum 0x24
load 0x3ffe8000, len 944, room 8
tail 8
chksum 0x9e
load 0x3ffe83b0, len 1080, room 0
tail 8
chksum 0x60
csum 0x60
Now I will explain my HW setup:
GPIO15 -> Gnd
EN -> Vcc
GPIO0 -> Gnd (when flashing)
GPIO0 -> Vcc (normal mode)
For the toolchain I've followed this tutorial and it works well:
http://microcontrollerkits.blogspot.it/2015/12/esp8266-eclipse-development.html
Then I started doing my RTOS blink example, I post my user_main.c code here:
#include "esp_common.h"
#include "gpio.h"
void task2(void *pvParameters)
{
printf("Hello, welcome to client!\r\n");
while(1)
{
// Delay and turn on
vTaskDelay (300/portTICK_RATE_MS);
GPIO_OUTPUT_SET (5, 1);
// Delay and LED off
vTaskDelay (300/portTICK_RATE_MS);
GPIO_OUTPUT_SET (5, 0);
}
}
/******************************************************************************
* FunctionName : user_rf_cal_sector_set
* Description : SDK just reversed 4 sectors, used for rf init data and paramters.
* We add this function to force users to set rf cal sector, since
* we don't know which sector is free in user's application.
* sector map for last several sectors : ABCCC
* A : rf cal
* B : rf init data
* C : sdk parameters
* Parameters : none
* Returns : rf cal sector
*******************************************************************************/
uint32 user_rf_cal_sector_set(void)
{
flash_size_map size_map = system_get_flash_size_map();
uint32 rf_cal_sec = 0;
switch (size_map) {
case FLASH_SIZE_4M_MAP_256_256:
rf_cal_sec = 128 - 5;
break;
case FLASH_SIZE_8M_MAP_512_512:
rf_cal_sec = 256 - 5;
break;
case FLASH_SIZE_16M_MAP_512_512:
case FLASH_SIZE_16M_MAP_1024_1024:
rf_cal_sec = 512 - 5;
break;
case FLASH_SIZE_32M_MAP_512_512:
case FLASH_SIZE_32M_MAP_1024_1024:
rf_cal_sec = 1024 - 5;
break;
default:
rf_cal_sec = 0;
break;
}
return rf_cal_sec;
}
/******************************************************************************
* FunctionName : user_init
* Description : entry of user application, init user function here
* Parameters : none
* Returns : none
*******************************************************************************/
void user_init(void)
{
uart_init_new();
printf("SDK version:%s\n", system_get_sdk_version());
// Config pin as GPIO5
PIN_FUNC_SELECT (PERIPHS_IO_MUX_GPIO5_U, FUNC_GPIO5);
xTaskCreate(task2, "tsk2", 256, NULL, 2, NULL);
}
I also post the flash command, the first executed one time, the second every time I modify the code:
c:/Espressif/utils/ESP8266/esptool.exe -p COM3 write_flash -ff 40m -fm qio -fs 32m 0x3FC000 c:/Espressif/ESP8266_RTOS_SDK/bin/esp_init_data_default.bin 0x3FE000 c:/Espressif/ESP8266_RTOS_SDK/bin/blank.bin 0x7E000 c:/Espressif/ESP8266_RTOS_SDK/bin/blank.bin
c:/Espressif/utils/ESP8266/esptool.exe -p COM3 -b 256000 write_flash -ff 40m -fm qio -fs 32m 0x00000 firmware/eagle.flash.bin 0x40000 firmware/eagle.irom0text.bin
There is something wrong? I really don't understand why it doesn't work.
When I try the NON-OS example they works very well.

I had the same problem as you. This issue is caused by the incorrect address of the eagle.irom0text.bin .
So I changed the address of the eagle.irom0text.bin from 0x40000 (0x10000) to 0x20000 and it worked well for me.
[RTOS SDK version: 1.4.2(f57d61a)]
The correct flash codes in the common_rtos.mk (ESP-12E)
for flashinit
flashinit:
$(vecho) "Flash init data default and blank data."
$(ESPTOOL) -p $(ESPPORT) write_flash $(flashimageoptions) 0x3fc000 $(SDK_BASE)/bin/esp_init_data_default.bin
$(ESPTOOL) -p $(ESPPORT) write_flash $(flashimageoptions) 0x3fe000 $(SDK_BASE)/bin/blank.bin
for flash:
flash: all
#ifeq ($(app), 0)
$(ESPTOOL) -p $(ESPPORT) -b $(ESPBAUD) write_flash $(flashimageoptions) 0x00000 $(FW_BASE)/eagle.flash.bin 0x20000 $(FW_BASE)/eagle.irom0text.bin
else
ifeq ($(boot), none)
$(ESPTOOL) -p $(ESPPORT) -b $(ESPBAUD) write_flash $(flashimageoptions) 0x00000 $(FW_BASE)/eagle.flash.bin 0x20000 $(FW_BASE)/eagle.irom0text.bin
else
$(ESPTOOL) -p $(ESPPORT) -b $(ESPBAUD) write_flash $(flashimageoptions) $(addr) $(FW_BASE)/upgrade/$(BIN_NAME).bin
endif
endif

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Where is the global memory replay overhead coming from? - memory

Related

What lives above the last accessible address in the stack?

ESP8266-12F WiFi soft AP config.authmode failed

cJSON Memory leak when freeing cJSON object

How to write to shared library code segment loaded into RAM memory?

esp8266 RTOS blink example doesn't work

Categories

Resources