When I compile a simple Blink sketch on Arduino for ESP8266, it looks like 38% of the memory is used by something:
Global variables use 31,576 bytes (38%) of dynamic memory, leaving 50,344 bytes for local variables. Maximum is 81,920 bytes.
Where does this memory go? I have an application that requires a lot of memory and wanted to see if I can disable / reduce usage by some Arduino built-in libraries.
Code below:
void setup() {
pinMode(LED_BUILTIN, OUTPUT);
// Initialize the LED_BUILTIN pin as an output
}
void loop() {
digitalWrite(LED_BUILTIN, LOW);
// Turn the LED on (Note that LOW is the voltage level
// but actually the LED is on; this is because
// it is acive low on the ESP-01)
delay(1000);
// Wait for a second
digitalWrite(LED_BUILTIN, HIGH);
// Turn the LED off by making the voltage HIGH
delay(2000);
// Wait for two seconds (to demonstrate the active low LED)
}
It is used by the variables that you initialize and firmware libs. If you want to write a code that is longer, you're going need more memory. By using the basic library for ESP it already occupies some memory for the configuration and firmware settings. If you use less variables and a simple logic in your program, that's going to drastically reduce your program size. Actually, it'll take less memory even for bigger programs since all the libraries are included for a larger program is also same.
But if it's really large concentrate on your logics and reduce the stress for the ESP and give it to a mainframe computer to do the complex calculations and logics(also helps in less consumption of power and less heat dissipation).
Related
I'm using an ESP8266 NodeMCU 12-E development board to capture audio from a pre-amplified electret microphone, then I upload it to the web where it will be converted to a wav file. My first thought was to cast the integer values of analogRead(A0) on the ESP8266 as String type, then concatenate them into a longer string payload which I can publish to an MQTT broker.
My MQTT client subscribers didn't seem to be getting proper sound files, because all I heard were series of rhythmic pops.
I decided to investigate if my code on the ESP8266 board was even capturing things properly. I stripped the code down to these few lines which seem to cause problems:
#include <ESP8266WiFi.h>
const char *ssid = "____"; // Change it
const char *pass = "____"; // Change it
void setup()
{
Serial.begin(115200);
Serial.println(0); //start
WiFi.mode(WIFI_STA);
WiFi.begin(ssid, pass);
}
void loop()
{
int analog = analogRead(A0);
if (analog > 255) {
analog = 255;
}
else if (analog < 0){
analog = 0;
}
Serial.print(String(analog));
Serial.print(" ");
}
Here's how I use the code above to produce a wav file to check if the sound is what I expect:
- I start up the ESP8266 development board
- I turn on the Serial Monitor and clear all previous output
- I power up my electret microphone and speak into it
- I power down my electret microphone
- I copy the contents of the Serial Monitor (which is a series of integers) into a text file called `audio.raw`
- I copy `audio.raw` to a linux machine that has ffmpeg installed
- I issue the command `ffmpeg -f u8 -ar 11111 -ac 1 -i audio.raw -y audio.wav` on the linux machine
When I listen to the audio.raw file, I hear my voice, but the speed is maybe 5-10 times faster than normal. (I also get a lot of noise and distortion, but that might be a separate issue with the input signal quality.)
I then tried changing this one line of code Serial.print(String(analog)) to Serial.print(analog). Then I repeated the steps above. But this time, my voice sounds like it is about 2 times faster than normal.
Why does changing this one line from Serial.print(String(analog)) to Serial.print(analog) make such a big difference?
Is it because the String() function is a very expensive operation that takes up a lot of time? And when the script needs more time to process each line of code, the script then has less time to capture enough analogRead(A0) data points? And if I run the same ffmpeg command using all the same flags, then ffmpeg will try to meet the -ar 11111 requirement by speeding up the audio play? Which would imply that my sampling rate is dependent on execution speed of my script? Which means I have to consider variable execution speeds across other boards of the same model due to variability in manufacturing precision, environmental temperature, etc...?
Your sampling rate is coupled to your loop implementation (as you have discovered). This will also cause jitter in your sampling rate as different code paths will take different amounts of time and interrupt service routines will also steal CPU cycles.
This jitter will be one of the causes of distortion in your output.
When I listen to the audio.raw file, I hear my voice, but the speed is maybe 5-10 times faster than normal.
The ESP8266 has a hardware UART so the code can potentially load the UART's FIFO buffer faster than it can output. This would be a source of the perceived faster sampling rate but also cause jitter or data loss when the buffer fills up. Depending on the implementation, when the buffer fills it will drop data or alternatively block (causing jitter).
Why does changing this one line from Serial.print(String(analog)) to Serial.print(analog) make such a big difference?
Is it because the String() function is a very expensive operation that takes up a lot of time? And when the script needs more time to process each line of code, the script then has less time to capture enough analogRead(A0) data points?
Yes, yes and yes.
One of the reasons for the performance difference is that String() involves allocating and managing memory on the heap to store the characters.
Serial.print(analog) uses a fixed size buffer on the stack as the code knows the maximum number of characters required to display an int.
And if I run the same ffmpeg command using all the same flags, then ffmpeg will try to meet the -ar 11111 requirement by speeding up the audio play?
Yes. ffmpeg assumes that the samples have a fixed sampling rate but this does not match the samples that are being printed out.
Which would imply that my sampling rate is dependent on execution speed of my script?
Yes!
Which means I have to consider variable execution speeds across other boards of the same model due to variability in manufacturing precision, environmental temperature, etc...?
Yes. There will be a multitude of variables that affect execution speeds.
What can you do?
Decouple the sampling of data from the code execution.
This can be done by implementing an Interrupt Service Routine. Tie the ISR to a hardware timer so it executes at a fixed sampling rate and avoiding jitter.
The ISR can write to a buffer which the code in loop() transmits over the serial connection. The ISR and serial transmission code need to manage the buffer to ensure that neither overrun the other. One means of doing this is to use alternate buffers that the ISR and transmission code use.
Since you use Serial.begin(115200) ESP8266 Microcontroller will transfer 115200 bits per second through serial port. Which is 115200 / 8 = 14400 bytes per second and that means since you use u8 (unsigned 8 bit) format for audio, each sample consists of a single byte. Just change the ffmpeg -ar parameter to 14400.
I don't any have microphones which i can connect to MCU for testing but it should work properly this way. The other -ac parameter is correct since it is mono channel audio.
Edit : Also don't use String() constructor while printing out to Serial.
While using Serial() constructor sound speeds up about 5 times because String converts your 1 byte value to 3 bytes, example ; byte : 255 -> String : "2", "5", "5" , you don't have to consider execution speed of Microcontroller, it will output 115200 bits per second as if you defined. You just need to consider it's output.
Finally delete the line
Serial.print(" ");
Also change
int analog = analogRead(A0);
to
byte analog = (byte)analogRead(A0);
since int consists of 4 bytes, you would not want to send extra 3 bytes to serial.
And after changing int to byte you can get rid this code block
if (analog > 255) {
analog = 255;
}
else if (analog < 0){
analog = 0;
}
If you connect ESP8266 to linux device through usb which has ffmpeg on it you can use
ttylog -b 115200 -d /dev/ttyUSB0 | ffmpeg -f u8 -ar 14400 -ac 1 -i - -y audio.wav
to capture audio data in realtime from ESP8266.
I am using MAX10 FPGA and have interfaced DDR3 memory. I have noticed that my DDR3 Memory is working slow as compared to on-chip memory. I came to know about this, as I wrote a blinking LEDs program, and for same delay function with on-chip memory it is working faster as compared to DDR3 memory. What can be done possibly to increase speed? And what might possibly be wrong? My system clock is running at 50MHz.
P.S. There are no Instruction or Data Caches in my system.
First,your function is not pipeline function as your description.Because you do something with memory and then blinking the LED.Every thing run in sequence.
In this case,you should estimate the response time and throughout of your memory.For example,you read a data from memory and then do a add function,and do this 10 times.If you always read memory after add function,your sum time consumption is about 10*response time + 10 add function time.
The difference is memory response time.Inner ram's response time can be 1 cycle at 50MHz.But DDR3 memory should be about 80 ns. That's the difference.
But you can change your module to pipeline pattern.Read/write data and do your other function parallel.And r/w DDR ahead.That's like cache in PC. This can save some time.
By the way,DDR throughout is highly depends on your function pattern.If you read or write data at the sequence order address, then you will get a bigger throughout.
After all,external memory's throughout and response time can never greater then internal memory.
Forgive my English.
It's my first time using openCL on ARM(CPU:Qualcomm Snapdragon MSM8930, GPU:Adreno(TM)305).
I find using openCL is really very effective, but data exchanging between CPU and GPU takes too much time, as much as I can't imaging.
Here is an example:
cv::Mat mat(640,480,CV_8UC3,cv::Scalar(0,0,0));
cv::ocl::oclMat mat_ocl;
//cpu->gpu
mat_ocl.upload(mat);
//gpu->cpu
mat = (cv::Mat)mat_ocl;
Just a small image like this, the upload option takes 10ms, and download option takes 20ms! That takes too long.
Can anyone could tell me is this situation normal? Or something goes wrong here?
Thank you in advance!
added:
my messuring method is
clock_t start,end;
start=clock();
mat_ocl.upload(mat);
end = clock();
__android_log_print(ANDROID_LOG_INFO,"tag","upload time = %f s",(double)(end-start)/CLOCKS_PER_SEC);
Actually, I'm not using openCL exactly, but ocl module in openCV(although it says they are equal). When reading openCV documents, I find it's just tell us to transform cv::Mat to cv::ocl::oclMat (which is data uploading from CPU to GPU)to do GPU calculation, but I haven't found memory mapping method in the ocl module documents.
Well, I found some useful introductions in openCV doc:
In a heterogeneous device environment, there may be cost associated with data transfer. This would be the case, for example, when data needs to be moved from host memory (accessible to the CPU), to device memory (accessible to a discrete GPU). in the case of integrated graphics chips, there may be performance issues, relating to memory coherency between access from the GPU “part” of the integrated device, or the CPU “part.” For best performance, in either case, it is recommended that you do not introduce data transfers between CPU and the discrete GPU, except in the beginning and the end of the algorithmic pipeline.
So, it seems explain the reason why speed of data transfer between CPU and GPU is so slow. But I still don't know how to fix this issue.
Provide exact measuring methods and results.
From experience of OpenCL development under ARM platforms (not Qcom, though), I can say that you shouldn't expect much of read-write operations. Memory bus is usually like 64bit, plus DDR3 isn't that fast.
Use shared memory for your advantage - go for mapping/unmapping instead of read/write.
P. S. actual operation time is measured, using cl_event profiling:
cl_ulong getTimeNanoSeconds(cl_event event)
{
cl_ulong start = 0, end = 0;
cl_int ret = clWaitForEvents(1, &event);
if (ret != CL_SUCCESS)
throw(ret);
ret = clGetEventProfilingInfo(
event,
CL_PROFILING_COMMAND_START,
sizeof(cl_ulong),
&start,
NULL);
if (ret != CL_SUCCESS)
throw(ret);
ret = clGetEventProfilingInfo(
event,
CL_PROFILING_COMMAND_END,
sizeof(cl_ulong),
&end,
NULL);
if (ret != CL_SUCCESS)
throw(ret);
return (end - start);
}
I'm working on a Linux kernel driver that makes a chunk of physical memory available to user space. I have a working version of the driver, but it's currently very slow. So, I've gone back a few steps and tried making a small, simple driver to recreate the problem.
I reserve the memory at boot time using the kernel parameter memmap=2G$1G. Then, in the driver's __init function, I ioremap some of this memory, and initialize it to a known value. I put in some code to measure the timing as well:
#define RESERVED_REGION_SIZE (1 * 1024 * 1024 * 1024) // 1GB
#define RESERVED_REGION_OFFSET (1 * 1024 * 1024 * 1024) // 1GB
static int __init memdrv_init(void)
{
struct timeval t1, t2;
printk(KERN_INFO "[memdriver] init\n");
// Remap reserved physical memory (that we grabbed at boot time)
do_gettimeofday( &t1 );
reservedBlock = ioremap( RESERVED_REGION_OFFSET, RESERVED_REGION_SIZE );
do_gettimeofday( &t2 );
printk( KERN_ERR "[memdriver] ioremap() took %d usec\n", usec_diff( &t2, &t1 ) );
// Set the memory to a known value
do_gettimeofday( &t1 );
memset( reservedBlock, 0xAB, RESERVED_REGION_SIZE );
do_gettimeofday( &t2 );
printk( KERN_ERR "[memdriver] memset() took %d usec\n", usec_diff( &t2, &t1 ) );
// Register the character device
...
return 0;
}
I load the driver, and check dmesg. It reports:
[memdriver] init
[memdriver] ioremap() took 76268 usec
[memdriver] memset() took 12622779 usec
That's 12.6 seconds for the memset. That means the memset is running at 81 MB/sec. Why on earth is it so slow?
This is kernel 2.6.34 on Fedora 13, and it's an x86_64 system.
EDIT:
The goal behind this scheme is to take a chunk of physical memory and make it available to both a PCI device (via the memory's bus/physical address) and a user space application (via a call to mmap, supported by the driver). The PCI device will then continually fill this memory with data, and the user-space app will read it out. If ioremap is a bad way to do this (as Ben suggested below), I'm open to other suggestions that'll allow me to get any large chunk of memory that can be directly accessed by both hardware and software. I can probably make do with a smaller buffer also.
See my eventual solution below.
ioremap allocates uncacheable pages, as you'd desire for access to a memory-mapped-io device. That would explain your poor performance.
You probably want kmalloc or vmalloc. The usual reference materials will explain the capabilities of each.
I don't think ioremap() is what you want there. You should only access the result (what you call reservedBlock) with readb, readl, writeb, memcpy_toio etc. It is not even guaranteed that the return is virtually mapped (although it apparently is on your platform). I'd guess that the region is being mapped uncached (suitable for IO registers) leading to the terrible performance.
It's been a while, but I'm updating since I did eventually find a workaround for this ioremap problem.
Since we had custom hardware writing directly to the memory, it was probably more correct to mark it uncacheable, but it was unbearably slow and wasn't working for our application. Our solution was to only read from that memory (a ring buffer) once there was enough new data to fill a whole cache line on our architecture (I think that was 256 bytes). This guaranteed we never got stale data, and it was plenty fast.
I have tried out doing a huge memory chunk reservations with the memmap
The ioremapping of this chunk gave me a mapped memory address space which in beyond few tera bytes.
when you ask to reserve 128GB memory starting at 64 GB. you see the following in /proc/vmallocinfo
0xffffc9001f3a8000-0xffffc9201f3a9000 137438957568 0xffffffffa00831c9 phys=1000000000 ioremap
Thus the address space starts at 0xffffc9001f3a8000 (which is waay too large).
Secondly, Your observation is correct. even the memset_io results in a extremely large delays (in tens of minutes) to touch all this memory.
So, the time taken has to do mainly with address space conversion and non cacheable page loading.
I see the word "BUFFER" everywhere, but I am unable to grasp what it exactly is.
Would anybody please explain what is buffer in layman's language?
When is it used?
How is it used?
Imagine that you're eating candy out of a bowl. You take one piece regularly. To prevent the bowl from running out, someone might refill the bowl before it gets empty, so that when you want to take another piece, there's candy in the bowl.
The bowl acts as a buffer between you and the candy bag.
If you're watching a movie online, the web service will continually download the next 5 minutes or so into a buffer, that way your computer doesn't have to download the movie as you're watching it (which would cause hanging).
The term "buffer" is a very generic term, and is not specific to IT or CS. It's a place to store something temporarily, in order to mitigate differences between input speed and output speed. While the producer is being faster than the consumer, the producer can continue to store output in the buffer. When the consumer gets around to it, it can read from the buffer. The buffer is there in the middle to bridge the gap.
If you average out the definitions at http://en.wiktionary.org/wiki/buffer, I think you'll get the idea.
For proof that we really did "have to walk 10 miles thought the snow every day to go to school", see TOPS-10 Monitor Calls Manual Volume 1, section 11.9, "Using Buffered I/O", at bookmark 11-24. Don't read if you're subject to nightmares.
A buffer is simply a chunk of memory used to hold data. In the most general sense, it's usually a single blob of memory that's loaded in one operation, and then emptied in one or more, Perchik's "candy bowl" example. In a C program, for example, you might have:
#define BUFSIZE 1024
char buffer[BUFSIZE];
size_t len = 0;
// ... later
while((len=read(STDIN, &buffer, BUFSIZE)) > 0)
write(STDOUT, buffer, len);
... which is a minimal version of cp(1). Here, the buffer array is used to store the data read by read(2) until it's written; then the buffer is re-used.
There are more complicated buffer schemes used, for example a circular buffer, where some finite number of buffers are used, one after the next; once the buffers are all full, the index "wraps around" so that the first one is re-used.
Buffer means 'temporary storage'. Buffers are important in computing because interconnected devices and systems are seldom 'in sync' with one another, so when information is sent from one system to another, it has somewhere to wait until the recipient system is ready.
Really it would depend on the context in each case as there is no one definition - but speaking very generally a buffer is an place to temporarily hold something. The best real world analogy I can think of would be a waiting area. One simple example in computing is when buffer refers to a part of RAM used for temporary storage of data.
That a buffer is "a place to store something temporarily, in order to mitigate differences between input speed and output speed" is accurate, consider this as an even more "layman's" way of understanding it.
"To Buffer", the verb, has made its way into every day vocabulary. For example, when an Internet connection is slow and a Netflix video is interrupted, we even hear our parents say stuff like, "Give it time to buffer."
What they are saying is, "Hit pause; allow time for more of the video to download into memory; and then we can watch it without it stopping or skipping."
Given the producer / consumer analogy, Netflix is producing the video. The viewer is consuming it (watching it). A space on your computer where extra downloaded video data is temporarily stored is the buffer.
A video progress bar is probably the best visual example of this:
That video is 5:05. Its total play time is represented by the white portion of the bar (which would be solid white if you had not started watching it yet.)
As represented by the purple, I've actually consumed (watched) 10 seconds of the video.
The grey portion of the bar is the buffer. This is the video data that that is currently downloaded into memory, the buffer, and is available to you locally. In other words, even if your Internet connection where to be interrupted, you could still watch the area you have buffered.
Buffer is temporary placeholder (variables in many programming languages) in memory (ram/disk) on which data can be dumped and then processing can be done.
The term "buffer" is a very generic term, and is not specific to IT or CS. It's a place to store something temporarily, in order to mitigate differences between input speed and output speed. While the producer is being faster than the consumer, the producer can continue to store output in the buffer. When the consumer speeds up, it can read from the buffer. The buffer is there in the middle to bridge the gap.
A buffer is a data area shared by hardware devices or program processes that operate at different speeds or with different sets of priorities. The buffer allows each device or process to operate without being held up by the other. In order for a buffer to be effective, the size of the buffer and the algorithms for moving data into and out of the buffer.
buffer is a "midpoint holding place" but exists not so much to accelerate the speed of an activity as to support the coordination of separate activities.
This term is used both in programming and in hardware. In programming, buffering sometimes implies the need to screen data from its final intended place so that it can be edited or otherwise processed before being moved to a regular file or database.
Buffer is temporary placeholder (variables in many programming languages) in memory (ram/disk) on which data can be dumped and then processing can be done.
There are many advantages of Buffering like it allows things to happen in parallel, improve IO performance, etc.
It also has many downside if not used correctly like buffer overflow, buffer underflow, etc.
C Example of Character buffer.
char *buffer1 = calloc(5, sizeof(char));
char *buffer2 = calloc(15, sizeof(char));