I'm using an ESP8266 NodeMCU 12-E development board to capture audio from a pre-amplified electret microphone, then I upload it to the web where it will be converted to a wav file. My first thought was to cast the integer values of analogRead(A0) on the ESP8266 as String type, then concatenate them into a longer string payload which I can publish to an MQTT broker.
My MQTT client subscribers didn't seem to be getting proper sound files, because all I heard were series of rhythmic pops.
I decided to investigate if my code on the ESP8266 board was even capturing things properly. I stripped the code down to these few lines which seem to cause problems:
#include <ESP8266WiFi.h>
const char *ssid = "____"; // Change it
const char *pass = "____"; // Change it
void setup()
{
Serial.begin(115200);
Serial.println(0); //start
WiFi.mode(WIFI_STA);
WiFi.begin(ssid, pass);
}
void loop()
{
int analog = analogRead(A0);
if (analog > 255) {
analog = 255;
}
else if (analog < 0){
analog = 0;
}
Serial.print(String(analog));
Serial.print(" ");
}
Here's how I use the code above to produce a wav file to check if the sound is what I expect:
- I start up the ESP8266 development board
- I turn on the Serial Monitor and clear all previous output
- I power up my electret microphone and speak into it
- I power down my electret microphone
- I copy the contents of the Serial Monitor (which is a series of integers) into a text file called `audio.raw`
- I copy `audio.raw` to a linux machine that has ffmpeg installed
- I issue the command `ffmpeg -f u8 -ar 11111 -ac 1 -i audio.raw -y audio.wav` on the linux machine
When I listen to the audio.raw file, I hear my voice, but the speed is maybe 5-10 times faster than normal. (I also get a lot of noise and distortion, but that might be a separate issue with the input signal quality.)
I then tried changing this one line of code Serial.print(String(analog)) to Serial.print(analog). Then I repeated the steps above. But this time, my voice sounds like it is about 2 times faster than normal.
Why does changing this one line from Serial.print(String(analog)) to Serial.print(analog) make such a big difference?
Is it because the String() function is a very expensive operation that takes up a lot of time? And when the script needs more time to process each line of code, the script then has less time to capture enough analogRead(A0) data points? And if I run the same ffmpeg command using all the same flags, then ffmpeg will try to meet the -ar 11111 requirement by speeding up the audio play? Which would imply that my sampling rate is dependent on execution speed of my script? Which means I have to consider variable execution speeds across other boards of the same model due to variability in manufacturing precision, environmental temperature, etc...?
Your sampling rate is coupled to your loop implementation (as you have discovered). This will also cause jitter in your sampling rate as different code paths will take different amounts of time and interrupt service routines will also steal CPU cycles.
This jitter will be one of the causes of distortion in your output.
When I listen to the audio.raw file, I hear my voice, but the speed is maybe 5-10 times faster than normal.
The ESP8266 has a hardware UART so the code can potentially load the UART's FIFO buffer faster than it can output. This would be a source of the perceived faster sampling rate but also cause jitter or data loss when the buffer fills up. Depending on the implementation, when the buffer fills it will drop data or alternatively block (causing jitter).
Why does changing this one line from Serial.print(String(analog)) to Serial.print(analog) make such a big difference?
Is it because the String() function is a very expensive operation that takes up a lot of time? And when the script needs more time to process each line of code, the script then has less time to capture enough analogRead(A0) data points?
Yes, yes and yes.
One of the reasons for the performance difference is that String() involves allocating and managing memory on the heap to store the characters.
Serial.print(analog) uses a fixed size buffer on the stack as the code knows the maximum number of characters required to display an int.
And if I run the same ffmpeg command using all the same flags, then ffmpeg will try to meet the -ar 11111 requirement by speeding up the audio play?
Yes. ffmpeg assumes that the samples have a fixed sampling rate but this does not match the samples that are being printed out.
Which would imply that my sampling rate is dependent on execution speed of my script?
Yes!
Which means I have to consider variable execution speeds across other boards of the same model due to variability in manufacturing precision, environmental temperature, etc...?
Yes. There will be a multitude of variables that affect execution speeds.
What can you do?
Decouple the sampling of data from the code execution.
This can be done by implementing an Interrupt Service Routine. Tie the ISR to a hardware timer so it executes at a fixed sampling rate and avoiding jitter.
The ISR can write to a buffer which the code in loop() transmits over the serial connection. The ISR and serial transmission code need to manage the buffer to ensure that neither overrun the other. One means of doing this is to use alternate buffers that the ISR and transmission code use.
Since you use Serial.begin(115200) ESP8266 Microcontroller will transfer 115200 bits per second through serial port. Which is 115200 / 8 = 14400 bytes per second and that means since you use u8 (unsigned 8 bit) format for audio, each sample consists of a single byte. Just change the ffmpeg -ar parameter to 14400.
I don't any have microphones which i can connect to MCU for testing but it should work properly this way. The other -ac parameter is correct since it is mono channel audio.
Edit : Also don't use String() constructor while printing out to Serial.
While using Serial() constructor sound speeds up about 5 times because String converts your 1 byte value to 3 bytes, example ; byte : 255 -> String : "2", "5", "5" , you don't have to consider execution speed of Microcontroller, it will output 115200 bits per second as if you defined. You just need to consider it's output.
Finally delete the line
Serial.print(" ");
Also change
int analog = analogRead(A0);
to
byte analog = (byte)analogRead(A0);
since int consists of 4 bytes, you would not want to send extra 3 bytes to serial.
And after changing int to byte you can get rid this code block
if (analog > 255) {
analog = 255;
}
else if (analog < 0){
analog = 0;
}
If you connect ESP8266 to linux device through usb which has ffmpeg on it you can use
ttylog -b 115200 -d /dev/ttyUSB0 | ffmpeg -f u8 -ar 14400 -ac 1 -i - -y audio.wav
to capture audio data in realtime from ESP8266.
Related
I am trying to stream sampled values from 8 bit ADC through UART on the STM32 nucleo board.
I use ADC with DMA. Sample rate is around 6kHz to fill a buffer with 100 converted values takes me around 17 ms.
After that I want to send those values through UART with baudrate 115200. Since the ADC converted value is HALF_WORD for 100 converted values I have to send 1600 bits. That means I can send them for 14 ms without overwritting data.
This is my attempt in code:
/* Private variables*/
#define ADC_BUF_LEN 100
uint16_t adc_buf[ADC_BUF_LEN];
uint8_t flag = 0;
/* USER CODE BEGIN 2 */
HAL_ADC_Start_DMA(&hadc, (uint32_t*)adc_buf, ADC_BUF_LEN);
HAL_TIM_Base_Start(&htim2);
while (1)
{
if (flag==1)
{
HAL_UART_Transmit(&huart4,(uint8_t*)adc_buf,100,1);
flag = 0;
HAL_GPIO_TogglePin(GPIOA,GPIO_PIN_9);
}
else
{}
}
void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef* hadc)
{
HAL_GPIO_TogglePin(GPIOA,LED_GREEN_Pin);
flag = 1;
}
I have attached picture with the transmitted data to the terminal.
For input the ADC meet 1 kHz sine wave 2 V p-pk.
I can see with naked eye that my system is not working.
If i plot that data it wont be sine wave.
The project is for EMG signal processing: I need to sample the signal and then process it in Python.
Setting the Timeout parameter of HAL_UART_Transmit to 1 is not correct. You have already calculated it is going to take 14 ms! This means the function will give up and return after only a small proportion of the data has transmitted.
To do this more than once without gaps in the data you are going to need to use DMA on both the ADC and UART at the same time.
Enable the half-transfer interrupt for the ADC DMA, or poll for the half-transfer flag. When you receive it start the UART in DMA mode on the first half of the buffer. It should complete in 7ms, which is 1.5ms before the ADC DMA starts overwriting the data it contains. When you get the ADC DMA complete interrupt or flag, start the UART DMA on the second half of the buffer.
Alternatively, the DMA on most STM32 also support "double-buffer" mode which works more-or-less the same but you only use the complete interrupt and you have two separate data pointers rather than calculating the offset of half a buffer.
When I compile a simple Blink sketch on Arduino for ESP8266, it looks like 38% of the memory is used by something:
Global variables use 31,576 bytes (38%) of dynamic memory, leaving 50,344 bytes for local variables. Maximum is 81,920 bytes.
Where does this memory go? I have an application that requires a lot of memory and wanted to see if I can disable / reduce usage by some Arduino built-in libraries.
Code below:
void setup() {
pinMode(LED_BUILTIN, OUTPUT);
// Initialize the LED_BUILTIN pin as an output
}
void loop() {
digitalWrite(LED_BUILTIN, LOW);
// Turn the LED on (Note that LOW is the voltage level
// but actually the LED is on; this is because
// it is acive low on the ESP-01)
delay(1000);
// Wait for a second
digitalWrite(LED_BUILTIN, HIGH);
// Turn the LED off by making the voltage HIGH
delay(2000);
// Wait for two seconds (to demonstrate the active low LED)
}
It is used by the variables that you initialize and firmware libs. If you want to write a code that is longer, you're going need more memory. By using the basic library for ESP it already occupies some memory for the configuration and firmware settings. If you use less variables and a simple logic in your program, that's going to drastically reduce your program size. Actually, it'll take less memory even for bigger programs since all the libraries are included for a larger program is also same.
But if it's really large concentrate on your logics and reduce the stress for the ESP and give it to a mainframe computer to do the complex calculations and logics(also helps in less consumption of power and less heat dissipation).
I'm trying to demodulate a signal using GNU Radio Companion. The signal is FSK (Frequency-shift keying), with mark and space frequencies at 1200 and 2200 Hz, respectively.
The data in the signal text data generated by a device called GeoStamp Audio. The device generates audio of GPS data fed into it in real time, and it can also decode that audio. I have the decoded text version of the audio for reference.
I have set up a flow graph in GNU Radio (see below), and it runs without error, but with all the variations I've tried, I still can't get the data.
The output of the flow graph should be binary (1s and 0s) that I can later convert to normal text, right?
Is it correct to feed in a wav audio file the way I am?
How can I recover the data from the demodulated signal -- am I missing something in my flow graph?
This is a FFT plot of the wav audio file before demodulation:
This is the result of the scope sink after demodulation (maybe looks promising?):
UPDATE (August 2, 2016): I'm still working on this problem (occasionally), and unfortunately still cannot retrieve the data. The result is a promising-looking string of 1's and 0's, but nothing intelligible.
If anyone has suggestions for figuring out the settings on the Polyphase Clock Sync or Clock Recovery MM blocks, or the gain on the Quad Demod block, I would greatly appreciate it.
Here is one version of an updated flow graph based on Marcus's answer (also trying other versions with polyphase clock recovery):
However, I'm still unable to recover data that makes any sense. The result is a long string of 1's and 0's, but not the right ones. I've tried tweaking nearly all the settings in all the blocks. I thought maybe the clock recovery was off, but I've tried a wide range of values with no improvement.
So, at first sight, my approach here would look something like:
What happens here is that we take the input, shift it in frequency domain so that mark and space are at +-500 Hz, and then use quadrature demod.
"Logically", we can then just make a "sign decision". I'll share the configuration of the Xlating FIR here:
Notice that the signal is first shifted so that the center frequency (middle between 2200 and 1200 Hz) ends up at 0Hz, and then filtered by a low pass (gain = 1.0, Stopband starts at 1 kHz, Passband ends at 1 kHz - 400 Hz = 600 Hz). At this point, the actual bandwidth that's still present in the signal is much lower than the sample rate, so you might also just downsample without losses (set decimation to something higher, e.g. 16), but for the sake of analysis, we won't do that.
The time sink should now show better values. Have a look at the edges; they are probably not extremely steep. For clock sync I'd hence recommend to just go and try the polyphase clock recovery instead of Müller & Mueller; chosing about any "somewhat round" pulse shape could work.
For fun and giggles, I clicked together a quick demo demod (GRC here):
which shows:
Basically the audio cape is working. Except for one strange phenomena that mistifies me. I will try to explain.
When I play a .wav file for example speaker-test -t vaw -> if lucky I hear Front Left - Front right as one expects. But 9 out of 10, I hear white noise with the audio front left front right very faint in the background or at another time the sound is simply distorted. The same happens when I play a file with aplay or mplayer.
So when I am lucky, or timing with respect to system clock is in sync I hear the audio clearly, if out of sync it might me white noise or distorted playback.
I have google extensively and have not found any solution. So I hope one of you guys knows whats happening here. It has to be something low level.
I'm quite a newby in this matter but according to this: Troubleshooting Linux Sound all seams to work ok.
These are my system parameters and settings: root#beaglebone:~# lsb_release -a Distributor ID: Angstrom Description: Angstrom GNU/Linux v2012.12 (Core edition) Release: v2012.12 Codename: Core edition
root#beaglebone:~# cat /sys/devices/bone_capemgr*/slots 0: 54:PF---
1: 55:PF---
2: 56:P---L CBB-Relay,00A0,Logic_Supply,CBB-Relay
3: 57:PF---
4: ff:P-O-L Bone-LT-eMMC-2G,00A0,Texas Instrument,BB-BONE-EMMC-2G
5: ff:P-O-- Bone-Black-HDMI,00A0,Texas Instrument,BB-BONELT-HDMI
6: ff:P-O-L Bone-Black-HDMIN,00A0,Texas Instrument,BB-BONELT-HDMIN
7: ff:P-O-L Override Board Name,00A0,Override Manuf,BB-BONE-AUDI-02
root#beaglebone:~# speaker-test -t wav
speaker-test 1.0.25
Playback device is default Stream parameters are 48000Hz, S16_LE, 1 channels WAV file(s) Rate set to 48000Hz (requested 48000Hz) Buffer size range from 128 to 32768 Period size range from 8 to 2048 Using max buffer size 32768 Periods = 4 was set period_size = 2048 was set buffer_size = 32768
0 - Front Left
Time per period = 0.641097
0 - Front Left
root#beaglebone:~# mplayer AxelF.wav MPlayer2 2.0-379-ge3f5043 (C) 2000-2011 MPlayer Team 162 audio & 361 video codecs
Playing AxelF.wav. Detected file format: WAV format (libavformat) [wav # 0xb6082780]max_analyze_duration reached [lavf] stream 0: audio (pcm_s16le), -aid 0 Load subtitles in .
==============================================================[edit]
Forced audio codec: mad Opening audio decoder: [pcm] Uncompressed PCM audio decoder AUDIO: 44100 Hz, 2 ch, s16le, 1411.2 kbit/100.00% (ratio: 176400->176400) Selected audio codec: [pcm] afm: pcm (Uncompressed PCM)
==============================================================[edit]
AO: [alsa] 44100Hz 2ch s16le (2 bytes per sample) Video: no video Starting playback... A: 1.6 (01.6) of 15.9 (15.8) 0.3%
MPlayer interrupted by signal 2 in module: unknown
Exiting... (Quit)
I can shed some light on what is causing the artifacts that you experience. I am sorry I do not yet have a countermeasure - I am struggling with the same problem. You describe the perceptible consequences pretty accurately.
Sound data travels from the ARM System on Chip to the Audio Codec on the audio cape using the I2S bus. I2S is a serial protocol, it sends one bit at a time, starting each sample with the most significant bit, then sending all bits down to the least significant bit. After the least significant bit of one sample is sent, the most significant bit of the sample on the next audio channel is sent. To be able to interpret the bit stream, the receiving audio codec needs to know when a new sound sample starts with its most significant bit, and also, to which channel each sound sample belongs. For this purpose, the "Word Select" (WS) signal is part of I2S and changes its value to indicate the start of the sound sample and also identifies the channel, see this I2S timing diagram for a better understanding of the concept.
What you and I perceive on our not-quite-working audio capes can be fully explained by the bit stream being interpreted out-of-step by the audio codec:
When you hear loud noise and the target signal soft in the background, then one or more of the least significant bits of the preceding sample are interpreted as the most significant bits of the current sample. The more bits are shifted, the softer the target signal, until you might only perceive noise when (this is a guess!) about 4 bits are shifted.
When the shift is in the other direction, i.e. most significant bit of the current sample was interpreted as the least significant bit of the preceding sample, then what you hear will sound correct for soft parts of the signal, i.e. when the most significant bit is not actually used (this is a simplification, see below). For louder parts of the signal, e.g. drum beats, you will perceive the missing most significant bit as distortion. Of course, the distortion gets worse and starts at softer levels as more bits are shifted in this direction.
In the above paragraph, the most significant will change with the sign of the data, so the statement that it is not actually used is valid only insofar as the most significant bit will have the same value as the next most significant bit for soft sounds. See Two's Complement for an introduction how negative integers are represented in computers.
I am not sure, where the corruption occurs. It could be that the WS signal is not correctly interpreted by the Audio Codec on the cape, or the WS signal is not correctly sent by the ARM System-on-Chip, or the bit shift might happen already inside the ARM CPU, e.g. in the Alsa driver.
I want to send signal with data rate (3.84 M) using USRP1, but when I transmit the signal it tells me some thing like this in the terminal :
WARNING
Target data rate: 3840000 bps
Actual data rate: 4000000 bps
but I'm trying to implement TX working with the UMTS air interface and I don't want this error in the data rate,
anyone can help?????
Your sample rate is dependent on the master clock rate you are using with your USRP. Your USRP1 has a master clock rate of 64 MHz, and you can only sample at integer decimations of that value, by default, which is why you cannot sample at 3.84 MSps.
UHD is auto-correcting your requested sample rate to a rate that is supported by your USRP, for you. This is actually desirable behavior.
You have two options:
Replace the clock on the USRP1 that will divide down to the rate you want.
Use a rational re-sampler. GNURadio provides this block for you, if you want to use it.
I would suggest using a rational resampler before attempting a hardware mod, which may permanently destroy your USRP if you do it incorrectly.