Basically the audio cape is working. Except for one strange phenomena that mistifies me. I will try to explain.
When I play a .wav file for example speaker-test -t vaw -> if lucky I hear Front Left - Front right as one expects. But 9 out of 10, I hear white noise with the audio front left front right very faint in the background or at another time the sound is simply distorted. The same happens when I play a file with aplay or mplayer.
So when I am lucky, or timing with respect to system clock is in sync I hear the audio clearly, if out of sync it might me white noise or distorted playback.
I have google extensively and have not found any solution. So I hope one of you guys knows whats happening here. It has to be something low level.
I'm quite a newby in this matter but according to this: Troubleshooting Linux Sound all seams to work ok.
These are my system parameters and settings: root#beaglebone:~# lsb_release -a Distributor ID: Angstrom Description: Angstrom GNU/Linux v2012.12 (Core edition) Release: v2012.12 Codename: Core edition
root#beaglebone:~# cat /sys/devices/bone_capemgr*/slots 0: 54:PF---
1: 55:PF---
2: 56:P---L CBB-Relay,00A0,Logic_Supply,CBB-Relay
3: 57:PF---
4: ff:P-O-L Bone-LT-eMMC-2G,00A0,Texas Instrument,BB-BONE-EMMC-2G
5: ff:P-O-- Bone-Black-HDMI,00A0,Texas Instrument,BB-BONELT-HDMI
6: ff:P-O-L Bone-Black-HDMIN,00A0,Texas Instrument,BB-BONELT-HDMIN
7: ff:P-O-L Override Board Name,00A0,Override Manuf,BB-BONE-AUDI-02
root#beaglebone:~# speaker-test -t wav
speaker-test 1.0.25
Playback device is default Stream parameters are 48000Hz, S16_LE, 1 channels WAV file(s) Rate set to 48000Hz (requested 48000Hz) Buffer size range from 128 to 32768 Period size range from 8 to 2048 Using max buffer size 32768 Periods = 4 was set period_size = 2048 was set buffer_size = 32768
0 - Front Left
Time per period = 0.641097
0 - Front Left
root#beaglebone:~# mplayer AxelF.wav MPlayer2 2.0-379-ge3f5043 (C) 2000-2011 MPlayer Team 162 audio & 361 video codecs
Playing AxelF.wav. Detected file format: WAV format (libavformat) [wav # 0xb6082780]max_analyze_duration reached [lavf] stream 0: audio (pcm_s16le), -aid 0 Load subtitles in .
==============================================================[edit]
Forced audio codec: mad Opening audio decoder: [pcm] Uncompressed PCM audio decoder AUDIO: 44100 Hz, 2 ch, s16le, 1411.2 kbit/100.00% (ratio: 176400->176400) Selected audio codec: [pcm] afm: pcm (Uncompressed PCM)
==============================================================[edit]
AO: [alsa] 44100Hz 2ch s16le (2 bytes per sample) Video: no video Starting playback... A: 1.6 (01.6) of 15.9 (15.8) 0.3%
MPlayer interrupted by signal 2 in module: unknown
Exiting... (Quit)
I can shed some light on what is causing the artifacts that you experience. I am sorry I do not yet have a countermeasure - I am struggling with the same problem. You describe the perceptible consequences pretty accurately.
Sound data travels from the ARM System on Chip to the Audio Codec on the audio cape using the I2S bus. I2S is a serial protocol, it sends one bit at a time, starting each sample with the most significant bit, then sending all bits down to the least significant bit. After the least significant bit of one sample is sent, the most significant bit of the sample on the next audio channel is sent. To be able to interpret the bit stream, the receiving audio codec needs to know when a new sound sample starts with its most significant bit, and also, to which channel each sound sample belongs. For this purpose, the "Word Select" (WS) signal is part of I2S and changes its value to indicate the start of the sound sample and also identifies the channel, see this I2S timing diagram for a better understanding of the concept.
What you and I perceive on our not-quite-working audio capes can be fully explained by the bit stream being interpreted out-of-step by the audio codec:
When you hear loud noise and the target signal soft in the background, then one or more of the least significant bits of the preceding sample are interpreted as the most significant bits of the current sample. The more bits are shifted, the softer the target signal, until you might only perceive noise when (this is a guess!) about 4 bits are shifted.
When the shift is in the other direction, i.e. most significant bit of the current sample was interpreted as the least significant bit of the preceding sample, then what you hear will sound correct for soft parts of the signal, i.e. when the most significant bit is not actually used (this is a simplification, see below). For louder parts of the signal, e.g. drum beats, you will perceive the missing most significant bit as distortion. Of course, the distortion gets worse and starts at softer levels as more bits are shifted in this direction.
In the above paragraph, the most significant will change with the sign of the data, so the statement that it is not actually used is valid only insofar as the most significant bit will have the same value as the next most significant bit for soft sounds. See Two's Complement for an introduction how negative integers are represented in computers.
I am not sure, where the corruption occurs. It could be that the WS signal is not correctly interpreted by the Audio Codec on the cape, or the WS signal is not correctly sent by the ARM System-on-Chip, or the bit shift might happen already inside the ARM CPU, e.g. in the Alsa driver.
Related
I'm using an ESP8266 NodeMCU 12-E development board to capture audio from a pre-amplified electret microphone, then I upload it to the web where it will be converted to a wav file. My first thought was to cast the integer values of analogRead(A0) on the ESP8266 as String type, then concatenate them into a longer string payload which I can publish to an MQTT broker.
My MQTT client subscribers didn't seem to be getting proper sound files, because all I heard were series of rhythmic pops.
I decided to investigate if my code on the ESP8266 board was even capturing things properly. I stripped the code down to these few lines which seem to cause problems:
#include <ESP8266WiFi.h>
const char *ssid = "____"; // Change it
const char *pass = "____"; // Change it
void setup()
{
Serial.begin(115200);
Serial.println(0); //start
WiFi.mode(WIFI_STA);
WiFi.begin(ssid, pass);
}
void loop()
{
int analog = analogRead(A0);
if (analog > 255) {
analog = 255;
}
else if (analog < 0){
analog = 0;
}
Serial.print(String(analog));
Serial.print(" ");
}
Here's how I use the code above to produce a wav file to check if the sound is what I expect:
- I start up the ESP8266 development board
- I turn on the Serial Monitor and clear all previous output
- I power up my electret microphone and speak into it
- I power down my electret microphone
- I copy the contents of the Serial Monitor (which is a series of integers) into a text file called `audio.raw`
- I copy `audio.raw` to a linux machine that has ffmpeg installed
- I issue the command `ffmpeg -f u8 -ar 11111 -ac 1 -i audio.raw -y audio.wav` on the linux machine
When I listen to the audio.raw file, I hear my voice, but the speed is maybe 5-10 times faster than normal. (I also get a lot of noise and distortion, but that might be a separate issue with the input signal quality.)
I then tried changing this one line of code Serial.print(String(analog)) to Serial.print(analog). Then I repeated the steps above. But this time, my voice sounds like it is about 2 times faster than normal.
Why does changing this one line from Serial.print(String(analog)) to Serial.print(analog) make such a big difference?
Is it because the String() function is a very expensive operation that takes up a lot of time? And when the script needs more time to process each line of code, the script then has less time to capture enough analogRead(A0) data points? And if I run the same ffmpeg command using all the same flags, then ffmpeg will try to meet the -ar 11111 requirement by speeding up the audio play? Which would imply that my sampling rate is dependent on execution speed of my script? Which means I have to consider variable execution speeds across other boards of the same model due to variability in manufacturing precision, environmental temperature, etc...?
Your sampling rate is coupled to your loop implementation (as you have discovered). This will also cause jitter in your sampling rate as different code paths will take different amounts of time and interrupt service routines will also steal CPU cycles.
This jitter will be one of the causes of distortion in your output.
When I listen to the audio.raw file, I hear my voice, but the speed is maybe 5-10 times faster than normal.
The ESP8266 has a hardware UART so the code can potentially load the UART's FIFO buffer faster than it can output. This would be a source of the perceived faster sampling rate but also cause jitter or data loss when the buffer fills up. Depending on the implementation, when the buffer fills it will drop data or alternatively block (causing jitter).
Why does changing this one line from Serial.print(String(analog)) to Serial.print(analog) make such a big difference?
Is it because the String() function is a very expensive operation that takes up a lot of time? And when the script needs more time to process each line of code, the script then has less time to capture enough analogRead(A0) data points?
Yes, yes and yes.
One of the reasons for the performance difference is that String() involves allocating and managing memory on the heap to store the characters.
Serial.print(analog) uses a fixed size buffer on the stack as the code knows the maximum number of characters required to display an int.
And if I run the same ffmpeg command using all the same flags, then ffmpeg will try to meet the -ar 11111 requirement by speeding up the audio play?
Yes. ffmpeg assumes that the samples have a fixed sampling rate but this does not match the samples that are being printed out.
Which would imply that my sampling rate is dependent on execution speed of my script?
Yes!
Which means I have to consider variable execution speeds across other boards of the same model due to variability in manufacturing precision, environmental temperature, etc...?
Yes. There will be a multitude of variables that affect execution speeds.
What can you do?
Decouple the sampling of data from the code execution.
This can be done by implementing an Interrupt Service Routine. Tie the ISR to a hardware timer so it executes at a fixed sampling rate and avoiding jitter.
The ISR can write to a buffer which the code in loop() transmits over the serial connection. The ISR and serial transmission code need to manage the buffer to ensure that neither overrun the other. One means of doing this is to use alternate buffers that the ISR and transmission code use.
Since you use Serial.begin(115200) ESP8266 Microcontroller will transfer 115200 bits per second through serial port. Which is 115200 / 8 = 14400 bytes per second and that means since you use u8 (unsigned 8 bit) format for audio, each sample consists of a single byte. Just change the ffmpeg -ar parameter to 14400.
I don't any have microphones which i can connect to MCU for testing but it should work properly this way. The other -ac parameter is correct since it is mono channel audio.
Edit : Also don't use String() constructor while printing out to Serial.
While using Serial() constructor sound speeds up about 5 times because String converts your 1 byte value to 3 bytes, example ; byte : 255 -> String : "2", "5", "5" , you don't have to consider execution speed of Microcontroller, it will output 115200 bits per second as if you defined. You just need to consider it's output.
Finally delete the line
Serial.print(" ");
Also change
int analog = analogRead(A0);
to
byte analog = (byte)analogRead(A0);
since int consists of 4 bytes, you would not want to send extra 3 bytes to serial.
And after changing int to byte you can get rid this code block
if (analog > 255) {
analog = 255;
}
else if (analog < 0){
analog = 0;
}
If you connect ESP8266 to linux device through usb which has ffmpeg on it you can use
ttylog -b 115200 -d /dev/ttyUSB0 | ffmpeg -f u8 -ar 14400 -ac 1 -i - -y audio.wav
to capture audio data in realtime from ESP8266.
I'm trying to demodulate a signal using GNU Radio Companion. The signal is FSK (Frequency-shift keying), with mark and space frequencies at 1200 and 2200 Hz, respectively.
The data in the signal text data generated by a device called GeoStamp Audio. The device generates audio of GPS data fed into it in real time, and it can also decode that audio. I have the decoded text version of the audio for reference.
I have set up a flow graph in GNU Radio (see below), and it runs without error, but with all the variations I've tried, I still can't get the data.
The output of the flow graph should be binary (1s and 0s) that I can later convert to normal text, right?
Is it correct to feed in a wav audio file the way I am?
How can I recover the data from the demodulated signal -- am I missing something in my flow graph?
This is a FFT plot of the wav audio file before demodulation:
This is the result of the scope sink after demodulation (maybe looks promising?):
UPDATE (August 2, 2016): I'm still working on this problem (occasionally), and unfortunately still cannot retrieve the data. The result is a promising-looking string of 1's and 0's, but nothing intelligible.
If anyone has suggestions for figuring out the settings on the Polyphase Clock Sync or Clock Recovery MM blocks, or the gain on the Quad Demod block, I would greatly appreciate it.
Here is one version of an updated flow graph based on Marcus's answer (also trying other versions with polyphase clock recovery):
However, I'm still unable to recover data that makes any sense. The result is a long string of 1's and 0's, but not the right ones. I've tried tweaking nearly all the settings in all the blocks. I thought maybe the clock recovery was off, but I've tried a wide range of values with no improvement.
So, at first sight, my approach here would look something like:
What happens here is that we take the input, shift it in frequency domain so that mark and space are at +-500 Hz, and then use quadrature demod.
"Logically", we can then just make a "sign decision". I'll share the configuration of the Xlating FIR here:
Notice that the signal is first shifted so that the center frequency (middle between 2200 and 1200 Hz) ends up at 0Hz, and then filtered by a low pass (gain = 1.0, Stopband starts at 1 kHz, Passband ends at 1 kHz - 400 Hz = 600 Hz). At this point, the actual bandwidth that's still present in the signal is much lower than the sample rate, so you might also just downsample without losses (set decimation to something higher, e.g. 16), but for the sake of analysis, we won't do that.
The time sink should now show better values. Have a look at the edges; they are probably not extremely steep. For clock sync I'd hence recommend to just go and try the polyphase clock recovery instead of Müller & Mueller; chosing about any "somewhat round" pulse shape could work.
For fun and giggles, I clicked together a quick demo demod (GRC here):
which shows:
I am recording from a cable stream using the hdhomerun command line tool, hdhomerun_config, to a .ts file. The way it works is that you run the command, it produces periods every second or so to let you know that the stream is being successfully recorded. So when I record, it produces only periods, which is desired. And the way to end it is by doing a Ctrl-C. However, whenever I try to convert this to an avi or a mov using FFMpeg, it gives a bunch of errors, some of which being
[mpeg2video # 0x7fbb4401a000] Invalid frame dimensions 0x0
[mpegts # 0x7fbb44819600] PES packet size mismatch
[ac3 # 0x7fbb44015c00] incomplete frame
It still creates the file, but it is bad quality and it doesn't work with OpenCV and other services. Has anyone else encountered this problem? Does anyone have any knowledge that may help with this situation? I tried to trim the ts file but most things require conversion before editing. Thank you!
Warnings/errors like that are normal at the very start of the stream as the recording started mid stream (ie mid PES packet) and ffmpeg expects PES headers (ie the start of the PES packet). Once ffmpeg finds the next PES header it will be happy (0-500ms later in play time).
Short version is that it is harmless. You could eliminate the warnings/errors but removing all TS-frames for each ES until you hit a payload unit start flag, but that is what ffmpeg is already doing itself.
If you see additional warnings/errors after the initial/start then there might be a reception of packet loss issue that needs investigation.
On iOS 7, how do I get the current microphone input volume in a range between 0 and 1?
I've seen several approaches like this one, but the results I get baffle me.
The return values of peakPowerForChannel: are documented to be in the range of -160 to 0 with 0 being the loudest and -160 near absolute silence.
Problem: Given a quite room and a short but loud noise, the power goes all the way up in an instant but takes very long time to drop back to quite level (way longer than the actual noise...)
What I want: Essentially I want an exact copy of the Audio Input patch of Quartz Composer with its Volume Peak output. Any tips?
To get a similar volume peak measurement, you might have to input raw audio via the iOS Audio Queue API (or the RemoteIO Audio Unit), and analyze the raw PCM waveform samples in each audio callback, looking for a magnitude maxima over your desired frame width or analysis time.
I am creating a little bootloader+kernel and till now I managed to read disk, load second sector, load GDT, open A20 and enable pmode.
I jumped to the 32-bits function that show me a character on the screen, using the video memory for textual content (0x000B0000 - 0x000B7777)
pusha
mov edi, 0xB8000
mov bl, '.'
mov dl, bl
mov dh, 63
mov word [edi], dx
popa
Now, I would like to go a little further and draw a single pixel on the screen. As I read on some website, if I want to use the graphics mode of the VGA, I have to write my pixel at location 0x000A0000. Is that right?
Now, what is the format of a single pixel? For a single character you need ASCII code and attribute, but what do you need to define a pixel (if it works the same way as the textual mode)?
Unfortunately, it's a little more than a little further.
The rules for writing to video memory depend on the graphics mode. Among traditional video modes, VGA mode 320x200 (8bpp) is the only one where video memory behaves like a normal kind of memory: you write a byte corresponding to a pixel you want to the video buffer starting from 0xA000:0000 (or 0xA0000 linear), and that's all.
For other VGA (pre-SVGA) modes, the rules are more complicated: when you write a byte to video memory, you address a group of pixels, and some VGA registers which I have long since forgotten specify which planes of those pixels are updated and how the old value of them is used. It's not just memory any more.
There are SVGA modes (starting with 800x600x8bpp); you can switch to them in a hardware-independent way using VESA Video Bios Extensions. In those modes, video memory behaves like memory again, with 1,2,3 or 4 bytes per pixel and no VGA-like 8-pixel groups which you touch with one byte access. The problem is that the real-mode video buffer is not large enough any more to address the whole screen.
VESA VBE 1.2 addressed this problem by providing functions to modify the memory window base: in any particular moment, the segment at linear 0xA0000 is addressing 64Kb region of video memory, but you can control which 64Kb of the whole framebuffer are available at this address (minimal unit of base address adjustment, a.k.a window granularity, depends on the hardware, but you can rely on the ability to map N*64Kb offset at 0xA0000). The downside is that it requires VBE BIOS call each time when you start working with different 64Kb chunk.
VESA VBE 2.0 added flat framebuffer, available at some high address in protected mode (also in unreal mode). Thus VBE BIOS call is required for entering video mode, but not for drawing pixels.
VESA VBE 3.0, which might not be portable enough yet, provides a way to call VBE functions in protected mode. (I didn't have a chance to try it, it was not there during my "OS in assembly" age).
Anyway, you have to switch to graphics mode first. There are several variants of doing that:
The easiest thing to do is to use a BIOS call before you enter protected mode. With VBE 2.0, you won't need video memory window adjustment calls.
Another way is creating a V8086-mode environment which is good enough for BIOS. The hardest part is forwarding interrupts to real-mode interrupt handlers. It's not easy, but when it's done, you'll be able to switch video modes in PM and use some other BIOS functions (for disk I/O, for example).
Yet another way is to use VESA VBE 3.0 protected mode interface. No idea on how easy or complicated it might be.
And a real Jedi way is digging out the information on your specific video card, switching modes by setting its registers. Been there, done that for some Cirrus card in the past -- getting big plain framebuffer in PM was not too complicated. It's unportable, but maybe it's just what you need if the aim is understanding the internals of your machine.
It depends on the graphics mode in use, and there are a lot a differences. BIOS VGA video mode 13h (320x200 at 8 bits/pixel) is probably the easiest to get started with (and it's the only BIOS VGA video mode with 256 colors, however you can create your own modes by writing directly to the ports of the video card): in BIOS video mode 13h the video memory mapped to screen begins at 0x0A0000 and it runs continuosly 1 byte for each pixel, and only 1 bit plane, so each coordinate's memory address is 0x0A000 + 320*y + x:
To change to BIOS video mode 13h (320 x 200 at 8 bits/pixel) while in real mode:
mov ax,0x13
int 0x10
To draw a pixel in the upper left corner (in video mode 13h) while in protected mode:
mov edi,0x0A0000
mov al,0x0F ; the color of the pixel
mov [edi],al
org 100h
bits 16
cpu 386
section.text:
START:
mov ax,12h
int 10h
mov al,02h
mov ah,0ch
pixel.asm
c:\>nasm pixel.asm -f bin -o pixel.com
int 10h