Our hardware saves 1024 frames of 16 bit stereo. I wrote an ALSA driver for it. PCM data is captured using arecord and saved to a wav file.
arecord -Dhw:0,0 -t wav -f S16_LE -r 44100 -c 2 -d 2 -v rec.wav
For debugging, the hardware generates a ramp of 2048 values in bits 11:0 with the ISR counter in bits 15:12. The wav file is viewed in audacity.
Expected pcm for 8 periods:
Ramp start end
0h, 7ffh
1800h, 1fffh
2000h, 27ffh
3800h, 3fffh
4000h, 47ffh
5800h, 5fffh
6000h, 67ffh
7800h, 7fffh
As seen in audacity
0h, 7ffh
0h, 7ffh
2000h, 27ffh
2000h, 27ffh
4000h, 47ffh
4000h, 47ffh
6000h, 67ffh
6000h, 67ffh
PCM ramp in audacity
The 1st period of pcm is duplicated twice. The big jump in the ramp indicates that the 2nd period of pcm is not present. The 3rd period of pcm is duplicated. The 4th period of data is is not present. The 5th period of pcm is duplicated. The 6th period of data is is not present.
In struct snd_pcm_hardware, I have:
.channels_min = 2,
.channels_max = 2,
.buffer_bytes_max = (8*4096),
.period_bytes_min = 4096,
.period_bytes_max = (2*4096),
.periods_min = 1,
.periods_max = 2,
Since there's no DMA, I use struct snd_pcm_ops .copy_user to copy data to an ALSA buffer passed in .copy_user.
I tried the settings below, but arecord had an error.
.periods_min = 1,
.periods_max = 1,
arecord: set_params:1411: Can't use period equal to buffer size 1024 == 1024)
I tried increasing period_max to 4, and it resulted in 4 duplicates
.period_bytes_max = (4*4096),
.periods_min = 1,
.periods_max = 4,
As seen in audacity
0h, 7ffh
0h, 7ffh
0h, 7ffh
0h, 7ffh
4000h, 47ffh
4000h, 47ffh
4000h, 47ffh
4000h, 47ffh
I verified that the ramp values are as expected in the ISR handler. In .copy_user, I verified that the pos argument isn't the same as the previous copy_user call and that the ramp values are as expected. Why is it duplicating? What can I do to fix it? Thanks in advance.
Added: Info printed by arecord
Recording WAVE 'rec.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
Bufsz 8192, srate 44100, nch 2
Opening PCM
PCM Prepare
Format 2
Rate 44100
Channels 2
Buffer size 2048
Period size 1024
stream : CAPTURE
access : RW_INTERLEAVED
format : S16_LE
PCMptr 0
subformat : STD
channels : 2
rate : 44100
exact rate : 44100 (44100/1)
msbits : 16
buffer_size : 2048
period_size : 1024
period_time : 23219
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 1024
period_event : 0
start_threshold : 1
stop_threshold : 2048
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
Related
I am using the following code to read each frame from an HLS stream (I invoke init_camera, in the following, with an HLS URL).
I simply call read_camera_frame in a while loop (as fast as I can, since I believe the VideoCapture read should block and return frames at a rate that corresponds to the FPS of the video stream).
def init_camera(camera_id):
return cv2.VideoCapture(camera_id)
self.camera_cap = init_camera(self.image_info.get_camera_id())
def read_camera_frame(self):
syst = time.time_ns()
time_since_last_pub = (syst - self.last_pub_time)/1000000000
time_since_last_stat = (syst - self.last_stat_time)/1000000000
if time_since_last_stat > self.stat_report_interval:
fps = self.frames_collected_since_last_report/self.stat_report_interval
self.logger.info(f"Total Frames: {self.frame_cnt}"
f"Total Discards: {self.frame_discard_cnt}"
f" Frames Since Last Report: {self.frames_collected_since_last_report} "
f" FPS: {fps} "
)
self.frames_collected_since_last_report = 0
self.last_stat_time = syst
self.logger.info(f"CameraReader: read a frame {self.frame_cnt}")
ret, img = self.camera_cap.read()
if ret:
self.frame_cnt += 1
self.frames_collected_since_last_report += 1
ts = self.min_frame_pub_interval - time_since_last_pub
if ts > 0:
self.frame_discard_cnt += 1
return []
self.last_pub_time = syst
return [(img, [copy.deepcopy(self.image_info)])]
raise CameraReaderException("Failed To Read Frame")
The FPS for the video I am playing is just about 30.
fps = self.camera_cap.get(cv2.CAP_PROP_FPS)
self.logger.info(f"Source FPS {fps}")
yet I see frames read at around 130 per second.
Why is the VideoCapture read returning frames roughly 4 times faster than I expect?
I thought the VideoCapture read would read frames at the FPS for the video.
iOS 10+
iPhone: 5s & 6
Xcode: 9+
I'm recording audio using aLaw codec at 8 KHz samplerate with sample size 8 bits. I create an AudioQueue like this:
// create the queue
XThrowIfError(AudioQueueNewInput(
&mRecordFormat,
MyInputBufferHandler,
this /* userData */,
NULL /* run loop */,
kCFRunLoopCommonModes /* run loop mode */,
0 /* flags */,
&mQueue), "AudioQueueNewInput failed");
MyInputBufferHandler is the callback that is called every time a buffer (160 bytes every 20 ms) is filled. So I expect the callback is called every 20 ms. But when testing it, every 128 ms , MyInputBufferHandler callback is called 6 times in a burst. While I expect callback to be called every 20 ms.
My recording configuration is:
mRecordFormat.mSampleRate = 8000.0; // 8 KHz
mRecordFormat.mChannelsPerFrame = 1;
mRecordFormat.mBytesPerFrame = 1;
mRecordFormat.mBitsPerChannel = 8;
mRecordFormat.mBytesPerPacket = 1;
mRecordFormat.mFramesPerPacket = 1;
Can someone please help me out? Why is MyInputBufferHandler called every 128 ms instead of 20 ms? Samplerate of 8 KHz with a buffer of 160 bytes of recording, means every 20 ms calling MyInputBufferHandler and not every 128 ms!
It seems that AudioQueue is on top of AudioUnit and somehow can't control the internal buffer size no matter what buffer size you set on AudioQueue level. So by default, the internal buffer is at minimum set to 1024 bytes. So if you want a callback after 160 bytes of recording data, it won't.
So for those who run into the same problem, you need to use AudioUnit.
Links of a similar situation:
https://stackoverflow.com/a/4597409/1012775
https://stackoverflow.com/a/6687050/1012775
I am using FANN for function approximation. My code is here:
/*
* File: main.cpp
* Author: johannsebastian
*
* Created on November 26, 2013, 8:50 PM
*/
#include "../FANN-2.2.0-Source/src/include/doublefann.h"
#include "../FANN-2.2.0-Source/src/include/fann_cpp.h"
//#include <doublefann>
//#include <fann/fann_cpp>
#include <cstdlib>
#include <iostream>
using namespace std;
using namespace FANN;
//Remember: fann_type is double!
int main(int argc, char** argv) {
//create a test network: [1,2,1] MLP
neural_net * net = new neural_net;
const unsigned int layers[3] = {1, 2, 1};
net->create_standard_array(3, layers);
//net->create_standard(num_layers, num_input, num_hidden, num_output);
//net->set_learning_rate(0.7f);
//net->set_activation_steepness_hidden(0.7);
//net->set_activation_steepness_output(0.7);
net->set_activation_function_hidden(SIGMOID);
net->set_activation_function_output(SIGMOID);
net->set_training_algorithm(TRAIN_RPROP);
//cout<<net->get_train_error_function()
//exit(0);
//test the number 2
fann_type * testinput = new fann_type;
*testinput = 2;
fann_type * testoutput = new fann_type;
*testoutput = *(net->run(testinput));
double outputasdouble = (double) *testoutput;
cout << "Test output: " << outputasdouble << endl;
//make a training set of x->x^2
training_data * squaredata = new training_data;
squaredata->read_train_from_file("trainingdata.txt");
//cout<<testinput[0]<<endl;
//cout<<testoutput[0]<<endl;
cout<<*(squaredata->get_input())[9]<<endl;
cout<<*(squaredata->get_output())[9]<<endl;
cout<<squaredata->length_train_data();
//scale data
fann_type * scaledinput = new fann_type[squaredata->length_train_data()];
fann_type * scaledoutput = new fann_type[squaredata->length_train_data()];
for (unsigned int i = 0; i < squaredata->length_train_data(); i++) {
scaledinput[i] = *squaredata->get_input()[i]/200;///100;
scaledoutput[i] = *squaredata->get_output()[i]/200;///100;
cout<<"In:\t"<<scaledinput[i]<<"\t Out:\t"<<scaledoutput[i]<<endl;
}
net->train_on_data(*squaredata, 1000000, 100000, 0.001);
*testoutput = *(net->run(testinput));
outputasdouble = (double) *testoutput;
cout << "Test output: " << outputasdouble << endl;
cout << endl << "Easy!";
return 0;
}
Here's trainingdata.txt:
10 1 1
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
When I run I get this:
Test output: 0.491454
10
100
10In: 0.005 Out: 0.005
In: 0.01 Out: 0.02
In: 0.015 Out: 0.045
In: 0.02 Out: 0.08
In: 0.025 Out: 0.125
In: 0.03 Out: 0.18
In: 0.035 Out: 0.245
In: 0.04 Out: 0.32
In: 0.045 Out: 0.405
In: 0.05 Out: 0.5
Max epochs 1000000. Desired error: 0.0010000000.
Epochs 1. Current error: 2493.7961425781. Bit fail 10.
Epochs 100000. Current error: 2457.3000488281. Bit fail 9.
Epochs 200000. Current error: 2457.3000488281. Bit fail 9.
Epochs 300000. Current error: 2457.3000488281. Bit fail 9.
Epochs 400000. Current error: 2457.3000488281. Bit fail 9.
Epochs 500000. Current error: 2457.3000488281. Bit fail 9.
Epochs 600000. Current error: 2457.3000488281. Bit fail 9.
Epochs 700000. Current error: 2457.3000488281. Bit fail 9.
Epochs 800000. Current error: 2457.3000488281. Bit fail 9.
Epochs 900000. Current error: 2457.3000488281. Bit fail 9.
Epochs 1000000. Current error: 2457.3000488281. Bit fail 9.
Test output: 1
Easy!
RUN FINISHED; exit value 0; real time: 9s; user: 10ms; system: 4s
Why is the training not working? After I asked a similar question, I was told to scale the NN's input and output. I have done so. Am I getting some parameter(s) wrong, or do I simply have to train longer?
The node number in your hidden layer is too few to fit a quadratic function. I would try 10.
Besides, I would like to recommend you a fun applet in which you can simulate the training process by parameter setting. I tried with 10 hidden layer nodes and unipolar sigmoid as both hidden layer and output layer activation function, the fitting is not bad (but randomize the weights may lead to the failure of converge, so more nodes in hidden layer are highly recommended, you can try to play this applet yourself and observe some interesting points):
Maybe a bit late, but maybe new FANN beginner will see this answer, I hope this helps !
I think your problem comes from the data format in your trainingdata.txt:
See :
FANN data format
You have to do a newline after each input and each output.
In your case, you have 10 examples with 1 input and 1 output. Then, you have to format your file like this :
10 1 1
1
1
2
4
3
9
4
16
5
25
6
36
...
Note : I notice when the data format is wrong, the error computed by training method is very (very) high. Could be an hint to look at your file format when you see huge error value.
I want to write a player to play the music. I see the code like below:
AudioFileGetPropertyInfo(audioFile,
kAudioFilePropertyMagicCookieData, &size, nil);
if (size > 0) {
cookie = malloc(sizeof(char) * size);
AudioFileGetProperty(audioFile,
kAudioFilePropertyMagicCookieData, &size, cookie);
AudioQueueSetProperty(aduioQueue,
kAudioQueueProperty_MagicCookie, cookie, size);
free(cookie);
}
i don't know why to set the AudioQueueProperty,and what is the means about kAudioQueueProperty_MagicCookie? I can't find the help from the documentation.
who can give a direction to slove the problem.
Actually magic cookie is more than just a signature, it holds some information about the encoder, most useful items are "Maximum Bit Rate" and "Average Bit Rate", specially for a compressed format like AudioFileMPEG4Type. For this specific type magic cookie is same as "esds" box in MPEG-4 data file. You can find the exact bit settings at:
http://xhelmboyx.tripod.com/formats/mp4-layout.txt
8+ bytes vers. 2 ES Descriptor box
= long unsigned offset + long ASCII text string 'esds'
- if encoded to ISO/IEC 14496-10 AVC standards then optionally use:
= long unsigned offset + long ASCII text string 'm4ds'
-> 4 bytes version/flags = 8-bit hex version + 24-bit hex flags
(current = 0)
-> 1 byte ES descriptor type tag = 8-bit hex value 0x03
-> 3 bytes extended descriptor type tag string = 3 * 8-bit hex value
- types are Start = 0x80 ; End = 0xFE
- NOTE: the extended start tags may be left out
-> 1 byte descriptor type length = 8-bit unsigned length
-> 2 bytes ES ID = 16-bit unsigned value
-> 1 byte stream priority = 8-bit unsigned value
- Defaults to 16 and ranges from 0 through to 31
-> 1 byte decoder config descriptor type tag = 8-bit hex value 0x04
-> 3 bytes extended descriptor type tag string = 3 * 8-bit hex value
- types are Start = 0x80 ; End = 0xFE
- NOTE: the extended start tags may be left out
-> 1 byte descriptor type length = 8-bit unsigned length
-> 1 byte object type ID = 8-bit unsigned value
- type IDs are system v1 = 1 ; system v2 = 2
- type IDs are MPEG-4 video = 32 ; MPEG-4 AVC SPS = 33
- type IDs are MPEG-4 AVC PPS = 34 ; MPEG-4 audio = 64
- type IDs are MPEG-2 simple video = 96
- type IDs are MPEG-2 main video = 97
- type IDs are MPEG-2 SNR video = 98
- type IDs are MPEG-2 spatial video = 99
- type IDs are MPEG-2 high video = 100
- type IDs are MPEG-2 4:2:2 video = 101
- type IDs are MPEG-4 ADTS main = 102
- type IDs are MPEG-4 ADTS Low Complexity = 103
- type IDs are MPEG-4 ADTS Scalable Sampling Rate = 104
- type IDs are MPEG-2 ADTS = 105 ; MPEG-1 video = 106
- type IDs are MPEG-1 ADTS = 107 ; JPEG video = 108
- type IDs are private audio = 192 ; private video = 208
- type IDs are 16-bit PCM LE audio = 224 ; vorbis audio = 225
- type IDs are dolby v3 (AC3) audio = 226 ; alaw audio = 227
- type IDs are mulaw audio = 228 ; G723 ADPCM audio = 229
- type IDs are 16-bit PCM Big Endian audio = 230
- type IDs are Y'CbCr 4:2:0 (YV12) video = 240 ; H264 video = 241
- type IDs are H263 video = 242 ; H261 video = 243
-> 6 bits stream type = 3/4 byte hex value
- type IDs are object descript. = 1 ; clock ref. = 2
- type IDs are scene descript. = 4 ; visual = 4
- type IDs are audio = 5 ; MPEG-7 = 6 ; IPMP = 7
- type IDs are OCI = 8 ; MPEG Java = 9
- type IDs are user private = 32
-> 1 bit upstream flag = 1/8 byte hex value
-> 1 bit reserved flag = 1/8 byte hex value set to 1
-> 3 bytes buffer size = 24-bit unsigned value
-> 4 bytes maximum bit rate = 32-bit unsigned value
-> 4 bytes average bit rate = 32-bit unsigned value
-> 1 byte decoder specific descriptor type tag
= 8-bit hex value 0x05
-> 3 bytes extended descriptor type tag string
= 3 * 8-bit hex value
- types are Start = 0x80 ; End = 0xFE
- NOTE: the extended start tags may be left out
-> 1 byte descriptor type length
= 8-bit unsigned length
-> ES header start codes = hex dump
-> 1 byte SL config descriptor type tag = 8-bit hex value 0x06
-> 3 bytes extended descriptor type tag string = 3 * 8-bit hex value
- types are Start = 0x80 ; End = 0xFE
- NOTE: the extended start tags may be left out
-> 1 byte descriptor type length = 8-bit unsigned length
-> 1 byte SL value = 8-bit hex value set to 0x02
"
Magic Cookie that comes from kAudioFilePropertyMagicCookieData starts from ES Descriptor (just ignore the first 4 bytes described in the map and rest will be an exact match to magick cookie).
A sample magic cookie would be like this:
03 80 80 80 22 00 00 00 04 80 80 80 14 40 15 00 18 00 00 00 FA 00 00 00 FA 00 05 80 80 80 02 12 08 06 80 80 80 01 02
Maximum bit rate is at offset 18 -> 0XFA00 (or 64,000)
Average bit rate is at offset 22 -> 0XFA00 (or 64,000)
Although according to Apple documentation, magic cookie is read/write, but I had no chance changing the bit rate before creating or converting files.
Hope that helps someone.
The "magic cookie" is a file type signature consisting of a unique sequence of bytes at the beginning of the file, indicating the file format. The audio queue framework uses this information to determine how to decode or extract audio information from a file stream (instead of using or trusting the file name extension). The code you posted reads this set of bytes from the file, and passes it to the audio queue as a cookie. (It would be a mistake to let them be interpreted as PCM samples instead, for instance).
I'm trying the second day to send a midi signal. I'm using following code:
int pitchValue = 8191 //or -8192;
int msb = ?;
int lsb = ?;
UInt8 midiData[] = { 0xe0, msb, lsb};
[midi sendBytes:midiData size:sizeof(midiData)];
I don't understand how to calculate msb and lsb. I tried pitchValue << 8. But it's working incorrect, When I'm looking to events using midi tool I see min -8192 and +8064 max. I want to get -8192 and +8191.
Sorry if question is simple.
Pitch bend data is offset to avoid any sign bit concerns. The maximum negative deviation is sent as a value of zero, not -8192, so you have to compensate for that, something like this Python code:
def EncodePitchBend(value):
''' return a 2-tuple containing (msb, lsb) '''
if (value < -8192) or (value > 8191):
raise ValueError
value += 8192
return (((value >> 7) & 0x7F), (value & 0x7f))
Since MIDI data bytes are limited to 7 bits, you need to split pitchValue into two 7-bit values:
int msb = (pitchValue + 8192) >> 7 & 0x7F;
int lsb = (pitchValue + 8192) & 0x7F;
Edit: as #bgporter pointed out, pitch wheel values are offset by 8192 so that "zero" (i.e. the center position) is at 8192 (0x2000) so I edited my answer to offset pitchValue by 8192.