I am currently working on developing ROS nodes. One node is for parsing raw lidar data which is gained from .bag file and publish to a topic, and the other node subscribes the parsed data from the first node and publish a bunch of the parsed point cloud data by gathering for a specific time duration.
However, the point cloud data seems not to be published to nodes with its correct amount of data.
When I play the .bag file with 0.1 rate, the point cloud data seems not to have much loss data, but when it plays with its normal speed, then it much data gets lost.
For example,
rosbag play data.bag
: (result) point cloud # 60000~80000
(normal speed)
rosbag play data.bag --rate=0.1
: (result) point cloud # 1000~1100
(10times slower speed)
according to the result above, when the data is played with the normal speed it loses almost 50% of its data.
Do you have any idea to get the full data with its normal play speed?
Related
I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.
Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.
for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.
does any one have any idea how to find the offset by comparing two audio files and sync with the video.?
Here's a concise, clear answer.
• It's not easy - it will involve signal processing and math.
• A quick Google gives me this solution, code included.
• There is more info on the above technique here.
• I'd suggest gaining at least a basic understanding before you try and port this to iOS.
• I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
• I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.
Edit
As an aside, I think it's worth taking a step back for a second. While
math and fancy signal processing like this can give great results, and
do some pretty magical stuff, there can be outlying cases where the
algorithm falls apart (hopefully not often).
What if, instead of getting complicated with signal processing,
there's another way? After some thought, there might be. If you meet
all the following conditions:
• You are in control of the server component (audio broadcaster
device)
• The broadcaster is aware of the 'real audio' recording
latency
• The broadcaster and receiver are communicating in a way
that allows accurate time synchronisation
...then the task of calculating audio offset becomes reasonably
trivial. You could use NTP or some other more accurate time
synchronisation method so that there is a global point of reference
for time. Then, it is as simple as calculating the difference between
audio stream time codes, where the time codes are based on the global
reference time.
This could prove to be a difficult problem, as even though the signals are of the same event, the presence of noise makes a comparison harder. You could consider running some post-processing to reduce the noise, but noise reduction in its self is an extensive non-trivial topic.
Another problem could be that the signal captured by the two devices could actually differ a lot, for example the good quality audio (i guess output from the live mix console?) will be fairly different than the live version (which is guess is coming out of on stage monitors/ FOH system captured by a phone mic?)
Perhaps the simplest possible approach to start would be to use cross correlation to do the time delay analysis.
A peak in the cross correlation function would suggest the relative time delay (in samples) between the two signals, so you can apply the shift accordingly.
I don't know a lot about the subject, but I think you are looking for "audio fingerprinting". Similar question here.
An alternative (and more error-prone) way is running both sounds through a speech to text library (or an API) and matching relevant part. This would be of course not very reliable. Sentences frequently repeat in songs and concert maybe instrumental.
Also, doing audio processing on a mobile device may not play well (because of low performance or high battery drain or both). I suggest you to use a server if you go that way.
Good luck.
Every tutorial about Apache Flume gives example of "logs getting continuously generated" as the example.
I am curious if Flume works only on text data or it can also work with streaming data like audio, video, electronic sensor inputs ?
Because irrespective of data type it is all bytes array.
It is designed for text data streaming. It is possible to provide a schema definition for text data, so that the consumer of the data can process it after receiving it. Consumer of data can scale horizontally with the increasing size of data, still making use of commodity hardware(moderate cores/ RAM). However for binary data the reconstruction and parsing would be a heavily resource intensive operation.
I am trying to create a RTSP client which live broadcast Audio and Video. I modified the iOS code at link http://www.gdcl.co.uk/downloads.htm and able to broadcast the Video to server properly. But now i am facing issues in broadcasting the audio part. In the link example the code is written in such a way that it writes the Video data to file and than reads the data from the file and upload the NALU's video packets to RTSP server.
For Audio part i am not sure how to proceed on it. Right now what i have tried is that get the audio buffer from mic and than broadcast it to the server directly by adding RTP headers and ALU.. but This approach is not properly working as Audio starts lagging behind and lag increases with time. Can someone let me know if there is some better approach to achieve this and with lip sycn audio/video.
Are you losing any packets on the client? If so, you need to leave "space." If you receive packet 1,2,3,4,6,7, You need to leave space for the missing packet (5).
The other possibility is a what is known as a clock drift problem. The clock (crystal) on your client and server are not perfectly in sync with each other.
This can be caused by environment, temperature changes, etc.
Let's say in a perfect world your server is producing audio samples 20ms audio samples at 48000 hz. Your client is playing them back using a sample rate of 48000 hz. Realistically your client and server are not exactly 48000hz. Your server might be 48000.001 and your client might be 47999.9998. So your server might be delivering faster than your client or vise versa. You would either consume packets too fast and under run the buffer or lag too far behind and overflow the client buffer. In your case, it sounds like the client is playing back too slow and slowly lagging behind the server. You might only lag a couple milliseconds per minute but the issue will keep continuing and it will look like a 1970s lip synced Kung Fu movie.
In other devices, there is often a common clock line to keep things in sync. For example, Video camera clocks, midi clocks. multitrack recorder clocks.
When you deliver data over IP, there is no common clock shared between a client and server. So your issue concerns syncing clocks between disparate devices with no. I have successfully solved this problem using this general approach:
A) Let the client count the rate of packets that come in over a period of time.
B) Let the client count the rate that the packets are consumed (played back).
C) Adjust the sample rate of the client based on A and B.
So your client requires that you adjust the sample rate of the playback. So yes you play it faster or slower. Note that the playback rate change will be very very subtle. You might set the sample rate to be 48000.0001 hz instead of 48000 hz. The difference in pitch would be undetectable by humans as it would only cause a fraction a cent difference in pitch. I gave an explanation of a very simplified approach. There many other nuances and edge cases that must be considered when developing such a control system. You don't just set it and forget it. You need a control system to manage the playback.
An interesting test to demonstrate this is to take two devices with the exact same file. A long recording (say 3 hours) is best. Start them at the same time. After 3 hours of playback, you will notice that one is ahead of the other.
This post explains that it is NOT a trivial task to stream audio and video.
I have a question regarding the synchronization of 2 Directsound streams.
To record and play sound I currently use Portaudio to open 2 Directsound streams.
There are 2 callback functions which are called every time the input buffer is filled and the output buffer needs data.
Now here`s my problem...
The input stream is running at 48kHz samplerate (#1024 samples). The output stream is running at 192kHz samplerate (#4096 samples). Every time the input buffer is filled and the callback is called I do some DSP and after that I convert the result to 192kHz. The output stream takes the result and outputs the data. Now the 2 streams are running completely out of sync.
I have looked through the entire Portaudio API but I cant`t find a sync option to lock the 2 streams together.
Is there any way to lock 2 Directsound streams? I really need 48kHz input and 192kHz output.
Br,
Vincent Bruinink.
The thing is that you can't really open two streams "at the same time", nor can you open two devices (or even one device at two different sample rates) and expect them to stay truly in sync, even if they were, at one time, in sync. To understand why, you may want to read something about how audio works on a computer. You may also want to read this document, which is specific to PortAudio.
As an alternative, you may want to consider opening a single device in a single stream and using software sample-rate conversion.
I have a DirectShow push source filter and a DirectShow simple audio mixer filter both written in Delphi 6 with the help of the DSPACK component library. In my app, I build a filter graph manually and for the pin connections I use IFilterGraph.ConnectDirect() to avoid any interference from DirectShow's "intelligent connection" technology. I am using both of those filters as private/unregistered filters internal to my program.
The graph I build has a capture filter and my push source audio filter sharing the head position of the graph. Their output pins are connected to my simple audio mixer, the latter supporting multiple input connections. The mixer forces all connections to its input and output pins to be the exact same media format type that is preset in its constructor. In this case the format setting I'm using is WAV format with a sample rate of 8000, 16 bits per sample, and one channel. Note, I am using DecideBufferSize() to set all filters to a buffer size of 50 milliseconds. This results in buffers being delivered that are 400 bytes (200 samples) large.
The capture filter is an external COM object that I find using the DirectShow API. Currently I am assigning my VOIP phone as the device (Moniker). For some strange reason my push source filter is pumping out buffers at a rate of exactly 7 times that of the capture filter. In other words, my mixer filter is getting 7 buffers from my push source filter for each buffer it receives from the capture filter. I know this because I debug print a line every time the mixer filter gets a buffer and I identify the filter that is the source of the buffer.
I don't know how the capture filter is forming its timestamps since it is external code, but I would expect its the usual scheme. My push source filter starts at zero and with each FillBuffer() call increments the timestamp in DirectShow reference time format by the amount of time the buffer represents.
Here are my questions:
1) Should the timestamps even matter if I am building the graph manually? Does DirectShow get in-between the filters and can somehow affect the timing of pin writes (Receive calls) even if you build the graph completely manually?
2) What common mistake could cause a filter to push out buffers too fast, despite a homogeneous media format all around the graph?
In DirectShow source/push fitlers are normally either live or non-live. Both inject data into pipeline, and the important difference is that a live filter streams data as soon as possible, as soon as it generates, receives from outside of pipeline (such as from network) etc.
A non-live filter pushes as much data as it can. A fitler that plays 5 minutes long MP3 file? It is prepared to inject all five minutes at once. It is a task of a renderer filter to block streaming when no more buffers available and to honor presentation time. So when source filter loads 100% of buffers, it just cannot push anything any more until buffers are released by playback.
The important part of this behavior is to timestamp media samples correctly. If one fails to time stamp, the renderer would not be able to present data on time, and could be showing/playing media too slow, or too fast.