How can I convert the RTP payload containing SILK-encoded audio into a file?

How can I convert the RTP payload containing SILK-encoded audio into a file? - wireshark

I have the pcap of a VoIP call involving SILK.
I'm able to see in Wireshark the RTP payload.
From the RTP headers I can understand the sample rate (e.g. 24 KHz) and the frame size (e.g. 20 ms).
What I'd like to do is extract the RTP payload and generate a file containing the SILK-encoded audio.
From the RTP payload format description I can see that in the case of storage in a file, each block of audio needs a block header, to specify the sample rate and block size (because the block size is variable and can be different on each frame).
How can I generate a file with the correct file header ("magic number") and add a block header for each block of audio?
I can use a few different programming languages so I'm mainly interested in the required algorithm, but would appreciate references to code implementations (or perhaps some existing tool?).

Use pcaputil of pjproject: Converts captured RTP packets in PCAP file to WAV file or play it to audio device. Can filter the PCAP file on source or destination IP or port, is able to deal with SRTP and supports all codecs in PJMEDIA, including SILK (have not tried this myself).
Examples:
pcaputil file.pcap output.wav
pcaputil -c AES_CM_128_HMAC_SHA1_80 -k VLDONbsbGl2Puqy+0PV7w/uGfpSPKFevDpxGsxN3 file.pcap output.wav

Related

Finding GOP structure and length

I set up a video stream and captured its packets(H264 over RTP). Looking at the Wireshark capture(decoded with type 96), I need to figure out the format of the GOP and the length of it. Problem is I can't tell which frame is I/P/B. Can I do this by looking at the Wireshark capture or do I need some sort of extension?

While you can get the NAL unit type of each frame quite easily by looking at the H.264 RTP payload format, I would recommend rather using a tool such as ffprobe to do the work for you:
ffprobe -show_frames -rtsp_transport tcp "<rtsp URI>" | grep -E 'pict_type'
which will output something like
pict_type=I
pict_type=P
pict_type=P
pict_type=P
In my example I use an RTSP stream but you should be able to adapt this to the RTP stream.

Advanced filtering in wireshark

I have a pcap file where I have a proprietary header from 13th byte to 110th byte. Is there a way I can strip of this portion from every packet in pcap file and then use wireshark to display the remaining packet ?

If you know for certain that every packet has the same proprietary header in the same location and is the same size, then you can use editcap to remove the unwanted bytes. For example:
editcap -C 12:98 file_with_prop_hdr.pcap file_without_prop_hdr.pcap

How to preserve timestamps in tshark output file?

I'm using tshark to extract specific TCP streams and write that to an output pcap file using the -w option.
But, the frames in the output pcap do not have any timestamps or delta times (they're all zero while in the original pcap there are timestamps and delta times for the frames).
Is there any way to ensure that the original timestamps (from the original pcap file) are preserved in the output pcap?
I'm using TShark 1.10.5 (SVN Rev 54262 from /trunk-1.10) on MacOS.
Thanks!

the frames in the output pcap do not have any timestamps or delta times (they're all zero while in the original pcap there are timestamps and delta times for the frames).
That is what is technically known as a "bug". Please file it as a bug on the Wireshark Bugzilla; if you can attach your original pcap file for testing purposes, that would be good. (If not, please run the file command on it and show the results, just so we know what file type the input file is - it might, for example, be a pcap-ng file rather than a pcap file).

Is it possible to split the recorded wav file into multiple wav files on iOS, given the duration of the splits?

I want to extract a few clips from the recorded wav file. I am not finding much help online regarding this issue. I understand we can't split from compressed formats like mp3, but how do we do it with caf/wav files?

One approach you may consider would be to calculate and read the bytes from an audio file and write them to a new file. Because you are dealing with LPCM formats the calculations are relatively simple.
If for example you have a file of 16bit mono LPCM audio sampled at 44.1kHz that is one minute in duration, then you have a total of (60 secs x 44100Hz) 2,646,000 samples. Times 2 bytes per sample gives a total of 5,292,000 bytes. And if you want audio from 10sec to 30sec then you need to read the bytes from 882,000 to 2,646,000 and write them to a separate file.
There is a bit of code involved but it can be done using Audio File Services Class from the AudioToolbox framework.
Functions you'll need to use are AudioFileOpenURL, AudioFileCreateWithURL, AudioFileReadBytes, AudioFileWriteBytes, and AudioFileClose.
An algorithm would be something like this-
You first set up an AudioFileID which is an opaque type that gets passed in to the AudioFileCreateWithURL function. Then open the file you wish to splice up using AudioFileOpenURL.
Calculate the start and end bytes of what you want to copy.
Next, in a loop preferably, read in the bytes and write them to file. AudioFileReadBytes and AudioFileWriteBytes allow you to do this. Whats good is that you can read and write whatever size bytes you decide on each iteration of the loop.
When finished close the new file and original using AudioFileClose.
Then repeat for each file (audio extraction) to be written.
On an additional note you would split a compressed format by converting the compressed format to LPCM first.

RTP H.264 Packet Depacketizer

Usually for videos the marker bit of RTP Packet indicates the last packet of the RTP.
So, with this it is guaranteed that I will receive 1 frame per packet or can receive more than one?
In the case beyond the depacketization I would have to make a parser to separate the H.264 frames?
If I can get more than one frame per RTP packetit is possible to get a piece of the next frame? Or all frames within the RTP packet even if more than one are completes?
Best regards,

RFC 6184 "RTP Payload Format for H.264 Video" has answers for the raised questions. It can be both ways: 2+ NAL units per packet, and 1 NAL unit fragmented over 2+ packets.
See quotes below:
5.7.1. Single-Time Aggregation Packet (STAP)
A single-time aggregation packet (STAP) SHOULD be used whenever NAL
units are aggregated that all share the same NALU-time.
and
5.8. Fragmentation Units (FUs)
This payload type allows fragmenting a NAL unit into several RTP
packets. Doing so on the application layer instead of relying on
lower-layer fragmentation (e.g., by IP) has the following advantages:

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart