I'm using ffmpeg to read an h264 RTSP stream from a Cisco 3050 IP camera and reencode it to disk as h264 (there are reasons why I'm not just using -codec:copy).
The ffmpeg version is as follows:
ffmpeg version 3.2.6 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 6.3.0 (Alpine 6.3.0)
I've also tried with ffmpeg 2.8.14-0ubuntu0.16.04.1 and the latest ffmpeg built from source (I used this commit) and see the same behaviour as below.
The command I'm running is:
ffmpeg -rtsp_transport udp -i 'rtsp://<user>:<pw>#<ip>:554/StreamingSetting?version=1.0&action=getRTSPStream&ChannelID=1&ChannelName=Channel1' -r 10 -c:v h264 -crf 23 -x264-params keyint=60:min-keyint=60 -an -f ssegment -segment_time 60 -strftime 1 /output/%Y%m%d_%H%M%S.ts -abort_on empty_output
I get a variety of errors at a fairly steady rate of at least one per second. Here's a sample:
[rtsp # 0x7f268c5e9220] max delay reached. need to consume packet
[rtsp # 0x7f268c5e9220] RTP: missed 40 packets
[h264 # 0x55b1e115d400] left block unavailable for requested intra mode
[h264 # 0x55b1e115d400] error while decoding MB 0 12, bytestream 114567
[h264 # 0x55b1e115d400] concealing 3889 DC, 3889 AC, 3889 MV errors in I frame
The most common one is 'error while decoding MB x x, bytestream x'. This corresponds to severe corruption in the video file when played back.
I see many references to that error message on stackoverflow and elsewhere, but I've yet to find a satisfying explanation or workaround. It comes from this line which appears to correspond to missing data at the end of the stream. 'left block unavailable' comes from here and also looks like missing data.
Others have suggested using -rtsp_transport tcp instead (1, 2, 3) which in my case just gives a slightly different mix of errors, and still video corruption:
[h264 # 0x557923191b00] left block unavailable for requested intra4x4 mode -1
[h264 # 0x557923191b00] error while decoding MB 0 28, bytestream 31068
[h264 # 0x557923191b00] concealing 2609 DC, 2609 AC, 2609 MV errors in I frame
[rtsp # 0x7f88e817b220] CSeq 5 expected, 0 received.
Using Wireshark I confirmed that in both UDP and TCP mode, all of the packets are making it from the camera to the PC (sequential RTP sequence numbers without any missing) which makes me think the data is being lost after it arrives at ffmpeg.
I also see similar behaviour when running the same command against a Panasonic WV-SFV110 camera, but with less frequent errors overall. Switching from UDP to TCP on the Panasonic camera reduces but does not completely eliminate the errors/corruption.
I also tried a similar command with VLC and got similar errors (cvlc rtsp://<user>:<pw>#<ip>/MediaInput/h264 :sout='#transcode{vcodec=h264}:std{access=file, mux=ts, dst="output.ts"}) -- presumably the code hasn't diverged much since libav forked from ffmpeg.
The camera is plugged directly into a PoE port on the PC so network congestion can't be a problem. Given that the PC has enough CPU to keep up encoding the live stream, it seems to me a problem with ffmpeg that it still drops data from the TCP stream.
Qualitatively, there are several factors which seem to make the problem worse:
Higher video resolution
Higher system load on the machine running ffmpeg (e.g. transcoding to a low res .avi file produces fewer errors than transcoding to h264 VBR; using -codec:copy eliminates all errors except a couple while ffmpeg is starting up)
Greater motion within the camera view
What the does the error mean? And what can I do about it?
Looking at the initial error message:
[rtsp # 0x7f268c5e9220] max delay reached. need to consume packet
[rtsp # 0x7f268c5e9220] RTP: missed 40 packets
I guess that you are loosing UDP packets. The rest of the H.264 error messages are caused by receiving an incomplete bitstream.
Now key is to isolate the issue. Is your network dropping packets? Or is your sever too slow or overloaded receiving the UDP (RTP).
First I'd check the UDP buffer size of your OS. https://access.redhat.com/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
If increasing the UDP buffer size doesn't help - use ffmpeg with -codec:copy to lower the CPU load. Do you still get errors?
Since you want to reencode consider using Intel Quicksync -vcodec h264_qsv or some other hardware encoder lowering your CPU load.
The question is not so much about if the PC has enough CPU. But more about identifying the bottle neck in the processing pipeline. Your H.264 encoder (x264) may over subscribe your CPU so that you get momentary peak loads that result in packet drops. Try limiting the number of threads for x264 and/or lower the quality to 'fast' or 'faster'.
It does sound like packet loss is an issue. Higher video resolution and greater motion both increase the bitrate of the encoded video stream which will increase your packet loss. Depending on which packet is lost, you will see varying errors in the decoding process as you indicated in your post.
The higher system load running ffmpeg also indicates that your network card might be dropping packets, when e.g. ffmpeg takes too long to read them while it is busy transcoding the video.
First question is what is your network topology? Streaming over the public Internet is a lot harder than streaming over your LAN. What kind of switches/routers are in the network?
Next question, what bitrate is your camera streaming at? Try reducing this and check the results. Be systematic in your approach i.e.
don't transcode at first.
just receive the video.
write it to file.
Check for packet loss/video artifacts.
start at lower bitrates e.g. 100kbps and increase this if no loss is evident
The next thing I would try to do is to increase the size of the receiver buffers. While I am not that familiar with ffmpeg, it looks like you can set it via recv_buffer_size as indicated here. You then need to work out a reasonably big enough size based on your camera configuration to store e.g. a couple (5?) of seconds of video data. Check if there are less artifacts as you increase the receiver buffer size or longer periods without artifacts.
Of course if your processor is too slow to transcode the video in real-time, you will run out of space sooner or later, in which case, you might have to transcode to a lower resolution/bitrate or use less intensive encoder settings, etc or run the transcoding on a faster machine.
Also, note that adjusting receiver buffer size will not compensate for packet loss occurring on the public Internet so the above will help assuming you're streaming on a local network that supports the bitrate of the camera. If you exceed the bandwidth of the network you can expect packet loss. In that case streaming over TCP could help somewhat (at least until the receiver buffer overruns eventually).
More things you can try if the above does not help or solve the problem completely:
Sniff the incoming traffic with wireshark or tcpdump.
Have a look at the traces. Filter the trace using "RTSP".
You should be able to see the RTP traffic where consecutive RTP packets have increasing sequence numbers e.g. 20, 21, 22, 23, etc. If you see missing sequence numbers, then you've got packet loss and try streaming over TCP. Repeat the trace while streaming over TCP. Also, remember to increase the receiver buffer size also when streaming over TCP.
In summary you have a pipeline architecture and you need to determine where in the pipeline the loss is occurring:
camera -> network -> receiver buffer (OS) -> application (ffmpeg)
Related
Context:
I am trying to replay a BLF file using python-can over a vector interface with an implementation of MessageSync iterator object and can.send operating on the yielded messages. While it functions for 20-30 seconds as expected but after that it keeps raising ERR_QUEUE_FULL exception while sending CAN messages. Have tried to handle that using can_bus.flush_tx_buffer() and can_bus.reset() but to no effect. I understand that the transmit buffer gets full while the messages are written too fast at a given segment causing buffer overflow.
Usage:
replayReaderObj = LogReader(replay_file_path)
msgSyncObj = MessageSync(messages=replayReaderObj, timestamps=True)
I am iterating via msgSyncObj using a for loop and using can.send() on messages (provided message is not an error frame). Default args of gap(0.0001) and skip(60) are considered in which case replay timestamps are considerably delayed compared to the replay file. Hence gap as 0 is included in next attempt to ensure only offset difference is considered. It aligns the replay timestamps but causes buffer overflow in few seconds.
The same replay file while run over a Vector CANoe replay blocks runs just fine without any buffer issues in given replay duration(+10%).
Question:
Can anyone shed light on whether python-can and Vector CANoe (both running on Win10 PC) has different way of configuring transmit queue buffer? Any suggestions on how I can increase the transmit queue buffer used by python-can is highly appreciated along with handling such buffer overflows(since flush_tx_buffer isn't having any impact).
Note: In Vector Hardware Configuration, transmit queue size is configured as 256 messages. I am not sure if python-can uses the same configuration before I want to change it.
Additional context
OS and version: Win 10
Python version: Python 3
python-can version: 3.3.4
python-can interface/s (if applicable): Vector VN1630
There is another real ECU for acknowledgement of Tx messages. This runs fine if I keep a decent wait time(10 ms - minimum that time.sleep() in Python Windows can provide) between consecutive messages. Drawback is that with the wait time injection, it takes 6x-7x times the actual replay time.
Let me know for any further information on top of this. Sorry, I will not be able to share the trace file as it is proprietary, but any details regarding it's nature I can get back on it.
Probably one of the most cliche question but here is the problem: So I have a Ubuntu Server running as an isolated machine only to handle FFMPEG jobs. It has 4 vCPU, 4GB RAM, 80GB storage. I am currently using this script to convert a video into HLS playlist: https://gist.github.com/maitrungduc1410/9c640c61a7871390843af00ae1d8758e This works fine for all video including 4K recorded from iPhone. However, I am trying to add watermark so I changed the line 106 of this script
from:
cmd+=" ${static_params} -vf scale=w=${widthParam}:h=${heightParam}"
to:
cmd+=" ${static_params} -filter_complex [1]colorchannelmixer=aa=0.5,scale=iw*0.1:-1[wm];[0][wm]overlay=W-w-5:H-h-5[out];[out]scale=w=${widthParam}:h=${heightParam}[final] -map [final]"
Now this works flawlessly in videos from Youtube or other sources but as soon as I am trying to use 4K videos from iPhone, the RAM usage grows from 250MB to 3.8GB in less than minute and crashes the entire process. So I looked out for some similar question:
FFmpeg Concat Filter High Memory Usage
https://github.com/jitsi/jibri/issues/269
https://superuser.com/questions/1509906/reduce-ffmpeg-memory-usage-when-joining-videos
ffmpeg amerge Error while filtering: Cannot allocate memory
I understand that FFMPEG requires high amount of memory consumption but I am unsure what's the exact way to process video without holding the stream in the memory but instead release any memory allocation in real-time. Even if we decide to work without watermark, It still hangs around 1.8GB RAM for processing 5 seconds 4K video and this create a risk of what if our user upload rather longer video than it will eventually crash down the server. I have thought about ulimit but this does seem like restricting FFMPEG instead of writing an improved command. Let me know how I can tackle this problem. Thanks
Okay, I found a solution. The problem is that the 4K video has extremely higher bitrate and it will load on your RAM to process the filter_complex which will eventually kill your process. To tackle this problem first thing I did was to transcode the input video to H264 format (you can put custom bitrate if you want to but I left that one out).
So I added this new command after line 58 of this script https://gist.github.com/maitrungduc1410/9c640c61a7871390843af00ae1d8758e
ffmpeg -i SOURCE.MOV -c:a aac -ar 48000 -c:v libx264 -profile:v main -crf 19 -preset ultrafast /home/myusername/myfolder/out.mp4
Now that we have a new processed out.mp4. We will go down the script line 121 and remove it. The reason for doing this is to stop FFMPEG from overloading all the command at once. Now we will remove line 107 to 109 and do this:
filters=[1]colorchannelmixer=aa=0.5,scale=iw*0.1:-1[wm];[0][wm]overlay=W-w-5:H-h-5[out];[out]scale=w=${widthParam}:h=${heightParam}[final]
cmd=""
cmd+=" ${static_params} -filter_complex ${filters} -map [final]"
cmd+=" -b:v ${bitrate} -maxrate ${maxrate%.*}k -bufsize ${bufsize%.*}k -b:a ${audiorate}"
cmd+=" -hls_segment_filename ${target}/${name}_%03d.ts ${target}/${name}.m3u8"
ffmpeg ${misc_params} -i /home/myusername/myfolder/out.mp4 -i mylogo.png ${cmd}
So now we are running FFMPEG inside a loop to handle per resolution basis output. This will eliminate the overloading of all filters in memory at once. You might even wanna remove line 53 depending on your use case.
Test
4K HEVC iPhone video of 1.2 minute long (453MB)
transcoding to H264 - Memory Usage stayed at 750MB
HLS + watermark - Memory Usage stayed between 430MB to 1.1GB
4K HEVC LG HDR video of 1.13 minute long (448MB)
transcoding to H264 - Memory Usage stayed at 800MB
HLS + watermark - Memory Usage stayed between 380MB to 850MB
My final thoughts
FFMPEG is a power eater. The total number of core/memory requirement will depend majorly on how much video you want to process. In my case, We only wanted to support upto 500MB videos so our test for 4K video processing is fitting the need but if you have larger video requirement then you have to test with more RAM/CPU core at hand
It's never a good idea to run parallel FFMPEG. Processing videos in a batch-wise will ensure optimum use of available resources and lesser chance for breaking your system in the middle of the night
Always run FFMPEG in an isolated machine away from your webserver, database, mail server, etc
Increasing resource is not always the answer. While we tend to first conclude that more resource === more stability is not always right. I've read enough thread about how even 64GB Ram with 32 core fails to keep up with FFMPEG so your best bet is to first improve your command or segregate commands into smaller command to handle the resources as effectively as possible
I'm not an expert in FFMPEG but I think this information will help someone who might have similar question.
any pointers to detect through a script on linux that an mp3 radio stream is breaking up, i am having issues with my radio station when the internet connection slows down and causes the stream on the client side to stop, buffer and then play.
There are a few ways to do this.
Method 1: Assume constant bitrate
If you know that you will have a constant bitrate, you can measure that bitrate over time on the server and determine when it slows below a threshold. Note that this isn't the most accurate method, and won't always work. Not all streams use a constant bitrate. But, this method is as easy as counting bytes received over the wire.
Method 2: Playback on server
You can run a headless player on the server (via cvlc or similar) and track when it has buffer underruns. This will work at any bitrate and will give you a decent idea of what's happening on the clients. This sort of player setup also enables utility functions like silence detection. The downside is that it takes a little bit of CPU to decode, and a bit more effort to automate.
Method 3 (preferred): Log output buffer on source
Your source encoder will have a buffer on its output, data waiting to be sent to the server. When this buffer grows over a particular threshold, log it. This means that output over the network stalled for whatever reason. This method gets the appropriate data right from the source, and ensures you don't have to worry about clock synchronization issues that can occur over time in your monitoring of audio streams. (44.1 kHz to your encoder might be 44.101 kHz to a player.) This method might require modifying your source client.
I'm using a Raspberry Pi 2 to route wifi-eth connections. So from the eth side I have a computer that will connect to internet using the Pi wifi connection. On the Raspberry I started htop to monitor the CPUs load, then on the computer I started chrome and played a 20-minute 1080 video. The load on the CPU didn't seem to go beyond 5% anyhow. After that I closed youtube tab and started a download of a binary file of 5GB from the first row here (https://testdebit.info/). Well, I noticed that CPU load was much more higher, around 10%!
Any explanation of such a difference?
It has to do with compression and how video is encoded. A normal file can be compressed, but nothing like that of a video stream.
A video stream can achieve very high compressions due to the predictable characteristics of video, e.g. video from one frame to another doesn't change much. As such, video will send a whole frame (I-frame) and then update it with just the changes (P-frame). It's even possible to do backward prediction (B-frame). Here's a wikipedia reference.
Yes, I hear your next unspoken question: Doesn't more compression mean more CPU time to uncompress? That's true for a lot of types of compression, such as that used by zip files. But since raw video is not very information dense over time, you have compression techniques that in essence reduce the amount of data you send with very little CPU usage.
I hope this helps.
I want to store large number of packets in a pcap file (say around 200000) and then send it using tcpreplay . The problem is the loop option in tcpreplay sends at a very low speed .
Now I am capturing packets using wireshark but wireshark does not respond after sending a lot of packets . How can i increase the length of the pcap file by multiplying the number of packets already stored in it ? How can I achieve good throughput using tcpreplay?
If you'd like to multiple a single pcap, consider the mergecap command, shipped with wireshark.
Regarding the packet pumping speed of tcpreplay take a look at its FAQs, and in particular consider the -T option to pick a timer mechanism that works well. I've found rdtsc to work very well. Also consider using a short trace that fits into memory, and iterating playback of that, to avoid disk I/O. For this, consider the -K option.