open media encoding standards - video-encoding

I have to choose methods for digitizing and encoding media for a project which collects media from a variety of sources.
What are the open media encoding standards I should consider?
What could be the factors which affect my decision on the selection of one?

There are a lot of things to consider, for example: platform, speed of encoding, what requirements your clients will meet (in terms of codecs), quality/size trade-off. Take a look at xvid. It's a free MPEG-4 video codec for PC.

Related

bitrate quality level for specific device

I'm looking mediaconvert service from aws to transcode videos. The value I'm trying to set just now is quality level (QL) for QVBR, according with this it could depends on the platform, for example for 720p/1080p resolution it proposes QL=8/9 (for TV), QL=7 (for tablet), QL=6 (for smartphone).
In fact, the app have a version for the 3 type of devices then I'm asking: I need to keep 3 versions for the same video? I want to save some money in streaming and my app has similar number of users using it in each platform, I want to save in bandwidth but providing good quality videos
Higher QVBR quality levels (QL) correspond to higher bitrates in the output.
For a large display such as a TV, a higher QVBR QL is recommended to help improve the viewer experience. But when viewing the same content on a smaller display such as a phone you may not need all of those extra bits to still have a good experience.
In general, it's recommended to create an output targeted for each of the various devices or resolutions content will be viewed on. This will help save bandwidth for the smaller devices while still delivering high quality for the larger ones.
This concept is referred to as Adaptive Bitrate (ABR) Streaming, and is a common feature of streaming formats such as HLS and DASH (among others). The MediaConvert documentation has a section on how to create ABR outputs as well: https://docs.aws.amazon.com/mediaconvert/latest/ug/video-abr-streaming-outputs.html

Analyze voice patterns IOS

I am looking for a way / library to analyze voice patterns. Say, there are 6 people in the room. I want to identify each one by voice.
Any hints are much appreciated.
Dmitry
The task of taking a long contiguous audio recording and splitting it up in chunks in which only one speaker is speaking - without any prior knowledge about the voice characteristics of each speaker - is called "Speaker diarization". You can find links to research code on the wikipedia page.
If you have prior recordings of each voice, and would rather do classification, this is a slightly different problem (Speaker recognition or Speaker identification). Software tools for that are available here (note that general purposes speech recognition packages like Sphinx or HTK are flexible enough to be coaxed into doing that).
Answered here https://dsp.stackexchange.com/questions/3119/library-to-differentiate-people-by-their-voice-timbre

Load Captures in OpenCV

Which video formats can we use in OpenCV? Can anything in addition to AVI and load from camera be used?
If these are the only supported formats, is a video converter required to use other video formats.
I'm not sure how up-to-date it is, but this OpenCV wiki page gives a good overview of what codecs are supported. If looks like AVI is the only format with decent cross-platform support. Your options are either to do the conversion using an external converter (like you suggest) or write code that uses a video library to load the image and create the appropriate cv::Mat or IplImage * header for the data.
Unless you're processing huge quantities of video I suggest taking the path of least resistance and just converting the videos to AVI (see the above link for the details of what OpenCV supports). Just be careful to avoid lossy compression: it will wreck havoc with a lot of image processing algoritms.
OpenCV "farms out" video encoding and decoding to other libraries (e.g., ffmpeg and VFW). Also, have a look at the highgui source directory to see all of the VideoCapture wrappers available (specifically pay attention to the cap_* implementations). AVI is merely a container, and really isn't that critical to what video codecs that OpenCV can read. AVI can contain several different combinations of video, audio, and even subtitle streams. See my other answer about this. Here is also a quick article explaining the differences between containers and codecs.
So, if you're on Linux make sure ffmpeg supports decoding the video codec you are interested in processing. You can check what codecs your version of ffmpeg supports with the following command:
ffmpeg -formats
On Windows, you'll want to make sure you have plenty of codecs available to decode various types of video like the K-Lite Codec Pack.

Can I use ffmpeg to create multi-bitrate (MBR) MPEG-4 videos?

I am currently in a webcam streaming server project that requires the function of dynamically adjusting the stream's bitrate according to the client's settings (screen sizes, processing power...) or the network bandwidth. The encoder is ffmpeg, since it's free and open sourced, and the codec is MPEG-4 part 2. We use live555 for the server part.
How can I encode MBR MPEG-4 videos using ffmpeg to achieve this?
The multi-bitrate video you are describing is called "Scalable Video Codec". See this wiki link for basic understanding.
Basically, in a scalable video codec, a base layer stream itself has completely decodable; however, additional information is represented in the form of (one or many) enhancement streams. There are couple of techniques to be able to do this including lower/higher resolution, framerate and change in Quantization. The following papers explains in details
of Scalable Video coding for MEPG4 and H.264 respectively. Here is another good paper that explains what you intend to do.
Unfortunately, this is broadly a research topic and till date no open source (ffmpeg and xvid) doesn't support such multi layer encoding. I guess even commercial encoders don't support this as well. This is significantly complex. Probably you can check out if Reference encoder for H.264 supports it.
The alternative (but CPU expensive) way could be transcode in real-time while transmitting the packets. In this case, you should start off with reasonably good quality to start with. If you are using FFMPEG as API, it should not be a problem. Generally multiple resolution could still be a messy but you can keep changing target encoding rate.

Writing an image processing application for analysis of satellite imagery

I have to start work on application for analysis of satellite imagery to identify some man made structure. I would like to use C or Java for this.
For satellite I am planning to use Google Maps data.
I have three questions here:
What is best source for GIS data besides Google Maps/earth.
Best language to write such an application considering i will have to use third-party APIs
Is there a open image processing engine available which identifies man made structures?
Thats a lot of questions but I hope the smarter guys here can help me here.
Overly processed imagery such as Google or Bing maps is a horrible source of imagery for performing feature extraction or feature recognition. Usually, you want the most unprocessed, raw form possible with camera models... of course, if you don't have access to this sort of data, then you have to work with what you have.
A more important consideration of Google Maps/Earth imagery is that you may run afoul of their License Agreement. I suggest you check it before you decide on their data as your imagery source. In particular, if you bypass their API's, you've violated their license agreement.
As far as libraries and langauges, there are dozens of machine vision libraries available. I can't recommend one over the other as I've only been a down-stream consumer of their results. My understanding of the problem is that the biggest concern is how you build the "models" to compare against... i.e. how do you give the system an "example" of what you're looking for.
Once you've found a library, then you can make a decision on the language. Generally, a high-level language like Python or Matlab is used for this kind of prototyping. Once a method has been found, then conversion to a "higher performance" language is done--if necessary.
Personally, I'd probably use Python because (1) it's freely available, (2) has a significant community in the scientific and research worlds, and (3) can interop with a wide variety of languages and platforms.
Specifically, check out Glovis: http://glovis.usgs.gov/
You can browse the earth, and download maps from several different satellites and sensors. Even though you have to go through a bogus "ordering" process, the imagery is free.
You may find the USGS (United States Geological Survey) website helpful. They provide both GIS information and a wide range of data sets.
I agree with James Schek. Google gives you RGB images - not the most helpful fot your task. Most imagery will have a couple of additional channels that may be better suited for you. Different channels show different features, water, urban areas, types of foliage etc. For example an infra-red channel could be used to pick out buildings in a cool climate. If you contact several data provider they may be able to recommend the best channels to use in their data.
Ariel imagery can be huge, numerous terrabytes for a detailed world database. Carefully consider how much information you need to process. If you are only doing a few square miles performance is not an issue. If you are processing thousands of square miles, performance becomes an issue. Processing millions, performance is mission critical and must be considered from day one.
Knowing the number of channels you need to process, your performance requirements and the file format of your data, look around for libraries that fulfil all your requirements. Many of them are written in C/C++ so using a language that interops with them both could be helpful
Take a look at this demo:
Finding Vegetation in a Multispectral Image
, part of the Image Processing Toolbox in MATLAB. It is related to your problem of analysing satellite images to find specific patterns.
I believe it's an excellent example of the sort of things you can achieve easily with MATLAB using very little code.

Resources