How Youtube ContentID performs video fingerprinting to identify copied video - youtube

I am trying to build a system exactly same as youtube contentID which will generate fingerprints of the video and will search the fingerprint in the database. I want to know what fingerprinting algorithm or method is used by Youtube ContentID to generate the fingerprints and compare the fingerprints and how it performs fingerprint searching in database

I don't think the exact algorithm is known. You could use scene detection and chunking to create bounded-size chunks of video and audio. Then, you could use locality sensitive hashing techniques to index these chunks, so that similar chunks receive identical hashes. However, this is not straightforward and subject to active research.

Related

I have prepared a dataset which consists of a video links from YouTube and i have designated features to these videos

So i have prepared this dataset which consists of the first column video links which are links of YouTube videos. The other columns consists of features of these videos and have one-hot encoded them. These are column names such as angry, sports, entertainment, etc.
I want to create an ai tool which combines these videos to generate a video as per the input of keywords by the user. And these input keywords should textually match the features/columns of the dataset.
I want to achieve this without downloading the videos locally.
Can somebody provide an overview of how to achieve this and which models to use or any changes in the dataset.
Should I apply word-embedding on the features?
Provide an overview of how to achieve this?

Legality on data usage

So I'm working on a project for my University where our current plan is to use the YouTube API and do some data analysis. We have some ideas since we're looking at the Terms of Service and the Developer Policies, but we're not entirely sure about a few things.
Our project does not focus on things such as monetary gain or predicting estimated income from a video or anything of that nature, or anything regarding trying to determine user data such as passwords/usernames, etc. It's much more about the content and statistics of the videos rather than anything else.
Our current ideas that we want to be sure would be ok to do and use:
Determine the category of a video given its title
Determine the category of a video given its tags
Determine the category of a video given its description
Determine the category of a video given its thumbnail
Some combination of above to create an ensemble model
Clustering on videos category/view counts
Sentiment analysis on comments
Trending topics over time
These are just a vague list for now but I would love to be able to reach out more to figure out what all we're allowed to use the data for.

nvidia dali video decode from external_source buffer (instead of file)

This article explains how to do image decoding and preprocessing on server side with Dali while using triton-inference-server.
I am trying to find something similar for doing video decoding from h.264 encoded bytes array on server side, before the input "NTHWC" array is passed to any of the video recognition models like in mmaction2 or swin-transformer, using ensemble model.
All I can find is how to load video from files, but nothing on loading videos from external_source.
Also, as a workaround, I guess I can do the desired thing using python-backend by writing the encoded video bytes to a file, and preprocess the video, but that will not inherently support batch processing, and I will either have to handle the batch sequentially or by starting multiprocess pools for processing each batch. highly un-optimal I guess.
Any help is highly appreciated.

Machine Learning: What is the format of video inputs that I can pass for my machine learning algorithm to analyze input video?

There are certain machine learning algorithms in use that takes videos files as input. If I have to pull all the videos from youtube that are associated with a certain tag and provide them as input to this algorithm, what should be my input format?
There is no format in which you can pass a video to a machine learning algorithm, since it won't understand the contents of the video.
You need to preprocess the video first, which might depend on how you have to use it. In general you can do something like converting each frame of the video to CSV (same as preprocessing an image), which you can pass to your machine learning algorithm. If you want to process your frames sequentially, you may want to use a Recurrent Neural Network. Also if the video has some audio, then just find its audio time series, and combine each part of the time series with its corresponding video frame.

Can I use ffmpeg to create multi-bitrate (MBR) MPEG-4 videos?

I am currently in a webcam streaming server project that requires the function of dynamically adjusting the stream's bitrate according to the client's settings (screen sizes, processing power...) or the network bandwidth. The encoder is ffmpeg, since it's free and open sourced, and the codec is MPEG-4 part 2. We use live555 for the server part.
How can I encode MBR MPEG-4 videos using ffmpeg to achieve this?
The multi-bitrate video you are describing is called "Scalable Video Codec". See this wiki link for basic understanding.
Basically, in a scalable video codec, a base layer stream itself has completely decodable; however, additional information is represented in the form of (one or many) enhancement streams. There are couple of techniques to be able to do this including lower/higher resolution, framerate and change in Quantization. The following papers explains in details
of Scalable Video coding for MEPG4 and H.264 respectively. Here is another good paper that explains what you intend to do.
Unfortunately, this is broadly a research topic and till date no open source (ffmpeg and xvid) doesn't support such multi layer encoding. I guess even commercial encoders don't support this as well. This is significantly complex. Probably you can check out if Reference encoder for H.264 supports it.
The alternative (but CPU expensive) way could be transcode in real-time while transmitting the packets. In this case, you should start off with reasonably good quality to start with. If you are using FFMPEG as API, it should not be a problem. Generally multiple resolution could still be a messy but you can keep changing target encoding rate.

Resources