How to create multimodal data set? - machine-learning

I want to make music recommendation system based on multimodal dataset. Ive been looking for publicly available dataset but I couldn't find it anywhere . I want to know where will I get multimodal dataset for music or book ?
I watched many YOUTUBE videos but didn't find any video where they are teaching how to create multimodal dataset.

Related

I have prepared a dataset which consists of a video links from YouTube and i have designated features to these videos

So i have prepared this dataset which consists of the first column video links which are links of YouTube videos. The other columns consists of features of these videos and have one-hot encoded them. These are column names such as angry, sports, entertainment, etc.
I want to create an ai tool which combines these videos to generate a video as per the input of keywords by the user. And these input keywords should textually match the features/columns of the dataset.
I want to achieve this without downloading the videos locally.
Can somebody provide an overview of how to achieve this and which models to use or any changes in the dataset.
Should I apply word-embedding on the features?
Provide an overview of how to achieve this?

Need dataset references for multimodal personality trait analysis

I've started a project titled "Multi-Modal (Image, Audio, Text) Analysis of Personality Traits", I need refernces to dataset that has image , audio and speech annotation of video which are also labelled w.r.t "Big Five Personality Traits", Thank you in advance!
I tried searching many research papers, most of them didn't have their code open or their data didn't meet my requirement and I tried to access chalearn first impression dataset but couldn't get access to it.

Legality on data usage

So I'm working on a project for my University where our current plan is to use the YouTube API and do some data analysis. We have some ideas since we're looking at the Terms of Service and the Developer Policies, but we're not entirely sure about a few things.
Our project does not focus on things such as monetary gain or predicting estimated income from a video or anything of that nature, or anything regarding trying to determine user data such as passwords/usernames, etc. It's much more about the content and statistics of the videos rather than anything else.
Our current ideas that we want to be sure would be ok to do and use:
Determine the category of a video given its title
Determine the category of a video given its tags
Determine the category of a video given its description
Determine the category of a video given its thumbnail
Some combination of above to create an ensemble model
Clustering on videos category/view counts
Sentiment analysis on comments
Trending topics over time
These are just a vague list for now but I would love to be able to reach out more to figure out what all we're allowed to use the data for.

OPENCV Best way to handle a game screenshot

I want to make an application for counting game statistics automatically. For that purpose, I need some sort of computer vision for handling screenshots of the game.
There are bunch of regions with different skills in always the same place that app needs to recognize. I assume that it should have a database of pictures or maybe some trained samples.
I've started to learn opencv lib, but not sure what will be better for this purpouse.
Would you please give me some hints or algorithms that I could use?
Here is the example of game screenshot.
You can covert it into gray scale and then use any haar cascade classifier to read the words in that image and then save it into any file format (csv) this way you can utilize your game pics for gathering data so that you can train your models

Transcript dataset for natural language processing

I've been searching on the web, and found media such as CNN and NPR provide links to access to their transcripts. To obtain them requires writing something like a crawler which is not so convenient. The reason is that I'm trying to use some transcripts of TV show, interview, radio, movie as training data in my natural language processing projects. So I'm wondering whether there's any collection or database freely available on the web so that I can download all of them at once without writing a crawler by myself?
I would recommend the British National Corpus. I would also mention the American National Corpus, but the transcripts there are only of phone calls or face to face conversations - no news, tv shows, etc.
You also mentioned CNN and NPR. There are transcripts from 1996 as an LDC corpus here.

Resources