I'm working on a data mining project, and I need to download youtube datasets to work on. I want at least 300 entry to work on.
I've tried googling it but I couldn't find any keywords fields included in it.
Does anyone know where to download such datasets??
Related
I have this timeline from a newspaper produced by my Native American tribe. I was trying to use AWS Textract to produce some kind of table from this. AWS Textract does not recognize any tables in this. So I don't think that will work (perhaps more can happen there if I pay, but it doesn't say so).
Ultimately, I am trying to sift through all the archived newspapers and download all the timelines for all of our election cycles (both "general" and "special advisory") to find number of days between each item in timeline.
Since this is all in the public domain, I see no reason I can't paste a picture of the table here. I will include the download URL for the document as well.
Download URL: Download
I started off by using Foxit Reader on individual documents to find the timelines on Windows.
Then I used a tool 'ocrmypdf' on ubuntu to ensure all these documents are searchable (ocrmypdf --skip-text Notice_of_Special_Election_2023.pdf.pdf ./output/Notice_of_Special_Election_2023.pdf).
Then I just so happened to see an ad for AWS Textract this morning in my Google Newsfeed. Saw how powerful it is. But when I tried it, it didn't actually find these human-readable timelines.
I'm hopefully wondering if any ML tools or even other solutions exist for this type of problem.
I am namely trying to keep my tech knack up to par. I was sick the last two years and this is a fun problem to tackle that I think is pretty fringe.
I'm still getting into google spreadsheets, recently understood how to format a .txt to be able to use =ImportData properly thanks to Tanaike's assitance, now tackling a -slightly- more challenging task.
Goal:
Automatically extracting specific data from .pdf files hosted inside of a google drive folder and arranging the information into specific cells
Challenges:
Being able to decode the blobs of information, as just the raw data obtained with =ImportData is useless
Truly learning how to use google-apps-script for something useful (that's on my own)
Instructing a single extraction of information rather than constant online status as with =ImportData
[Second Priority] Stop Depending on an add-on (Drive Direct Links) to get the URL of the files
To my understanding, I'll need to do some parsing. I know .pdf is not always straight forward, all the files will come from the same place and have the exact same format, so understanding how to do it once should be enough.
I already know how to get the real/permanent link to the files automatically and how to arrange information segregated into cells using =Index, =Extract and others.
Hope I'm being clear enough. Thanks a lot in advance.
Best regards,
Lucas.-
I am curious to know what would be the most efficient way to walk the youtube website. My goal is to eventually index all videos on youtube (hypothetically) and the only way I can think of is to go channel by channel indexing all of the videos. I am not very familiar with the v3 APi, so if there is a better way to accomplish this, please let me know. This gives rise to a few problems I can think of:
Where to begin? Channels and videos are accessed using random string IDs, so if I simply start with IDs beginning with 'A' I am going to run into a lot of null values. Not sure how IDs are assigned, but this also may keep the indexing in a certain segment/section of video types if it is based on the ID alphanumerics.
I am hoping to move methodically through the youtube directory, trying to avoid accidently indexing the same channel/video.
Should I somehow seperate the videos into groups and request them based on other parameters? A grouped scheme may be easier to work with, update, etc.
I won't know if the video has anything I am interested in indexing before accessing it.
First you need to understand that there are way too many videos for you to do this without having access to the stack directly, which you do not have and will not get.
As to automate the selection of video's, you can try to use the video ID's.
They are 11 characters long, consisting of only "a-z A-Z _ and - " . So that would reduce (still is 54 to the power of 11) the indexing/scanning if a video exists. Then save that ID (with related info) and move on.
Not a perfect option, but best I can see with your options and requirements.
After struggling with getting Ytd to work for a couple of days I'm about to dive into Youtube Direct Lite which looks much friendlier to set up.
My first question is about the playlist size limit. Once a playlist is full (200 videos?) what would happen with further video submissions? Would the oldest be dropped or is it just impossible to add any more, effectively breaking the widget for that playlist?
I expect I would need to use multiple playlists and manually make new playlists and widgets if there's a lot of videos, but is there a best practice kinda way to do this for a large number of videoslso?
Also, would it be possible to automate the submission approvals programmatically if there's a lot of videos or is this beyond the scope of ytd-lite.
Thought it's better to ask these questions now before starting the process of setting this up for my site. Ytd-lite looks like a great project.
thanks.
from the Doc:
https://developers.google.com/youtube/2.0/developers_guide_protocol_playlists#Adding_a_playlist
Note: Playlists contain a maximum of 200 videos. As such, you will not be able to add a video to a playlist that already contains that many videos.
I dont'n try to force this situation but I expect an error.
I believe that to automate the submission approvals programmatically you can modify the source code of YouTube Direct Lite, and with a little logic in the server side of your app you can do what you want.
When uploading a file I know I can access its properties but is it always the same or it varies? I mean, I am writing an app for myself where I can upload songs or videos to my server to watch later, and I'd like to populate the info about said files automatically as much as possible so I was wondering if it's possible to get things like length, quality, name, artists, artwork, or pick a first image like youtube does for its videos?
I'm fairly new to ruby (using rails) so I am unsure as to where to find this or if it's even possible
You can do that using FFMpeg (read the license first).
FFMpeg gives you everything you were asking about and some more.
it's very powerful.
For mp3, check out mp3-info, I haven't used it before but looks promising...