Index Youtube videos for Google Search Appliance - youtube-api

We're successfully using the Youtube API to create a metadata-and-url xml feed that the GSA requires and pushing it to our Google Search Appliance according to the documentation
The question that we have is that we know you need to put a start url in the Content Sources > Web Crawl > Start and Block URLs page in the Admin Console. If we put in https://www.youtube.com as a start url and a follow pattern of https://www.youtube.com/watch?v=* (which all looks like all youtube videos follow) will the GSA only index whats coming from the feed or will it go out to youtube.com and index a bunch of content that isn't part of our channel? I don't see anywhere you can specify a channel for a video.
FYI, we are aware of FishBowlSolutions connector for YouTube but trying to avoid spinning up another server with TomCat just to index our YouTube videos.

You should not add the youtube-url to your Start URLs, only to your Follow Patterns. That way, the crawler will not crawl Youtube from top to bottom, but the URLs you provide in the feed will be crawled. However, if GSA finds URLs on the crawled pages, it will obviously also crawl those.
An option is to tighten the Follow Patterns. And of course you can develop a Youtube connector on Googles Adaptor Framework, which is not that hard for Java-developers!

Google CSE Search
YouTube User Panel
I haven't used GSA(I'm getting ramped up on it though, which is how I found your post), but the way I've accomplished this using Google's CSE is to index the channel, user or playlist specifically, vs. youtube in general, i.e.:
youtube dot com/user/alltrapmusic
or: youtube dot com/channel/UC_ahy2GUec7EmbWF3LGxLhQ
or: youtube dot com/playlist?list=PLsHnWFR4n5jBFYdsclaKtdWQtf2Iu8bKZ
So, in CSE, I can configure to search only that user, channel and playlist and return only results found on those three (Google CSE Search link).
I can only assume GSA works the same(as I mentioned, I have no experience with GSA); if not, my apologies.
~chipleh
p.s. - in order to find your youtube channel, go to the user link(YouTube User Panel link); there you'll find home, videos, playlists, channels, etc. Hope that helps.

For anyone else looking to use the Youtube api and push their videos to the GSA, we found that there needs to be a few changes to the feed.
The feedtype needs to be full in the xml.This tells the GSA that everything it needs to know about the content is in the xml and it doesn't need to go out and index a url.
You need to have a <content> node in the xml. We used the description coming from Youtube api as the value. This is what is displayed to the user in the search results
url attribute on the record needs to be a value that can be added to the Start and Block URL and Follow patterns in the GSA settings and it needs to be unique. These actually don't need to exist but the GSA will use this value in the xml to determine if it should be included in the index. We used a fake url and the value from Youtube video ID appended to make it unique
displayurl attribute will be the url that will be displayed in the results so it would have the actual youtube url.
Start and Block URLs should contain the general url attribute value. For us, it was the fake directory http://www.yourdomain.com/video/youtube/
Follow Pattern should contain the pattern to follow that also matches the Start URL. Since we only have videos in that directory, we're able to put the same value as the Start URL. If you are pointing to a real directory and have other content in that that you don't want to index, you may need to add whatever pattern is common to your videos.
A sample record is below. Once we updated our feed, added the Start and Block URLs, our videos appear in our search results.
<gsafeed>
<header>
<datasource>youtube</datasource>
<feedtype>full</feedtype>
</header>
<group action="add">
<record url="http://www.yourdomain.com/video/youtube/?VIDEOID" displayurl="https://www.youtube.com/watch?v=VIDEOID" mimetype="text/html">
<content><![CDATA[DESCRIPTION]]></content>
<metadata>
<meta name="Title" content="TITLE OF VIDEO"></meta>
<meta name="Published" content="2016-08-15T22:00:38.000Z"></meta>
<meta name="PhotoURL" content="https://i.ytimg.com/.."></meta>
</metadata>
</record>
</group>
</gsafeed>

Related

Adding multiple youtube playlists to site just like youtube has em

this is desired output for site
this is how youtube displays multiple playlist...how can I add this just like they have it to my site AND to linkedin profile?
embedding the share html link just brings in playlist but it does not display properly....I need it to appear exactly like it does in attached pic???
Linking to your LinkedIn profile, I have not tried. So perhaps someone else can tell you.
As to having a section like the one you showed to be on your own site is doable.
Firstly you will need to collect the playlist ID's that you are going to use.
Then you will need to get the video ID's from the video's in each Playlist.
This can be done with the API.
As to the displaying it like on youtube, you will need to do that yourself with one or more combination of table's, div's and CSS. Then With the information you grabbed in the APi, populate it.
Best to do it in a loop so as to be done all at the same time. Then put it to the page where you require it to be.

YouTube API Search by Tags

I'm trying to add videos to an existing ASP.NET MVC site, and I'd like to show videos from our YouTube channel.
I have added a tag to each video to indicate what page it should appear on. I had thought that I could search our channel by tag on each page to render the relevant video on that page.
I'm trying to exclusively use the API v3, but it seems I can't do this.
I can't use developer tags, because videos are uploaded by multiple users using the standard YouTube front end. This seems like basic functionality, so I'm assuming it's my inexperience with this API.
As an example, our YouTube channel is ChillinWithCharlie. During development, one video is tagged 20141213Cheneys.
I can get all videos in our channel, but is there a way to query the v3 API to retrieve just this video?
I've seen one suggestion here that I retrieve all videos, and filter in code. This feels inefficient, so I'd rather not do this, but I can't even see where the tag is returned with all channel videos, that I could interrogate in code.
It's not just you. There seems to be no specific query parameter to search by tag with API v3.
I would recommend doing a search with your tag in the 'q' (search) parameter, then checking the results to see if the tag exists in the returned snippet->tags property to verify the exact video.
Note YouTube tags are only visible to the video's uploader.
https://developers.google.com/youtube/v3/docs/videos#snippet.tags[]

How to get links in comments from the YouTube api following the move to Google+ comments

In the new YouTube Google + comments system how can I retrieve comments that contain links.
For example if someone posts a link to another youtube video as follows:
http://www.youtube.com/watch?v=AZNHuFjnmUo
This gets converted to a link by the google plus system. The title of the video is shown as the text rather that the url. i.e. The html shown within the comments is this:
Francis HATES Google+
However the api for that comment only returns the title of the video which is pretty useless seeing as I want to get the link too. I am guessing that the system converts the url into an <a> tag which is stored in the database but then the api strips out the html when its requested so it only returns the videos title.
I have posted a defect here:
https://code.google.com/p/gdata-issues/issues/detail?id=5500
But that bug list seems to have very little activity going on in terms of responses to issues.
So is there another way to get the data I need?
What you can do while this bug remains is to extract the comment id and use it in the Google+ API with an activities.get request. This will return the full post with all links.
A bit cumbersome since it needs one request for each comment you want to check, but it seems to be the only way while the bug remains.
To take an example from the video you linked in the issue:
This YouTube comment returned by the API includes a YouTube link:
http://gdata.youtube.com/feeds/api/videos/rgkDKeSc-1o
/comments/z12hvvcgxznkufyo304ci1iqlnandzxjpes
You can use the z... ID in a request to the Google+ API:
https://developers.google.com/apis-explorer/#p/plus/v1/plus.activities.get?activityId=z12hvvcgxznkufyo304ci1iqlnandzxjpes
Which includes the full post including links.

Patterns between YouTube m. and normal site urls

My site is not able to show uploaded youtube videos when the url is a mobile (m.) site, but it works for the normal youtube site. It seems to me that the mobile and normal urls differ in a pattern, as shown below:
http://www.youtube.com/watch?v=5ILbPFSc4_4
http://m.youtube.com/#/watch?v=5ILbPFSc4_4&desktop_uri=%2Fwatch%3Fv%3D5ILbPFSc4_4
obviously, the m. is added, as is the /#, and all the &desktop_uri... stuff.
and again:
http://www.youtube.com/watch?v=8To-6VIJZRE
http://m.youtube.com/#/watch?v=9To-6VIJZRE&desktop_uri=%2Fwatch%3Fv%3D8To-6VIJZRE
What we hope to do is check to see if the url is mobile site, and if it is, parse it so it shows as the normal site.
Does any one know if all youtube urls work this way--if this similar pattern works for all the same videos on mobile and normal sites?
In general, any time you attempt to parse URLs for sites (as opposed to web APIs) by hand, you're leaving yourself open to breakage. There's no "contract" in place that states that a common format will always be used for watch page URLs on the mobile site, or on the desktop site.
The oEmbed service is what you should use whenever you want to take a YouTube watch page URL as input and get information about the underlying video resource as output in a programmatic fashion. That being said, the oEmbed response doesn't include a canonical link to the desktop YouTube watch page, so it's not going to give you exactly what you want in this case. For many use cases, such as when you want to get the embed code for a video given its watch page URL, it's the right choice.
If you do code something by hand, please ensure that your code is deployed somewhere where it would be easy to update if the format of the watch pages ever do change.

Retrieving a YouTube Disco playlist

YouTube has this cool thing that creates a "smart playlist" from some starting keywords. I would like to programmatically access the playlist. I've found the YouTube data API but it doesn't discuss the "disco" feature.
One of the answers below suggests using
http://www.youtube.com/disco?action_search=1&query=XXXXXXXXX
that will return some JSON with the first video to be played, and a list property. Unfortunately, the list is a 32-character hex string, whereas normal playlist ID's are 16-character hex strings. This means that the standard data API to retrieve the full playlist doesn't work.
Any suggestions?
First, I must say I I never used youtube data API, so I don't know how useful below information will be.
Let me use an example:
I wanted to create Smashing Pumpkins. I typed the artist name and clicked the "Disco!" button. Using Fiddler2 I figured out the requested url was:
www.youtube.com/disco?action_search=1&query=smashing%20pumpkins
Notice, that spaces are replaced with %20. As a response i got simple JSON response:
{"url": "\/watch?v=bhMz7x1ZaGM\u0026feature=disco\u0026playnext=1\u0026list=MLGxdCwVVULXe5-F4X_zm6wnblRsnXoPJS"}
It was a link to the first song of freshly generated Smashing Pumpkins playlist, which address was list=MLGxdCwVVULXe5-F4X_zm6wnblRsnXoPJS. All you have to do know is replacing \u0026 with & and you get a valid link.
I assume the rest magic you want to do is available via official youtube data API.
I hope my research will help you.
EDIT
Well, it looks like this playlist generated by youtube/disco is not the same type of playlist that users can generate and that is available via API. The list ID is longer than usuall and when you click "More info about playlist" you are redirected to artists profile. Based on this two facts, i guess it's impossible to retrive generated lists via API. Sorry.
#Randomblue, how exactly do you want to retrieve this playlist?
You can use the 32 char hex in this url to get a page detailing the playlist
https://www.youtube.com/playlist?list={HEX}
or in an embed playlist iframe, like this:
<iframe width="853" height="480" src="//www.youtube.com/embed/videoseries?list={HEX}" frameborder="0" allowfullscreen></iframe>

Resources