Getting documentID from various google docs URLs - google-docs-api

We've been noticing that there are a bunch of potential URLs for google docs like /edit, /view, /mobilebasic, /pub, etc and not all of them follow the convention of https://docs.google.com/document/d/some_random_id/edit. One alternative we've seen is https://docs.google.com/document/d/e/some_random_id/pub. Is there some documentation around what are the different possible URLs? Is there some standard or suggested way to get fileID of the doc from the said URL?

Related

Is there a way to simply query the YouTube Data Api for information about a url?

It seems like pretty common problem/question on here to ask how to parse a YouTube video/user/channel/playlist url; with a lot of the answers being partial or outdated regex solutions (Meaning they don't support legacy urls or newer features like handles). Which made me wonder..
Can I simply just ask the v3 Youtube Data Api if a url is valid; then have it return any relevant information? (similar to doing a search w/the api, it'll tell you the type, id, etc.)
As far as I can tell from the reference it's not entirely possible.
The closest I could find was doing a search with the url as the query, but this unsurprisingly led to getting a list with somewhat unpredictable results.

Youtube API v3 category topics

Hey guys I'm not having an issue with the API directly, but with the documentation. I can make an API request and get the video topic details just fine, but I am looking for a master list of all topics.
According to the documentation the old style of topics that you can see here was depreciated in 2017 and replaced with Wikipedia articles. This is fine and a little better for my use case but I would like to get a list of all options. The documentation says
A list of Wikipedia URLs that provide a high-level description of the video's content.
which is not especially helpful haha. I think I found the Wikipedia Source for all music genres, and it looks like they are just using the "main" genres but I would like to confirm that. I also found this list of topics that looks similar but is from the natural language API documentation instead of the YouTube API documentation.
I could try and brute force it but that would require a considerable amount of effort with no real way to confirm my results. I also found this API but it just returns the top level categories.
I am also really only interested in the music categories.

All google slide url parameters available?

Where is the official documentation for the URL parameters/arguments?
There are things like /preview, /copy, /export,/htmlpresent #headings, ?something question mark, that I stumbled on tips and tricks articles on the web.
Where is the Official URL API that explains ALL of them?
I'm not sure if this is what you are looking for, but this references contains all information and guidelines related to Google Slides API:
https://developers.google.com/slides
https://developers.google.com/slides/how-tos/overview
https://developers.google.com/slides/reference/rest
You could also add more details on what you are trying to look for so I assist you better

How would I find all the short urls that link to a particular long url?

Basically I want to know how many people have tweeted a link to a url, but since there are dozens of link shortener out there I don't see any way to do this without having access to all of their url maps. I found a previous question here but it was over a year old and didn't have any new answers.
So #1, does anyone know of a service/API for doing this?
And #2, can anyone think of a way to accomplish this task other than submitting the long url in question to all the popular link shortening sites?
ps- I'm also open to comments about why this is impossible or impractical.
You could perform a Google search (or the equivalent via API) for any pages that link to your page. This is done with the link: keyword. So if you're trying to figure out how many people link to www.example.com (regardless of whether it's through a link shortner URL), then you would just do a Google search for link:www.example.com.
e.g.: http://www.google.com/search?q=link:www.example.com
Note that this will only find pages that have been indexed, so pages that haven't been crawled, or pages that get crawled infrequently, will not show up in the results until a later date (if at all).
Since all sites have different algorithms for shortening the URLs, and these are different sites that most likely do not share their data with each other, how can you hope to find all of them in a single or small number of queries?
All you can do is brute-force it, and even then this might not be any good if a site is content to create a new value for the same long-form URL (especially if you send a different long-form URL that maps to the same place, like http://www.stackoverflow.com/ rather than http://stackoverflow.com/).
In order to really get this to work, there would have to be a site that ALREADY automatically collects all of this information from every site, which the URL shortening sites voluntarily call. And even if you wrote such a site, that doesn't account for the URL-shortening sites already out there who already have data!
In short, I do not see how this is remotely possible, unless I'm wrong about there being such a database somewhere out there.
So months after asking this question I came across a solution to a similar question, that is how to tell how many times a link has been shared on facebook. The solution, via a simple new API call:
http://graph.facebook.com/http://stackoverflow.com
returns the following json data:
{
"id": "http://stackoverflow.com",
"shares": 1627
}

Google sees something that it shouldn't see. Why?

For some mysterious reason, Google has indexed both these adresses, that lead to the same page:
/something/some-text-1055.html
and
/index.php?pg=something&id=1055
(short notice - the site has had friendly urls since its launch, I have no idea how google found the "index.php?" url - there are "unfriendly" urls only in the content management system, which is password-restricted)
What can I do to solve the situation? (I have around 1000 pages that are double-indexed.) Somebody told me to use "disallow: index.php?" in the robots.txt file.
Right or wrong? Any other suggestions?
You'd be surprised as how pervasive and quick the google bots are at indexing site content. That, combined with lots of CMS systems creating unintended pages/links making it likely that at some point those links were exposed is the most likely culprit. It's also possible your administration area isn't as secure as you think, the google bot got through that way.
The well-behaved, and google recommended, things to do here are
If possible, create 301 redirects from you query string style URLs to your canonical style URLs. That's you saying "hey there, web bot/browser, the content that used to be at this URL is now at this other URL"
Block the query string content in your robots.txt. That's like asking the spiders or other automated programs "Hey, please don't look at this stuff. These aren't the URLs you're looking for"
Google apparently allows you to specify a canonical URL now via a <link /> tag in the top of your page. Consider adding these in.
As to whether doing the well behaved things is the the "right" thing to do re: Google rankings ... who knows. Only "Google" knows how their algorithms work now, and will work in the future, and by Google, I mean a bunch of engineers and executives with conflicting goals on how search should work.
Google now offers a way to specify a page's canonical URL. You can use the following code in your HTML to tell Google your canonical URL:
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />
You can read more about canonical URLs on Google on their blog post on the subject, here: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
According to the blog post, Ask.com, Microsoft Live Search and Yahoo! all support the canonical tag.
If you use sitemap generators to submit to search engines, you'll want to disallow in them as well. They are likely where Google got your links, from crawling your folder and from checking your logs.
Better check what URI has been requested ($_SERVER['REQUEST_URI']) and redirect if it was /index.php.
Changing robots.txt will not help, since the page is already indexed.
The best is to use a permanent redirect (301).
If you want to remove a page once indexed by Google the only way, more or less, is to make it return a 404 not found message.
Is it possible you're posting a form to a similar url and google is simply picking it up from the source?

Resources