S3 Static Website Nice URLs - url

We are working on a website with a static frontend, API Gateway + Lambda as the backend, DynamoDB as the DB. I saw there were a couple similar questions to this one, but I'm looking to understand the matter throughly to build a complete and robust solutions, since I hope to build several websites using this stack.
This one is a fairly basic website: we have an index.html page, a blog.html page and a portfolio.html page. We also have an html page for single portfolio entries (let's call it portfolio-entry.html) and a page for single blog articles (let's call it blog-post.html).
So I see there's a way to specify an index page and an error page, so you can have a nice clean url for your index. There are also rewrite rules, which are more like redirects.
I guess my best bet to deliver different blog posts would be to pass a query string to blog-post.html ("mywebsite.com/blog-post.html?post=post-alias") and have the .js ask the API different content depending on the query string.
Is there a way using S3 to route mywebsite.com/blog/post-alias/ to mywebsite.com/blog-post.html?post=post-alias and pass the response to the client without redirecting? I'm both interested in "client-side url rewriting" via JS to have nice URLs for humans AND server-side routing to catch crawler requests and have SEO/indexing of pages for specific posts, for example.
How should I go about this? Is there a way to achieve all this using what S3 and JS provide or do I have to put a proxy/router (like nginx) in front of S3 to handle route requests?
We are really committed to the whole S3-ApiGateway-Lambda-Dynamo architecture and we would really love to do without server.

Related

Can prerender.io help for situations like dynamic seo for a front end route with dynamic parameters?

The detailed problem can be found on this link - https://stackoverflow.com/questions/36931309/dynamic-seo-for-routes-in-angular2any-frontend-routing-framework?noredirect=1#comment61422672_36931309
My situation is - I have a front end route /category/:categoryId. This categoryId could be different and accordingly I fetch different data from server. This data contains the title that I should set for this page.
Now doing SEO for different categoryId in this case seems impossible from frontend as google bot won't wait for my server response while crawling.
Can prerender solve this particular situation and how? I have never used prerender.io. My backend is written in ruby on rails.
Yep! Prerender loads your pages in a browser just like a user would and then saves the resulting HTML to serve back to the crawlers. That way you can dynamically load content, or even dynamically tell us to return a 301/404 to the crawlers based on the content.

How to delete old Google Urls with parameters

a month ago i relaunched a Website in Typo3 CMS. Before that, the site was hosted with Joomla CMS.
In Joomla Config, SEO Links were disabled, so Google indexed the Page Urls this:
www.domain.de/index.php?com_component&itemid=123....
for example.
Now, a month later (after the Typo3 Relaunch), these Links are still visible in Google because the Urls don't return a 404-Error. That's because "index.php" also exists on Typo3 and Typo3 doesnt care about the additional query string/variables - it returns a 200 status code and shows the front page.
In Google Webmaster Tools it's possible to delete single Urls from the Google Index, but that way i have to delete about 10000 Urls manually...
My Question is: Is there a way to remove these old Urls from the Google Index?
Greetings
With this amount of URL's there is only one sensible solution, implement the proper 404 handling in your TYPO3, or even better redirections to same content placed in TYPO3.
You can use TYPO3's handler (search for it in Install Tool > All configuration) it's called pageNotFound_handling, you can use options like REDIRECT for redirecting to some page or even USER_FUNCTION, which allow you to use own PHP script, check the description in the Install Tool.
You can also write a simple condition in TypoScript and check if Joomla typical params exists in the URL - so that easy way you can return custom 404 page. If it's important to you to make more sophisticated condition (for an example, you want to redirect links which previously pointed to some gallery in Joomla, to new gallery in TYPO3) you can make usage of userFunc condition and that would be probably best option for SEO
If these urls contain an acceptable number of common indicators, you could redirect these links with a rule in your virtual host or .htaccess so that google will run into the correct error message.
I wrote a google chrome extension to remove urls in bulk in google webmaster tools. Check it out here: https://github.com/noitcudni/google-webmaster-tools-bulk-url-removal.
Basically, it's a glorified for loop. You put all the urls in a text file. For example,
http://your-domain/link-1
http://your-domain/link-2
Having installed the extension as described in the README, you'll find a new "choose a file" button.
Select the file you just created. The extension reads it in, loops thru all the urls and submits them for removal.

Avoid robots from going into a www.domain.com/thishash when link posted to twitter, facebook

I'm building a service where people gets notified (mails) when they follow a link with the format www.domain.com/this_is_a_hash. The people that use this server can share this link on different places like, twitter, tumblr, facebook and more...
The main problem I'm having is that as soon as the link is shared on any of this platforms a lot of request to the www.domain.com/this_is_a_hash are coming to my server. The problem with this is that each time one of this requests hits my server a notification is sent to the owner of the this_is_a_hash, and of course this is not what I want. I just want to get notifications when real people is going into this resource.
I found a very interesting article here that talks about the huge amount of request a server receives when posting to twitter...
So what I need is to avoid search engines to hit the "resource" url... the www.mydomain.com/this_is_a_hash
Any idea? I'm using rails 3.
Thanks!
If you don’t want these pages to be indexed by search engines, you could use a robots.txt to block these URLs.
User-agent: *
Disallow: /
(That would block all URLs for all user-agents. You may want to add a folder to block only those URLs inside of it. Or you could add the forbidden URLs dynamically as they get created, however, some bots might cache the robots.txt for some time so they might not recognize that a new URL should be blocked, too.)
It would, of course, only hold back those bots that are polite enough to follow the rules of your robots.txt.
If your users would copy&paste HTML, you could make use of the nofollow link relationship type:
cute cat
However, this would not be very effective, as even some of those search engines that support this link type still visit the pages.
Alternatively, you could require JavaScript to be able to click the link, but that’s not very elegant, of course.
But I assume they only copy&paste the plain URL, so this wouldn’t work anyway.
So the only chance you have is to decide if it’s a bot or a human after the link got clicked.
You could check for user-agents. You could analyze the behaviour on the page (e.g. how long it takes for the first click). Or, if it’s really important to you, you could force the users to enter a CAPTCHA to be able to see the page content at all. Of course you can never catch all bots with such methods.
You could use analytics on the pages, like Piwik. They try to differentiate users from bots, so that only users show up in the statistics. I’m sure most analytics tools provide an API that would allow sending out mails for each registered visit.

Hide website filenames in URL

I would like to hide the webpage name in the url and only display either the domain name or parts of it.
For example:
I have a website called "MyWebSite". The url is: localhost:8080/mywebsite/welcome.xhtml. I would like to display only the "localhost:8080/mywebsite/".
However if the page is at, for example, localhost:8080/mywebsite/restricted/restricted.xhtml then I would like to display localhost:8080/mywebsite/restricted/.
I believe this can be done in the web.xml file.
I believe that you want URL rewriting. Check out this link: http://en.wikipedia.org/wiki/Rewrite_engine - there are many approaches to URL rewriting, you need to decide what is appropriate for you. Some of the approaches do make use of the web.config file.
You can do this in several ways. The one I see most is to have a "front door" called a rewrite engine that parses the URL dynamically to internally redirect the request, without exposing details about how that might happen as you would see if you used simple query strings, etc. This allows the URL you specify to be digested into a request for a master page with specific content, instead of just looking up a physical page at that location to serve.
The StackExchange sites do this so that you can link to a question in a semi-permanent fashion (and thus can use search engines with crawlers that log these URLs) without them having to have a real page in the file system for every question that's ever been asked (we're up to 9,387,788 questions as of this one).

Ruby on Rails 3 search external website source based on top google result

I'm having a hard time finding out where to start with this one. I pull information from an external website and put some of the content on my page. I think I need two things done. 1. A google search that takes the url of the top search given a name of my current object. 2. A way to examine the source of the result and output the information of a tag with a specific class.
To better explain this, I'll create a hypothetical situation: Say I have a website that lists mattresses and gives reviews. Say I want to add other websites reviews and in this website there's a tag like 3.5/5. Then I want to display this review along with a link to the external page. Is there a way to search the site like "site:http://mattressreviewsite/ #matress.name", pull that top url, and then search the source for the string "class='rating'" and display this in my view?
Thanks for any help or guidance. I'm using Rails 3.
You need an HTTP client (httparty, net/http-default) for that and do some parsing to get the required results.
Go study the url patterns of google (as far as I remember it was google.com?q=search_string) and use the http client for requests (get/post). Parse the result (there are many HTML parser gems available too) to get what you need and for any subsequent HTTP requests. And don't forget the 'I am feeling lucky' feature of google which returns only one result.
All the best!

Resources