Correct way to precache the root url ("/") - service-worker

I'm a bit confused about the correct way to precache the root url, namely "/".
If I use webpack-plugin-workbox to generate the precacheManifest, it doesn't include an entry for "/". "/index.html" is included of course. Now if the user loads the app, precaching kicks in, and the user tries to load the root url without network connectivity, the site won't load since precaching did nothing for the root url. If the user tries to load "/index.html" everything works nicely. But users don't load that url, they load the root url. So how to cache that?
Should I use the navigateFallback: index.html option which, in my understanding, redirects the user to the provided url in case of connectivity loss and cache miss?
Or should I use templatedUrls: { "/": [ "index.html" ] } option which, in my understanding, generates a hash based on the index.html and then caches "/" based on the changing of that hash value?
Or should I use some completely different strategy?
Thanks a million!

By default, when there's an initial precache miss for a URL that ends in /, Workbox will check its list of precached files again to see if there's a match for the same URL ending in /index.html.
You can read more about this, along with how to customize the default behavior, in the module guide for workbox-precaching.
So... things should work as you describe without your needing to do anything. (You need to make sure that you're testing after the service worker has activated and takes control over the current window client.)
If you're not seeing that behavior, it sounds like there might be a bug in Workbox, and it might be best to follow up in the issue tracker with more details.

Related

How to set up app to disallow Cloudfront from fetching anything?

I use rails 5.2 and cloudfront for assets
How to set up app to disallow Cloudfront from fetching anything except for assets?
CloudFront doesn't have an explicit way to "allow only" certain path prefixes, since they will ultimately match the default * cache behavior if they don't match any others, but there are several ways of working around this, depending on the level of sophistication and complexity that suits your taste... but all of them would start with this step:
create a new cache behavior using the desired path pattern, such as /assets/* and select your existing origin to handle these requests.
At this point, CloudFront still works as before, it's just internally considering the asset requests to match one behavior and everything else to match the other.
So, what we need next is something different for the "other."
The simplest solution is to create a second Origin, using the Origin Domain Name invalid.invalid. This is a syntactically valid hostname that points to a nonexistent target (the .invalid TLD is reserved for such purposes).
After creating this origin, edit your default cache behavior to use this new origin.
With this change in place and propagated, CloudFront will process /assets/* requests as before, but will throw an error on any other path. (The error is 502 Bad Gateway, if I remember correctly).
This accomplishes the simple purpose of blocking all other requests.
If you want to be a bit more proactive, and actually redirect requests back to the main site, you can accomplish this by creating an empty bucket in S3, and select the "Redirect requests" option. In the "target bucket or domain" box, put your main web site hostname. Then take the "Endpoint" shown in the Static website hosting box and use that as your origin hostname in CloudFront, for the default cache behavior. Any requests that arrive at CloudFront (for other than /assets/* will receive a redirect back to the main site.
This option may be the better option if your CDN has been inadvertently picked up by search engines, because the links will redirect back to the main site.

Are friendly URLs based on directories?

I've been reading many articles about SEO and investigating how to improve my site. I found an article that said that having friendly URLs help online indexers to find and positionate your site better than using URLs with lots of GET parameters so I decided to adapt my site to this kind of URL. I've also read that there's a way (editing .htaccess) but it's not the best way and it doesn't look really good.
For example, that's how Google's About URL looks like:
https://www.google.com/search/about/es/
When surfing into FTP do they see the directories search/about/es/index.html? If so, you must create many files and directories for each language instead of using &l=es, is it that worth?
You can never know (for sure) how resources are mapped to URLs.
For example, the URL https://www.google.com/search/about/es/ could
point to the HTML file /search/about/es/index.html
point to the HTML file /foo/bar/1.html
point to the PHP script /index.php
point to the PHP script /search.php?title=about&lang=es
point to the document available from the URL https://internal.google.com/1238
…
It’s always the server that, given the URL from the request, decides which resource to deliver. Unless you have access to the server, you can’t know how. (Even if a URL ends with .php, it’s not necessarily the case that PHP is involved at all.)
The server could look for a file that physically exists (if URL rewriting is involved: even in "other" places than what the URL path suggests), the server could run a script that generates a document on the fly (e.g., taking the content from your database), the server could output the file available from another URL, etc.
Related Wikipedia articles:
Rewrite engine
Web framework: URL mapping
Front controller

How does HTML5 AppCache handle redirects?

If I include in my application cache manifest:
/example.html
and this redirects to
https://s3.amazonaws.com/longURL/example.html?dynamicauthenticationparameters
will this work?
The current draft HTML5 specification seems to be silent on redirects for content files (as opposed to the manifest itself) apart from referring to a manual redirect flag, which apparently is set but (as far as I can tell) never actually used.
(The intention is to avoid proxying some S3 content, but to still make it available offline using the cache mechanism. JavaScript and LocalStorage would presumably be a workaround if the above can't be done).
Any pointers to the relevant part of a spec and/or current browser implementation behavior would be helpful.
The current specification now states that if the resource is redirected to a different origin, then this is treated as a failure and the local cached copy (or fallback) is used instead.
In section 5.6.4 of http://www.w3.org/TR/2011/WD-html5-20110525/offline.html it states that:
Redirects are fatal because they are either indicative of a network
problem (e.g. a captive portal); or would allow resources to be added
to the cache under URLs that differ from any URL that the networking
model will allow access to, leaving orphan entries; or would allow
resources to be stored under URLs different than their true URLs. All
of these situations are bad.
So sadly you can't serve some pages from Amazon S3 or Cloudfront.

IIS 7 URL Rewrite config issue or MVC Route?

I have an account with Gearhost.com and when it comes to setting up sub-domains you are currently required to go in and configure an URL Rewrite entry using IIS Remote Admin.
The directory folder structure follows the pattern:
\mastersite
\mastersite\subdomain1
The Gearhost KB Article on how to do it can be found here:
https://support.gearhost.com/KB/a851/setting-a-subdomains-content-location-using-url-rewrite.aspx?KBSearchID=0
This works just fine, but I ran into a scenario that revealed the ability to access the sub-domain by using the master.com/sub-domain path.
subdomain1.site.com (works)
www.site.com/subdomain1 (displays site also --which I don't want)
I don't know if the KB article is the correct way to configure sub-domains in IIS or if I need to manage the routing in my Microsoft MVC 3 Application.
Let's say it is the correct way to setup/configure a sub-domain. Is there a way to restrict the path for the 2nd option, so it returns as page not found or access forbidden or something to this effect?
I'm developing a Microsoft MVC Application and if I use a "Request.Url" call, it actually returns the full path of the 2nd option even when I'm sitting on what looks like a perfect path to the sub-domain home page.
So I don't know if this needs to be handled a different way, if the URL Rewrite entry needs to be changed, or what the solution may be.
Looking for feedback from any engineers who may have more knowledge on the topics.
Thanks.
I ran across an article which solved my original request for help.
It involved creating Outgoing rules in IIS, to rename the path. The rule looks for the path in question and then rewrites it.
Per the article I used Outgoing rule # 2.
Pre-Condition: None
Matching Scope: Server Variable
Variable Name: RESPONSE_LOCATION
Variable Value: Matches the Pattern
Using: Regular Expressions
Pattern: ^(?:MyMasterSiteSubFolder/MySubDomain|(.*//[_a-zA-Z0-9-\.]*)?/MyMasterSiteSubFolder/MySubDomain)(.*)
[x] Ignore case
Action: Rewrite
Action Properties Value: {R:1}{R:2}
[x] Replace existing server variable value
[ ] Stop processing of subsequent rules

How do you see the client-side URL in ColdFusion?

Let's say, on a ColdFusion site, that the user has navigated to
http://www.example.com/sub1/
The server-side code typically used to tell you what URL the user is at, looks like:
http://#cgi.server_name##cgi.script_name#?#cgi.query_string#
however, "cgi.script_name" automatically includes the default cfm file for that folder- eg, that code, when parsed and expanded, is going to show us "http://www.example.com/sub1/index.cfm"
So, whether the user is visiting sub1/index.cfm or sub1/, the "cgi.script_name" var is going to include that "index.cfm".
The question is, how does one figure out which URL the user actually visited? This question is mostly for SEO-purposes- It's often preferable to 301 redirect "/index.cfm" to "/" to make sure there's only one URL for any piece of content- Since this is mostly for the benefit of spiders, javascript isn't an appropriate solution in this case. Also, assume one does not have access to isapi_rewrite or mod_rewrite- The question is how to achieve this within ColdFusion, specifically.
I suppose this won't be possible.
If the client requests "GET /", it will be translated by the web server to "GET /{whatever-default-file-exists-fist}" before ColdFusion even gets invoked. (This is necessary for the web server to know that ColdFusion has to be invoked in the first place!)
From ColdFusion's (or any application server's) perspective, the client requested "GET /index.cfm", and that's what you see in #CGI#.
As you've pointed out yourself, it would be possible to make a distinction by using a URL-rewriting tool. Since you specifically excluded that path, I can only say that you're out of luck here.
Not sure that it is possible using CF only, but you can make the trick using webserver's URL rewriting -- if you're using them, of course.
For Apache it can look this way. Say, we're using following mod_rewrite rule:
RewriteRule ^page/([0-9]+)/?$
index.cfm?page=$1&noindex=yes [L]
Now when we're trying to access URL http://website.com/page/10/ CGI shows:
QUERY_STRING page=10&noindex=yes
See the idea? Think same thing is possible when using IIS.
Hope this helps.
I do not think this is possible in CF. From my understanding, the webserver (Apache, IIS, etc) determines what default page to show, and requests it from CF. Therefore, CF does not know what the actual called page is.
Sergii is right that you could use URL rewrting to do this. If that is not available to you, you could use the fact that a specific page is given precedence in the list of default pages.
Let's assume that default.htm is the first page in the list of default pages. Write a generic default.htm that automatically forwards to index.cfm (or whatever). If you can adjust the list of defaults, you can have CF do a 301 redirect. If not, you can do a meta-refresh, or JS redirect, or somesuch in an HTML file.
I think this is possible.
Using GetHttpRequestData you will have access to all the HTTP headers.
Then the GET header in that should tell you what file the browser is requesting.
Try
<cfdump var="#GetHttpRequestData()#">
to see exactly what you have available to use.
Note - I don't have Coldfusion to hand to verify this.
Edit: Having done some more research it appears that GetHttpRequestData doesn't include the GET header. So this method probably won't work.
I am sure there is a way however - try dumping the CGI scope and see what you have.
If you are able to install ISAPI_rewrite (Assuming you're on IIS) - http://www.helicontech.com/isapi_rewrite/
It will insert a variable x-rewrite-url into the GetHttpRequestData() result structure which will either have / or /index.cfm depending on which URL was visited.
Martin

Resources