Downloading whole websites with k6 - load-testing

I'm currently evaluating whether k6 fits our load testing needs. We have a fairly traditional website architecture that uses Apache webservers with PHP und a MySQL database. Sending simple HTTP requests with k6 looks simple enough and I think we will be able to test all major functionality with it, as we don't rely on JavaScript that much and most pages are static.
However, I'm unsure how to deal with resources (stylesheets, images, etc.) that are referenced in the HTML that is returned in the requests. We need to load them as well, as this sometimes leads to database requests, which must be part of the load test.
Is there some out-of-the-box functionality in k6 that allows you to load all the resources like a browser would? I'm aware that k6 does NOT render the page and I don't need it to. I only need to request all the resources inside the HTML.

You basically have two options, both with their caveats:
Record your session - you can either export har directly from the browser as shown there or use an extension made for your browser here is firefox and chromes. Both should be usable without a k6 cloud account you just need to set them to download the har and it will automatically (and somewhat silently) download them when you hit stop. And then either use the in k6 har converter (which is deprecated, but still works) or the new har-to-k6 one which.
This method is particularly good if you have a lot of pages and/or resources and even works if you have a single page style of application as it just gets what the browser requested as a HAR and then transforms it into a script. And if there were no dynamic things that need to be inputed (username/password) the final script can be used as is most of the time.
The biggest problem with this approach is that if you add a css file you need to redo this whole exercise. This is even more problematic if you css/js file name change on each change or something like that. Which is what the next method is good for:
Use parseHTML and then find the elements you care about and make a request for them.
import http from "k6/http";
import {parseHTML} from "k6/html";
export default function() {
const res = http.get("https://stackoverflow.com");
const doc = parseHTML(res.body);
doc.find("link").toArray().forEach(function (item) {
console.log(item.attr("href"));
// make http gets for it
// or added them to an array and make one batch request
});
}
will produce
NFO[0001] https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d
INFO[0001] https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a
INFO[0001] https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a
INFO[0001] /opensearch.xml
INFO[0001] https://cdn.sstatic.net/Shared/stacks.css?v=53507c7c6e93
INFO[0001] https://cdn.sstatic.net/Sites/stackoverflow/primary.css?v=d3fa9a72fd53
INFO[0001] https://cdn.sstatic.net/Shared/Product/product.css?v=c9b2e1772562
INFO[0001] /feeds
INFO[0001] https://cdn.sstatic.net/Shared/Channels/channels.css?v=f9809e9ffa90
As you can see some of the urls are relative and not absolute so you will need to handle this. And in this example only some are css, so probably more filtering is needed.
The problem here is that you need to write the code and if you add a relative link or something else you need to handle it. Luckily k6 is scriptable so you can reuse the code :D.

I've followed Михаил Стойков suggestion and written my own function to load resources. You can set the way resources are loaded (batch or sequential gets with options.concurrentResourceLoading).
/**
* #param {http.RefinedResponse<http.ResponseType>} response
*/
export function getResources(response) {
const resources = [];
response
.html()
.find('*[href]:not(a)')
.each((index, element) => {
resources.push(element.attributes().href.value);
});
response
.html()
.find('*[src]:not(a)')
.each((index, element) => {
resources.push(element.attributes().src.value);
});
if (options.concurrentResourceLoading) {
const responses = http.batch(
resources.map((r) => {
return ['GET', resolveUrl(r, response.url), null, {
headers: createHeader()
}];
})
);
responses.forEach(() => {
check(response, {
'resource returns status 200': (r) => r.status === 200,
});
});
} else {
resources.forEach((r) => {
const res = http.get(resolveUrl(r, response.url), {
headers: createHeader(),
});
!check(res, {
'resource returns status 200': (r) => r.status === 200,
});
});
}
}

Related

NetworkFirst cache strategy not saving the html content of an url

I am first time using serviceworker using workbox, have no prior experience. I want when a user visits a url, the corresponding html must be saved to cache, so when the app is offline, the content of the url serves from the cache. But I don't see the content in the cache from browser devtool. What am I missing here?
import { registerRoute } from "workbox-routing";
import { precacheAndRoute } from "workbox-precaching";
import {skipWaiting} from 'workbox-core';
import {NetworkFirst, CacheFirst} from 'workbox-strategies';
// ...
function jsonResponse(value) {
return new Response(JSON.stringify(value), {
headers: { "Content-Type": "application/json" },
status: 202,
});
}
console.log("Workbox is loaded");
skipWaiting();
/* injection point for manifest files. */
precacheAndRoute(self.__WB_MANIFEST);
const pagesCache = new NetworkFirst({
cacheName: 'pages-cache',
});
registerRoute(
new RegExp("/projects/[-a-z0-9]+/edit$"),
args => {
console.log('Got in here..');
return pagesCache.handle(args);
}
);
so when I go to offline I see.
Your code is mixing both precaching and runtime caching parts from the Workbox library. Some parts of it might work, some not.
By saying "I want when a user visits a url, the corresponding html must be saved to cache, so when the app is offline, the content of the url serves from the cache." it seems that you want what is called a network first caching strategy. The simplest configuration for that using Workbox would be:
import {registerRoute} from 'workbox-routing';
import {NetworkFirst} from 'workbox-strategies';
registerRoute(
new RegExp('/'),
new NetworkFirst()
);
More info about the strategy here: https://developers.google.com/web/tools/workbox/modules/workbox-strategies#network_first_network_falling_back_to_cache
I also suggest you to read a primer on Service Workers themselves. You may find one eg. here https://developers.google.com/web/fundamentals/primers/service-workers/

Workbox redirect the clients page when resource is not cached and offline

Usually whenever I read a blog post about PWA's, the tutorial seems to just precache every single asset. But this seems to go against the app shell pattern a bit, which as I understand is: Cache the bare necessities (only the app shell), and runtime cache as you go. (Please correct me if I understood this incorrectly)
Imagine I have this single page application, it's a simple index.html with a web component: <my-app>. That <my-app> component sets up some routes which looks a little bit like this, I'm using Vaadin router and web components, but I imagine the problem would be the same using React with React Router or something similar.
router.setRoutes([
{
path: '/',
component: 'app-main', // statically loaded
},
{
path: '/posts',
component: 'app-posts',
action: () => { import('./app-posts.js');} // dynamically loaded
},
/* many, many, many more routes */
{
path: '/offline', // redirect here when a resource is not cached and failed to get from network
component: 'app-offline', // also statically loaded
}
]);
My app may have many many routes, and may get very large. I don't want to precache all those resources straight away, but only cache the stuff I absolutely need, so in this case: my index.html, my-app.js, app-main.js, and app-offline.js. I want to cache app-posts.js at runtime, when it's requested.
Setting up runtime caching is simple enough, but my problem arises when my user visits one of the potentially many many routes that is not cached yet (because maybe the user hasn't visited that route before, so the js file may not have loaded/cached yet), and the user has no internet connection.
What I want to happen, in that case (when a route is not cached yet and there is no network), is for the user to be redirected to the /offline route, which is handled by my client side router. I could easily do something like: import('./app-posts.js').catch(() => /* redirect user to /offline */), but I'm wondering if there is a way to achieve this from workbox itself.
So in a nutshell:
When a js file hasn't been cached yet, and the user has no network, and so the request for the file fails: let workbox redirect the page to the /offline route.
Option 1 (not always useful):
As far as I can see and according to this answer, you cannot open a new window or change the URL of the browser from within the service worker. However you can open a new window only if the clients.openWindow() function is called from within the notificationclick event.
Option 2 (hardest):
You could use the WindowClient.navigate method within the activate event of the service worker however is a bit trickier as you still need to check if the file requested exists in the cache or not.
Option 3 (easiest & hackiest):
Otherwise, you could respond with a new Request object to the offline page:
const cacheOnly = new workbox.strategies.CacheOnly();
const networkFirst = new workbox.strategies.NetworkFirst();
workbox.routing.registerRoute(
/\/posts.|\/articles/,
async args => {
const offlineRequest = new Request('/offline.html');
try {
const response = await networkFirst.handle(args);
return response || await cacheOnly.handle({request: offlineRequest});
} catch (error) {
return await cacheOnly.handle({request: offlineRequest})
}
}
);
and then rewrite the URL of the browser in your offline.html file:
<head>
<script>
window.history.replaceState({}, 'You are offline', '/offline');
</script>
</head>
The above logic in Option 3 will respond to the requested URL by using the network first. If the network is not available will fallback to the cache and even if the request is not found in the cache, will fetch the offline.html file instead. Once the offline.html file is parsed, the browser URL will be replaced to /offline.

Service worker to save form data when browser is offline

I am new to Service Workers, and have had a look through the various bits of documentation (Google, Mozilla, serviceworke.rs, Github, StackOverflow questions). The most helpful is the ServiceWorkers cookbook.
Most of the documentation seems to point to caching entire pages so that the app works completely offline, or redirecting the user to an offline page until the browser can redirect to the internet.
What I want to do, however, is store my form data locally so my web app can upload it to the server when the user's connection is restored. Which "recipe" should I use? I think it is Request Deferrer. Do I need anything else to ensure that Request Deferrer will work (apart from the service worker detector script in my web page)? Any hints and tips much appreciated.
Console errors
The Request Deferrer recipe and code doesn't seem to work on its own as it doesn't include file caching. I have added some caching for the service worker library files, but I am still getting this error when I submit the form while offline:
Console: {"lineNumber":0,"message":
"The FetchEvent for [the form URL] resulted in a network error response:
the promise was rejected.","message_level":2,"sourceIdentifier":1,"sourceURL":""}
My Service Worker
/* eslint-env es6 */
/* eslint no-unused-vars: 0 */
/* global importScripts, ServiceWorkerWare, localforage */
importScripts('/js/lib/ServiceWorkerWare.js');
importScripts('/js/lib/localforage.js');
//Determine the root for the routes. I.e, if the Service Worker URL is http://example.com/path/to/sw.js, then the root is http://example.com/path/to/
var root = (function() {
var tokens = (self.location + '').split('/');
tokens[tokens.length - 1] = '';
return tokens.join('/');
})();
//By using Mozilla’s ServiceWorkerWare we can quickly setup some routes for a virtual server. It is convenient you review the virtual server recipe before seeing this.
var worker = new ServiceWorkerWare();
//So here is the idea. We will check if we are online or not. In case we are not online, enqueue the request and provide a fake response.
//Else, flush the queue and let the new request to reach the network.
//This function factory does exactly that.
function tryOrFallback(fakeResponse) {
//Return a handler that…
return function(req, res) {
//If offline, enqueue and answer with the fake response.
if (!navigator.onLine) {
console.log('No network availability, enqueuing');
return enqueue(req).then(function() {
//As the fake response will be reused but Response objects are one use only, we need to clone it each time we use it.
return fakeResponse.clone();
});
}
//If online, flush the queue and answer from network.
console.log('Network available! Flushing queue.');
return flushQueue().then(function() {
return fetch(req);
});
};
}
//A fake response with a joke for when there is no connection. A real implementation could have cached the last collection of updates and keep a local model. For simplicity, not implemented here.
worker.get(root + 'api/updates?*', tryOrFallback(new Response(
JSON.stringify([{
text: 'You are offline.',
author: 'Oxford Brookes University',
id: 1,
isSticky: true
}]),
{ headers: { 'Content-Type': 'application/json' } }
)));
//For deletion, let’s simulate that all went OK. Notice we are omitting the body of the response. Trying to add a body with a 204, deleted, as status throws an error.
worker.delete(root + 'api/updates/:id?*', tryOrFallback(new Response({
status: 204
})));
//Creation is another story. We can not reach the server so we can not get the id for the new updates.
//No problem, just say we accept the creation and we will process it later, as soon as we recover connectivity.
worker.post(root + 'api/updates?*', tryOrFallback(new Response(null, {
status: 202
})));
//Start the service worker.
worker.init();
//By using Mozilla’s localforage db wrapper, we can count on a fast setup for a versatile key-value database. We use it to store queue of deferred requests.
//Enqueue consists of adding a request to the list. Due to the limitations of IndexedDB, Request and Response objects can not be saved so we need an alternative representations.
//This is why we call to serialize().`
function enqueue(request) {
return serialize(request).then(function(serialized) {
localforage.getItem('queue').then(function(queue) {
/* eslint no-param-reassign: 0 */
queue = queue || [];
queue.push(serialized);
return localforage.setItem('queue', queue).then(function() {
console.log(serialized.method, serialized.url, 'enqueued!');
});
});
});
}
//Flush is a little more complicated. It consists of getting the elements of the queue in order and sending each one, keeping track of not yet sent request.
//Before sending a request we need to recreate it from the alternative representation stored in IndexedDB.
function flushQueue() {
//Get the queue
return localforage.getItem('queue').then(function(queue) {
/* eslint no-param-reassign: 0 */
queue = queue || [];
//If empty, nothing to do!
if (!queue.length) {
return Promise.resolve();
}
//Else, send the requests in order…
console.log('Sending ', queue.length, ' requests...');
return sendInOrder(queue).then(function() {
//Requires error handling. Actually, this is assuming all the requests in queue are a success when reaching the Network.
// So it should empty the queue step by step, only popping from the queue if the request completes with success.
return localforage.setItem('queue', []);
});
});
}
//Send the requests inside the queue in order. Waiting for the current before sending the next one.
function sendInOrder(requests) {
//The reduce() chains one promise per serialized request, not allowing to progress to the next one until completing the current.
var sending = requests.reduce(function(prevPromise, serialized) {
console.log('Sending', serialized.method, serialized.url);
return prevPromise.then(function() {
return deserialize(serialized).then(function(request) {
return fetch(request);
});
});
}, Promise.resolve());
return sending;
}
//Serialize is a little bit convolved due to headers is not a simple object.
function serialize(request) {
var headers = {};
//for(... of ...) is ES6 notation but current browsers supporting SW, support this notation as well and this is the only way of retrieving all the headers.
for (var entry of request.headers.entries()) {
headers[entry[0]] = entry[1];
}
var serialized = {
url: request.url,
headers: headers,
method: request.method,
mode: request.mode,
credentials: request.credentials,
cache: request.cache,
redirect: request.redirect,
referrer: request.referrer
};
//Only if method is not GET or HEAD is the request allowed to have body.
if (request.method !== 'GET' && request.method !== 'HEAD') {
return request.clone().text().then(function(body) {
serialized.body = body;
return Promise.resolve(serialized);
});
}
return Promise.resolve(serialized);
}
//Compared, deserialize is pretty simple.
function deserialize(data) {
return Promise.resolve(new Request(data.url, data));
}
var CACHE = 'cache-only';
// On install, cache some resources.
self.addEventListener('install', function(evt) {
console.log('The service worker is being installed.');
// Ask the service worker to keep installing until the returning promise
// resolves.
evt.waitUntil(precache());
});
// On fetch, use cache only strategy.
self.addEventListener('fetch', function(evt) {
console.log('The service worker is serving the asset.');
evt.respondWith(fromCache(evt.request));
});
// Open a cache and use `addAll()` with an array of assets to add all of them
// to the cache. Return a promise resolving when all the assets are added.
function precache() {
return caches.open(CACHE).then(function (cache) {
return cache.addAll([
'/js/lib/ServiceWorkerWare.js',
'/js/lib/localforage.js',
'/js/settings.js'
]);
});
}
// Open the cache where the assets were stored and search for the requested
// resource. Notice that in case of no matching, the promise still resolves
// but it does with `undefined` as value.
function fromCache(request) {
return caches.open(CACHE).then(function (cache) {
return cache.match(request).then(function (matching) {
return matching || Promise.reject('no-match');
});
});
}
Here is the error message I am getting in Chrome when I go offline:
(A similar error occurred in Firefox - it falls over at line 409 of ServiceWorkerWare.js)
ServiceWorkerWare.prototype.executeMiddleware = function (middleware,
request) {
var response = this.runMiddleware(middleware, 0, request, null);
response.catch(function (error) { console.error(error); });
return response;
};
this is a little more advanced that a beginner level. But you will need to detect when you are offline or in a Li-Fi state. Instead of POSTing data to an API or end point you need to queue that data to be synched when you are back on line.
This is what the Background Sync API should help with. However, it is not supported across the board just yet. Plus Safari.........
So maybe a good strategy is to persist your data in IndexedDB and when you can connect (background sync fires an event for this) you would then POST the data. It gets a little more complex for browsers that don't support service workers (Safari) or don't yet have Background Sync (that will level out very soon).
As always design your code to be a progressive enhancement, which can be tricky, but worth it in the end.
Service Workers tend to cache the static HTML, CSS, JavaScript, and image files.
I need to use PouchDB and sync it with CouchDB
Why CouchDB?
CouchDB is a NoSQL database consisting of a number of Documents
created with JSON.
It has versioning (each document has a _rev
property with the last modified date)
It can be synchronised with
PouchDB, a local JavaScript application that stores data in local
storage via the browser using IndexedDB. This allows us to create
offline applications.
The two databases are both “master” copies of
the data.
PouchDB is a local JavaScript implementation of CouchDB.
I still need a better answer than my partial notes towards a solution!
Yes, this type of service worker is the correct one to use for saving form data offline.
I have now edited it and understood it better. It caches the form data, and loads it on the page for the user to see what they have entered.
It is worth noting that the paths to the library files will need editing to reflect your local directory structure, e.g. in my setup:
importScripts('/js/lib/ServiceWorkerWare.js');
importScripts('/js/lib/localforage.js');
The script is still failing when offline, however, as it isn't caching the library files. (Update to follow when I figure out caching)
Just discovered an extra debugging tool for service workers (apart from the console): chrome://serviceworker-internals/. In this, you can start or stop service workers, view console messages, and the resources used by the service worker.

How do I use ServiceWorker without a separate JS file?

We create service workers by
navigator.serviceWorker.register('sw.js', { scope: '/' });
We can create new Workers without an external file like this,
var worker = function() { console.log('worker called'); };
var blob = new Blob( [ '(' , worker.toString() , ')()' ], {
type: 'application/javascript'
});
var bloburl = URL.createObjectURL( blob );
var w = new Worker(bloburl);
With the approach of using blob to create ServiceWorkers, we will get a Security Error as the bloburl would be blob:chrome-extension..., and the origin won't be supported by Service Workers.
Is it possible to create a service worker without external file and use the scope as / ?
I would strongly recommend not trying to find a way around the requirement that the service worker implementation code live in a standalone file. There's a very important of the service worker lifecycle, updates, that relies on your browser being able to fetch your registered service worker JavaScript resource periodically and do a byte-for-byte comparison to see if anything has changed.
If something has changed in your service worker code, then the new code will be considered the installing service worker, and the old service worker code will eventually be considered the redundant service worker as soon as all pages that have the old code registered and unloaded/closed.
While a bit difficult to wrap your head around at first, understanding and making use of the different service worker lifecycle states/events are important if you're concerned about cache management. If it weren't for this update logic, once you registered a service worker for a given scope once, it would never give up control, and you'd be stuck if you had a bug in your code/needed to add new functionality.
One hacky way is to use the the same javascript file understand the context and act as a ServiceWorker as well as the one calling it.
HTML
<script src="main.js"></script>
main.js
if(!this.document) {
self.addEventListener('install', function(e) {
console.log('service worker installation');
});
} else {
navigator.serviceWorker.register('main.js')
}
To prevent maintaining this as a big file main.js, we could use,
if(!this.document) {
//service worker js
importScripts('sw.js');
else {
//loadscript document.js by injecting a script tag
}
But it might come back to using a separate sw.js file for service worker to be a better solution. This would be helpful if one'd want a single entry point to the scripts.

Modify URL before loading page in firefox

I want to prefix URLs which match my patterns. When I open a new tab in Firefox and enter a matching URL the page should not be loaded normally, the URL should first be modified and then loading the page should start.
Is it possible to modify an URL through a Mozilla Firefox Addon before the page starts loading?
Browsing the HTTPS Everywhere add-on suggests the following steps:
Register an observer for the "http-on-modify-request" observer topic with nsIObserverService
Proceed if the subject of your observer notification is an instance of nsIHttpChannel and subject.URI.spec (the URL) matches your criteria
Create a new nsIStandardURL
Create a new nsIHttpChannel
Replace the old channel with the new. The code for doing this in HTTPS Everywhere is quite dense and probably much more than you need. I'd suggest starting with chrome/content/IOUtils.js.
Note that you should register a single "http-on-modify-request" observer for your entire application, which means you should put it in an XPCOM component (see HTTPS Everywhere for an example).
The following articles do not solve your problem directly, but they do contain a lot of sample code that you might find helpful:
https://developer.mozilla.org/en/Setting_HTTP_request_headers
https://developer.mozilla.org/en/XUL_School/Intercepting_Page_Loads
Thanks to Iwburk, I have been able to do this.
We can do this my overriding the nsiHttpChannel with a new one, doing this is slightly complicated but luckily the add-on https-everywhere implements this to force a https connection.
https-everywhere's source code is available here
Most of the code needed for this is in the files
IO Util.js
ChannelReplacement.js
We can work with the above files alone provided we have the basic variables like Cc,Ci set up and the function xpcom_generateQI defined.
var httpRequestObserver =
{
observe: function(subject, topic, data) {
if (topic == "http-on-modify-request") {
var httpChannel = subject.QueryInterface(Components.interfaces.nsIHttpChannel);
var requestURL = subject.URI.spec;
if(isToBeReplaced(requestURL)) {
var newURL = getURL(requestURL);
ChannelReplacement.runWhenPending(subject, function() {
var cr = new ChannelReplacement(subject, ch);
cr.replace(true,null);
cr.open();
});
}
}
},
get observerService() {
return Components.classes["#mozilla.org/observer-service;1"]
.getService(Components.interfaces.nsIObserverService);
},
register: function() {
this.observerService.addObserver(this, "http-on-modify-request", false);
},
unregister: function() {
this.observerService.removeObserver(this, "http-on-modify-request");
}
};
httpRequestObserver.register();
The code will replace the request not redirect.
While I have tested the above code well enough, I am not sure about its implementation. As far I can make out, it copies all the attributes of the requested channel and sets them to the channel to be overridden. After which somehow the output requested by original request is supplied using the new channel.
P.S. I had seen a SO post in which this approach was suggested.
You could listen for the page load event or maybe the DOMContentLoaded event instead. Or you can make an nsIURIContentListener but that's probably more complicated.
Is it possible to modify an URL through a Mozilla Firefox Addon before the page starts loading?
YES it is possible.
Use page-mod of the Addon-SDK by setting contentScriptWhen: "start"
Then after completely preventing the document from getting parsed you can either
fetch a different document from the same domain and inject it in the page.
after some document.URL processing do a location.replace() call
Here is an example of doing 1. https://stackoverflow.com/a/36097573/6085033

Resources