Extract images from playwright page without requesting them again? - playwright

Let's say I've requested a page and it's fully loaded. Is it possible to save the images from the rendered/loaded page without sending another request for the image? This would be to avoid just collecting the individual image urls and hammering the server for each image again.

Related

html2canvas: use cached images from Service Worker

TL;DR: How can images processed by html2canvas be cached using a ServiceWorker? Why the existing ServiceWorker cache isn't used?
I'm writing a PWA that also can be used offline. It's an application that is used for creating grids of custom images. Images are coming from an external API and I cache these requests to the API using Workbox/ServiceWorker.
Offline capabilities are working great, but when using html2canvas in order to create thumbnails of the image grids, it's only working online. html2canvas seems to create an iframe-copy of the page in order to create the screenshots. And for all images in the iframe/screenshot new requests are done, and the existing cache from the ServiceWorker isn't used.
This screenshot shows the network traffic for opening my app with a grid of 2 images from the API:
request (1) is are the images loaded by the app - coming from ServiceWorker
requests (2-4) are three attemts of loading the images from html2canvas, where the last one succeeds using the ServiceWorker, however the images are not visible on the screenshot.
Any ideas for making html2canvas usable offline using either the existing ServiceWorker cache or another one are welcome.
I'm using html2canvas 1.4.1.
I have never used html2canvas, so I might be wrong, but if it is creating an <iframe>, then keep in mind that an iframe establishes a new browsing context, and that the communication between browsing contexts is severely constrained for security reasons.
The iframe created by html2canvas should be on the same origin of your PWA, so maybe you could try using the BroadcastChannel API to let these browsing contexts (i.e. the iframe and the service worker) communicate between each other.
See also:
Cache iframe request with ServiceWorker

Resize S3 Images on the Fly with AWS Lambda, Amazon API Gateway - too many redirects for HTML IMG tag

I've followed this AWS image resizing blog post to resize images in my S3 bucket on the fly and return them when requested using the bucket’s static website hosting endpoint. It works fine for me when I type the URL into a web browser address bar.
I can see 2 redirects happening when I do this (if the resized image hasn't been generated yet):
First, the S3 endpoint will 404 redirect to API gateway URL
The lambda function is called to generate the resized image, and API gateway redirects the browser back to the endpoint URL again. Now that the resized image is there, it gets displayed.
What I'm aiming to do is display the dynamically resized images in web pages, so I've simply tried to place the endpoint URL for a resized image in img tags eg. <img src="http://YOUR_BUCKET_WEBSITE_HOSTNAME_HERE/300×300/blue_marble.jpg">
When I do this, nothing shows up in my web pages. In the browser console, I can see an error message - it's trying to GET the API gateway URL, and the error is ERR_TOO_MANY_REDIRECTS
Is there a way for me to display the resized images in web pages?
If not, what would I need to do to modify this approach to resize images to predefined sizes upon upload to S3 instead?
Thanks!
Make sure that when you configure the URL in your environment variables that you do not have a / at the end of your url which will cause looping in your flow.
i.e http://BUCKET_NAME.s3-website.us-east-2.amazonaws.com (this is good)
i.e http://BUCKET_NAME.s3-website.us-east-2.amazonaws.com/ (this is bad remove slash)

What Is A Browser Cache? What does it store from a webpage data?

Whenever I have an issue with a website, one of the first suggestions I will hear is “try to clear your browser cache” along with “and delete your cookies“. So what is this browser cache? What does it store and what is it good for?
I have googled. but didn't find the proper answer.I appreciate if anyone help on this.
A browser cache "caches" (as in keeps local copies) of data downloaded from the internet. The next time your browser needs the same data it can get it from the cache (fast) instead of downloading it over the internet (slow)
The problem is that data can be old. For example imagine the browser cached www.nytimes.com today and 24hrs later you visited www.nytimes.com again. If the browser loaded the cached data it would be old news.
So there are headers (metadata) that the servers send to browser telling them how long they should cache something (if at all).
The data the browser generally caches are "requests" which. In other words if your browser asks for "http://foo.com/bar.html" the first time the browser will "request" that "foo.com" send it "bar.html". If the headers from "foo.com" are set a certain way the browser will the save a local copy of "bar.html". If you request the same thing again the browser may load "bar.html" from it's cache. I say "may" because it depends on the headers sent from the server. The server can say how long (say 10 minutes, 10 hours, 10 days, etc..) or it can say "don't cache this at all, always download the newest version".
If you go to your browser's dev tools (chrome shown below) and look at the network tab (not sure what it's called in other browsers). Load the page again and you can see all the requests. You'll also notice which ones were loaded from the cache
If you click on a request you can see the metadata from both the browser (request headers) and the server (response headers)
The reason clearing the cache often fixes things is if for some reason the server (a bug?) said it was ok to cache or used the cached version but the data on the server has actually been updated. The browser, doing what the server told it to do, is using its copy from the cache, not the newer version which is actually needed. There might also from time to time be bugs in the browser itself related to caching.
When everything is working correctly it's great but if one thing or another is mis-configured or sending the wrong headers then the browser can end up loading old data from the cache instead of downloading the newest data. Clearing your cache effectively forces the browser to download the data again.
You can find out the details of what the various headers do here.
Browser caches are not mere rubbish bins but a mechanism to speed up the way we browse the web. Each website we visit has certain common elements like logos, navigation buttons, GIF animation files, script files etc. It doesn’t make sense for the browser to download each element (also commonly called as Temporary Internet files) when we hop from page to the other and back.
The page elements are downloaded when we visit a website and the browser checks its cache folder for copies when we browse the website. If a copy exists, then the browser doesn’t download the same file again, thus significantly speeding up web browsing speeds.
for more info..
http://www.guidingtech.com/8925/what-are-browser-cache-cookies-does-clearing-them-help/
https://en.wikipedia.org/wiki/Cache_(computing)
First result in Google, this is the proper answer, but I will summarize =]
1) What is Browser Cache?
Cache is a component that stores data so future requests for that data can be served faster; the data stored in a cache might be the results of an earlier computation, or the duplicates of data stored elsewhere.
2) What does it store?
Web browsers and web proxy servers employ web caches to store previous responses from web servers, such as web pages and images.
3) What is it good for?
Web caches reduce the amount of information that needs to be transmitted across the network, as information previously stored in the cache can often be re-used. This reduces bandwidth and processing requirements of the web server, and helps to improve responsiveness for users of the web.

Heroku and Cloudfront cache issues

My heroku application resizes images as thumbnails: these thumbnails are supposed to be stored by Cloudfront.
Requesting the thumb to Heroku will cause a image to be generated, which takes time, and should only be done once for each image.
Our application always access these images through Cloudfront: so the images should be generated once, then they'd be stored by Cloudfront which would serve us as long as the cache would be deemed valid.
We receive mails everytime a thumb is generated. The problem is, when we try to access those thumbs, our Heroku server is asked to generate the thumbnail again: only then the thumb is properly cached, and we can freely access it without any traffic being sent to our server.
Does anyone know why such a thing would happen ?

Can I Tweet an image with just a URL?

I'm setting up the back end for an Android/iOS app that, among other things, allows users to share an image via Twitter. It's hosted on Heroku, which has no local image hosting, so the images are hosted elsewhere.
It looks like if you want to tweet an image you're supposed to POST to /statuses/update_with_media and send the image as multi-part data. But I don't have the images stored locally, so I would have to copy the image over to temp storage on Heroku, POST it to Twitter, and then delete it, which seems... inefficient.
Is there any way I can use Twitter's API to tweet an image and only supply the URL for the image?
It does not look like it's possible to send Twitter a link via their API, presumably because they would then have to download the image themselves. You could upload the image to a third party and link to that, but you have the same problem in that case.
You shouldn't need to copy the file over as such though, you could read the file into memory and serialize it to multi-part form data in order to send to Twitter.
Do you have any code to show?

Resources