I browse an infinite scroll page using Puppeteer but this page is really really long. The problem is that the memory used by Puppeteer grows way too much and after a while, it crashes. I was wondering if there is a nice way to somehow free up memory during the scroll.
For example, would it be possible to pause every minute to remove the HTML that has been loaded so far and copy it to the hard disk? That way, after I'm done scrolling, I have all the HTML in a file and can easily work with it. Is it possible to do that? If yes, how? If no, what would be a viable solution?
I would wager that the approach you outline would work. The trick will be to remove nodes from only the list that is being added to. The implementation would maybe look something like this:
await page.addScriptTag({ url: "https://code.jquery.com/jquery-3.2.1.min.js" });
const scrapedData = [];
while (true) {
const newData = await page.evaluate(async () => {
const listElm = $(".some-list");
const tempData = listElm.toArray().map(elm => {
//Get data...
});
listElm
.children()
.slice(20)
.remove();
//TODO: Scroll and wait for new content...
return tempData;
});
scrapedData.push(...newData)
if(someCondition){
break;
}
}
Related
The scenario I'm trying to test is:
Get an array of elements from the page and read the url property of that element
Test 1 - that the array is not empty
For Each of these elements
Test 2->number of elements - Navigate to the url read above and test there are no errors on the page
So far I've got the code working, but only by including the for loop in the second test, which only registers the above as two tests and does not run the second set (ie: the for loop) in parallel.
When trying to make the second set of tests parallel (as I've done in another test that's based on a static array of pages) I end up with the below code, but the second set of tests is not even detected, ie: if I comment out Test 1, no tests found is reported.
Here's the code structure I'm trying to use. As I said above, it is based upon another test that's using a static array of elements that does seem to work as expected. The main difference with the other test is that everything is within the loop and there is no beforeAll. I'm assuming that's where the crux of the issue is.
import { test, expect, chromium } from '#playwright/test';
const pages = [];
test.describe('Pages Load', () => {
test.beforeAll(async () => {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(url);
const allPages = await page
.locator('container id')
.evaluateAll(($elements) => {
return $elements.map((element) => {
const pageTitle = element.querySelector('element id');
const pageURL = element.getAttribute(
'attribute id'
);
return {
title: pageTitle,
url: pageURL,
};
});
});
allPages.forEach(($page) => {
pages.push({
title: $page.title,
url: $page.url,
});
});
});
test('Pages exist', () => {
expect(pages.length).toBeGreaterThan(0);
});
test.describe('No errors on pages', () => {
pages.forEach((page) => {
test.describe(`${page.title}`, () => {
test('No errors', async () => {
// Test goes here
});
});
});
});
});
Update:
I've had a bit more of a play with this and while I know why this code isn't working, I still have no idea how to resolve the issue.
Couple of things I've noticed (and these are just theories/assumptions based on what I've observed):
Firstly, the reason why the above isn't working is because the array isn't set up by the time it get's to the loop, as the beforeAll only appears to happen with each test, not the test.describe. While this is just a theory (can't see anything in the documentation proving or disproving this), it does explain why the second test is ignored, because the array.length is 0
Secondly, when attempting to extract the functionality into a separate function outside of the test, as suggested here, however, this too had no luck, as when it gets to setting up the browser with const browser = await chromium.launch(); in an external function, playwright seems to exit with no errors being thrown. My assumption is that the browser setup needs to be within a test context, though that is just a theory.
Any advice on how I can get the desired results would be greatly appreciated. Many thanks in advance.
I am trying to show the loading animation during a function call that takes some time. The function call is searching a large array that is already loaded. After the search, matching items are inserted into a table. The table is cleared prior to starting the search.
The problem is the animation only displays during the brief moment when the page updates.
Here is my code:
var interval = setInterval(function ()
{
$.mobile.loading('show');
clearInterval(interval);
}, 1);
DoSearch(term, function ()
{
var interval = setInterval(function ()
{
$.mobile.loading('hide');
clearInterval(interval);
}, 1000);
});
//The search function looks like this (detail omitted for brevity):
function DoSearch(term)
{
$("table#tableICD tbody").html('');
// also tried:
/*$("table#tableICD tbody")
.html('')
.table()
.closest("table#tableICD")
.table("refresh")
.trigger("create");*/
var tr = '';
$.each(codes, function (key, value)
{
// determine which items match and add them as table rows to 'tr'
});
$("table#tableICD tbody")
.append(tr)
.closest("table#tableICD")
.table("refresh")
.trigger("create");
callback();
}
The search works properly and adds the rows to the table. I have two unexpected behaviors:
The table does not clear until the search is complete. I have tried adding .table("refresh").trigger("create") to the line where I set the tbody html to an empty string. This does not help. (see commented line in the code)
As I mentioned, the animation displays briefly while the screen is refreshing. (Notice I set the interval to 1000 in the second setInterval function just so I could even see it.)
The suggestions I have read so far are to use setInterval instead of straight calling $.mobile.loading, which I have done and placing the search in a function and using a callback, which I have also done.
Any ideas?
Let me give you a few suggestions; they will probably not solve all your issues but they may help you found a solution.
jQuery Mobile is buggy, and for some features, we will never know were they intended to work like that or are they just plain bugs
You can call $.mobile.loading('show') on its own only in pageshow jQuery Mobile event. In any other case, you need to do it in interval or timeout.
It is better to do it in timeout, mostly because you are using less code. Here an example I made several years ago: http://jsfiddle.net/Gajotres/Zr7Gf/
$(document).on('pagebeforecreate', '[data-role="page"]', function(){
setTimeout(function(){
$.mobile.loading('show');
},1);
});
$(document).on('pageshow', '[data-role="page"]', function(){
// You do not need timeout for pageshow. I'm using it so you can see loader is actualy working
setTimeout(function(){
$.mobile.loading('hide');
},300);
});
It's difficult to enhance any jQuery Markup in real time after a page was loaded. So my advice is to first generate new table content, then clean it, and then update markup using .table("refresh").
Do table refresh only once, never do it several times in the row. It is very resourced heavy method and it will last a very long time if you run it for every row
If you are searching on keypress in the input box then showing it in the table; that is the least efficient method in jQuery Mobile. jQM is that slow, it is much better to use listview component which is least resource extensive.
I'm trying to dynamically populate a select tag at load time (latest jQM version) using a custom template filling function.
If the fn is called in the "pagebeforechange" event, the select tag is properly initialized. Since this event is called on every page transition, I thought of moving the fn to the 'pageinit' event. This does not work, presumably because the DOM is not yet fully available. How can I coerce jQM to inject content in a page only once? Currently, I am using a kludge. There surely must be a smarter way. Thanks for any suggestions.
$(document).bind('pageinit', function () {
InitSelTagTest("#selActTag", "tplTag"); // Does not work.
});
$(document).bind("pagebeforechange", function (e, data) {
if ($("#selActTag").children().size() === 0) {
InitSelTagTest("#selActTag", "tplTag"); // Kludge, but it works
}
});
function InitSelTagTest(el,tpl) { // Append all tags to element el
var lstAllTags = JSON.parse($("#hidTag").val()); // Create tag array
// Retrieve html content from template.
var cbeg = "//<![" + "CDATA[", cend = "//]" + "]>";
var rslt = tmpl(tpl, { ddd: lstAllTags }).replace(cbeg, ").replace(cend,");
$(el).html(rslt).trigger("create"); // Add to DOM.
}
EDIT
In response to Shenaniganz' comment, it seems that the "pagebeforecreate" event could do the trick ie.
$("#pgAct").live("pagebeforecreate", function () {
// Populate tag select. Works. Traversed only once.
InitSelTag("#selActTag", "tplTag");
});
I'm not sure I fully understand your question but I'll throw a few things out there and you let me know if I can extend further.
To make something trigger only once on page load you can try to implement a regular JQuery $(document).ready(function(){}) aka $(function(){}) for the exact reason why JQuery Mobile users are told not to use it. It triggers only once on DOM load. Further pages don't trigger it because they're being switched via Ajax.
Other than that, on regular dynamic content loading you take a look at the following example I put together for someone else earlier:
http://jsbin.com/ozejif/1/edit
I'm helping a company develop a website that utilizes jquery but I have noticed that the site slows to a complete halt with a jquery "Too Much Recursion" error. The company really needs to get this resolved but retain the slideshow capabilities as they are right now. Here is the code in question:
<script type="text/javascript">
var $testimonialCont;
var $slideshowContainer;
$(document).ready(function(){
$slideshowContainer = $('.slideshowContainer');
var inititalSlideshowDelay = setTimeout(cycle_slideshow_image, 4000);
$testimonialCont = $('.testimonialContainer');
$('.testimonialBubble').hide();
$('.testimonialBubble').removeClass('hide');
cycle_top_bubble()
var initialTestimonialDelay = setTimeout(cycle_top_bubble, 3000);
});
function cycle_slideshow_image(){
//This code cycles the slideshow caption headings and body text
$('h1.slideshowCaptionHeading:last').fadeOut(1500, function(){
$(this).prependTo('.captionHeaderArea');
$(this).show(1);
var delay = setTimeout(cycle_slideshow_image, 4000);
});
$('p.slideshowCaptionBody:last').fadeOut(1500, function(){
$(this).prependTo('.captionBodyArea');
$(this).show(1);
var delay = setTimeout(cycle_slideshow_image, 4000);
});
$('img.slideshowSlide:last').fadeOut(1500, function(){
$(this).prependTo($slideshowContainer);
$(this).show(1);
var delay = setTimeout(cycle_slideshow_image, 4000);
});
}
function cycle_top_bubble(){
$('.testimonialBubble:last').prependTo($testimonialCont).fadeIn(1500, function(){
var $this = $(this);
var thisTimer = setTimeout(function(){
$this.fadeOut(1500, function(){
var thisDelay = setTimeout(cycle_top_bubble, 3000);
})
}, 5000);
});
}
</script>
Here is the site's address: http://dbunderdevelopment.com/CRR/
If anyone has any suggestions, I would greatly appreciate it.
P.S. I did post this question before as an unregistered user and I sincerely apologize in advance for that. I can't seem to find the post in order to delete but, rest assured, it will not happen again. I know how bad repostings are on forums.
To me it looks like cycle_slideshow_image calls itself three times each time it is called... change it to this:
function cycle_slideshow_image(){
//This code cycles the slideshow caption headings and body text
$('h1.slideshowCaptionHeading:last').fadeOut(1500, function(){
$(this).prependTo('.captionHeaderArea');
$(this).show(1);
});
$('p.slideshowCaptionBody:last').fadeOut(1500, function(){
$(this).prependTo('.captionBodyArea');
$(this).show(1);
});
$('img.slideshowSlide:last').fadeOut(1500, function(){
$(this).prependTo($slideshowContainer);
$(this).show(1);
var delay = setTimeout(cycle_slideshow_image, 4000);
});
}
Also, cycle_top_bubble is being called twice initially, so it's running in two loops. remove this line:
var initialTestimonialDelay = setTimeout(cycle_top_bubble, 3000);
Another thing to consider is that when your page becomes an inactive tab in the browser, the timeouts are clamped to 1000ms (ref) so the animation may build up if you have the timeouts too short, which you don't, but it's something to keep in mind.
So you need to think about how recursion works, when you recurse in those set timeout functions you create a new scope inside the recursed function, Adding everything onto the stack without popping off the last function.
If you look at this as it is a block of memory but you never recurse which is the returning back up you continue to flood memory with more and more objects until its full. How you can solve this is pretty easy.
First recursion is the wrong approach for something that never completes, I explained why above. The recursion needs to be changed. The solution I would use is have a callback on the setTimeout but move your setTimeouts outside the scope of the calling function. This should help with the memory problem.
Other suggestions is to use a real slideshow plugin that someone else wrote... I know this may be frowned upon but why recreate the wheel when it has been done 1000 times. I recommend jQuery Cycle it is extremely fast and customizable.
Good luck!
I am using the sortable widget to re-order a list of items. After an item is dragged to a new location, I kick off an AJAX form post to the server to save the new order. How can I undo the sort (e.g. return the drag item to its original position in the list) if I receive an error message from the server?
Basically, I only want the re-order to "stick" if the server confirms that the changes were saved.
Try the following:
$(this).sortable('cancel');
I just encountered this same issue, and for the sake of a complete answer, I wanted to share my solution to this problem:
$('.list').sortable({
items:'.list:not(.loading)',
start: function(event,ui) {
var element = $(ui.item[0]);
element.data('lastParent', element.parent());
},
update: function(event,ui) {
var element = $(ui.item[0]);
if (element.hasClass('loading')) return;
element.addClass('loading');
$.ajax({
url:'/ajax',
context:element,
complete:function(xhr,status) {
$(this).removeClass('loading');
if (xhr.status != 200) {
$($(this).data('lastParent')).append(this);
}
},
});
}
});
You'll need to modify it to suit your codebase, but this is a completely multithread safe solution that works very well for me.
I'm pretty sure that sortable doesn't have any undo-last-drop function -- but it's a great idea!
In the mean time, though, I think your best bet is to write some sort of start that stores the ordering, and then on failure call a revert function. I.e. something like this:
$("list-container").sortable({
start: function () {
/* stash current order of sorted elements in an array */
},
update: function () {
/* ajax call; on failure, re-order based on the stashed order */
}
});
Would love to know if others have a better answer, though.