Extracting pageContent of the page returns Blank using Playwright for certain pages - playwright

import {chromium} from 'playwright'; // Web scraper Library
import * as fs from 'fs';
(async function () {
const chromeBrowser = await chromium.launch({ headless: true }); // Chromium launch and options
const context = await chromeBrowser.newContext({ ignoreHTTPSErrors: true ,
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
});
const page = await context.newPage();
await page.goto("https://jp.abcmouse.com/mkt/privacy/", { waitUntil: 'load', timeout: 60000 });
let content = await page.content();
fs.writeFileSync('test.html', content);
console.log("done")
})();
How do we access the Body content of this URL? I am able to extract many webpages but some of them won't work. Is there anything specific to be done for such sites ?

The page you shared as an example has most of its content inside a shadow root. As the content function relies on document.documentElement.outerHTML it won't pierce the shadow root. That's why it looks incomplete.

Related

Xamarin Android Websview UserAgentString not setting

I am trying to set User agent string in my web view in Xamarin Android app but it added only once and not adding after that. I am calling this code in OnCreate event
webView = FindViewById<WebView>(Resource.Id.webView);
webView.ClearCache(true);
webView.Settings.SetGeolocationEnabled(true);
webView.Settings.JavaScriptCanOpenWindowsAutomatically = true;
webView.Settings.SetAppCacheEnabled(false);
//webView.Settings.DomStorageEnabled = true;
//webView.Settings.DatabaseEnabled = true;
//string dbPath = Application.Context.GetDir("database", FileCreationMode.Private).Path;
//webView.Settings.DatabasePath = dbPath;
//webView.Settings.AllowUniversalAccessFromFileURLs = true;
//webView.Settings.DatabaseEnabled = true;
webView.Settings.JavaScriptEnabled = true;
webView.Settings.UserAgentString += " AndroidAPP";
webView.SetWebViewClient(new MyWebViewClient());
webView.SetWebChromeClient(new MyWebChromeClient(this));
webView.LoadUrl(myurl);
I am getting fowwling agent string in website logs
User agent = Mozilla/5.0 (Linux; Android 7.1.1; Custom Phone - 7.1.0 - API 25 - 768x1280 Build/NMF26Q; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/52.0.2743.100 Mobile Safari/537.36
only one time I got this string
User agent = Mozilla/5.0 (Linux; Android 7.1.1; Custom Phone - 7.1.0 - API 25 - 768x1280 Build/NMF26Q; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/52.0.2743.100 Mobile Safari/537.36 AndroidAPP
What I need to do?

Apps-Script post request

I've been trying to pull the CA CPUC data to return information based on the CPUC#. This has been kicking my butt. Sessions and checksums are currently active in the snippet but would need to be updated to reproduce. Before I write the code to get the appropriate session data I would love to have one post request actually work as expected. Seems to work in Postman no problem but in apps script it fails every time.
I've looked at all the other Apps script Post related questions. I haven't seen any that include session info.
Pretty, please help.
function cpuc(input = "") {
//input=0017;
//if (input == "") {
//input = ''
//} else
//{var num = input.toString(); }
var formData = {
'p_arg_checksums':'33316309586380504_F1C39724CC9F8514705A3E90B641B338',
'p_arg_names':'15205107486571135',
'p_arg_names':'33315885014380503',
'p_arg_names':'33316093548380503',
'p_arg_names':'33316309586380504',
'p_arg_names':'14878578395513793',
'p_flow_id':'203',
'p_flow_step_id':'35',
'p_instance':'5080254898961',
'p_md5_checksum':'',
'p_page_checksum':'C863921514D0032E5859DB0CAB79534A',
'p_page_submission_id':'4390229111775',
'p_request':'Submit',
'p_t01':'PSG',
'p_t02':'17',
'p_t03':'',
'p_t04':'',
'p_t05':'-1',
};
var headers = {
'Origin':'https://apps.cpuc.ca.gov',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'Content-Type':'application/x-www-form-urlencoded',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Referer':'https://apps.cpuc.ca.gov/apex/f?p=203:35:0::NO:RP::',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4',
'Cookie':'gsScrollPos=; ORA_WWV_APP_203=ORA_WWV-zjF8IrxaICg6oRrzu9Dpw6dU'
};
var url = 'https://apps.cpuc.ca.gov/apex/wwv_flow.accept';
var options = {
"method" : "post",
"headers" : headers,
"payload" : formData
};
var response = UrlFetchApp.fetch(url, options);
var text = response.getContentText();
//extract appropriate informatoin.
Logger.log(text);
return text;
}
I also haven't had much luck parsing the HTML responses in apps script. (Extra credit).

Dart HttpRequest & "The built-in library 'dart: io' is not available on Dartium"

I want to get json data from the server(tomcat server)
I was import "package:http/http.dart' as http".
But, Result is "The built-in library 'dart: io' is not available on Dartium" in Datium console.
So "dart build" and run the "Uncaught Unsupported operation: Platform._version" error comes in chrome console.
Also, dart: html and dart: io's "HttpRequest" was using the request fails.
How can I get response data from the server(tomcat or another was)?
Thanks your answer!!!
import 'dart:async';
import "dart:html";
import "dart:convert";
import 'package:http/http.dart' as http;
final ButtonElement loginButton = querySelector("#login");
void main() {
loginButton.onClick.listen((e) {
requestTest2IO();
});
}
void requestTest2IO(){
var url = 'server url';
http.get(url, headers : {'Cookie': 'JSESSIONID : xxxxxxxxxxxxxxxxxxxxxx',
'User-Agent': 'xxxxxxx',
'x-app-stat-json': '(Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36,appversion=8}'
})
.then((response) {
List<String> repos = JSON.decode(response.body);
print(repos);
});
}

Resource interpreted as Document but transferred with MIME type application/zip:

I'm unable to successfully download a file from the server using a Web API get call. The download seems to start but then Chrome throws:
"Resource interpreted as Document but transferred with MIME type application/zip"
Firefox doesn't say that but the download still fails.
What am I doing wrong in the following setup?:
[HttpGet, Route("api/extractor/downloadresults")]
public HttpResponseMessage DownloadResultFiles()
{
int contentLength = 0;
this.ResultFiles.ForEach(f => contentLength = contentLength + f.FileSize);
var streamContent = new PushStreamContent((outputStream, httpContext, transportContent) =>
{
...zip files...
});
streamContent.Headers.ContentType = new MediaTypeHeaderValue("application/zip");
streamContent.Headers.ContentLength = contentLength;
streamContent.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment")
{
FileName = "result.zip"
};
var response = Request.CreateResponse();
response.StatusCode = HttpStatusCode.OK;
response.Content = streamContent;
}
I trigger the download via:
window.location.href = "api/extractor/downloadresults";
With the resulting headers:
Request Headers
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Cookie:ASP.NET_SessionId=ibwezezeutmu2gpajfnpf41p
Host:localhost:47384
Referer:http://localhost:47384/
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36
Response Headers
Cache-Control:no-cache
Content-Disposition:attachment; filename=result.zip
Content-Length:436102
Content-Type:application/zip
Date:Mon, 16 Dec 2013 22:36:31 GMT
Expires:-1
Persistent-Auth:true
Pragma:no-cache
Server:Microsoft-IIS/8.0
X-AspNet-Version:4.0.30319
X-Powered-By:ASP.NET
X-SourceFiles:=?UTF-8?B?QzpcbmV3VG9vbGJveFxUb29sYm94XFRvb2xib3guV2ViXGFwaVx0ZXJtZXh0cmFjdG9yXGRvd25sb2FkcmVzdWx0ZmlsZXM=?=
Have you tried changing the request headers, for example the accept header?
Also, here you can find a similar question, some of the solutions suggested there may help you.

dropzonejs multifile upload not working as expected

I am trying to use the multi upload but in my MVC action I am not getting a list of uploaded files, instead I am getting file[] for each uploaded item.
I am uploading 2 files, but when I access this in my controller via:
foreach (string filename in Request.Files)
{
var file = Request.Files[filename];
//file.name always reads from file[] and picks the first file in all requests
}
My full request is:
Request URL: http://localhost:54434/1328/uploads/new
Request Method:POST
Status Code:201 Created
Request Headersview source
Accept:application/json
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:no-cache
Connection:keep-alive
Content-Length:8957136
Content-Type:multipart/form-data; boundary=----WebKitFormBoundarysX8tBB9TH4BzWZsG
Cookie:glimpsePolicy=On; _gauges_unique_month=1; _gauges_unique_year=1; _gauges_unique=1; glimpseId=Chrome 28.0; __RequestVerificationToken=itQ6HqqB_D7H_Y924w-HFfF8tq
ASP.NET_SessionId=cnj4lzpunuxnbyunl1m5gtpn
Glimpse-Parent-RequestID:04a1b6d2-6c6a-4da0-936d-3ff39e5b8c6c
Host:localhost:54434
Origin:http://localhost:54434
Referer:http://localhost:54434/1328/uploads/new
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36
X-Requested-With:XMLHttpRequest
Request Payload
------WebKitFormBoundarysX8tBB9TH4BzWZsG
Content-Disposition: form-data; name="__RequestVerificationToken"
dmPL-YRqsiwjKy43rlkYIBJE4kPlthsyL0IgnyHbtrD7Doczpbu9Z1SYeoL_93vuR15-6HfpNCCEzkzLYHBIxFJOQd3ynRGGYqILGpMWdLE1
------WebKitFormBoundarysX8tBB9TH4BzWZsG
Content-Disposition: form-data; name="private_upload"
true
------WebKitFormBoundarysX8tBB9TH4BzWZsG
Content-Disposition: form-data; name="files[]"; filename="Maid with the Flaxen Hair.mp3"
Content-Type: audio/mp3
------WebKitFormBoundarysX8tBB9TH4BzWZsG
Content-Disposition: form-data; name="files[]"; filename="Sleep Away.mp3"
Content-Type: audio/mp3
------WebKitFormBoundarysX8tBB9TH4BzWZsG--
Response Headersview source
Cache-Control:private, s-maxage=0
Content-Encoding:gzip
Content-Length:57
Content-Type:application/json; charset=utf-8
Date:Fri, 16 Aug 2013 10:55:23 GMT
Server:Microsoft-IIS/8.0
X-AspNet-Version:4.0.30319
X-AspNetMvc-Version:4.0
X-Powered-By:ASP.NET
X-SourceFiles:=?UTF-8?B?QzpcUHJvamVjdHNcU3VydmVudHJpeFxhcHBcU3VydmVudHJpeFxGYXN0U3VydmV5b3JzXDEzMjhcdXBsb2Fkc1xuZXc=?=
````
Dropzone config is:
<script type="text/javascript">
Dropzone.autoDiscover = false;
var myDropzone = new Dropzone("form#my-awesome-dropzone", {
paramName: "files", // The name that will be used to transfer the file
autoProcessQueue: false,
forceFallback: false,
uploadMultiple: true,
maxFilesize: 10,
previewsContainer: ".dropzone-previews",
clickable: ".dropzone" //make only this region clickable
});
myDropzone.on("addedfile", function (file) {
/* Maybe display some more file information on your page */
console.debug("added a file: " + file.name);
});
myDropzone.on("success", function (file) {
$("#drop-success").show();
});
$("#btnDropzone").click(function () {
myDropzone.processQueue();
});
});
</script>
How can I get each uploaded file in my controller so I can process it?
If the input name file[] is actually the problem, then you can set the Dropzone option uploadMultiple to false. This will create multiple requests and send every file on it's own instead of sending all parallel uploaded files in one request.
The following solution is valid for dropzone 3.7.3.
To fix this, change this line in dropzone.js:
formData.append("" + this.options.paramName + (this.options.uploadMultiple ? "[]" : ""), file, file.name);
to
formData.append("" + this.options.paramName + (this.options.uploadMultiple ? "[" + _l + "]" : ""), file, file.name);
Don't forget to include this non-minified version.
I'll try to suggest this patch to Enyo, on github.
add this attribute enctype="multipart/form-data" to your form
--edit
you need to add this option
parallelUploads: 10
Only the number of files which is specified in the parallelUploads option will be uploaded. The standard value of parallelUploads seems to be 2.
this is not a good solution as you don't know how many files to be uploaded
so i have edit the
$("#btnDropzone").click(function () {
var fileCount = myDropzone.files.length;
alert(fileCount);
alert(fileCount % myDropzone.options.parallelUploads);
var loopsCount = fileCount / myDropzone.options.parallelUploads;
if (fileCount % myDropzone.options.parallelUploads != 0) {
loopsCount = loopsCount + 1;
}
alert(loopsCount);
for (var i = 0; i < loopsCount ; i++) {
alert(i);
myDropzone.processQueue();
}
});
I had the same problem. I replaced the mvc action for
for (int arquivo = 0; arquivo < Request.Files.Count; arquivo++)
{
HttpPostedFileBase file = Request.Files[arquivo];
//...
}
And now it works fine.

Resources