Parse HTML to retrieve specific tags value with Google Apps Script - parsing

I'm trying to parse a HTML to retrieve the value of tag, on my Google Apps Script code. contains line breaks in attributes, and appears more than once but I only want the first value. (In this case, only 'foo' is required.)
<b class="
"
>
foo
</b><b class="
"
>
var
</b>
On Google Apps Script, functions such as 'getElementByTagName' is not available. So I first though of using regexp but it's not the wise option here.
Does anyone have an idea on how I can move forward? Any comment/guess would be highly appreciated!

How about using XmlService for your situation as a workaround? At XmlService, even if there are several line breaks in the tags, the value can be retrieved. I think that there are several workarounds for your situation. So please think of this as one of them.
The flow of sample script is as follows.
Flow :
Add the header of xml and a root element tag to the html.
Parse the creates xml value using XmlService.
Retrieve the first value of tags using XmlService.
Sample script :
var html = '<b class="\n"\n>\nfoo\n</b><b class="\n"\n>\nvar\n</b>\n'; // Your sample value
var xml = '<?xml version="1.0"?><sampleContents>' + html + '</sampleContents>';
var res = XmlService.parse(xml).getRootElement().getChildren()[0].getText().trim();
Logger.log(res) // foo
Note :
In this sample script, your sample html was used. So if you use more complicated one, can you provide it? I would like to modify the script.
Reference :
XML Service
If this was not what you want, please tell me. I would like to modify it.
Edit 1 :
Unfortunately, for the value retrieved from the URL, above script cannot be used. So I used "Parser" which is a GAS library for your situation. The sample script is as follows.
Sample script :
var url = "https://www.booking.com/searchresults.ja.html?ss=kyoto&checkin_year=2018&checkin_month=10&checkin_monthday=1&checkout_year=2018&checkout_month=10&checkout_monthday=2&no_rooms=1&group_adults=1&group_children=0";
var html = UrlFetchApp.fetch(url).getContentText();
var res = Parser.data(html).from("<b class=\"\n\"\n>").to("</b>").build().trim();
Logger.log(res) // US$11
Note :
Before you run this script, please install "Parser". About the install of library, you can see it at here.
The project key of the library is M1lugvAXKKtUxn_vdAG9JZleS6DrsjUUV
References :
Parser
Managing libraries
google app script Exceeded memory limit
google script scrape parser with 2 classes with the same name
Edit 2 :
For your 2nd URL in your comment, it seems that the URL is different from your 1st one. And also your new URL has no tag of <b class=\"\n\"\n>. By this, the value you want cannot be retrieved. But from the 1st URL in your comment, I presumed about the value what you want. Please confirm the following script?
var url = "https://www.booking.com/searchresults.ja.html?ss=kyotogranvia&checkin_year=2018&checkin_month=10&checkin_monthday=1&checkout_year=2018&checkout_month=10&checkout_monthday=2&no_rooms=1&group_adults=1&group_children=0";
var html = UrlFetchApp.fetch(url).getContentText();
var res = Parser.data(html).from("<span class=\"lp-postcard-avg-price-value\">").to("</span>").build().trim();
Logger.log(res) // US$289

Related

Importxml() returned "empty cells" or "formula parse error"

I tried Importhtml ("https://nepsealpha.com/investment-calandar/dividend","table",) and then Importxml("https://nepsealpha.com/investment-calandar/dividend",xpath). I found out xpath from "selectorgadget" extension of googlechrome, but still couldn't import it. It shows either "empty content" or formula parse error".
You can retrieve quite all the informations this way
=importxml(url,"//div/#data-page")
and then parse the json.
By script : =getData("https://nepsealpha.com/investment-calandar/dividend")
function getData(url) {
var from='data-page="'
var to='"></div></body>'
var jsonString = UrlFetchApp.fetch(url).getContentText().split(from)[1].split(to)[0].replace(/"/g,'"')
var json = JSON.parse(jsonString).props.today_prices_summary.top_volume
var headers = Object.keys(json[0]);
return ([headers, ...json.map(obj => headers.map(header => obj[header]))]);
}
edit
to update periodically, add this script
function update(){
var chk = SpreadsheetApp.getActiveSpreadsheet().getSheets()[0].getRange('A1')
chk.setValue(!chk.getValue())
}
put a trigger as you wish on the update function and change as follows
=getData("https://nepsealpha.com/investment-calandar/dividend",$A$1)
I know that's not the answer you want to see.
It's impossible to get any content from this website using IMPORTXML or other tools included in Google Sheets.
It's generated using Javascript. Once Javascript is disabled no content is displayed:
It's done on purpose. Financial companies pay for live stock data and they don't want to share it with us for free.
So the site is protected against tools like importxml.

pdf.js to display output of file created with tcpdf

I really hope you will be able to help me out on this one.
I am new to pdf.js so for the moment, I am playing around with the pre-built version to see if I can integrate this into my web app.
My problem:
I am using tcpdf to generate a pdf file which I would like to visualize using pdf.js without having to save it to a file on the server.
I have a php file (generate_document.php) that I use to generate the pdf. The file ends with the following:
$pdf->Output('test.pdf', 'I');
according to the tcpdf documentation, the second parameter can be used to generate the following formats:
I: send the file inline to the browser (default). The plug-in is used if available. The name given by name is used when one selects the "Save as" option on the link generating the PDF.
D: send to the browser and force a file download with the name given by name.
F: save to a local server file with the name given by name.
S: return the document as a string (name is ignored).
FI: equivalent to F + I option
FD: equivalent to F + D option
E: return the document as base64 mime multi-part email attachment (RFC 2045)
Then, I would like to view the pdf using pdf.js without creating a file on the server (= not using 'F' as a second parameter and passing the file name to pdf.js).
So, I thought I could simply create an iframe and call the pdf.js viewer pointing to the php file:
<iframe width="100%" height="100%" src="/pdf.js_folder/web/viewer.html?file=get_document.php"></iframe>
However, this is not working at all....do you have any idea what I am overlooking? Or is this option not available in pdf.js?
I have done some research and I have seen some posts here on converting a base64 stream to a typed array but I do not see how this would be a solution to this problem.
Many thanks for your help!!!
EDIT
#async, thanks for your anwer.
I got it figured out in the meantime, so I thought I'd share my solution with you guys.
1) In my get_document.php, I changed the output statement to convert it directly to base64 using
$pdf_output = base64_encode($pdf->Output('test_file.pdf', 'S'));
2) In viewer.js, I use an XHR to call the get_document.php and put the return in a variable (pdf_from_XHR)
3) Next, I convert what came in from the XHR request using the solution that was already mentioned in a few other posts (e.g. Pdf.js and viewer.js. Pass a stream or blob to the viewer)
pdf_converted = convertDataURIToBinary(pdf_from_XHR)
function convertDataURIToBinary(dataURI) {
var base64Index = dataURI.indexOf(BASE64_MARKER) + BASE64_MARKER.length;
var base64 = dataURI.substring(base64Index);
var raw = window.atob(base64);
var rawLength = raw.length;
var array = new Uint8Array(new ArrayBuffer(rawLength));
for (i = 0; i < rawLength; i++) {
array[i] = raw.charCodeAt(i);
}
return array;
}
et voilĂ  ;-)
Now i can inject what is coming from that function into the getDocument statement:
PDFJS.getDocument(pdf_converted).then(function (pdf) {
pdfDocument = pdf;
var url = URL.createObjectURL(blob);
PDFView.load(pdfDocument, 1.5)
})

Google App script: Stumped on command to extract 'title' from forum HTML page & paste into a spreadsheet (my code inside)

I'm Extremely new to this and I've been trying to get the title of each unique forum page (or topic) here is the code I have so far:
function GraalGet() {
//parses forums for ALL posts one by one, extract <title> from HTML webpage
var sheet = SpreadsheetApp.getActiveSheet();
var i = 31
var url = "http://www.graalians.com/forums/showthread.php?p="+i;
//var params = {method : "post"}; can this be used at all?
//The aim: loop this once you can get 1 result.
var geturl = UrlFetchApp.fetch(url).getContentText(); //maybe .getContentText should be elsewhere?
var parseurl = Xml.parse(geturl, true); //confirmed - this is true because it wont parse HTML if false
var titleinfo = parseurl.getElement().getElement("html"); //.getElement('body');//.getElements("title");
sheet.appendRow([titleinfo, i]);
}
In addition the script would write down the topic number in the adjoining cell.
There's a lot of answered questions about extracting XML data, and this example is about parsing HTML but I couldn't pull up any results - I'm honestly stumped and any help about finding and extracting the tag will be appreciated. (If you have the time, please feel free to explain as well, but I'll be thankful for any help really.)
For reference I have used these:
Google's Kevin Bacon Script
The authors comments on bugs with the script & some explanation
I'm sorry if I'm being pedantic, this is my first post & I don't want to anger anyone, please do tell me if I've broken any rules, I'll do my best to fix them. I've left the comments I made for myself for your perusal too.
You can use Logger.log to print out debugging information. I did this with your function and figured out that the title tag is embedded within the tag. So you should use something like this. Also, getElement returns an XmlElement object which you should convert to String using getText().
var titleinfo = parseurl.getElement().getElement('head').getElement('title');
sheet.appendRow([titleinfo.getText(), i]);

Google AdWords and google_conversion_value

I have online shop application, and I integrated it with Google AdWords, by adding proper script into web application.
Problem I have, is that Value on Google's Analysis control panel page is 0, despite the thing that I do have Conversions (many-per-click) with value of 12.
Code I integrated looks like this:
var google_conversion_id = <number is here>;
var google_conversion_language = "en";
var google_conversion_format = "3";
var google_conversion_color = "ffffff";
var google_conversion_label = "<label is here>";
var google_conversion_value = <?php echo $charge; ?>;
I added those lines (with several more JS lines required for Google AdWords) into last page, when payment has been made on my webshop.
PHP variable $charge have value of sold order.
Despite all of those, my Value is still 0. Can you help me waht I'm doing wrong, and how can I get proper value for it?
Try wrapping the PHP output with double quotes like so:
var google_conversion_value = "<?php echo $charge; ?>";
so that the rendered output looks like:
var google_conversion_value = "150";
The value can be either in quotes or not - it doesn't matter.
I am not an expert on PHP but from what little I do know is that it looks ok here. The easiest way to check is to get the Google Tag Assistant extension for Google Chrome that will let you check on the value that is being sent back to Google: https://chrome.google.com/webstore/detail/tag-assistant-by-google/kejbdjndbnbjgmefkgdddjlbokphdefk?hl=en
Using the extension when your conversion tag fires lets you see the values that are actually sent back and so you can confirm if the value is correctly being set.
If it does look like the value is being sent back ok, it would be best to wait at least 72 hours to verify that the value is appearing inside AdWords.

Call javascript in sharepoint custom field

I am creating a custom field in SharePoint 2007. I have seen other solutions where the current site URL was default value of a text field.
How can I get this current site URL?
I have got one answer whiches states that I shall use JavaScript, but where do I put the script?
I hope you can help.
BR
To answer 1
I am new to SharePoint and am not quiet sure where to put the java script. Normaly i just give the initial value to the field in the FieldEditor.cs file but how can I do this with the javascript?
Here follows a picute of my files.
I have tried to put it into FiledEditor.cs but this results in the value of myString is written in the top of the web page.
Here is my current code:
string myScript = "var currentUrl = document.URL; LabelLookupFieldTargetURLText.Text = currentUrl;";
Page.ClientScript.RegisterClientScriptBlock(LabelLookupFieldTargetURLText.GetType(), "LabelLookupFieldTargetURLTextJavaScript", myScript);
I found the answer my self. I don't need to use a java script. I can just use SPContext.Current.Site.Url
use javascript:
var nowUrl = document.URL;
yourTextfiled.value = nowUrl;
you can read this:http://www.w3schools.com/jsref/dom_obj_document.asp

Resources