Website Link
https://redacted
xml options I have tried so far
<span aria-labelledby="amount">722</span>
//*[#id="amount"]/h3/span[2]
/html/body/div[3]/main/div/span/div/div/div[2]/div/div/div[2]/div/div[2]/div[3]/div/div/div/div[2]/div[1]/h3/span[2]
None working
Trying to =importxml from here # a value of "722" this is value on 5/5/22 anyway.
Unfortunately, it seems that your expected value cannot be directly retrieved using the XPath. Because the value is put to the HTML using Javascript and IMPORTXML cannot analyze the result of Javascript. But, fortunately, it seems that your expected value is included in the HTML as the JSON data. So, in this answer, I would like to retrieve the value from the JSON data.
Pattern 1:
In this pattern, IMPORTXML and REGEXEXTRACT are used.
=ARRAYFORMULA(REGEXEXTRACT(IMPORTXML(A1,"//script[#data-component-name='GetOfferWrapper']"),"defaultEstimatedValue"":(.+?)}"))
The URL https://www.gazelle.com/iphone/iphone-13-pro-max/other/iphone-13-pro-max-1tb-other/498082-gpid is put in the cell "A1".
When this formula is used, the following result is obtained.
Pattern 2:
In this pattern, a custom function created by Google Apps Script is used. When the value is retrieved from JSON data, Google Apps Script is useful. When you use this script, please copy and paste the following script to the script editor of Spreadsheet and save the script. And, please put a custom function of =SAMPLE("https://www.gazelle.com/iphone/iphone-13-pro-max/other/iphone-13-pro-max-1tb-other/498082-gpid") to a cell.
function SAMPLE(url) {
const res = UrlFetchApp.fetch(url).getContentText();
const data = res.match(/<script.+data-component-name="GetOfferWrapper".+?>([\w\s\S]+?)<\/script>/);
if (!data || data.length == 0) return "No data";
const obj = JSON.parse(data[1]);
return obj.initState.defaultEstimatedValue;
}
The URL https://www.gazelle.com/iphone/iphone-13-pro-max/other/iphone-13-pro-max-1tb-other/498082-gpid is put in the cell "A1".
When this formula is used, the value of 722 is retrieved.
Note:
The formula and custom function can be used for the current HTML. So, when the specification of HTML is changed, those might not be able to be used. Please be careful about this.
References:
IMPORTXML
REGEXEXTRACT
Custom Functions in Google Sheets
fetch(url)
JSON.parse()
you will need to find another site with intel you attempting to scrape. the #N/A error is the result of google sheets not supporting the import of JavaScript elements. you can always check for compatibility by disabling JS in site settings and only what's left can be usually scrapped. in this case its nothing:
Related
I am trying to scrape data from crypto-exchanges. When i use the Import XML function in google sheets is gives the "Imported Xml content can not be parsed." error. The function I am using is
=IMPORTXML("https://www.binance.us/en/trade/pro/ADA_BTC%22,%22//*%5B#id=%22%22__APP%22%22%5D/div/div/div%5B6%5D/div/div/div/div%5B2%5D/div%5B1%5D/div/div%5B1%5D)")
that is the price data i am trying to get into the sheet.
I have tried to look into different x paths but they all return Imported content is empty error. Not sure if i am using the right Xpath.
the issue is with a site that uses JavaScript. google sheets IMPORT formulae do not support scraping of JS elements:
To retrieve informations you need, you can parse the json provided by Binance at https://api1.binance.com/api/v3/ticker/24hr.
function binance(url = 'https://api1.binance.com/api/v3/ticker/24hr') {
var data = JSON.parse(UrlFetchApp.fetch(url).getContentText())
data.forEach(function(d) {if (d.symbol == 'ADABTC') console.log(d.lastPrice)})
}
The other way is to use websockets at wss://stream.binance.com:9443/ws with this stream
{"method":"SUBSCRIBE","params":["adabtc#kline_15m","adabtc#kline_5m","adabtc#kline_1m"],"id":1}
you will recieve a json as follows
{"e":"kline","E":1649207018439,"s":"ADABTC","k":{"t":1649206800000,"T":1649207699999,"s":"ADABTC","i":"15m","f":106692274,"L":106692475,"o":"0.00002555","c":"0.00002551","h":"0.00002556","l":"0.00002550","v":"89755.10000000","n":202,"x":false,"q":"2.29071565","V":"27199.10000000","Q":"0.69455387","B":"0"}}
I tried Importhtml ("https://nepsealpha.com/investment-calandar/dividend","table",) and then Importxml("https://nepsealpha.com/investment-calandar/dividend",xpath). I found out xpath from "selectorgadget" extension of googlechrome, but still couldn't import it. It shows either "empty content" or formula parse error".
You can retrieve quite all the informations this way
=importxml(url,"//div/#data-page")
and then parse the json.
By script : =getData("https://nepsealpha.com/investment-calandar/dividend")
function getData(url) {
var from='data-page="'
var to='"></div></body>'
var jsonString = UrlFetchApp.fetch(url).getContentText().split(from)[1].split(to)[0].replace(/"/g,'"')
var json = JSON.parse(jsonString).props.today_prices_summary.top_volume
var headers = Object.keys(json[0]);
return ([headers, ...json.map(obj => headers.map(header => obj[header]))]);
}
edit
to update periodically, add this script
function update(){
var chk = SpreadsheetApp.getActiveSpreadsheet().getSheets()[0].getRange('A1')
chk.setValue(!chk.getValue())
}
put a trigger as you wish on the update function and change as follows
=getData("https://nepsealpha.com/investment-calandar/dividend",$A$1)
I know that's not the answer you want to see.
It's impossible to get any content from this website using IMPORTXML or other tools included in Google Sheets.
It's generated using Javascript. Once Javascript is disabled no content is displayed:
It's done on purpose. Financial companies pay for live stock data and they don't want to share it with us for free.
So the site is protected against tools like importxml.
I can't find any reference to an API that enables Rest API clients to export an existing Google Sheet to a csv file.
https://developers.google.com/sheets/
I believe there should be a way to export them.
The following URL gives you the CSV of a Google spreadsheet per sheet. The sheet must be accessible by the public, by anyone with the link (unlisted).
The parameters you need to provide are:
sheet ID (that is simply the ID in the URL of a Google Spreadsheet https://docs.google.com/spreadsheets/d/{{ID}}/edit)
sheet name (that is simply the name of the sheet as given by the user)
https://docs.google.com/spreadsheets/d/{{ID}}/gviz/tq?tqx=out:csv&sheet={{sheet_name}}
With that URL you can run a GET-request to fetch the CSV.
Or paste it in your browser address bar.
You can use the Drive API to do this today -- see https://developers.google.com/drive/v3/web/manage-downloads#downloading_google_documents, however that will limit you to the first sheet of the document. The Sheets API doesn't expose exporting as CSV today, but may offer it in the future.
Nobody's mentioned gspread yet, so here's how I did it:
#open sheet
sheet = gc.open_by_key(sheet_id)
#select worksheet
worksheet = sheet.get_worksheet(0)
#download values into a dataframe
df = pd.DataFrame(worksheet.get_all_records())
#save dataframe as a csv, using the spreadsheet name
filename = sheet.title + '.csv'
df.to_csv(filename, index=False)
Firstly you should make document accessible for anyone. Then you get url. From this url you should extract long id composed from big and small letters and numbers. Then use this script.
#!/bin/bash
long_id="id_assigned_to_your_document"
g_id="number_assigned_to_card_in_google_sheet"
wget --output-document=temp.csv "https://docs.google.com/spreadsheets/d/$long_id/export?gid=$g_id&format=csv&id=$long_id"
If you use only one card in document, their number is: g_id="0"
The problem you will probably have is connected with strange spaces in obtained file. I use this second script to process it
#!/bin/bash
#Delete all lines beginning with a # from a file
#http://stackoverflow.com/questions/8206280/delete-all-lines-beginning-with-a-from-a-file
sed '/^#/ d' temp.csv |
# reomve spaces
# http://stackoverflow.com/questions/9953448/how-to-remove-all-white-spaces-from-a-given-text-file
tr -d "[:blank:]" |
# regexp "1,2" into 1.2
# http://www.funtoo.org/Sed_by_Example,_Part_2
sed 's/\"\([−]\?[0-9]*\),\([0-9]*\)\"/\1.\2/g' > out.csv
Update
As Sam mentioned, api is better solution. There is now great documentation on address:
https://developers.google.com/sheets/quickstart/php
With example that generate output having CSV structure.
If you don't have easy access to or familiarity with PHP, here's a very barebones Google Apps Script Web App that once deployed and the caller permission accepted, should allow clients with an appropriately scoped access token or api key to export an existing Google Sheet to a csv file. It takes a Google Sheets spreadsheet id and sheet name (and optional download filename) as query parameters, and returns the corresponding theoretically RFC 4180 compliant CSV file.
Further instructions on deploying an Apps Script project as a web app are here: https://developers.google.com/apps-script/guides/web#deploying_a_script_as_a_web_app.
You can deploy it and test it out easily in the browser just by visiting the "Current web app URL" (as provided when you publish as web app from the script editor), and accepting the consent screen, or even just visit the one that I deployed (configured to execute as the accessing user, and unverified/scary consent) at the example URL.
The tricky part (as usual) is getting the OAuth token or API key set up, but if you're already calling the Google Sheets V4 API, you've probably already got that dialed in. I used CURL to make sure that it behaved as a REST api, but the technique I used to get an OAuth token there is both a distraction and frankly a little scary to include here since it's really easy to mess up. If you don't already have a way to get one, that's probably a good topic for a separate SO question in any case.
One related (and big!) caveat: I'm not 100% sure how the consent and verification interact with a pure Rest client (i.e. how that works if you DON'T visit this in the browser first...), and/or whether this script would need to be in the same GCP project as the other code that uses the Sheets API. If there's interest, and/or it doesn't work right out of the box, please let me know and I'll happily dig deeper and follow up.
// Example URL, assuming:
// "Current web app URL": https://script.google.com/a/tillerhq.com/macros/s/AKfycbyZlWAW6bpCpnFoPjbdjznDomFRbTNluG4siCBMgOy2qU2AGoA/exec
// spreadsheetId: 1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E
// sheet name: Sheet1
// (optional) filename: mycsv.csv
//
// https://script.google.com/a/tillerhq.com/macros/s/AKfycbyZlWAW6bpCpnFoPjbdjznDomFRbTNluG4siCBMgOy2qU2AGoA/exec?spreadsheetid=1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E&sheetname=Sheet1&filename=mycsv.csv?spreadsheetid=1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E&sheetname=Sheet1&filename=mycsv.csv
//
var REQUIRED_PARAMS = [
'spreadsheetid', // example: "1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E"
'sheetname' // Case-sensitive; example: "Sheet1"
];
// Returns an RFC 4180 compliant CSV for the specified sheet in the specified spreadsheet
function doGet(e) {
REQUIRED_PARAMS.forEach(function(requiredParam) {
if (!e.parameters[requiredParam]) throw new Error('Missing required parameter ' + requiredParam);
});
var spreadsheet = SpreadsheetApp.openById(e.parameters.spreadsheetid);
var sheet = spreadsheet.getSheetByName(e.parameters.sheetname);
if (!sheet) throw new Error("Could not find sheet " + e.parameters.sheetname + " in spreadsheet " + e.parameters.spreadsheetid);
var filename = e.parameters.filename || (spreadsheet.getName() + "_" + e.parameters.sheetname + ".csv");
var numRows = sheet.getLastRow();
var numColumns = sheet.getLastColumn();
var values = sheet.getSheetValues(1, 1, numRows, numColumns);
function quote(s) {
s = s.toString();
if ((s.indexOf("\r") == -1)
&& (s.indexOf("\n") == -1)
&& (s.indexOf(",") == -1)
&& (s.indexOf("\"") == -1)) return s;
// Fields containing line breaks (CRLF)*, double quotes, and commas should be enclosed in double-quotes;
// anything other than that we already returned, so if we get here -- escape it and quote it.
// *That's what the text of the RFC says, but the ABNF (...and Excel) treat EITHER CR or LF as requiring quotes.
// Replace any double quote with a double double quote, and wrap the whole thing in quotes
return "\"" + s.replace(/"/g, '""') + "\"";
};
var csv = values.map(function(row) {
return row.map(quote).join();
}).join("\r\n") + "\r\n";
return ContentService
.createTextOutput(csv)
.setMimeType(ContentService.MimeType.CSV)
.downloadAsFile(filename);
}
I've searched for days looking into this issue but have yet to come up with something. We are migrating our analytics code over to DTM. We are using our own Library hosted at DTM. Everything works great except for some missing data collection parameters in the query string only when using the Adobe Analytics tool to assign variables.
Let me explain. When I use custom code in DTM in a rule to call analytics I get exactly the same query string parameters in the request that we were getting before.
var str = 'string';
s.linkTrackVars = 'prop61,eVar61';
s.linkTrackEvents = 'none';
s.prop61 = str;
s.eVar61 = str;
s.tl(this, 'o', str);
This works fine.
If I try to set eVar61 and prop61 with the Adobe Analytics tool inside a rule, five parameters are no longer in the query string. Specifically 'pev1', 'pid', 'pidt', 'oid' and 'ot'. Is there a way to get DTM to set those parameters or am I just to use custom code for all our rules?
Thanks
Those are clickmap query string parameters. Click on the gear icon to edit the global Analytics tool, and under Link Tracking, make sure 'Enable Clickmap' is checked. Alternatively, you can set s.trackInlineStats=true in your code, which effectively achieves the same effect.
If you ever see missing query string parameters in the future, you can determine what variables to define using the Data Collection Query Parameters in the Marketing Cloud documentation.
I am testing the new Google Spreadsheets as there is a new feature I really need: the 200 sheets limit has been lifted (more info here: https://support.google.com/drive/answer/3541068).
However, I can't publish a spreadsheet to CSV like you can in the old version. I go to 'File>Publish to the web' and there is no more options to publish 'all sheets' or certain sheets and you can't specify cell ranges to publish to CSV etc.
This limitation is not mentioned in the published 'Unsupported Features' documentation found at: https://support.google.com/drive/answer/3543688
Is there some other way this gets enabled or has it in fact been left out of the new version?
My use case is: we retrieve Bigquery results into the spreadsheets, we publish the sheets as a CSV automatically using the "publish automatically on update" feature which then produces the CSV URL which gets placed into charting tools that read the CSV URL to generate the visuals.
Does anyone know how to do this?
The new Google spreadsheets use a different URL (just copy your <KEY>):
New sheet : https://docs.google.com/spreadsheets/d/<KEY>/pubhtml
CSV file : https://docs.google.com/spreadsheets/d/<KEY>/export?gid=<GUID>&format=csv
The GUID of your spreadsheet relates to the tab number.
/!\ You have to share your document using the Anyone with the link setting.
Here is the solution, just write it like this:
https://docs.google.com/spreadsheets/d/<KEY>/export?format=csv&id=<KEY>
I know it's weird to write the KEY twice, but it works perfectly. A teammate from work discovered this by opening the excel file in Google Docs, then File -> Download as -> Comma separated values. Then, in the downloads section of the browser appears a link to the CSV file, like this:
https://docs.google.com/spreadsheets/d/<KEY>/export?format=csv&id=<KEY>&gid=<SOME NUMBER>
But it doesn't work in this format, what my friend did was remove "&gid=<SOME NUMBER>" and it worked! Hope it helps everyone.
If you enable "Anyone with the link sharing" for spreadsheet, here is a simple method to get range of cells or columns (or whatever your feel like) export in format of HTML, CSV, XML, JSON via the query:
https://docs.google.com/spreadsheet/tq?key=YOUR-KEY&gid=1&tq=select%20A,%20B&tqx=reqId:1;out:html;%20responseHandler:webQuery
For tq variable read query language reference.
For tqx variable read request format reference.
Downside to this is that your doc is still availble in full via the public link, but if you want to export/import data to say Excel this is a perfect way.
It's not going to help everyone, but I've made a PHP script to read the HTML into an array.
I've added converting back to a CSV at the end. Hopefully this will help some people who have access to PHP.
$html_link = "https://docs.google.com/spreadsheets/d/XXXXXXXXXX/pubhtml";
$local_html = "sheets.html";
$file_contents = file_get_contents($html_link);
file_put_contents($local_html,$file_contents);
$dom = new DOMDocument();
$html = #$dom->loadHTMLFile($local_html); //Added a # to hide warnings - you might remove this when testing
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
$cols = $rows->item(0)->getElementsByTagName('td'); //You'll need to edit the (0) to reflect the row that your headers are in.
$row_headers = array();
foreach ($cols as $i => $node) {
if($i > 0 ) $row_headers[] = $node->textContent;
}
foreach ($rows as $i => $row){
if($i == 0 ) continue;
$cols = $row->getElementsByTagName('td');
$row = array();
foreach ($cols as $j => $node) {
$row[$row_headers[$j]] = $node->textContent;
}
$table[] = $row;
}
//Convert to csv
$csv = "";
foreach($table as $row_index => $row_details){
$comma = false;
foreach($row_details as $value){
$value_quotes = str_replace('"', '""', $value);
$csv .= ($comma ? "," : "") . ( strpos($value,",")===false ? $value_quotes : '"'.$value_quotes.'"' );
$comma = true;
}
$csv .= "\r\n";
}
//Save to a file and/or output
file_put_contents("result.csv",$csv);
print $csv;
Here is another temporary, non-PHP workaround:
Go to an existing NEW google sheet
Go to "File -> New -> Spreadsheet"
Under "File -> Publish to the web..." now has the option to publish a csv version
I believe this is actually creating an old Google sheet but for my purposes (importing google sheet data from clients or myself into R for statistical analysis) it works until they hopefully update this feature.
I posted this in a Google Groups forum also, please find it here:
https://productforums.google.com/forum/#!topic/docs/An-nZtjaupU
The correct URL for downloading a Google spreadsheet as CSV is:
https://docs.google.com/spreadsheets/export?id=<ID>&exportFormat=csv
The current answers do not work anylonger. The following has worked for me:
Do File -> "Publish to the web" and select 'start publishing' and the format. I choose text (which is TSV)
Now just copy the URL there which will be similar to https://docs.google.com/spreadsheet/pub?key=YOUR_KEY&single=true&gid=0&output=txt
That new feature appears to have disappeared. I don't see any option to publish a csv/tsv version. I can download tsv/csv with the export, but that's not available to other people with merely the link (it redirects them to a google docs sign-in form).
I found a fix! So I discovered that old spreadsheets before this change were still allowing only publishing certain sheets. So I made a copy of an old spreadsheet, cleared the data out, copy and pasted my current info into it and now I'm happily publishing just a single sheet of my large spreadsheet. Yay
I was able to implement a query to the result, see this table
https://docs.google.com/spreadsheets/d/1LhGp12rwqosRHl-_N_N8eTjTwfFsHHIBHUFMMyhLaaY/gviz/tq?tq=select+A,B,I,J,K+where+B%3E=4.5&pli=1
the spreadsheet fetches data from earthquake, but I just want to select MAG 4.5+ earthquakes so it makes the query and the columns, just a problem:
I cannot parse the result, I tried to decode as json but was not able to parse it.
I would like to be able to show this as HTML or CSV or how to parse this ? for example to be able to plot it on a Google Map.