How to export a csv from Google Sheet API? - google-sheets

I can't find any reference to an API that enables Rest API clients to export an existing Google Sheet to a csv file.
https://developers.google.com/sheets/
I believe there should be a way to export them.

The following URL gives you the CSV of a Google spreadsheet per sheet. The sheet must be accessible by the public, by anyone with the link (unlisted).
The parameters you need to provide are:
sheet ID (that is simply the ID in the URL of a Google Spreadsheet https://docs.google.com/spreadsheets/d/{{ID}}/edit)
sheet name (that is simply the name of the sheet as given by the user)
https://docs.google.com/spreadsheets/d/{{ID}}/gviz/tq?tqx=out:csv&sheet={{sheet_name}}
With that URL you can run a GET-request to fetch the CSV.
Or paste it in your browser address bar.

You can use the Drive API to do this today -- see https://developers.google.com/drive/v3/web/manage-downloads#downloading_google_documents, however that will limit you to the first sheet of the document. The Sheets API doesn't expose exporting as CSV today, but may offer it in the future.

Nobody's mentioned gspread yet, so here's how I did it:
#open sheet
sheet = gc.open_by_key(sheet_id)
#select worksheet
worksheet = sheet.get_worksheet(0)
#download values into a dataframe
df = pd.DataFrame(worksheet.get_all_records())
#save dataframe as a csv, using the spreadsheet name
filename = sheet.title + '.csv'
df.to_csv(filename, index=False)

Firstly you should make document accessible for anyone. Then you get url. From this url you should extract long id composed from big and small letters and numbers. Then use this script.
#!/bin/bash
long_id="id_assigned_to_your_document"
g_id="number_assigned_to_card_in_google_sheet"
wget --output-document=temp.csv "https://docs.google.com/spreadsheets/d/$long_id/export?gid=$g_id&format=csv&id=$long_id"
If you use only one card in document, their number is: g_id="0"
The problem you will probably have is connected with strange spaces in obtained file. I use this second script to process it
#!/bin/bash
#Delete all lines beginning with a # from a file
#http://stackoverflow.com/questions/8206280/delete-all-lines-beginning-with-a-from-a-file
sed '/^#/ d' temp.csv |
# reomve spaces
# http://stackoverflow.com/questions/9953448/how-to-remove-all-white-spaces-from-a-given-text-file
tr -d "[:blank:]" |
# regexp "1,2" into 1.2
# http://www.funtoo.org/Sed_by_Example,_Part_2
sed 's/\"\([−]\?[0-9]*\),\([0-9]*\)\"/\1.\2/g' > out.csv
Update
As Sam mentioned, api is better solution. There is now great documentation on address:
https://developers.google.com/sheets/quickstart/php
With example that generate output having CSV structure.

If you don't have easy access to or familiarity with PHP, here's a very barebones Google Apps Script Web App that once deployed and the caller permission accepted, should allow clients with an appropriately scoped access token or api key to export an existing Google Sheet to a csv file. It takes a Google Sheets spreadsheet id and sheet name (and optional download filename) as query parameters, and returns the corresponding theoretically RFC 4180 compliant CSV file.
Further instructions on deploying an Apps Script project as a web app are here: https://developers.google.com/apps-script/guides/web#deploying_a_script_as_a_web_app.
You can deploy it and test it out easily in the browser just by visiting the "Current web app URL" (as provided when you publish as web app from the script editor), and accepting the consent screen, or even just visit the one that I deployed (configured to execute as the accessing user, and unverified/scary consent) at the example URL.
The tricky part (as usual) is getting the OAuth token or API key set up, but if you're already calling the Google Sheets V4 API, you've probably already got that dialed in. I used CURL to make sure that it behaved as a REST api, but the technique I used to get an OAuth token there is both a distraction and frankly a little scary to include here since it's really easy to mess up. If you don't already have a way to get one, that's probably a good topic for a separate SO question in any case.
One related (and big!) caveat: I'm not 100% sure how the consent and verification interact with a pure Rest client (i.e. how that works if you DON'T visit this in the browser first...), and/or whether this script would need to be in the same GCP project as the other code that uses the Sheets API. If there's interest, and/or it doesn't work right out of the box, please let me know and I'll happily dig deeper and follow up.
// Example URL, assuming:
// "Current web app URL": https://script.google.com/a/tillerhq.com/macros/s/AKfycbyZlWAW6bpCpnFoPjbdjznDomFRbTNluG4siCBMgOy2qU2AGoA/exec
// spreadsheetId: 1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E
// sheet name: Sheet1
// (optional) filename: mycsv.csv
//
// https://script.google.com/a/tillerhq.com/macros/s/AKfycbyZlWAW6bpCpnFoPjbdjznDomFRbTNluG4siCBMgOy2qU2AGoA/exec?spreadsheetid=1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E&sheetname=Sheet1&filename=mycsv.csv?spreadsheetid=1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E&sheetname=Sheet1&filename=mycsv.csv
//
var REQUIRED_PARAMS = [
'spreadsheetid', // example: "1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E"
'sheetname' // Case-sensitive; example: "Sheet1"
];
// Returns an RFC 4180 compliant CSV for the specified sheet in the specified spreadsheet
function doGet(e) {
REQUIRED_PARAMS.forEach(function(requiredParam) {
if (!e.parameters[requiredParam]) throw new Error('Missing required parameter ' + requiredParam);
});
var spreadsheet = SpreadsheetApp.openById(e.parameters.spreadsheetid);
var sheet = spreadsheet.getSheetByName(e.parameters.sheetname);
if (!sheet) throw new Error("Could not find sheet " + e.parameters.sheetname + " in spreadsheet " + e.parameters.spreadsheetid);
var filename = e.parameters.filename || (spreadsheet.getName() + "_" + e.parameters.sheetname + ".csv");
var numRows = sheet.getLastRow();
var numColumns = sheet.getLastColumn();
var values = sheet.getSheetValues(1, 1, numRows, numColumns);
function quote(s) {
s = s.toString();
if ((s.indexOf("\r") == -1)
&& (s.indexOf("\n") == -1)
&& (s.indexOf(",") == -1)
&& (s.indexOf("\"") == -1)) return s;
// Fields containing line breaks (CRLF)*, double quotes, and commas should be enclosed in double-quotes;
// anything other than that we already returned, so if we get here -- escape it and quote it.
// *That's what the text of the RFC says, but the ABNF (...and Excel) treat EITHER CR or LF as requiring quotes.
// Replace any double quote with a double double quote, and wrap the whole thing in quotes
return "\"" + s.replace(/"/g, '""') + "\"";
};
var csv = values.map(function(row) {
return row.map(quote).join();
}).join("\r\n") + "\r\n";
return ContentService
.createTextOutput(csv)
.setMimeType(ContentService.MimeType.CSV)
.downloadAsFile(filename);
}

Related

=importxml, Website to Google Sheets - getting #N/A every time

Website Link
https://redacted
xml options I have tried so far
<span aria-labelledby="amount">722</span>
//*[#id="amount"]/h3/span[2]
/html/body/div[3]/main/div/span/div/div/div[2]/div/div/div[2]/div/div[2]/div[3]/div/div/div/div[2]/div[1]/h3/span[2]
None working
Trying to =importxml from here # a value of "722" this is value on 5/5/22 anyway.
Unfortunately, it seems that your expected value cannot be directly retrieved using the XPath. Because the value is put to the HTML using Javascript and IMPORTXML cannot analyze the result of Javascript. But, fortunately, it seems that your expected value is included in the HTML as the JSON data. So, in this answer, I would like to retrieve the value from the JSON data.
Pattern 1:
In this pattern, IMPORTXML and REGEXEXTRACT are used.
=ARRAYFORMULA(REGEXEXTRACT(IMPORTXML(A1,"//script[#data-component-name='GetOfferWrapper']"),"defaultEstimatedValue"":(.+?)}"))
The URL https://www.gazelle.com/iphone/iphone-13-pro-max/other/iphone-13-pro-max-1tb-other/498082-gpid is put in the cell "A1".
When this formula is used, the following result is obtained.
Pattern 2:
In this pattern, a custom function created by Google Apps Script is used. When the value is retrieved from JSON data, Google Apps Script is useful. When you use this script, please copy and paste the following script to the script editor of Spreadsheet and save the script. And, please put a custom function of =SAMPLE("https://www.gazelle.com/iphone/iphone-13-pro-max/other/iphone-13-pro-max-1tb-other/498082-gpid") to a cell.
function SAMPLE(url) {
const res = UrlFetchApp.fetch(url).getContentText();
const data = res.match(/<script.+data-component-name="GetOfferWrapper".+?>([\w\s\S]+?)<\/script>/);
if (!data || data.length == 0) return "No data";
const obj = JSON.parse(data[1]);
return obj.initState.defaultEstimatedValue;
}
The URL https://www.gazelle.com/iphone/iphone-13-pro-max/other/iphone-13-pro-max-1tb-other/498082-gpid is put in the cell "A1".
When this formula is used, the value of 722 is retrieved.
Note:
The formula and custom function can be used for the current HTML. So, when the specification of HTML is changed, those might not be able to be used. Please be careful about this.
References:
IMPORTXML
REGEXEXTRACT
Custom Functions in Google Sheets
fetch(url)
JSON.parse()
you will need to find another site with intel you attempting to scrape. the #N/A error is the result of google sheets not supporting the import of JavaScript elements. you can always check for compatibility by disabling JS in site settings and only what's left can be usually scrapped. in this case its nothing:

google sheet: make a local copy of image link from a shared sheets

my frient shared his google sheet to me and the table contains image which is a link (url). How can i make a copy of this sheet and make all the image link to be local, so i want the image is copying to my local google drive automatically (so the link won't be broken if he delete his images files in future). Right now, if i make a copy of this document, then it still link to original image source.
How is it possible ? of course i don't want to manually copy them one by one from the link. Is there any better and faster way ?
https://docs.google.com/spreadsheets/d/1TkXwAd8rKbjnGfYEJVaOYBJwCZ7G7YfuSvmcDE6g8No/edit?usp=sharing
The OP wants to extract the image URL from a hyperlink formula, and save a copy of the image to their own Google Drive account.
This answer combines several elements from precedents on StackOverflow.
Since the images metadata is in the formula, the code uses the getFormulas() method rather than the "conventional" getValues(). Cells with no formula are empty strings; hence the test if (formula.length !=0){.
Get the file name without extension: REGEX: Capture Filename from URL without file extension. Ironically, this precedent doesn't use regular expressions but finds the position of the last / and the last . using lastIndexOf and getting a substring between those points. Note this solution fails on filenames with multiple periods, though there is an alternative solution for this scenario.
Get the file name from the url: Getting a Google Spreadsheet Cell's Image URL which combines regex and Javascript match.
Save a file to Google Drive: Need sheets script to save img to drive which is a simple and elegant solution for saving files.
Saving the file to Google Drive: When copying files using Apps Script from one folder to another any “Apps Script” files being copied end up in MyDrive not the specified folder - why? explains why the API is required to write the files to My Drive.
Note: In order to use this script, enable Drive API v2 at Advanced Google Services
On script editor, Resources -> Advanced Google Services; Turn on Drive API v2
function so5811567402() {
var ss=SpreadsheetApp.getActiveSpreadsheet();
var sheetName = "Table";
var sh = ss.getSheetByName(sheetName);
var rg=sh.getDataRange();
var lastColumn = sh.getLastColumn();
var lastRow = sh.getLastRow();
var formulas = rg.getFormulas();
for (var i in formulas) {
for (var j in formulas[i]) {
var formula = formulas[i][j];
if (formula.length !=0){
var regex = /image\("(.*)"/i;
var matches = formula.match(regex);
var imgurl = matches[1];
var filename = imgurl.substring(imgurl.lastIndexOf("/") + 1, imgurl.lastIndexOf("."));
//Logger.log(filename);
var image = UrlFetchApp.fetch(imgurl).getBlob().getAs('image/jpeg').setName(filename);
var FolderId = "Folder ID goes here";
var folder = DriveApp.getFolderById(FolderId);
var file = DriveApp.createFile(image);
Drive.Files.update({"parents": [{"id": folder.getId()}]}, file.getId());
}
}
}
}

AdWords PLACEMENT_PERFORMANCE_REPORT not pulling URLs

This should be extremely simple but for some reason it doesn't seem to work. I'm trying to pull the URLs of display placements using the DISPLAY_PERFORMANCE_REPORT but instead of URLs it's just returning "--".
The code I'm using is:
var report = AdWordsApp.report(
"SELECT CampaignName, Clicks, FinalAppUrls, FinalUrls " +
"FROM PLACEMENT_PERFORMANCE_REPORT " +
"WHERE Clicks > 0 " +
"DURING LAST_30_DAYS");
var rows = report.rows();
while (rows.hasNext()) {
var row = rows.next();
var url = row["FinalUrls"];
Logger.log(url);
}
I've tried logging the CampaignName and clicks and they're working as expected, so can't understand what the issue is here. The only thing I can think of is that in the reference guide it says:
List of final URLs of the main object of this row. UrlList elements
are returned in JSON list format
I'm not entirely sure what JSON list format is, but when I log the typeof url it says it's a string, so thought it shouldn't be an issue.
The FinalAppUrls and FinalUrls list the target URLs that you set on the individual managed placements.
If you're interested in the URL (domain, rather) of the placement itself, you'll have to request either the Criteria or the DisplayName field in your report——they both contain the domain of the placement.

Parse HTML to retrieve specific tags value with Google Apps Script

I'm trying to parse a HTML to retrieve the value of tag, on my Google Apps Script code. contains line breaks in attributes, and appears more than once but I only want the first value. (In this case, only 'foo' is required.)
<b class="
"
>
foo
</b><b class="
"
>
var
</b>
On Google Apps Script, functions such as 'getElementByTagName' is not available. So I first though of using regexp but it's not the wise option here.
Does anyone have an idea on how I can move forward? Any comment/guess would be highly appreciated!
How about using XmlService for your situation as a workaround? At XmlService, even if there are several line breaks in the tags, the value can be retrieved. I think that there are several workarounds for your situation. So please think of this as one of them.
The flow of sample script is as follows.
Flow :
Add the header of xml and a root element tag to the html.
Parse the creates xml value using XmlService.
Retrieve the first value of tags using XmlService.
Sample script :
var html = '<b class="\n"\n>\nfoo\n</b><b class="\n"\n>\nvar\n</b>\n'; // Your sample value
var xml = '<?xml version="1.0"?><sampleContents>' + html + '</sampleContents>';
var res = XmlService.parse(xml).getRootElement().getChildren()[0].getText().trim();
Logger.log(res) // foo
Note :
In this sample script, your sample html was used. So if you use more complicated one, can you provide it? I would like to modify the script.
Reference :
XML Service
If this was not what you want, please tell me. I would like to modify it.
Edit 1 :
Unfortunately, for the value retrieved from the URL, above script cannot be used. So I used "Parser" which is a GAS library for your situation. The sample script is as follows.
Sample script :
var url = "https://www.booking.com/searchresults.ja.html?ss=kyoto&checkin_year=2018&checkin_month=10&checkin_monthday=1&checkout_year=2018&checkout_month=10&checkout_monthday=2&no_rooms=1&group_adults=1&group_children=0";
var html = UrlFetchApp.fetch(url).getContentText();
var res = Parser.data(html).from("<b class=\"\n\"\n>").to("</b>").build().trim();
Logger.log(res) // US$11
Note :
Before you run this script, please install "Parser". About the install of library, you can see it at here.
The project key of the library is M1lugvAXKKtUxn_vdAG9JZleS6DrsjUUV
References :
Parser
Managing libraries
google app script Exceeded memory limit
google script scrape parser with 2 classes with the same name
Edit 2 :
For your 2nd URL in your comment, it seems that the URL is different from your 1st one. And also your new URL has no tag of <b class=\"\n\"\n>. By this, the value you want cannot be retrieved. But from the 1st URL in your comment, I presumed about the value what you want. Please confirm the following script?
var url = "https://www.booking.com/searchresults.ja.html?ss=kyotogranvia&checkin_year=2018&checkin_month=10&checkin_monthday=1&checkout_year=2018&checkout_month=10&checkout_monthday=2&no_rooms=1&group_adults=1&group_children=0";
var html = UrlFetchApp.fetch(url).getContentText();
var res = Parser.data(html).from("<span class=\"lp-postcard-avg-price-value\">").to("</span>").build().trim();
Logger.log(res) // US$289

New Google Spreadsheets publish limitation

I am testing the new Google Spreadsheets as there is a new feature I really need: the 200 sheets limit has been lifted (more info here: https://support.google.com/drive/answer/3541068).
However, I can't publish a spreadsheet to CSV like you can in the old version. I go to 'File>Publish to the web' and there is no more options to publish 'all sheets' or certain sheets and you can't specify cell ranges to publish to CSV etc.
This limitation is not mentioned in the published 'Unsupported Features' documentation found at: https://support.google.com/drive/answer/3543688
Is there some other way this gets enabled or has it in fact been left out of the new version?
My use case is: we retrieve Bigquery results into the spreadsheets, we publish the sheets as a CSV automatically using the "publish automatically on update" feature which then produces the CSV URL which gets placed into charting tools that read the CSV URL to generate the visuals.
Does anyone know how to do this?
The new Google spreadsheets use a different URL (just copy your <KEY>):
New sheet : https://docs.google.com/spreadsheets/d/<KEY>/pubhtml
CSV file : https://docs.google.com/spreadsheets/d/<KEY>/export?gid=<GUID>&format=csv
The GUID of your spreadsheet relates to the tab number.
/!\ You have to share your document using the Anyone with the link setting.
Here is the solution, just write it like this:
https://docs.google.com/spreadsheets/d/<KEY>/export?format=csv&id=<KEY>
I know it's weird to write the KEY twice, but it works perfectly. A teammate from work discovered this by opening the excel file in Google Docs, then File -> Download as -> Comma separated values. Then, in the downloads section of the browser appears a link to the CSV file, like this:
https://docs.google.com/spreadsheets/d/<KEY>/export?format=csv&id=<KEY>&gid=<SOME NUMBER>
But it doesn't work in this format, what my friend did was remove "&gid=<SOME NUMBER>" and it worked! Hope it helps everyone.
If you enable "Anyone with the link sharing" for spreadsheet, here is a simple method to get range of cells or columns (or whatever your feel like) export in format of HTML, CSV, XML, JSON via the query:
https://docs.google.com/spreadsheet/tq?key=YOUR-KEY&gid=1&tq=select%20A,%20B&tqx=reqId:1;out:html;%20responseHandler:webQuery
For tq variable read query language reference.
For tqx variable read request format reference.
Downside to this is that your doc is still availble in full via the public link, but if you want to export/import data to say Excel this is a perfect way.
It's not going to help everyone, but I've made a PHP script to read the HTML into an array.
I've added converting back to a CSV at the end. Hopefully this will help some people who have access to PHP.
$html_link = "https://docs.google.com/spreadsheets/d/XXXXXXXXXX/pubhtml";
$local_html = "sheets.html";
$file_contents = file_get_contents($html_link);
file_put_contents($local_html,$file_contents);
$dom = new DOMDocument();
$html = #$dom->loadHTMLFile($local_html); //Added a # to hide warnings - you might remove this when testing
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
$cols = $rows->item(0)->getElementsByTagName('td'); //You'll need to edit the (0) to reflect the row that your headers are in.
$row_headers = array();
foreach ($cols as $i => $node) {
if($i > 0 ) $row_headers[] = $node->textContent;
}
foreach ($rows as $i => $row){
if($i == 0 ) continue;
$cols = $row->getElementsByTagName('td');
$row = array();
foreach ($cols as $j => $node) {
$row[$row_headers[$j]] = $node->textContent;
}
$table[] = $row;
}
//Convert to csv
$csv = "";
foreach($table as $row_index => $row_details){
$comma = false;
foreach($row_details as $value){
$value_quotes = str_replace('"', '""', $value);
$csv .= ($comma ? "," : "") . ( strpos($value,",")===false ? $value_quotes : '"'.$value_quotes.'"' );
$comma = true;
}
$csv .= "\r\n";
}
//Save to a file and/or output
file_put_contents("result.csv",$csv);
print $csv;
Here is another temporary, non-PHP workaround:
Go to an existing NEW google sheet
Go to "File -> New -> Spreadsheet"
Under "File -> Publish to the web..." now has the option to publish a csv version
I believe this is actually creating an old Google sheet but for my purposes (importing google sheet data from clients or myself into R for statistical analysis) it works until they hopefully update this feature.
I posted this in a Google Groups forum also, please find it here:
https://productforums.google.com/forum/#!topic/docs/An-nZtjaupU
The correct URL for downloading a Google spreadsheet as CSV is:
https://docs.google.com/spreadsheets/export?id=<ID>&exportFormat=csv
The current answers do not work anylonger. The following has worked for me:
Do File -> "Publish to the web" and select 'start publishing' and the format. I choose text (which is TSV)
Now just copy the URL there which will be similar to https://docs.google.com/spreadsheet/pub?key=YOUR_KEY&single=true&gid=0&output=txt
That new feature appears to have disappeared. I don't see any option to publish a csv/tsv version. I can download tsv/csv with the export, but that's not available to other people with merely the link (it redirects them to a google docs sign-in form).
I found a fix! So I discovered that old spreadsheets before this change were still allowing only publishing certain sheets. So I made a copy of an old spreadsheet, cleared the data out, copy and pasted my current info into it and now I'm happily publishing just a single sheet of my large spreadsheet. Yay
I was able to implement a query to the result, see this table
https://docs.google.com/spreadsheets/d/1LhGp12rwqosRHl-_N_N8eTjTwfFsHHIBHUFMMyhLaaY/gviz/tq?tq=select+A,B,I,J,K+where+B%3E=4.5&pli=1
the spreadsheet fetches data from earthquake, but I just want to select MAG 4.5+ earthquakes so it makes the query and the columns, just a problem:
I cannot parse the result, I tried to decode as json but was not able to parse it.
I would like to be able to show this as HTML or CSV or how to parse this ? for example to be able to plot it on a Google Map.

Resources