I have a Google Spreadsheets with data connected to a Data Studio Panel. I'm using the following data flow to get the data:
Google SpreadSheets --> BigQuery External Table --> View To the External Table --> Data Studio (Updated every 10 minutes)
But for some reason that I don't know, sometimes, when executing a select on the BigQuery External Table I get the following error:
Resources exceeded during query execution: Google Sheets service overloaded for spreadsheet id:XXX
The Google SpreadSheet has only 1500x10 Columns, which I think is pretty small. Also, there are about 6 users.
What can cause that error? Any idea about how to solve this?
Thanks
The Google documentation has information about this error:
A BigQuery query can overload Sheets, resulting in an error like Resources exceeded during query execution: Google Sheets service overloaded. Consider simplifying your spreadsheet; for example, by minimizing the use of formulas.
It seems that along with size of the Sheet, the "complexity" also matters. We cannot know how complex is your spreadsheet without seeing it but consider reducing your formula usage. This article also mentions a max result size of 10MB and other pivot table limits. You could also try to divide the data or if the error rate is manageable you could also use some kind of retry strategy to query again until you get the results.
If this is not enough then you may have reached the limits of what you can do with Sheets. Digging deeper I found this Google issue tracker post which has a quote from their engineering team:
The BigQuery Engineering Team has stated that the current suggested approach is to simplify the spreadsheet. Sheets is designed for Web/Mobile use cases and not as a DB backend. Even a couple of thousand rows is large in this context, especially if there are formulas involved.
The post is a feature request to the Google engineering team to allow for more complexity, but these requests can take time and if they don't intend Sheets to be used that way it's also possible that they won't implement it. If you cannot reduce the spreadsheet's complexity enough to stop getting the error you may want to consider querying the data from a different source.
Related
I'm having trouble pulling just the price for these sites into a Google sheet. Instead, I'm pulling multiple rows/currencies, etc. and I don't know how to fix it
1---->
https://www.discountfilters.com/refrigerator-water-filters/models/ukf8001/
//main/div/div/div/div/div/div/div/div/div[1]/span/span/span
2---->
https://www.discountfilters.com/refrigerator-water-filters/models/ukf8001/
//div[1]/form/div/div/div[1]/div/div/div[2]/div[1]
3---->
https://filterbuy.com/air-filters/8x16x1/
//div[2]/div[1]/div[3]/span
I tried the xpaths above and it's giving me all the data instead of just the discounted price (row1) that I'm looking for.
try:
=INDEX(IMPORTXML(A1, "//div[#class='price mt-2 mt-md-0 mb-0 mb-md-3']"),,2)
regarding issues on multiple websites you are trying to scrape.. ImportXML is good for basic tasks, but won't get you too far if you are serious in scraping:
If the target website data requires some cleanup post-processing, it's getting very complicated since you are now "programming with excel formulas", rather painful process compared to regular code writing in conventional programming languages
There is no proper launch & cache control so the function can be triggered occasionally and if the HTTP request fails, cells will be populated with ERR! values
The approach only works with most basic websites (no SPAs rendered in browsers can be scraped this way, any basic web scraping protection or connectivity issue breaks the process, no control over HTTP request geo location, or number of retries)
When ImportXML() fails, the second approach to web scraping in Google Sheets is usually to write some custom Google Apps Script. This approach is much more flexible, just write Javascript code and deploy it as Google Sheets addon, but it takes a lot of time, and is not too easy to debug and iterate over - definitely not low code.
And the third approach is to use proper tools (automation framework + scraping engine) and use Google Sheets just for storage purposes:
https://youtu.be/uBC752CWTew
I recently decided to update my spreadsheet of games I need to complete. In order to ensure my data was constantly up to date I made use of the IMPORTXML function but with the amount of urls I have begun to encounter 'loading' issues.
This is the spreadsheet:
https://docs.google.com/spreadsheets/d/1ZdcsIf9Upn_0zqTFyLAm1TMMFu_MpyTEm23EU0nVaTA/edit?usp=sharing
(Columns B,E,G and I are usually hidden)
Column A is the url.
Column B scrapes the image url and column C displays it.
Columns D,E,G and I scrape the data I want and displays it in columns D,F,H and J.
If my aim is to have upwards of 500 urls, is this something that can be only be accomplished with a script?
In this scenario you are encountering the limit of Google services. That quota is reached by aggregating the usage of all documents and projects. Also please be aware that there could be more than one import inside the same document, like one per every cell in your example.
To diminish that usage you could modify old documents so they don't refresh anymore (commenting out the relevant pieces and deactivating triggers). Alternatively you could just delete them. If you plan to run large amounts of imports, you could use Apps Script. Although this option is limited by the same quota discussed above, you could programmatically control when and how much to import in order to optimise your utilisation of Google services.
We are using Google sheets for simultaneous entry of several users. We observe that after a number of users start to entry at the same time, Google sheets starts to slow down.
We have a stable internet connection. What can we do to keep Google Sheets from slowing down? Do we limit the number of users use the Google sheet at the same time? Is there any optimization we can do?
You are likely witnessing a correlation not a causation between the number of users entering information on the sheet and the speed of the sheet.
Is there conditional formatting on the sheet? Are there a lot of formulas? What kinds of functions are in the formulas?
The thing that is likely slowing the sheet down is the number of calculations being made that are dependent on the cells in which your users are entering information. This makes it difficult to both diagnose and advise on the next best course of action.
Is there a way to get around the 50 million cell count rule? Can this be done by using 2 separate workbooks?
We have a lead tracking system that we have built in a Google Sheets workbook and with the way our leads get updated we have already hit the 50mil record count in Google Sheets over the past 3 months. Deleting the data is not an option as we have to analyze weekly monthly quarterly and yearly stats.
I am pretty sure IMPORTRANGE would still hit the 50mil cell count limit.
Is there a way around this limit?
Update:
So a way to combat the cell limit is to totally delete all columns and rows that you do not use and are empty. Trimming the sheets down to just what you have filled in rows and columns.
Apparently if the cell has no data in it it still counts against your cell count despite it being empty.
This is not a solution per say but it is a way to make sure empty cells are not counting against your cell count.
Answer:
There is no way around this. According to the Google File Size documentation[1], the limits on a Spreadsheet are:
Up to 5 million cells or 18,278 columns (column ZZZ) for spreadsheets that are created in or converted to Google Sheets.
Things I Tested:
Starting in 2019 it became possible to edit Office files natively in G Suite[2] so I thought I'd give it a test. According to the specifications and limits page for Microsoft Office Excel[3]:
Total number of rows and columns on a worksheet: 1,048,576 rows by 16,384 columns
Which totals 17,179,869,184 cells.
As Spreadsheets that are created on Google Drive have the Google Drive limit, I created an Excel workbook on my local machine, with the maximum number of possible cells and uploaded it to Drive to see if it could be edited natively. Unfortunately, while the file uploaded successfully, attempting to open the file resulted in the following page:
More Information, Workarounds & Similar Services:
Honestly if you need more than 5 million cells in a Spreadsheet (or even 50 million!) then you're not using the right tool for the job. With this much data, you're likely better off using a database or a cloud data warehouse such as Google BigQuery[4] or Cloud SQL[5]
That being said, if Google Sheets/Spreadsheet workbooks really is the only way forward for you, the only thing I can recommend you doing is creating multiple Sheets files, separated into a more appropriate timeframe - each Sheet containing data for just a month. This will take a bit more time to set up (though you can use Apps Script for data migration between the Sheets), but in the long run will mean you will be able to use your data more effectively, and any data processing you need to do will complete within the Apps Script Quotas[6].
References:
Google Drive Help - Files you can store in Google Drive
G Suite Updates - "Office editing makes it easier to work with Office files in Docs, Sheets and Slides."
Excel specifications and limits
Google BigQuery
Google Cloud SQL
Google Apps Script - Quotas for Google Services
Or may be some other options.
We want to use Google Docs for all office operations, but faced with problem of big Excel files >3 000 records.
Is it justified to use DB and web interface over DB?
Yes, I think it's justifiable.
A Google Spreadsheet may have up to 400 thousand cells. So depending on how many columns your records have, it may fit nicely or not in an Google Spreadsheet.
If you have more than 133 columns (4e5/3e3) it will not fit, actually if you have around 100 columns (for this amount of records 3e3), I advise not using Google Spreadsheet as it will be too near the limit.
Google Fusion tables may be suitable to you, as it does not have such limitation, it's actually a 250MB total size limitation per account.