I recently decided to update my spreadsheet of games I need to complete. In order to ensure my data was constantly up to date I made use of the IMPORTXML function but with the amount of urls I have begun to encounter 'loading' issues.
This is the spreadsheet:
https://docs.google.com/spreadsheets/d/1ZdcsIf9Upn_0zqTFyLAm1TMMFu_MpyTEm23EU0nVaTA/edit?usp=sharing
(Columns B,E,G and I are usually hidden)
Column A is the url.
Column B scrapes the image url and column C displays it.
Columns D,E,G and I scrape the data I want and displays it in columns D,F,H and J.
If my aim is to have upwards of 500 urls, is this something that can be only be accomplished with a script?
In this scenario you are encountering the limit of Google services. That quota is reached by aggregating the usage of all documents and projects. Also please be aware that there could be more than one import inside the same document, like one per every cell in your example.
To diminish that usage you could modify old documents so they don't refresh anymore (commenting out the relevant pieces and deactivating triggers). Alternatively you could just delete them. If you plan to run large amounts of imports, you could use Apps Script. Although this option is limited by the same quota discussed above, you could programmatically control when and how much to import in order to optimise your utilisation of Google services.
Related
I'm having trouble pulling just the price for these sites into a Google sheet. Instead, I'm pulling multiple rows/currencies, etc. and I don't know how to fix it
1---->
https://www.discountfilters.com/refrigerator-water-filters/models/ukf8001/
//main/div/div/div/div/div/div/div/div/div[1]/span/span/span
2---->
https://www.discountfilters.com/refrigerator-water-filters/models/ukf8001/
//div[1]/form/div/div/div[1]/div/div/div[2]/div[1]
3---->
https://filterbuy.com/air-filters/8x16x1/
//div[2]/div[1]/div[3]/span
I tried the xpaths above and it's giving me all the data instead of just the discounted price (row1) that I'm looking for.
try:
=INDEX(IMPORTXML(A1, "//div[#class='price mt-2 mt-md-0 mb-0 mb-md-3']"),,2)
regarding issues on multiple websites you are trying to scrape.. ImportXML is good for basic tasks, but won't get you too far if you are serious in scraping:
If the target website data requires some cleanup post-processing, it's getting very complicated since you are now "programming with excel formulas", rather painful process compared to regular code writing in conventional programming languages
There is no proper launch & cache control so the function can be triggered occasionally and if the HTTP request fails, cells will be populated with ERR! values
The approach only works with most basic websites (no SPAs rendered in browsers can be scraped this way, any basic web scraping protection or connectivity issue breaks the process, no control over HTTP request geo location, or number of retries)
When ImportXML() fails, the second approach to web scraping in Google Sheets is usually to write some custom Google Apps Script. This approach is much more flexible, just write Javascript code and deploy it as Google Sheets addon, but it takes a lot of time, and is not too easy to debug and iterate over - definitely not low code.
And the third approach is to use proper tools (automation framework + scraping engine) and use Google Sheets just for storage purposes:
https://youtu.be/uBC752CWTew
I have a Google Spreadsheets with data connected to a Data Studio Panel. I'm using the following data flow to get the data:
Google SpreadSheets --> BigQuery External Table --> View To the External Table --> Data Studio (Updated every 10 minutes)
But for some reason that I don't know, sometimes, when executing a select on the BigQuery External Table I get the following error:
Resources exceeded during query execution: Google Sheets service overloaded for spreadsheet id:XXX
The Google SpreadSheet has only 1500x10 Columns, which I think is pretty small. Also, there are about 6 users.
What can cause that error? Any idea about how to solve this?
Thanks
The Google documentation has information about this error:
A BigQuery query can overload Sheets, resulting in an error like Resources exceeded during query execution: Google Sheets service overloaded. Consider simplifying your spreadsheet; for example, by minimizing the use of formulas.
It seems that along with size of the Sheet, the "complexity" also matters. We cannot know how complex is your spreadsheet without seeing it but consider reducing your formula usage. This article also mentions a max result size of 10MB and other pivot table limits. You could also try to divide the data or if the error rate is manageable you could also use some kind of retry strategy to query again until you get the results.
If this is not enough then you may have reached the limits of what you can do with Sheets. Digging deeper I found this Google issue tracker post which has a quote from their engineering team:
The BigQuery Engineering Team has stated that the current suggested approach is to simplify the spreadsheet. Sheets is designed for Web/Mobile use cases and not as a DB backend. Even a couple of thousand rows is large in this context, especially if there are formulas involved.
The post is a feature request to the Google engineering team to allow for more complexity, but these requests can take time and if they don't intend Sheets to be used that way it's also possible that they won't implement it. If you cannot reduce the spreadsheet's complexity enough to stop getting the error you may want to consider querying the data from a different source.
I am fairly new to Power Apps, and am trying to make a batch data entry form.
I am prototyping this now, and while I think in theory it should be working I keep running into technical errors.
The data source I'm using is google sheets. For prototyping purposes, there are three columns, item_id, item, and recorded_value.
For this app, it will be pulling a list of standard values into a gallery, where the input values can then be selected.
The approach I have taken is to create a gallery, which is added to a collection using the code below:
ClearCollect(
collection,
ForAll(
Filter(Gallery1.AllItems,true),
{ item:t_item.Text,item_id:t_item_id.Text,
recorded_value:t_recorded_value.Text
}
)
)
This is then uploaded to google sheets, I have found "success" using the two methods below:
ForAll(collection,Patch(records, Defaults(records),{item:item,item_id:item_id,recorded_value:recorded_value}))
or
Collect(records, collection)
I would say overall I am seeing 2 main issues in the testing:
The initial 'collect' seems like it fails to capture items on occasion. I don't know if it is cache related or what, but it seems like unless I scroll all the way down it will leave some fields blank (maybe not an issue in real use, but seems odd)
Uploading of records seems to take excruciatingly long in some cases. While initially it was just straight up crashing due to the problems in issue 1, I have found that it will sometimes get to say item 85 before sitting for a minute or so and then going through the rest of the list. For just 99 items it is taking several minutes to upload.
Ultimately I am looking to know if there is a better approach for what I am doing. I am basically just wanting to take a max of 99 rows and paste it on to the table, but it feels really inefficient right now due to the looping nature of the function. I am not sure if this is more of a powerapps or google sheets issue, but any advice would be appreciated.
From everything I could research, it seems like batch upload of records like this is going to be time consuming nearly any way you approach it.
I was able to come up with a workaround however which more or less eliminates the problem.
Instead of uploading each individual record, I am taking the approach of concatenating all records in the collection in a single cell through a variable, using delimiters to differentiate the rows/columns. (set variable with concat function, then patch the variable to the data source.)
This method allows all of the data to be stored nearly instantaneously.
After that I am just going to perform some basic etl through Python to transform the data into a more standard format and load it into SQL server which is fairly trivial to do.
I recommend others looking to take a 'batch insert' approach try something similar, as it will now only take users essentially a second to load records rather than several minutes.
Have been using google sheets for many years. Powerful and free so I like it.
Recently the image() function updating has been erratic or laggy.
Using the function in PC with many browers, chrome, firefox or edge etc, the pic failed to diplay after many minutes. Erratic, sometimes faster, sometimes longer.
However when using android phone or tablet, always display within about 10 sec. I attached an image of my spreadsheet with PC and tablet side by side
Anyone facing the same problem?
Anyway to solve it?
Is google restricting bandwith or is it a bug?
Thank u guys for reading my post:)
The fact that you use Sheets everyday has nothing to do with the way it works.
Sheets on a computer can be slow due to various reasons - large amounts of data stored, using functions which make use of other Spreadsheets and even connectivity issues.
The fact that the IMAGE function works on one device as expected and not on another one can be due to many reasons as well. Since this issue seems particular to your situation you might want to take a look onto your data from the Spreadsheet and how can you improve the performance.
What you can do to improve performance:
Try to use close-ranges when using formulas and such. For example, use A1:C3 if the data is stored in that range instead of A:C.
Certain functions can also slow the performance of a Spreadsheet, especially the so-called volatile functions such as NOW(), TODAY(), RAND() since these functions refresh every time there's a change in the Sheets.
IMPORTRANGE, IMPORTHTML, IMPORTFEED, IMPORTDATA and IMPORTXML are also known to recalculate; for the IMPORTANGE the time is of 30 minutes and for the others is 1 hour.
Reference
Sheets Limitations;
Formulas Recalculation
Is there a way to get around the 50 million cell count rule? Can this be done by using 2 separate workbooks?
We have a lead tracking system that we have built in a Google Sheets workbook and with the way our leads get updated we have already hit the 50mil record count in Google Sheets over the past 3 months. Deleting the data is not an option as we have to analyze weekly monthly quarterly and yearly stats.
I am pretty sure IMPORTRANGE would still hit the 50mil cell count limit.
Is there a way around this limit?
Update:
So a way to combat the cell limit is to totally delete all columns and rows that you do not use and are empty. Trimming the sheets down to just what you have filled in rows and columns.
Apparently if the cell has no data in it it still counts against your cell count despite it being empty.
This is not a solution per say but it is a way to make sure empty cells are not counting against your cell count.
Answer:
There is no way around this. According to the Google File Size documentation[1], the limits on a Spreadsheet are:
Up to 5 million cells or 18,278 columns (column ZZZ) for spreadsheets that are created in or converted to Google Sheets.
Things I Tested:
Starting in 2019 it became possible to edit Office files natively in G Suite[2] so I thought I'd give it a test. According to the specifications and limits page for Microsoft Office Excel[3]:
Total number of rows and columns on a worksheet: 1,048,576 rows by 16,384 columns
Which totals 17,179,869,184 cells.
As Spreadsheets that are created on Google Drive have the Google Drive limit, I created an Excel workbook on my local machine, with the maximum number of possible cells and uploaded it to Drive to see if it could be edited natively. Unfortunately, while the file uploaded successfully, attempting to open the file resulted in the following page:
More Information, Workarounds & Similar Services:
Honestly if you need more than 5 million cells in a Spreadsheet (or even 50 million!) then you're not using the right tool for the job. With this much data, you're likely better off using a database or a cloud data warehouse such as Google BigQuery[4] or Cloud SQL[5]
That being said, if Google Sheets/Spreadsheet workbooks really is the only way forward for you, the only thing I can recommend you doing is creating multiple Sheets files, separated into a more appropriate timeframe - each Sheet containing data for just a month. This will take a bit more time to set up (though you can use Apps Script for data migration between the Sheets), but in the long run will mean you will be able to use your data more effectively, and any data processing you need to do will complete within the Apps Script Quotas[6].
References:
Google Drive Help - Files you can store in Google Drive
G Suite Updates - "Office editing makes it easier to work with Office files in Docs, Sheets and Slides."
Excel specifications and limits
Google BigQuery
Google Cloud SQL
Google Apps Script - Quotas for Google Services