Google Sheets import issue with limit despite small amount of data - google-sheets

In a Google Sheet we have to format pivot-type capability from two tabs (sub-worksheets) of dat. We ran out of the silly 5 million cells limit in Google fairly quickly. So now we have deleted a lot of the data and have only imported data from our CSV for this year. Since Jan 1.
In one tab this has 23000 rows with 10 columns of data.
In the other tab, we have about 1000 rows with 6 rows of data.
We deleted all the data from old tabs, and are importing the new CSVs with "Replace current sheet". When we do this with the much lower limits above...we see this error:
Too large to import. Remove rows or columns and try again.
What may be going wrong? The main pivot tab also now has most of the data forced by pasting "as value only" from 2020. So it just has one month of data for now to deal with the ARRAYFORMULA calculations.

Related

Downloaded my dataset combine from BigQuery to Google Sheets using Export, but only showed me 500 rows instead of the 3,745,465 rows

I want to export some datasets I combined which results in 3,745,465 rows in total. When I get to Google Sheets it says only allows me to see 500 rows "preview." When I clicked on my columns filters, the system has 3,745,465 but when I downloaded it as an excel sheet it only gives me 500 rows only instead of the whole data.
The result I want is to get the whole row number counts not only the 500 rows preview.
google sheets compared to bigquery is a small ecosystem. there are limits even cell limits. you are allowed to have 10 million cells per the whole spreadsheet. so if you have only one sheet with one single column you can have 10M cells which is equal to 10M rows (in theory). in reality, you will experience significant performance issues after like 30k rows and with ~ 80k you will be lucky if the whole spreadsheet won't crash.

Google sheets max cells limits and mitigations

I'd read online that Google sheets has a max cell limit of 5 million cells. A sheet that I'm currently working on has well and above passed that limit (including blank cells).
What is the new limit?
Also I'd manually checked how many cells I was using. Is there any
function or script that I can use to keep a check?
The sheet I'm working on is going to only get bigger and it's already lagging heavily. I'd love some suggestions on which platform I could move to next to handle such big data. There are so many options, it's mindboggling. I use Google sheets mainly for it's ease in collaboration, presentability and ease of use. Any other tool with these traits but with an ability to handle bigger data?
in the early years, it was 5 million cells. last year this was upgraded to 10 million cells
you can follow updates at: https://workspaceupdates.googleblog.com/search/label/Google%20Sheets
try:
take a look on Google DataStudio
The easiest way I found to check the cell limit was to try and add a huge amount of lines at the end of the document, which gave me this error message:
This reads: "An error has occurred: This action would increase the number of cells in the worksheet above the limit of 10000000 cells".
However, when I used one more digit, I got a different message:
That one reads: "Oops, enter a number between 1 and 5000000", suggesting the maximum number of rows you can have is 5 million, while the max of cells can be up to 10 million. I'm not sure about the columns, but I'd say it is the as the row's limit.

Linking a fixed spreadsheet and a moving spreadsheet by cell value

I work for a small business that depends quite a lot on the weather. I’m trying to create a spreadsheet (in google sheets) to predict daily revenue. At the moment the spreadsheet simply multiplies last year’s corresponding day by the growth percentage to get this year’s daily values. I’m trying to include the weather forecast (15 days) in the calculation. I’ve managed to import the forecast successfully using an api tutorial that I found online (https://www.visualcrossing.com/weather-data), and include the forecast values in the revenue calculation.
Now for the issue:
*the revenue prediction is a spreadsheet with every row being a day of the year, making the spreadsheet 365 rows long
*the imported forecast (in another tab, not sure that’s relevant) is only 15 rows long and updates every day, meaning that it remains that length. No new row is added during the update: the data just shift one row up and the first day disappears while a new day takes the bottom row.
My formula in the revenue tab identifies the weather data in the weather tab by row but when I pull the formula down for the whole year (down 365 rows, that is), only the first 15 rows refer to existing data in the weather tab. -> Not only are the results obtained based on the wrong row as soon as the weather tab updates on day 2, but they are also referring to totally empty cells from day 16 on.
So my questions are:
*is there a way to force the daily weather forecast to remain on its row and add a new row every day, or failing that,
*is there a way for the formula to recognize the matching day cells and use the value of the cell x columns to the right of it, effectively skipping the row referencing altogether
I’ve added a very simplified sketch of the spreadsheets for the more visual helpers out here
Apologies if this is unclear, I do not have a tech background but am happy to clarify anything if needed.
-- Edit: Here is a screenshot of the revenue tab. The expected output is in column K. The values in it at the moment are incorrect.
Seems like you just need a dynamic way to match the dates? I can't tell if you meant to have prior year data in other tab, but this spreadsheet matches your example and returns the value irrespective of the row order.
The formula being used (in the top row) is:
=B1*index(WeatherTab!B:B,match(A1,WeatherTab!A:A,))*index(WeatherTab!C:C,match(A1,WeatherTab!A:A,))

How can I disable automatic recalculation on google sheets?

Simplified scenario:
Sheet Customer_Orders, has blocks of rows with each row having product code, count ordered, and size. Bunch of other stuff is looked up/calculated on the basis of these three tidbits. By the end of the season this sheet has about 5000 rows.
Sheet Raw_Inventory has start of year in the first 500 rows, and then does a query to Customer_orders. By season end this sheet has about 2000 rows.
Near as I can tell, this query runs every time I change one of the 3 fields in Cust_Orders.
Sheet Inv_Status is a pivot table that runs against Raw_Inventory, and again, I think that every time Raw_Inventory is modified, the pivot table is recalculated. (There are a couple of other pivot tables that use the same data.)
The result is that making a change on Cust_Orders can result in up to 2 minutes while the calculations catch up.
(Hardware: Mac Pro, 24 GB ram, 3.2 GHz, 4 core; Current version of Chrome running under Yosemite)
What I would like to do is one of the following:
Lengthen the time between updates.
Be able to recalculate sheet Raw_Inventory manually.
A partial workaround:
I've created a new sheet that imports raw_Inventory. This copy is used for the pivot table. ImportRange only runs every 30 minutes.
The next step will replace the query with 1 zillion simple assignment statements. I'm hoping that this will replace querying 3000 lines with querying a single line when I make a change in Cust_Orders.
There is no way to disable automatic recalculation in Google Sheets. One option is to replace the formulas by the values either by using copy/paste as value only or by using a script. The advantage of using a script is that it also could be used to add again the formulas when needed.
Related
Ho do I stop and start autoupdating in Google Sheets?
Formulas always recalculating when refreshing/loading spreadsheet
I had a similar problem, I solved it by creating an enabling cell and in that cell I put 0 or 1 and then I used that cell inside the formula. In such a way that:
A
B
1
enable formula
0
2
= if(B1=0; 0; complex_formula1)
= if(B1=0; 0; complex_formula2)
3
= if(B1=0; 0; complex_formula1)
= if(B1=0; 0; complex_formula2)
This way when I need to change the spreadsheet I disable formulas (putting 0 on A1), change spreadsheet and on end I enable formulas (putting 1 on A1).
It's not the best solution, but it worked for me.
This is not an answer to my original question -- how to control recalculation, but is a workaround, and ultimately a better solution.
Quick restatement of problem:
CustOrders pulled descriptions of inventory off of RawInv sheet.
RawInv updated from CustOrders. This wasn't quite a circular dependency, as RawInv only updated quantities from CustOrders. But it meant that anytime a change was made in CustOrders, RawInv needed to be recalcuated.
This was made worse by having one query per line creating descriptions.
The solution amounted to refactoring.
Another spreadsheet was created, CustSupport.
It kept RawInv and Trees -- the latter being the descriptions. It also had the master reference sheet for prices and round off tables. These two tabs are rarely changed, and are copied as needed to sheets that use them.
It imported a copy of CustOrders. Since this copy had no dependencies back to to main ordering sheet, I didn't have to wait for it to recalculate.
RawInv recalculated from this copy of CustOrders.
I did a wholesale replacement of Querys with VLookups. This required some rearrangement of columns.

I can't figure out how to filter or query in Google Sheets without returning a bunch of blank strings appended to actual data

I'm at my wit's end on trying to figure out why filtering/querying in Google Sheets is so broken. I have a sheet with some data about practice exams I'm taking and I'm attempting to pull some data from that sheet to another sheet for calculating statistics. I've made a shareable document with the pertinent stuff so you can see what I mean.
My raw data is in the TestScores sheet and I made a TESTSTATS sheet to test different methods of pulling data from TestScores. In my example, I'm only trying to pull unique dates from range TestScores!B2:B and I've added a few different methods to do so in TESTSTATS (removed the equal sign from each one so each can be tested on its own by putting in the equal sign).
The methods I've tried:
=UNIQUE(TestScores!B2:B)
=UNIQUE(FILTER(TestScores!B2:B, TestScores!B2:B<>""))
=UNIQUE(FILTER(TestScores!B2:B, TestScores!B2:B<>0))
=UNIQUE(FILTER(TestScores!B2:B, NOT(ISBLANK(TestScores!B2:B))))
=UNIQUE(QUERY(TestScores!B2:B, "select B"))
=ARRAY_CONSTRAIN(UNIQUE(QUERY(TestScores!B2:B, "select B")), ROWS(UNIQUE(TestScores!B2:B))+1,5)
You'll see that each one, when activated by adding the = in front of the formula returns the proper data, but also appends 500 empty rows which look empty, but are in fact blank strings (""). This makes it difficult to work with because there are a lot of calculations in my sheet that depend on one another. I also do not want to specify an explicit end to my ranges and would prefer to keep them open ended (B2:B instead of B2:B17) so everything updates automatically as new records are added.
What am I doing wrong? Why are the returned data appended with a bunch of empty cells, and why 500 specifically (seems arbitrary considering my source data is 29 or 30 rows depending on whether or not you include headers)?
Starting with only two rows in TESTSTATS more rows have to be added for somewhere to place the output. It seems Google choose to do so 500 rows at a time (from the last required cell). "Why?" would have to be a matter for Google.
If you know 14 rows are required for the output and increase the size of TESTSTATS to 16 no more rows will be added. Since you want room for expansion you can't extend to 16 and avoid further issues but you could allow some room, say to 30 rows, and delete the few extra, or, if 30 becomes insufficient (when sheet shoots up to say 540 rows) delete the rows not required but set the sheet size to say 60 rows - and so on.

Resources