I have a googlesheet where a column may contain no information in it. While iterating through the rows and looking at that column, if the column is blank, it's not returning anything. Even worse, if I do a get of a full row and include that common, say get 5 columns, I get back only 4 columns when any of the columns are empty. How do I return either NULL or an empty string if I'm getting a row of columns and one of the cells in a column is empty?
// Build a new authorized API client service.
Sheets service = GoogleSheets.getSheetsService();
range = "Functional Users!A3:E3";
response = service.spreadsheets().values().get(spreadsheetId, range).execute();
values = response.getValues();
cells = values.get(0);
I am getting 5 cells in the row. cells.size() should ALWAYS return five. However if any of the 5 cells are blank, it will return fewer cells. Say only the cell at B3 is empty. cells.size() will be 4. Next iteration, I get A4:E4 and cell D4 is empty. Again, cells.size() will be 4. With no way to know just which cell is missing. If A4 AND D4 AND E4 are empty, cells.size() will be 2.
How do I get it to return 5 cells regardless of empty cells?
The way I solved this issue was converting the values into a Pandas dataframe. I fetched the particular columns that I wanted in my Google Sheets, then converted those values into a Pandas dataframe. Once I converted my dataset into a Pandas dataframe, I did some data formatting, then converted the dataframe back into a list. By converting the list to a Pandas dataframe, each column is preserved. Pandas already creates null values for empty trailing rows and columns. However, I needed to also convert the non trailing rows with null values to keep consistency.
# Authenticate and create the service for the Google Sheets API
credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)
http = credentials.authorize(Http())
discoveryUrl = ('https://sheets.googleapis.com/$discovery/rest?version=v4')
service = discovery.build('sheets', 'v4',
http=http,discoveryServiceUrl=discoveryUrl)
spreadsheetId = 'id of your sheet'
rangeName = 'range of your dataset'
result = service.spreadsheets().values().get(
spreadsheetId=spreadsheetId, range=rangeName).execute()
values = result.get('values', [])
#convert values into dataframe
df = pd.DataFrame(values)
#replace all non trailing blank values created by Google Sheets API
#with null values
df_replace = df.replace([''], [None])
#convert back to list to insert into Redshift
processed_dataset = df_replace.values.tolist()
I've dabbled in Sheetsv4 and this is indeed the behavior when you're reading a range of cells with empty data. It seems this is the way it has been designed. As stated in the Reading data docs:
Empty trailing rows and columns are omitted.
So if you can find a way to write a character that represents 'empty values', like zero, then that will be one way to do it.
I experienced the same issue using V4 of the sheets api but was able to workaround this using an extra column at the end of my range and the valueRenderOption argument for the values.get API
Given three columns, A, B and C any of which might contain a null value, add an additional column, D and add an arbitrary value here such as 'blank'.
Ensure you capture the new column in your range and add the additional parameter,
valueRenderOption: 'FORMATTED_VALUE'.
You should end up with a call similar to this:
sheets.spreadsheets.values.get({
spreadsheetId: SOME_SHEET_ID,
range: "AUTOMATION!A:D",
valueRenderOption: 'FORMATTED_VALUE'
}, (err, res) => {})
This should then give you a consistent length array for each value, returning a blank string "" in the place of the empty cell value.
If you pull a range from the google sheet API v4 then empty row data IS included if its at the beginning or middle of the selected range. Only cells which have no data at the end of the range are omitted. Using this assumption you can 'fill' the no data cells in your app code.
For instance if you selected A1:A5 and A1 has no value it will still be returned in row data as {}.
If A5 is missing then you'll have an array of length 4 and so know to fill the empty A5.
If A4 & A5 are empty then you'll have an array of length 3 and so on.
If none of the range contains data then you'll receive an empty object.
I know that this is super late, but just in case someone else who has this problem in the future would like a fix for it, I'll share what I did to work past this.
What I did was increase the length of the range of cells I was looking for by one. Then within the Google Spreadsheet that I was reading off of, I added a line of "."s in the extra column (The column added to the array now that the desired range of cells has increased). Then I protected that line of periods so that it can't be changed from the "."
This way gives you an array with everything you are looking for, including null results, but does increase your array size by 1. But if that bothers you, you can just make a new one without the last index of the arrays.
The only solution I could find is writing your own function:
def _safe_get(data, r, c):
try:
return data[r][c]
except IndexError:
return ''
def read(range_name, service):
result = service[0].spreadsheets().values().get(spreadsheetId=service[1],
range=range_name).execute()
return result.get('values', [])
def safe_read(sheet, row, col, to_row='', to_col='', service=None):
range_name = '%s!%s%i:%s%s' % (sheet, col, row, to_col, to_row)
data = read(range_name, service)
if to_col == '':
cols = max(len(line) for line in data)
else:
cols = ord(to_col.lower()) - ord(col.lower()) + 1
if to_row == '':
rows = len(data)
else:
rows = to_row - row + 1
return [[_safe_get(data, r, c)
for c in range(cols)]
for r in range(rows)]
If last cell in row has a value then the row will be returned fully
for example:
Rows:
|Nick|29 years|Minsk|
|Mike| |Pinsk|
|Boby| | |
Return:
[
["Nick", "29 years", "Minsk"],
["Mike", "", "Pinsk"]
["Boby"]
]
So when you add a new line with empty cells instead of empty("" or null) just use space " "
And then when you read values just map all items from space " " to empty ""
Rows:
|Nick|29 years|Minsk|
|Mike| |Pinsk|
|Boby| |" " |
Return:
[
["Nick", "29 years", "Minsk"],
["Mike", "", "Pinsk"]
["Boby", "", " "]
]
Another option is iterating through the returned rows, checking the length of the row and appending whatever data you were expecting to be returned. I found this preferable to adding junk data to my dataset.
I am super late to the party, but here goes another alternative:
def read_sheet(service, SPREADSHEET_ID, range) -> pd.DataFrame:
result = service.spreadsheets().values().get(spreadsheetId=SPREADSHEET_ID, range=range).execute()
rows = result.get('values', [])
df = pd.DataFrame(rows[0:])
df.columns = df.iloc[0]
df = df.drop(axis=0, index=0)
return df
For this solution to work you will need headers (column names) in all columns of the spreadsheet you want to read. It will load a pandas df without a headers (column names) specification, replace the column names with the first row, and then drop it.
Sheets API V4, should return all blanks up to last filled column.
This will fill out the blanks:
values = result.get('values', [])
print(values[1:5]) # [['Spinach Lasagna', '10', '5', '', 'x'], ['Hot Dish', '10', '5', '', '', '', 'x'], ['Tuna-Noodle Casserole', '10', '5', '', 'x', '', '', 'x'], ['Sausage and Peppers', '10', '3', '', '', '', '', '', 'x']]
n_col = 14 # hard code
n_col = max([len(i) for i in values]) # if last column is occupied at least once
n_col = len(values[0]) # if you have header
values = [lst + ([''] * (n_col - len(lst))) for lst in values]
print(values[1:4]) # [['Spinach Lasagna', '10', '5', '', 'x', '', '', '', '', '', '', '', '', ''], ['Hot Dish', '10', '5', '', '', '', 'x', '', '', '', '', '', '', ''], ['Tuna-Noodle Casserole', '10', '5', '', 'x', '', '', 'x', '', '', '', '', '', '']]
Just add:
values.add("");
before:
cells = values.get(0);
This will ensure that you do not query an empty list because of blank cell or a row.
Related
I'm writing a Google Sheets named function, GETTABLEOFCELL(), that takes in a cell reference, and returns the Named Range that cell exists in.
Since I could not find a native function to determine if a cell is within the specified range, I've defined a helper function called ISCELLINRANGE(range, cell). I've confirmed that this helper function works for cells and ranges within the same sheet--good enough for my case.
ISCELLINRANGE(range, cell)
=AND(
ROW(cell) >= ROW(range),
ROW(cell) < ROW(range) + ROWS(range),
COLUMN(cell) >= COLUMN(range),
COLUMN(cell) < COLUMN(range) + COLUMNS(range)
)
GETTABLEOFCELL(tableCell)
=ARRAYFORMULA(
IFS(
ISCELLINRANGE(DeathWaveUW, tableCell), {DeathWaveUW},
ISCELLINRANGE(BlackHoleUW, tableCell), {BlackHoleUW},
// ...
)
)
///
=ISCELLINRANGE(DeathWaveUW, D6) // => TRUE
=COLUMN(GETTABLEOFCELL(D6)) // => #VALUE!
=ARRAYFORMULA(
IFS(
ISCELLINRANGE(DeathWaveUW, D6), DeathWaveUW
)
) // => #N/A
As seen above, to debug GETTABLEOFCELL(), I simply copied a snippet of the formula into a cell with hard-coded values. It returns #N/A saying there is no match in the IFS() list, which I am guessing (read: hoping) is the root issue in GETTABLEOFCELL(). I've used both DeathWaveUW and {DeathWaveUW} syntaxes for the second argument of IFS; both return #N/A.
Any idea what I am doing wrong?
Issue:
IFS returns
#N/A, when none of the conditions is satisfied
#VALUE, when there are mismatched ranges
Solution:
For
#N/A, Add a default value to IFS
#VALUE, Fix the range size
Sample:
#N/A
Add a default value:
GETTABLEOFCELL(tableCell)
=ARRAYFORMULA(
IFS(
ISCELLINRANGE(DeathWaveUW, tableCell), {DeathWaveUW},
ISCELLINRANGE(BlackHoleUW, tableCell), {BlackHoleUW},
TRUE, "No range found"
)
)
This assumes All *UWs are of single range. If not, TRUE and "No range found" should be modified to a array(MAKEARRAY is a option).
#VALUE
If DeathWaveUW and BlackHoleUW and all other UWs were two dimensional and of the same size, you can change the aggregating function AND to a non-aggregating function like * to maintain the array size:
ISCELLINRANGE(range, cell)
=
(ROW(cell) >= ROW(range))*
(ROW(cell) < ROW(range) + ROWS(range))*
(COLUMN(cell) >= COLUMN(range))*
(COLUMN(cell) < COLUMN(range) + COLUMNS(range))
Alternatively, reduce the array size of named range to 1 by passing it as a string and use INDIRECT
GETTABLEOFCELL(tableCell)
=ARRAYFORMULA(
INDIRECT(
IFS(
ISCELLINRANGE(DeathWaveUW, tableCell), "DeathWaveUW",
ISCELLINRANGE(BlackHoleUW, tableCell), "BlackHoleUW",
TRUE, "No range found"
)
)
)
Related:
ArrayFormula and "AND" Formula in Google Sheets
Mismatch Range error on using IFs in sheets
I am facing a problem related to the dynamic array.
I have data in the below format.
And I want to convert to this format.
Here is the sheet link.
I am using this formula to filter Fruits category.
={FILTER(A5:D11,B5:B11="Fruits");SUM( FILTER(D5:D11,B5:B11="Fruits"))}
But it gives this error
In ARRAY_LITERAL, an Array Literal was missing values for one or more rows
NOTE: Data should be pulled dynamically from the formula, as the data may change.
To build the result table without hard coding category names in the formula, use the recently introduced lambda functions, like this:
={
lambda(
data, categories, headers, totalsHeader, blankRow, selectPrice,
reduce(
headers, query(unique(categories), "where Col1 is not null", 0),
lambda(
resultTable, filterKey,
{
resultTable;
lambda(
filterData,
{
filterData;
{ totalsHeader, query(filterData, selectPrice, 0) };
blankRow
}
)(filter(data, categories = filterKey))
}
)
)
)(
B5:D,
B5:B,
B4:D4,
{ "", "Total:" },
{ "", "", "" },
"select sum(Col3) label sum(Col3) '' "
);
{ "", "Grand Total:", sum(D5:D) }
}
See { array expressions }, filter(), query(), reduce() and lambda().
The formula will repeat each category name on several rows. If they get in the way, you can hide them from view by using a conditional formatting custom formula rule.
I did some tests to add all the information in just one formula. It will change the format you want, but it will still divide all the information.
Here is the formula:
={"Fruits:","";QUERY(B5:D,"select C, D where B ='Fruits'");
{"Total:",SUMIF(B5:D,"Fruits",D5:D)};"","";
"Vegetables:","";QUERY(B5:D,"select C, D where B ='Vegetables'");
{"Total:",SUMIF(B5:D,"Vegetables",D5:D);"","";
"condiments:","";QUERY(B5:D,"select C, D where B ='condiments'");
{"Total:",SUMIF(B5:D,"condiments",D5:D)};"","";
"Grand Total:",SUM(D5:D)}}
Note:
I added : and the end of each category in the formula so they will look like Fruits: and the table will look like this:
The formula opens with { to open an array in Google Sheets, and you use , to separate columns to write a row of data, and ; to separate the rows to help you write a column of data. After that, you use } to close the array. For example:
{"1","2";"3","4"}
It will print:
So basically, I organize the data with arrays of the same amounts of columns. The first one with part
= { => To open the array.
"Fruits:",""; => This create a cell with "Fruits:" + an empty cell.
QUERY(B5:D,"select C, D where B ='Fruits'"); => which is
already on an array of 2 columns.
{"Total:",SUMIF(B5:D,"Fruits",D5:D)}; => Creates the "Total" cell + the sum
of values that has Fruits in column B.
"",""; => Which will create an empty row to separate the information
for the next set of arrays.
You do the same pattern for the other categories.
} => to end the initial array.
You can add a "Conditional formatting" that will change the text with : to bold automatically.
Reference:
QUERY function
SUMIF
ARRAYFORMULA
I suggest you read on: https://stackoverflow.com/a/58042211/5632629
the first part of your formula outputs a grid of 4×3 cells
the second part of your formula outputs a single cell
if you want to combine it properly use:
={FILTER(A5:D11, B5:B11="Fruits");
{"","","Totals",SUM(FILTER(D5:D11, B5:B11="Fruits"))}}
or:
={FILTER(B5:D11, B5:B11="Fruits");
{"","Totals",SUM(FILTER(D5:D11, B5:B11="Fruits"))}}
I have the following Google Seet Table
Old New New2
W01
W02 W04
W03 W05 W06
I want to create a formular that transforms the table to this one
Old New
W02 W04
W03 W05
W05 W06
So any time a switch from Old to New or New to New2 happens should be displayed.
I wrote the following formular but i always get an error:
= IFS(B1 = "";""; AND(NOT(B1 = ""); NICHT(C1 = ""));FILTER({A1\ B1}; NICHT(A1=""));NICHT(B1 = "");FILTER({B1\ C1}; NICHT(B1="")))
Has anybody an idea?
Concatenate the results of two Query calls:
={
QUERY(A1:B4,
"select A,B where B<>''");
QUERY(B1:C4,
"select B,C where C<>'' label B '', C ''", 1)
}
or in German locale:
={
QUERY(A1:B4;
"select A,B where B<>''");
QUERY(B1:C4;
"select B,C where C<>'' label B '', C ''"; 1)
}
The label statements in the second query are necessary to suppress the column labels since you want to treat certain columns in New1 as Old.
I need to build a table based on the following data:
Ref
Product
R1
ProdA
R2
ProdC
R1
ProdB
R3
ProdA
R4
ProdC
And here the result I need:
My Product
All Ref
ProdA
R1#R3
ProdC
R2#R4
The particularity is that the 'My Product' column is computed elsewhere. So I need an arrayformula based on 'My Product' column to look in the first table to build the 'All Ref' column. You follow me?
I know that Arrayformula is not compatible with filter and join ... I expect a solution like this one Google sheet array formula + Join + Filter but not sure to understand all steps and if really adapted to my case study.
Hope you can help.
You could try something like this:
CREDIT: player0 for the method shared to similar questions
=ARRAYFORMULA(substitute(REGEXREPLACE(TRIM(SPLIT(TRANSPOSE(
QUERY(QUERY({B2:B&"😊", A2:A&"#"},
"select max(Col2)
where Col1 !=''
group by Col2
pivot Col1"),,999^99)), "😊")), "#$", )," ",""))
Step by step:
Instead of the workaround hacks I implemented a simple joinMatching(matches, values, texts, [sep]) function in Google Apps Script.
In your case it would be just =joinMatching(MyProductColumn, ProductColumn, RefColumn, "#").
Source:
// Google Apps Script to join texts in a range where values in second range equal to the provided match value
// Solves the need for `arrayformula(join(',', filter()))`, which does not work in Google Sheets
// Instead you can pass a range of match values and get a range of joined texts back
const identity = data => data
const onRange = (data, fn, args, combine = identity) =>
Array.isArray(data)
? combine(data.map(value => onRange(value, fn, args)))
: fn(data, ...(args || []))
const _joinMatching = (match, values, texts, sep = '\n') => {
const columns = texts[0]?.length
if (!columns) return ''
const row = i => Math.floor(i / columns)
const col = i => i % columns
const value = i => values[row(i)][col(i)]
return (
// JSON.stringify(match) +
texts
.flat()
// .map((t, i) => `[${row(i)}:${col(i)}] ${t} (${JSON.stringify(value(i))})`)
.filter((_, i) => value(i) === match)
.join(sep)
)
}
const joinMatching = (matches, values, texts, sep) =>
onRange(matches, _joinMatching, [values, texts, sep])```
I have a spreadsheet with criteria, a start and end date, and a value. The goal is to find the lowest value for each unique criteria and start date without overlapping dates (exclusive of end date). I made a pivot table to make it easier for myself but I know there is probably a way to highlight all valid rows that meet the above requirements with some formula or conditional formatting.
I have attached a google drive link where the spreadsheet can be found here and I have some images of the sheet as well. I know that it might be possible with conditional formatting but I just don't know how to combine everything I want it to do in a single formula.
Example below:
Row 2 is a valid entry because it has the lowest value for Item 1 starting on 03-15-2021, same with row 9.
Row 5 is valid because the start date does not fall within the date range of row 2 (exclusive of end date)
Row 7 is not valid because the start date is between the start and end date of row 6
You may add a bounded script to your project. Then you can call it either with a picture/drawing that has the function assigned (button-like), or adding a menu to Google Sheets.
From what you said in the question and the comments, this seems to do what you are trying. Notice that this requires the V8 runtime (which should be the default).
function validate() {
// Get the correct sheet
const spreadsheet = SpreadsheetApp.getActiveSpreadsheet()
const sheet = spreadsheet.getSheetByName('Sheet1')
// Get the data
const length = sheet.getLastRow() - 1
const range = sheet.getRange(2, 1, length, 4)
const rows = range.getValues()
const data = Array.from(rows.entries(), ([index, [item, start, end, value]]) => {
/*
* Row Index
* 1 Criteria 1
* 2 Item 1 0
* 3 Item 1 1
* 4 Item 1 2
*
* row = index + 2
*/
return {
row: index + 2,
criteria: item,
start: start.getTime(),
end: end.getTime(),
value: value
}
})
// Sort the data by criteria (asc), start date (asc), value (asc) and end date (asc)
data.sort((a, b) => {
let order = a.criteria.localeCompare(b.criteria)
if (order !== 0) return order
order = a.start - b.start
if (order !== 0) return order
order = a.value - b.value
if (order !== 0) return order
order = a.end - b.end
return order
})
// Iterate elements and extract the valid ones
// Notice that because we sorted them, the first one of each criteria will always be valid
const valid = []
let currentCriteria
let currentValid = []
for (let row of data) {
if (row.criteria !== currentCriteria) {
// First of the criteria
valid.push(...currentValid) // Move the valids from the old criteria to the valid list
currentValid = [row] // The new list of valid rows is only the current one (for now)
currentCriteria = row.criteria // Set the criteria
} else {
const startDateCollision = currentValid.some(valid => {
row.start >= valid.start && row.start < valid.end
})
if (!startDateCollision) {
currentValid.push(row)
}
}
}
valid.push(...currentValid)
// Remove any old marks
sheet.getRange(2, 5, length).setValue('')
// Mark the valid rows
for (let row of valid) {
sheet.getRange(row.row, 5).setValue('Valid')
}
}
Algorithm rundown
We get the sheet that we have the data in. In this case we do it by name (remember to change it if it's not the default Sheet1)
We read the data and transform it in a more an array of objects, which for this case makes it easier to manage
We sort the data. This is similar to the transpose you made but in the code. It also forces a priority order and groups it by criteria
Iterate the rows, keeping only the valid:
We keep a list of all the valid ones (valid) and one for the current criteria only (currentValid) because we only have to check data collisions with the ones in the same criteria.
The first iteration will always enter the if block (because currentCriteria is undefined).
When changing criteria, we dump all the rows in currentValid into valid. We do the same after the loop with the last criteria
When changing criteria, the CurrentValid is an array with the current row as an element because the first row will always be valid (because of sorting)
For the other rows, we check if the starting date is between the starting and ending date of any of the valid rows for that criteria. If it's not, add it to this criteria's valid rows
We remove all the current "Valid" in the validity row and fill it out with the valids
The cornerstone of the algorithm is actually sorting the data. It allows us to not have to search for the best row, as it's always the next one. It also ensures things like that the first row of a criteria is always valid.
Learning resources
Javascript tutorial (W3Schools)
Google App Scripts
Overview of Google Apps Script
Extending Google Sheets
Custom Menus in Google Workspace
Code references
Class SpreadsheetApp
Class Sheet
Sheet.getRange (notice the 3 overloads)
let ... of (MDN)
Spread syntax (...) (MDN)
Arrow function expressions (MDN)
Array.from() (MDN)
Array.prototype.push() (MDN)
Array.prototype.sort() (MDN)
Date.prototype.getTime() (MDN)
String.prototype.localeCompare() (MDN)