I am using a complex foreach statement to generate my data.
This is a sample row:
(2013-07-01)
Below is my code:
joined_data = foreach old_data {
date = old_data::date;
month = SUBSTRING(date, 5, 7);
generate date, month;
};
When I go ahead and use the table, I get the following error:
<file script.pig, line 24, column 66> Invalid field projection. Projected field [month] does not exist in schema: old_data::date:chararray,:chararray.
Why doesn't month have a name?
I clearly named it.
When I write:
joined_data = foreach data {
date = old_data::date;
month = SUBSTRING(date, 5, 7);
generate date, $1;
};
The code never finishes running (it cannot finish the reduce stage).
Any idea why this is happening and how I can make sure that Pig picks up on the name I gave to the month column?
Thanks.
Indeep you can force the labelling with the AS month, but it doesn't explain why the $1 doesn't work :S
I would recommend you to use the EXTRACT with the appropriate regex and try a DESCRIBE joined_data; and DESCRIBE old_data; in order to see the labelling process :)
Related
I have more than 20k form responses (google sheet and google form) where some guys have selected the wrong data which is visible in my responses. How I know its wrong is because they needed to select the activity (an attribute) but they selected the similar activity name (let's call it X) which was for the previous year and this year's activity (let's call it Y) should have been the different one.
I know that after a certain date all the X activities are Y, so I need to modify the data while importing it from the responses.
I tried conditional formatting on the data but then the importrange doesn't work, it needs cells to be empty to work.
I learned about query statements but it doesn't allow UPDATE.
Please help me do this, I am okay if we need to use a macro. I'm looking for something like this (note that the following is the logic I'm looking for and not the actual code):
if date>"a date" and FC==X:
FC=Y
#FC being the column I wanna modify
Edit: I am unable to share the table as its confidential. Can tell you that first column is date/time of form and then there are 149 columns, one of them I need to modify based on the date. Let's Assume it has just 2 columns, A: date, B: activity (has 20 activities). So, if they have filled "X" activity after then change that activity to Y. I hope it helps in understanding.
Edit 2: Have put a dummy file as asked. So now the problem statement is after 21 May 2022 (inclusive) all "6" activity must be "2"
Try
function onOpen() {
SpreadsheetApp.getUi().createMenu('⇩ M E N U ⇩')
.addItem('👉 Update', 'myFunction')
.addToUi();
}
function myFunction() {
// parameters
var param = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Param')
var from = param.getRange('A2').getValue()
var before = param.getRange('B2').getValue()
var after = param.getRange('C2').getValue()
// data
var sh = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet()
var range = sh.getDataRange()
var data = range.getValues()
data.filter(r => (r[0] >= from && r[1] == before)).forEach(r => r[1] = after)
range.setValues(data)
}
To avoid hardcoding and potential issue with dates, I put all parameters in a new tab called "Param" as follows
I have a table that changes dynamically based on the week we're on by using
=WEEKNUM(TODAY())
but I want to create a dropdown menu with all the week numbers, so a user can select any week and check the data related to that weeknum. However, I'd like that everytime the sheet opens, it shows as a default, the current weeknum.
You might be able to use a non-script approach if you can adapt the technique outlined in detail at https://www.benlcollins.com/spreadsheets/default-values/. You will need to be able to add a column to the immediate left of the 'Week' column, and probably also to hide the column afterwards for neatness.
For instance, if the example table above was in columns B&C, add the following formula to A2, copy down the column as far as needed then hide column A:
={"",weeknum(today())}
You can then set up the data validation dropdown in the week column, and anything you select in those cells will overwrite the default value. This generates #REF! errors in column A because you are preventing a literal array expanding, but you can't see the error because it's hidden, and you don't care about it anyway. If you delete the value in any cell in the Week column, the array in the corresponding cell in column A will be able to expand again and show the default value once more.
Suggestion
As what #player0 have mentioned, this is possible using a script. You may use this quick sample script below as reference on how to create a drop-down, set it with week numbers from the current week down to the first week of year, and set it up to show the default current week number.
Sample Script:
function test() {
var currentDate = new Date();
var startDate = new Date(currentDate.getFullYear(), 0, 1);
var days = Math.floor((currentDate - startDate) / (24 * 60 * 60 * 1000));
var weekNumber = Math.ceil((currentDate.getDay() + 1 + days) / 7);
var weeks = [];
while (weekNumber != 0) { //Get the week numbers from today to the first week of the year
weeks.push(weekNumber)
weekNumber--;
}
var cell = SpreadsheetApp.getActive().getRange('A1');
var rule = SpreadsheetApp.newDataValidation().requireValueInList(weeks, true).build();
cell.setDataValidation(rule).setValue(weeks[0]); //Sets the default value to the current week number
}
Test
Reference
https://developers.google.com/apps-script/reference/spreadsheet/range#setvaluevalue
https://developers.google.com/apps-script/reference/spreadsheet/data-validation-builder#requirevalueinlistvalues
I want a formula to generate random data of birth dates for a specific years (Example: 1995 to 2002) and make it Array like this:
Sheet URL: https://docs.google.com/spreadsheets/d/1XHoxD-hNmpUOMVm_u-cz-4ESrabodsrS0fIfaN-n4js/edit
That might not be the best approach but it will get you closer to what you want:
=DATE(RANDBETWEEN(1995,2002),RANDBETWEEN(1,12),RANDBETWEEN(1,31))
There are two issues with this approach:
you might get a day that does not exist for the particular month. For example, 2/28/2021 exists, but 2/29/2021 does not exist.
I wasn't able to generate an array but only drag down formulas. When I generate an array, the same random numbers are used and as a result the dates are the same.
For the first issue, you can use isdate to check if the random date returned is correct. For example, 2/29/2021 is a wrong date (I hardcopied that date).
but I guess you can filter out the FALSE cases.
I really hope other people can come up with a better approach.
You could try (as I demonstrated in your sheet):
=ARRAY_CONSTRAIN(SORT(SEQUENCE(DATE(1992,12,31)-DATE(1900,1,1),1,DATE(1900,1,1)),RANDARRAY(DATE(1992,12,31)-DATE(1900,1,1)),1),COUNTA(A2:A),1)
SEQUENCE(DATE(1992,12,31)-DATE(1900,1,1),1,DATE(1900,1,1)) - Is used to create an array of valid numeric representations of true dates between 1-1-1900 and 31-12-1992.
SORT(<TheAbove>,RANDARRAY(DATE(1992,12,31)-DATE(1900,1,1)),1) - Is used to sort the array we just created randomly.
ARRAY_CONSTRAIN(<TheAbove>, COUNTA(A2:A),1) - Is used to only return as many random birth-dates we need according to other data.
Note that this is volatile and will recalculate upon sheet-changes and such. Also note that this is just "slicing" a given array and may fall short when you try to use it on a dataset larger than the given array.
As Google Sheets can deal with dates as integers (~ number of days since 1900), choosing a random date between two dates can be a single call to RANDBETWEEN (with the output formatted as Date).
With your initial date written in B1 and your end date in B2, the formula is simply:
=RANDBETWEEN($B$1,$B$2)
You can paste this formula in as many cells as you want, to generate N different random dates.
Of course, as other answers involving random generators in your sheet, the formula will be recomputed at each change. My suggestion to overcome this would simply be to copy/paste the output, using the "Paste special > Values only" option (right click or "Edit" menu).
Script Solution
Just for sake of completeness, here is a solution using a script
Initial Considerations
This cannot function like a in sheet function/formula.
https://developers.google.com/apps-script/guides/sheets/functions
Custom function arguments must be deterministic. That is, built-in spreadsheet functions that return a different result each time they calculate — such as NOW() or RAND() — are not allowed as arguments to a custom function. If a custom function tries to return a value based on one of these volatile built-in functions, it will display Loading... indefinitely.
A custom function cannot affect cells other than those it returns a value to. In other words, a custom function cannot edit arbitrary cells, only the cells it is called from and their adjacent cells. To edit arbitrary cells, use a custom menu to run a function instead.
So a normal script is needed.
The Script
/**
* Sets the values of a range to random dates.
*/
function generateRandomBdays(range, start, end) {
let height = range.getHeight();
let width = range.getWidth();
let output = [];
for (let i = 0; i != height; i++) {
let row = [];
for (let j = 0; j != width; j++) {
row.push(randomBday(start, end));
}
output.push(row)
}
range.setValues(output);
}
/**
* Generates a random date beween start and end
*/
function randomBday(start, end) {
if (start < 2000) start = start - 1900
start = new Date(`${start}`);
if (end < 2000) end = end - 1900
end = new Date(`${end}`);
let bday = new Date(
start.getTime() + (Math.random() * (end.getTime() - start.getTime()))
);
return bday;
}
/**
* Gets active selection and fills with random dates
*/
function main(){
let file = SpreadsheetApp.getActive();
let sheet = file.getActiveSheet()
let range = sheet.getActiveRange();
// ============
generateRandomBdays(range, 1995, 2002); // Change these years to your liking
// ============
}
/**
* Creates menu when sheet is opened.
*/
function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu('Generate Birthdays')
.addItem('Generate!', 'main')
.addToUi();
}
Which works like this:
Installation
You will have to copy it into your script editor and then run one of the functions to authorize the script with the permissions it needs. Then next time you open the sheet you should have the menu available.
Alternatively you can delete the onOpen function and just use it from the script editor.
Within the main function, customize the range of years you need.
References
Apps Script
Overview of Spreadsheet Service in Apps Script
I am working on adding the time I spend on my habits using google sheets. If you look at this example sheet, I am keeping my individual habits in columns 3-8 (see the offsets on the first row).
To add the food related habits times (columns 5 and 6), I can use the range in offset function (see formulae in D17 below "Food").
The question is: how do I add the numbers for exercise and sleep (column offsets 4, 7, and 8)? The number of columns here could be 2, 3, or more! And they might not be consecutive.
Thanks for any pointers.
To sum entries of the rows whose columns are in the given array, I would use
=SUMPRODUCT(COUNTIF({5,8,9},COLUMN(D3:J3))*(D3:J3))
This is the formula for E18 in your spreasheet.
Since the columns might not be consecutive and there can be a variable number of them, I think it is appropriate to use an Apps Script custom function, and use the spread syntax to account for the variable number of columns.
Just open the script bound to your file, copy this function and save the project:
function HABIT_TOTALS(...habitIndexes) {
const sheet = SpreadsheetApp.getActiveSheet();
const headers = sheet.getRange(1, 1, 1, sheet.getLastColumn()).getValues()[0];
let output = [];
for (let dayIndex = 0; dayIndex < 7; dayIndex++) {
let dayValue = 0;
habitIndexes.forEach(habitIndex => {
const columnIndex = headers.indexOf(habitIndex) + 1;
const dailyHabitValue = sheet.getRange(3, columnIndex).getValue();
const dayHabitValue = sheet.getRange(4 + dayIndex, columnIndex).getValue();
dayValue = Number(dayValue) + Number(dailyHabitValue) + Number(dayHabitValue);
});
output.push([dayValue]);
}
return output;
}
Notes:
This function can be used as any in-built formula from Sheets (e.g. =HABIT_TOTALS(4,7,8)).
This function gets, as arguments, the indexes of the habits to retrieve (in this case 4, 7, 8), to be found on the first row in the sheet.
It loops through all days of the week (dayIndex), returning the total amount for each day. Because of this, there's no need to drag the formula down.
For each day, it finds the column index based on the habit index provided as an argument, and adds the values for Daily and for the current day to the total value for the day.
After retrieving the total amount for the day, this value is pushed to output, the value returned by this function.
This function could be used for the Food habits, just changing the arguments: =HABIT_TOTALS(5,6), or for any other combination.
Reference:
Custom Functions in Google Sheets
Spread syntax (...)
For the calculation concerning food you can try in cell D18
=sum(filter(filter($D$3:$I$11, regexmatch($C$3:$C$11, "Daily|"&text($C18, "ddd"))), regexmatch($D$1:$I$1&"", "5|6")))
and fill down.
The numbers at the end refer to the colum numbers you have in row 1. So in E18 (Sleep and excercise) you would have
=sum(filter(filter($D$3:$I$11, regexmatch($C$3:$C$11, "Daily|"&text($C18, "ddd"))), regexmatch($D$1:$I$1&"", "4|7|8")))
Of course, it is also possible to write the last part in a cell and then refer to that cell. That would mean you can enter in E18
=sum(filter(filter($D$3:$I$11, regexmatch($C$3:$C$11, "Daily|"&text($C18, "ddd"))), regexmatch($D$1:$I$1&"", D$17)))
and fill down AND to the right.
See if that helps?
I am trying to match a date within a google spreadsheet, but I am new to this, and I could use some help. I have a column that contains a list of birthdays. I have a cell that I am using to reference for the current date by using
=Today()
What I would like to do is compare the month and day, but ignore the year, and return the values in the two adjacent columns. I am using this query to try to get the information.
=QUERY(C2:E430; "select * where C = date '" & text(F2,"yyyy-MM-dd") & "'")
but it always returns an empty output, because the year never matches. How do I get it to ignore the year?
Thank you,
Paul
One way would be to compare the month and day separately (note that the MONTH() spreadsheet function is January = 1, while the month() function in the QUERY select clause is January = 0).
=QUERY(C2:E430;"select * where month(C) = "&(MONTH(F2)-1)&" and day(C) = "&DAY(F2))