Spark join hangs - join

I have a table with n columns that I'll call A. In this table there are three columns that i'll need:
vat -> String
tax -> String
card -> String
vat or tax can be null, but not at the same time.
For every unique couple of vat and tax there is at least one card.
I need to alter this table, adding a column count_card in which I put a text based on the number of cards every unique combination of tax and vat has.
So I've done this:
val cardCount = A.groupBy("tax", "vat").count
val sqlCard = udf((count: Int) => {
if (count > 1)
"MULTI"
else
"MONO"
})
val B = cardCount.withColumn(
"card_count",
sqlCard(cardCount.col("count"))
).drop("count")
In the table B I have three columns now:
vat -> String
tax -> String
card_count -> Int
and every operation on this DataFrame is smooth.
Now, because I wanted to import the new column in A table, i performed the following join:
val result = A.join(B,
B.col("tax")<=>A.col("tax") and
B.col("vat")<=>A.col("vat")
).drop(B.col("tax"))
.drop(B.col("vat"))
Expecting to have the original table A with the column card_count.
Problem is that the join hangs, getting all system resources blocking the pc.
Additional details:
Table A has ~1.5M elements and is read from parquet file;
Table B has ~1.3M elements.
System is a 8 thread and 30GB of RAM
Let me know what I'm doing wrong

At the end, I didn't found out which was the issue, so I changed approach
val cardCount = A.groupBy("tax", "vat").count
val cardCountSet = cardCount.filter(cardCount.col("count") > 1)
.rdd.map(r => r(0) + " " + r(1)).collect().toSet
val udfCardCount = udf((tax: String, vat:String) => {
if (cardCountSet.contains(tax + " " + vat))
"MULTI"
else
"MONO"
})
val result = A.withColumn("card_count",
udfCardCount(A.col("tax"), A.col("vat")))
If someone knows a better approach let me know it

Related

Arrayformula to preserve a running/cumulative balance when inserting new rows

The top right cell (Natwest) is a list from a range using data validation.
The Opening Balance 1,000.00 is sourced from another sheet using a lookup formula.
Using simple if statements, the cumulative balance is then produced - according to the Amount column and whether the Natwest account occurs in the Dr(+) or Cr (-) column
i.e. =if(B4=$D$1,D3+A4,if(C4=$D$1,D3-A4,D3)) and copied down.
Natwest
Amount Dr Cr Balance
1,000.00
100.00 Natwest Account 1 1,100.00
200.00 Account 2 Natwest 900.00
400.00 Natwest Account 1 1,300.00
It works fine, except that when a new row is inserted, the if statement formula is not copied into the new row.
I am looking for an arrayformula solution (or other formula inside the cell solution), so that the Cumulative Balance still works, but doesn't need to be copied into column D new row - when a new row(s) are inserted.
(I don't mind the Natwest (drop down from the list) or the Opening Balance 1,000.00 to be moved elsewhere if required for a solution.)
Thanks for your help.
Something adding up in between the same range of the arrayformula is always going to be tricky with circular dependency. I suggest to get the initial value and add it the SUMIF of second column and substract the SUMIF of second column up to each value. With BYROW you can do it like this:
=BYROW(A4:A,LAMBDA(each,SUMIF(INDIRECT("B4:B"&ROW(each)),D1,A4:each)-SUMIF(INDIRECT("C4:C"&ROW(each)),D1,A4:each)+D3))
Alternate solution:
You can use this custom function from AppScript for automatically calculating cumulative balance
Code:
function customFunction(startnum, key, range) {
var res = [];
var current = startnum;
range.forEach((x) => {
res.push(x.map((y, index) => {
return y == key && index == 1 ? current = (current + x[0]) : (y == key && index == 2 ? current = (current - x[0]) : null)
}).filter(c => c))
})
return res;
}
Custom Function Parameters:
=customFunction(startnum, key, range)
startnum = opening balance
key = Account name
range = cell range
Sample output:
=customFunction(D3,D1,A4:C)

Set a sequential index within each subgroup of a List using Java 8 forEach or Stream API

I have an already sorted list of objects, and need to perform a "group by" and assign a sequence number to each object in the group.
[Essentially, I am trying to mimic the effect of a sql window function such as "seq = row_number() over (partition by cust_id)"]
Right now, I am achieving this using the below code, but would like to learn if there is a different way to do it using Java 8 features like stream() or forEach(). Thanks for the help.
int count = 0;
String prev_cust_id = "";
// OrdersList is already sorted by cust_id
for (Order o : orderList) {
// reset index for each new group
if (!o.cust_id.equals(prev_cust_id))
count = 0;
o.seq = ++count;
prev_cust_id = o.cust_id;
}
Update: I was also able to achieve the same with below code, but it is less readable and seems more inefficient. Is there is a better way.
orderList.get(0).seq = 1;
IntStream.range(1, orderList.size())
.forEach(i -> orderList.get(i).seq = orderList.get(i).cust_id.equals(orderList.get(i-1).cust_id) ? OrdersList.get(i-1).seq + 1 : 1);

Advance manual unpivot of data table [duplicate]

I am trying to produce a "reverse pivot" function. I have searched long and hard for such a function, but cannot find one that is already out there.
I have a summary table with anywhere up to 20 columns and hundreds of rows, however I would like to convert it into a flat list so I can import to a database (or even use the flat data to create more pivot tables from!)
So, I have data in this format:
Customer 1
Customer 2
Customer 3
Product 1
1
2
3
Product 2
4
5
6
Product 3
7
8
9
And need to convert it to this format:
Customer | Product | Qty
-----------+-----------+----
Customer 1 | Product 1 | 1
Customer 1 | Product 2 | 4
Customer 1 | Product 3 | 7
Customer 2 | Product 1 | 2
Customer 2 | Product 2 | 5
Customer 2 | Product 3 | 8
Customer 3 | Product 1 | 3
Customer 3 | Product 2 | 6
Customer 3 | Product 3 | 9
I have created a function that will read the range from sheet1 and append the re-formatted rows at the bottom of the same sheet, however I am trying to get it working so I can have the function on sheet2 that will read the whole range from sheet1.
No matter what I try, I can't seem to get it to work, and was wondering if anybody could give me any pointers?
Here is what I have so far:
function readRows() {
var sheet = SpreadsheetApp.getActiveSheet();
var rows = sheet.getDataRange();
var numRows = rows.getNumRows();
var values = rows.getValues();
heads = values[0]
for (var i = 1; i <= numRows - 1; i++) {
for (var j = 1; j <= values[0].length - 1; j++) {
var row = [values[i][0], values[0][j], values[i][j]];
sheet.appendRow(row)
}
}
};
I wrote a simple general custom function, which is 100% reusable you can unpivot / reverse pivot a table of any size.
In your case you could use it like this: =unpivot(A1:D4,1,1,"customer","sales")
So you can use it just like any built-in array function in spreadsheet.
Please see here 2 examples:
https://docs.google.com/spreadsheets/d/12TBoX2UI_Yu2MA2ZN3p9f-cZsySE4et1slwpgjZbSzw/edit#gid=422214765
The following is the source:
/**
* Unpivot a pivot table of any size.
*
* #param {A1:D30} data The pivot table.
* #param {1} fixColumns Number of columns, after which pivoted values begin. Default 1.
* #param {1} fixRows Number of rows (1 or 2), after which pivoted values begin. Default 1.
* #param {"city"} titlePivot The title of horizontal pivot values. Default "column".
* #param {"distance"[,...]} titleValue The title of pivot table values. Default "value".
* #return The unpivoted table
* #customfunction
*/
function unpivot(data,fixColumns,fixRows,titlePivot,titleValue) {
var fixColumns = fixColumns || 1; // how many columns are fixed
var fixRows = fixRows || 1; // how many rows are fixed
var titlePivot = titlePivot || 'column';
var titleValue = titleValue || 'value';
var ret=[],i,j,row,uniqueCols=1;
// we handle only 2 dimension arrays
if (!Array.isArray(data) || data.length < fixRows || !Array.isArray(data[0]) || data[0].length < fixColumns)
throw new Error('no data');
// we handle max 2 fixed rows
if (fixRows > 2)
throw new Error('max 2 fixed rows are allowed');
// fill empty cells in the first row with value set last in previous columns (for 2 fixed rows)
var tmp = '';
for (j=0;j<data[0].length;j++)
if (data[0][j] != '')
tmp = data[0][j];
else
data[0][j] = tmp;
// for 2 fixed rows calculate unique column number
if (fixRows == 2)
{
uniqueCols = 0;
tmp = {};
for (j=fixColumns;j<data[1].length;j++)
if (typeof tmp[ data[1][j] ] == 'undefined')
{
tmp[ data[1][j] ] = 1;
uniqueCols++;
}
}
// return first row: fix column titles + pivoted values column title + values column title(s)
row = [];
for (j=0;j<fixColumns;j++) row.push(fixRows == 2 ? data[0][j]||data[1][j] : data[0][j]); // for 2 fixed rows we try to find the title in row 1 and row 2
for (j=3;j<arguments.length;j++) row.push(arguments[j]);
ret.push(row);
// processing rows (skipping the fixed columns, then dedicating a new row for each pivoted value)
for (i=fixRows; i<data.length && data[i].length > 0; i++)
{
// skip totally empty or only whitespace containing rows
if (data[i].join('').replace(/\s+/g,'').length == 0 ) continue;
// unpivot the row
row = [];
for (j=0;j<fixColumns && j<data[i].length;j++)
row.push(data[i][j]);
for (j=fixColumns;j<data[i].length;j+=uniqueCols)
ret.push(
row.concat([data[0][j]]) // the first row title value
.concat(data[i].slice(j,j+uniqueCols)) // pivoted values
);
}
return ret;
}
That is basically array manipulation... below is a code that does what you want and writes back the result below existing data.
You can of course adapt it to write on a new sheet if you prefer.
function transformData(){
var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getDataRange().getValues();//read whole sheet
var output = [];
var headers = data.shift();// get headers
var empty = headers.shift();//remove empty cell on the left
var products = [];
for(var d in data){
var p = data[d].shift();//get product names in first column of each row
products.push(p);//store
}
Logger.log('headers = '+headers);
Logger.log('products = '+products);
Logger.log('data only ='+data);
for(var h in headers){
for(var p in products){ // iterate with 2 loops (headers and products)
var row = [];
row.push(headers[h]);
row.push(products[p]);
row.push(data[p][h])
output.push(row);//collect data in separate rows in output array
}
}
Logger.log('output array = '+output);
sheet.getRange(sheet.getLastRow()+1,1,output.length,output[0].length).setValues(output);
}
to automatically write the result in a new sheet replace last line of code with these :
var ns = SpreadsheetApp.getActive().getSheets().length+1
SpreadsheetApp.getActiveSpreadsheet().insertSheet('New Sheet'+ns,ns).getRange(1,1,output.length,output[0].length).setValues(output);
google-sheets-formula
With the advent of new LAMBDA and MAKEARRAY functions, we can unpivot the data without string manipulation. This works by creating a sequence of appropriate index numbers for the new array, which should be faster than string manipulation.
=ARRAYFORMULA(LAMBDA(range,s_cols,
QUERY(
MAKEARRAY(ROWS(range)*(COLUMNS(range)-s_cols),s_cols+1,
LAMBDA(i,j,
TO_TEXT(
INDEX(range,
ROUNDDOWN(1+(i-1)/(COLUMNS(range)-s_cols)),
if(j>s_cols,MOD(i-1,COLUMNS(range)-s_cols)+s_cols+1,j)
)
)
)
),"where Col"&s_cols+1&" is not null"
)
)(A1:C10,2))
Or as a named function(UNPIVOT(range,s_cols)):
=ARRAYFORMULA(
QUERY(
MAKEARRAY(ROWS(range)*(COLUMNS(range)-s_cols),s_cols+1,
LAMBDA(i,j,
TO_TEXT(
INDEX(range,
ROUNDDOWN(1+(i-1)/(COLUMNS(range)-s_cols)),
if(j>s_cols,MOD(i-1,COLUMNS(range)-s_cols)+s_cols+1,j)
)
)
)
),"where Col"&s_cols+1&" is not null"
)
)
Arguments:
range: The range to unpivot. Eg:A1:C10
s_cols: The number of static columns on the left.Eg:2
google-apps-script
Using simple, yet powerful loops on V8 engine:
/**
* Unpivots the given data
*
* #return Unpivoted data from array
* #param {A1:C4} arr 2D Input Array
* #param {1=} ignoreCols [optional] Number of columns on the left to ignore
* #customfunction
*/
const unpivot = (arr, ignoreCols = 1) =>
((j, out) => {
while (++j < arr[0].length)
((i) => {
while (++i < arr.length)
out.push([arr[0][j], ...arr[i].slice(0, ignoreCols), arr[i][j]]);
})(0);
return out;
})(ignoreCols - 1, []);
Usage:
=UNPIVOT(A1:C4)
=UNPIVOT(A1:F4,3)//3 static cols on left
={{"Customer","Products","Qty"};UNPIVOT(A1:D4)}//add headers
Live demo:
/*<ignore>*/console.config({maximize:true,timeStamps:false,autoScroll:false});/*</ignore>*/
const arr = [
[' ', ' Customer 1 ', ' Customer 2 ', ' Customer 3'],
['Product 1 ', ' 1 ', ' 2 ', ' 3'],
['Product 2 ', ' 4 ', ' 5 ', ' 6'],
['Product 3 ', ' 7 ', ' 8 ', ' 9'],
];
console.log("Input table")
console.table(arr)
/**
* Unpivots the given data
*
* #return Unpivoted data from array
* #param {A1:C4} arr 2D Input Array
* #param {1=} ignoreCols [optional] Number of columns on the left to ignore
* #customfunction
*/
const unpivot = (arr, ignoreCols = 1) =>
((j, out) => {
while (++j < arr[0].length)
((i) => {
while (++i < arr.length)
out.push([arr[0][j], ...arr[i].slice(0, ignoreCols), arr[i][j]]);
})(0);
return out;
})(ignoreCols - 1, []);
console.log("Output table")
console.table(unpivot(arr));
console.log("Output table with 2 static columns")
console.table(unpivot(arr,2));
<!-- https://meta.stackoverflow.com/a/375985/ --> <script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>
Check history for older deprecated functions
Use FLATTEN. It converts any array into single column.
Here's the formula for unpivot:
=ARRAYFORMULA(SPLIT(FLATTEN(A2:A12&"💣"&B1:F1&"💣"&B2:F12),"💣"))
FLATTEN creates 1-column array of Item1💣Date1💣67455 strings, which we then split.
Please copy the sample file to try.
Shorter:
=index(SPLIT(FLATTEN(A2:A12&"💣"&B1:F1&"💣"&B2:F12),"💣"))
Please also see this solution.
It uses INDIRECT and settings, so the formula looks like a more general solution:
I didn't think you had enough array formula answers so here's another one.
Test Data (Sheet 1)
Formula for customer
=ArrayFormula(hlookup(int((row(indirect("1:"&Tuples))-1)/Rows)+2,{COLUMN(Sheet1!$1:$1);Sheet1!$1:$1},2))
(uses a bit of math to make it repeat and hlookup to find correct column in column headers)
Formula for product
=ArrayFormula(vlookup(mod(row(indirect("1:"&Tuples))-1,Rows)+2,{row(Sheet1!$A:$A),Sheet1!$A:$A},2))
(similar approach using mod and vlookup to find correct row in row headers)
Formula for quantity
=ArrayFormula(vlookup(mod(row(indirect("1:"&Tuples))-1,Rows)+2,{row(Sheet1!$A:$A),Sheet1!$A:$Z},int((row(indirect("1:"&Tuples))-1)/Rows)+3))
(extension of above approach to find both row and column in 2d array)
Then combining these three formulas into a query to filter out any blank values for quantity
=ArrayFormula(query(
{hlookup(int((row(indirect("1:"&Tuples))-1)/Rows)+2, {COLUMN(Sheet1!$1:$1);Sheet1!$1:$1},2),
vlookup(mod(row(indirect("1:"&Tuples))-1,Rows)+2,{row(Sheet1!$A:$A),Sheet1!$A:$A},2),
vlookup(mod(row(indirect("1:"&Tuples))-1,Rows)+2,{row(Sheet1!$A:$A),Sheet1!$A:$Z},int((row(indirect("1:"&Tuples))-1)/Rows)+3)},
"select * where Col3 is not null"))
Note
The named ranges Rows and Cols are obtained from the first column and row of the data using counta and Tuples is their product. The separate formulas
=counta(Sheet1!A:A)
=counta(Sheet1!1:1)
and
=counta(Sheet1!A:A)*counta(Sheet1!1:1)
could be included in the main formula if required with some loss of readability.
For reference, here is the 'standard' split/join solution (with 50K data limit) adapted for the present situation:
=ArrayFormula(split(transpose(split(textjoin("♫",true,transpose(if(Sheet1!B2:Z="","",Sheet1!B1:1&"♪"&Sheet1!A2:A&"♪"&Sheet1!B2:Z))),"♫")),"♪"))
This is also fairly slow (processing 2401 array elements). If you restrict the computation to the actual dimensions of the data, it is much faster for small datasets:
=ArrayFormula(split(transpose(split(textjoin("♫",true,transpose(if(Sheet1!B2:index(Sheet1!B2:Z,counta(Sheet1!A:A),counta(Sheet1!1:1))="","",Sheet1!B1:index(Sheet1!B1:1,counta(Sheet1!1:1))&"♪"&Sheet1!A2:index(Sheet1!A2:A,counta(Sheet1!A:A))&"♪"&Sheet1!B2:index(Sheet1!B2:Z,counta(Sheet1!A:A),counta(Sheet1!1:1))))),"♫")),"♪"))
=ARRAYFORMULA({"Customer", "Product", "Qty";
QUERY(TRIM(SPLIT(TRANSPOSE(SPLIT(TRANSPOSE(QUERY(TRANSPOSE(QUERY(TRANSPOSE(
IF(B2:Z<>"", B1:1&"♠"&A2:A&"♠"&B2:Z&"♦", )), , 999^99)), , 999^99)), "♦")), "♠")),
"where Col1<>'' order by Col1")})
Here another alternative:
=arrayformula
(
{ "PRODUCT","CUSTOMER","QTY";
split
( transpose ( split
( textjoin("✫" ,false,filter(Sheet2!A2:A,Sheet2!A2:A<>"") & "✤" &
filter(Sheet2!B1:1,Sheet2!B1:1<>""))
,"✫",true,false)),"✤",true,false
),
transpose ( split ( textjoin ( "✤", false, transpose ( filter
(
indirect( "Sheet2!B2:" & MID(address(1,COUNTA( Sheet2!B1:1)+1), 2,
FIND("$",address(1,COUNTA( Sheet2!B1:1)+1),2)-2)
)
, Sheet2!A2:A<>""
))),"✤",true,false)
)
}
)
Explanation:
1. "PRODUCT","CUSTOMER","QTY"
-- Use for giving title
2. split
( transpose ( split
( textjoin("✫" ,false,filter(Sheet2!A2:A,Sheet2!A2:A<>"") & "✤" &
filter(Sheet2!B1:1,Sheet2!B1:1<>""))
,"✫",true,false)),"✤",true,false
)
-- Use for distributing Row1 and ColumnA, to be Product and Customer Columns
3. transpose ( split ( textjoin ( "✤", false, transpose ( filter
(
indirect( "Sheet2!B2:" & MID(address(1,COUNTA( Sheet2!B1:1)+1), 2,
FIND("$",address(1,COUNTA( Sheet2!B1:1)+1),2)-2)
)
, Sheet2!A2:A<>""
))),"✤",true,false)
)
--use to distributed data qty to Qty Column
Sheet2 Pict:
Result Sheet Pict:
Input Sheet
This function will handle many customers and many products and it will sum the quantities of multiple customer/product entries and summarize it into one simple table.
The Code:
function rPVT() {
var ss=SpreadsheetApp.getActive();
var sh=ss.getSheetByName('Sheet1');
var osh=ss.getSheetByName('Sheet2');
osh.clearContents();
var vA=sh.getDataRange().getValues();
var itoh={};
var pObj={};
vA[0].forEach(function(h,i){if(h){itoh[i]=h;}});
for(var i=1;i<vA.length;i++) {
for(var j=1;j<vA[i].length;j++) {
if(!pObj.hasOwnProperty(itoh[j])){pObj[itoh[j]]={};}
if(!pObj[itoh[j]].hasOwnProperty(vA[i][0])){pObj[itoh[j]][vA[i][0]]=vA[i][j];}else{pObj[itoh[j]][vA[i][0]]+=(vA[i][j]);}
}
}
var oA=[['Customer','Product','Quantity']];
Object.keys(pObj).forEach(function(ik){Object.keys(pObj[ik]).forEach(function(jk){oA.push([ik,jk,pObj[ik][jk]]);});});
osh.getRange(1,1,oA.length,oA[0].length).setValues(oA);
}
Output Sheet:
The following function reads Sheet2 which is the output of the above function and returns it to the original format.
function PVT() {
var ss=SpreadsheetApp.getActive();
var sh2=ss.getSheetByName('Sheet2');
var sh3=ss.getSheetByName('Sheet3');
sh3.clearContents();
var vA=sh2.getRange(2,1,sh2.getLastRow()-1,sh2.getLastColumn()).getValues();
pObj={};
vA.forEach(function(r,i){if(!pObj.hasOwnProperty(r[1])){pObj[r[1]]={};}if(!pObj[r[1]].hasOwnProperty(r[0])){pObj[r[1]][r[0]]=r[2];}else{pObj[r[1]][r[0]]+=r[2];}});
var oA=[];
var ikeys=Object.keys(pObj);
var jkeys=Object.keys(pObj[ikeys[0]]);
var hkeys=jkeys.slice();
hkeys.unshift('');
oA.push(hkeys);
ikeys.forEach(function(ik,i){var row=[];row.push(ik);jkeys.forEach(function(jk,j){row.push(pObj[ik][jk]);});oA.push(row);});
sh3.getRange(1,1,oA.length,oA[0].length).setValues(oA);
}
If your data has a single unique key column, this spreadsheet may have what you need.
Your unpivot sheet will contain:
The key column =OFFSET(data!$A$1,INT((ROW()-2)/5)+1,0)
The column header column =OFFSET(data!$A$1,0,IF(MOD(ROW()-1,5)=0,5,MOD(ROW()-1,5)))
The cell value column =INDEX(data!$A$1:$F$100,MATCH(A2,data!$A$1:$A$100,FALSE),MATCH(B2,data!$A$1:$F$1,FALSE))
where 5 is the number of columns to unpivot.
I did not make the spreadsheet. I happened across it in the same search that led me to this question.
One range refrence
This will work regardless of the number of customers and products. with one range reference in this case (A1:D4)
=ArrayFormula({SPLIT("Customer|Product|Qty","|");
QUERY(LAMBDA(r,SPLIT(FLATTEN(
QUERY({r}, " Select Col1 ", 1)&"+"&
QUERY({r}, " select "& TEXTJOIN(",",1,REGEXREPLACE("Col#", "#", SEQUENCE(COLUMNS(QUERY(r, " select * limit 0 ", 1))-1,1,2,1)&""))&" limit 0 ", 1)&"+"&
QUERY({QUERY({r}, " Select "& TEXTJOIN(",",1,REGEXREPLACE("Col#", "#", SEQUENCE(COLUMNS(QUERY({r}, " select * where Col1 <> '' ", 1))-1,1,2,1)&""))&" ", 0)},
" Select * where Col1 is not null ")),"+"))(A1:D4)," Select * Where Col2 <> '' ")})
Demonstration
This woks well when you have this table "on the left" as an output of another formula.
in this case simulated with the range A1:G15
20 columns and hundreds of rows
Named function
Pending...
Used formulas help
ARRAYFORMULA - SPLIT - QUERY - LAMBDA - FLATTEN - TEXTJOINREGEXREPLACE - SEQUENCE - COLUMNS - NOT

Should this be a SUMIF formula?

I'm trying to make a formula that can recognize in Column A the name Brooke B for instance here, from there I'd like to SUM the values listed in Column I Cash Discounts for that specific user.
(Yes this user has no Cash Discounts, thus column I states "Non-Cash Payment").
There's about 80 users total here, so I'd prefer to automate the name recognition in Column A.
Sheet: https://docs.google.com/spreadsheets/d/1xzzHT7VjG24UJ4ZXaiZWsfzroTpn7jCJLexuTOf6SQs/edit?usp=sharing
Desired Results listed in Cash Discounts sheet, listed per user in column C.
You are trying to calculate the total amount of the Cash Discount per person given to people in a list. You have data that has been exported from a POS system to which that you have added a formula to calculate the amout of the discount on a line by line basis. You have speculated whether the discount totals could be calculated using SUMIFS formulae.
In my view, the layout of the spreadsheet and the format of the POS report do not lend themselves to isolating discrete data elements though Google sheets functions (though, no doubt, someone with greater skills than I will disprove this theory). Column A, containing names, also includes sub-groupings (and their sub-totals) as well as transaction dates. There are 83 unique persons and over 31,900 transaction lines.
This answer is a script-based solution which updates a sheet with the names and values of the discount totals. The elapsed execution time is #11 seconds.
function so5882893202() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
// get the Discounts sheet
var discsheetname = "Discounts";
var disc = ss.getSheetByName(discsheetname);
//get the Discounts data
var discStartrow = 3;
var discLR = disc.getLastRow();
var discRange = disc.getRange(discStartrow, 1, discLR-discStartrow+1, 9);
var discValues = discRange.getValues();
// isolate Column A
var discnameCol = discValues.map(function(e){return e[0];});//[[e],[e],[e]]=>[e,e,e]
//Logger.log(discnameCol); // DEBUG
// isolate Column I
var discDiscounts = discValues.map(function(e){return e[8];});//[[e],[e],[e]]=>[e,e,e]
//Logger.log(discDiscounts); // DEBUG
// create an array to build a names list
var names =[]
// get the number of rows on the Discounts sheet
var discNumrows = discLR-discStartrow+1;
// Logger.log("DEBUG: number of rows = "+discNumrows);
// identify search terms
var searchPercent = "%";
var searchTotal = "Total";
// loop through Column A
for (var i=0; i<discNumrows; i++){
//Logger.log("DEBUG: i="+i+", content = "+discnameCol[i]);
// test if value is a date
if (Object.prototype.toString.call(discnameCol[i]) != "[object Date]") {
//Logger.log("it isn't a date")
// test whether the value contains a % sign
if ( discnameCol[i].indexOf(searchPercent) === -1){
//Logger.log("it doesn't have a % character in the content");
// test whether the value contains the word Total
if ( discnameCol[i].indexOf(searchTotal) === -1){
//Logger.log("it doesn't have the word total in the content");
// test whether the value is a blank
if (discnameCol[i] != ""){
//Logger.log("it isn't empty");
// this is a name; add it to the list
names.push(discnameCol[i])
}// end test for empty
}// end test for Total
} // end for percentage
} // end test for date
}// end for
//Logger.log(names);
// get the number of names
var numnames = names.length;
//Logger.log("DEBUG: number of names = "+numnames)
// create an array for the discount details
var discounts=[];
// loop through the names
for (var i=0;i<numnames;i++){
// Logger.log("DEBUG: name = "+names[i]);
// get the first row and last rows for this name
var startrow = discnameCol.indexOf(names[i]);
var endrow = discnameCol.lastIndexOf(names[i]+" Total:");
var x = 0;
var value = 0;
// Logger.log("name = "+names[i]+", start row ="+ startrow+", end row = "+endrow);
// loop through the Cash Discounts Column (Column I) for this name
// from the start row to the end row
for (var r = startrow; r<endrow;r++){
// get the vaue of the cell
value = discDiscounts[r];
// test that it is a value
if (!isNaN(value)){
// increment x by the value
x = +x+value;
// Logger.log("DEBUG: r = "+r+", value = "+value+", x = "+x);
}
}
// push the name and the total discount onto the array
discounts.push([names[i],x]);
}
//Logger.log(discounts)
// get the reporting sheet
var reportsheet = "Sheet10";
var report = ss.getSheetByName(reportsheet);
// define the range (allow row 1 for headers)
var reportRange = report.getRange(2,1,numnames,2);
// clear any existing content
reportRange.clearContent();
//update the values
reportRange.setValues(discounts);
}
Report Sheet - extract
Not everyone wants a script solution to their problem. This answer seeks to supply a repeatable solution using common garden-variety formula/functions.
As noted elsewhere, the layout of the spreadsheet does not lend itself to a quick/simple solution, but it IS possible to break down the data to compile a non-script answer. Though it may "seem" as though the following formula are less than "simple, when taken one-at-a-time they are logical, very easy to create, and very easy to verify successful outcomes.
Note: It is important to know at the outset that the first row of data = row#3, and the last row of data = row#31916.
Step#1 - get Text values from ColumnA
Enter this formula in Cell J3, and copy to row 31916
=if(isdate(A3),"",A3):
evaluates Column A, if the content is a date, returns blank, otherwise, returns the context
Taking Customer "AJ" as an example, the content at this point includes:
AJ
10% BuildingDiscount
10% BuildingDiscount Total:
Northwestern 10%
Northwestern 10% Total:
AJ Total:
Step#2 - ignore the values that contain "10%" (this removes both headings and sub-subtotals
Enter this formula in Cell K3 and copy to row 31916
=iferror(if(search("10%",J3)>0,"",J3),J3): searches for "10%" in Column J. Returns all values except those that containing "10%".
Taking Customer "AJ" as an example, the content at this point includes:
AJ
AJ Total:
**Step#3 - ignore the values that contain the word "Total"
Enter this formula in Cell L3 and copy to row 31916.
=iferror(if(search("total",K3)>0,"",K3),K3)
Taking Customer "AJ" as an example, the content at this point includes:
AJ
Results after Step#3
You might wonder, "couldn't this be done in a single formula?" and/or "an array formula would be more efficent". Both those thoughts are true, but we're looking at simple and easy, and a single formula is NOT simple (as shown below); and given that, an array formula is out-of-the-question unless/until an expert can wave a magic wand over the data.
FWIW - Combining Steps#1, 2 & 3
each of the Steps#1, 2 and 3 build on each other. So it is possible to create a single formula that combines these steps.
enter this formula in Cell J3, and copy dow to row #31916.
=iferror(if(search("total",iferror(if(search("10%",if(isdate(A3),"",A3))>0,"",if(isdate(A3),"",A3)),if(isdate(A3),"",A3)))>0,"",iferror(if(search("10%",if(isdate(A3),"",A3))>0,"",if(isdate(A3),"",A3)),if(isdate(A3),"",A3))),iferror(if(search("10%",if(isdate(A3),"",A3))>0,"",if(isdate(A3),"",A3)),if(isdate(A3),"",A3)))
As the image showed, step#3 concludes with mainly empty cells in Column L; the only populated cell is the first instance of the customer name at the start of their transactions - such as "Alec" in this example. However (props to #Rubén) it is possible to populate the blank transaction Cells in Column L. An arrayformula to find the previous non-empty cell in another column on Webapps explains how.
Step#4 - Create a customer name for each transaction row.
Enter this formula in Cell M3, it will automatically populate the cells to row#31916
=ArrayFormula(vlookup(ROW(3:31916),{IF(LEN(L3:L31916)>0,ROW(3:31916),""),L3:L31916},2))
Step#5 - Get the discount amount for each transaction value
The discount values are already displayed in Column I. They are interspersed with text values, so the formula for tests if this is a total line by testing the value in Column D; only if there is a vale (Product item) does the formula then test of there is a value in column I.
Enter this formula in Cell N3, it will automatically populate the cells to row#31916
=ArrayFormula(if(len(D3:D31914)>0,if(ISNUMBER(I3:I31916),I3:I31916,0),""))
Screenshot after step#5
Reporting by Query
Reporting is done via queries. These can go anywhere, but it is probably more convenient to put it on a separate sheet.
Step#6.1 - query the results to create report showing total by ALL customers
=query(Discounts_analysis!$M$2:$N$31916,"select M, sum(N) where N is not null group by M label M 'Customer', sum(N) 'Total Discount' ",1)
Step#6.2 - query the results to create report showing total by customer where the customer received a discount
=query(Discounts_analysis!$M$2:$N$31916,"select M, sum(N) where N >0 group by M label M 'Customer', sum(N) 'Total Discount' ",1)
Step#6.3 - query the results to create report showing customers with no discount
- `=query(query(Discounts_analysis!$M$2:$N$31916,"select M, sum(N) where N is not null group by M label M 'Customer', sum(N) 'Total Discount' ",1),"select Col1 where Col2=0")`
Queries screenshot

Create a Trigger to genearate a random alphanumeric string in informix

To create a trigger before insert using Informix database.
When we try to insert a record into the table it should insert random alphanumeric string into one of the field. Are there any built in functions?
The table consists of the following fields:
empid serial NOT NULL
age int
empcode varchar(10)
and I am running
insert into employee(age) values(10);
The expected output should be something as below:
id age empcode
1, 10, asf123*
Any help is appreciated.
As already commented there is no existing function to create a random string however it is possible to generate random numbers and then convert these to characters. To create the random numbers you can either create a UDR wrapper to a C function such as random() or register the excompat datablade and use the dbms_random_random() function.
Here is an example of a user-defined function that uses the dbs_random_random() function to generate a string of ASCII alphanumeric characters:
create function random_string()
returning varchar(10)
define s varchar(10);
define i, n int;
let s = "";
for i = 1 to 10
let n = mod(abs(dbms_random_random()), 62);
if (n < 10)
then
let n = n + 48;
elif (n < 36)
then
let n = n + 55;
else
let n = n + 61;
end if
let s = s || chr(n);
end for
return s;
end function;
This function can then be called from an insert trigger to populate the empcode column of your table.

Resources