Nested stored procedures - stored-procedures

Nested stored procedures - stored-procedures

My goal is: I have three tables dev1, PICK1, rul1, and I need my information interact between them,
An example would be this: according to the results of select rul1 table, took certain lines from dev1 table and insert them into PICK1,
then the lines of the table dev1 have to be updated to avoid getting selected again in the next select from rul1
This is the structure of the tables:
dev1 {
[delivery]
[inlist]
[scot]}
rul1{
[group],
[field],
[logical_condition]
[value]}
PICK1 {
[id_delivery]
[group_num]
[lock]}
these are some of the values that I have in rul1
[group] [field] [logical_condition] [value]
1 scot = 1
1 inlist = 0
2 scot = 2
2 inlist = 0
3 scot = 3
Then when I run this stored procedure:
BEGIN
DECLARE # max int, # count int<br />
SET # count = 1<br />
SELECT # max = max ([group]) from rul1 group by [group]
while (# count <= # max)<br />
BEGIN<br />
select field + '' + logical_condition + '' + value as [rule] from rul1 where [group]= #count<br />
SET # count = (# count + 1)<br />
END
END
I get this results for each group (1, 2, 3):
scot = 1
inlist = 0
scot = 2
inlist = 0
scot = 3
Now I want to put these results into a string to be part of the clause "where" to select the correct lines from dev1 and insert them into PICK1
then update on dev1, I guess this must be every time when I run the while of the procedure, but I dont know how to nest store procedures or
if this can be completed by other method.
every assistance will be welcome
thank you very much

Related

Advance manual unpivot of data table [duplicate]

I am trying to produce a "reverse pivot" function. I have searched long and hard for such a function, but cannot find one that is already out there.
I have a summary table with anywhere up to 20 columns and hundreds of rows, however I would like to convert it into a flat list so I can import to a database (or even use the flat data to create more pivot tables from!)
So, I have data in this format:
Customer 1
Customer 2
Customer 3
Product 1
1
2
3
Product 2
4
5
6
Product 3
7
8
9
And need to convert it to this format:
Customer | Product | Qty
-----------+-----------+----
Customer 1 | Product 1 | 1
Customer 1 | Product 2 | 4
Customer 1 | Product 3 | 7
Customer 2 | Product 1 | 2
Customer 2 | Product 2 | 5
Customer 2 | Product 3 | 8
Customer 3 | Product 1 | 3
Customer 3 | Product 2 | 6
Customer 3 | Product 3 | 9
I have created a function that will read the range from sheet1 and append the re-formatted rows at the bottom of the same sheet, however I am trying to get it working so I can have the function on sheet2 that will read the whole range from sheet1.
No matter what I try, I can't seem to get it to work, and was wondering if anybody could give me any pointers?
Here is what I have so far:
function readRows() {
var sheet = SpreadsheetApp.getActiveSheet();
var rows = sheet.getDataRange();
var numRows = rows.getNumRows();
var values = rows.getValues();
heads = values[0]
for (var i = 1; i <= numRows - 1; i++) {
for (var j = 1; j <= values[0].length - 1; j++) {
var row = [values[i][0], values[0][j], values[i][j]];
sheet.appendRow(row)
}
}
};

I wrote a simple general custom function, which is 100% reusable you can unpivot / reverse pivot a table of any size.
In your case you could use it like this: =unpivot(A1:D4,1,1,"customer","sales")
So you can use it just like any built-in array function in spreadsheet.
Please see here 2 examples:
https://docs.google.com/spreadsheets/d/12TBoX2UI_Yu2MA2ZN3p9f-cZsySE4et1slwpgjZbSzw/edit#gid=422214765
The following is the source:
/**
* Unpivot a pivot table of any size.
*
* #param {A1:D30} data The pivot table.
* #param {1} fixColumns Number of columns, after which pivoted values begin. Default 1.
* #param {1} fixRows Number of rows (1 or 2), after which pivoted values begin. Default 1.
* #param {"city"} titlePivot The title of horizontal pivot values. Default "column".
* #param {"distance"[,...]} titleValue The title of pivot table values. Default "value".
* #return The unpivoted table
* #customfunction
*/
function unpivot(data,fixColumns,fixRows,titlePivot,titleValue) {
var fixColumns = fixColumns || 1; // how many columns are fixed
var fixRows = fixRows || 1; // how many rows are fixed
var titlePivot = titlePivot || 'column';
var titleValue = titleValue || 'value';
var ret=[],i,j,row,uniqueCols=1;
// we handle only 2 dimension arrays
if (!Array.isArray(data) || data.length < fixRows || !Array.isArray(data[0]) || data[0].length < fixColumns)
throw new Error('no data');
// we handle max 2 fixed rows
if (fixRows > 2)
throw new Error('max 2 fixed rows are allowed');
// fill empty cells in the first row with value set last in previous columns (for 2 fixed rows)
var tmp = '';
for (j=0;j<data[0].length;j++)
if (data[0][j] != '')
tmp = data[0][j];
else
data[0][j] = tmp;
// for 2 fixed rows calculate unique column number
if (fixRows == 2)
{
uniqueCols = 0;
tmp = {};
for (j=fixColumns;j<data[1].length;j++)
if (typeof tmp[ data[1][j] ] == 'undefined')
{
tmp[ data[1][j] ] = 1;
uniqueCols++;
}
}
// return first row: fix column titles + pivoted values column title + values column title(s)
row = [];
for (j=0;j<fixColumns;j++) row.push(fixRows == 2 ? data[0][j]||data[1][j] : data[0][j]); // for 2 fixed rows we try to find the title in row 1 and row 2
for (j=3;j<arguments.length;j++) row.push(arguments[j]);
ret.push(row);
// processing rows (skipping the fixed columns, then dedicating a new row for each pivoted value)
for (i=fixRows; i<data.length && data[i].length > 0; i++)
{
// skip totally empty or only whitespace containing rows
if (data[i].join('').replace(/\s+/g,'').length == 0 ) continue;
// unpivot the row
row = [];
for (j=0;j<fixColumns && j<data[i].length;j++)
row.push(data[i][j]);
for (j=fixColumns;j<data[i].length;j+=uniqueCols)
ret.push(
row.concat([data[0][j]]) // the first row title value
.concat(data[i].slice(j,j+uniqueCols)) // pivoted values
);
}
return ret;
}

That is basically array manipulation... below is a code that does what you want and writes back the result below existing data.
You can of course adapt it to write on a new sheet if you prefer.
function transformData(){
var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getDataRange().getValues();//read whole sheet
var output = [];
var headers = data.shift();// get headers
var empty = headers.shift();//remove empty cell on the left
var products = [];
for(var d in data){
var p = data[d].shift();//get product names in first column of each row
products.push(p);//store
}
Logger.log('headers = '+headers);
Logger.log('products = '+products);
Logger.log('data only ='+data);
for(var h in headers){
for(var p in products){ // iterate with 2 loops (headers and products)
var row = [];
row.push(headers[h]);
row.push(products[p]);
row.push(data[p][h])
output.push(row);//collect data in separate rows in output array
}
}
Logger.log('output array = '+output);
sheet.getRange(sheet.getLastRow()+1,1,output.length,output[0].length).setValues(output);
}
to automatically write the result in a new sheet replace last line of code with these :
var ns = SpreadsheetApp.getActive().getSheets().length+1
SpreadsheetApp.getActiveSpreadsheet().insertSheet('New Sheet'+ns,ns).getRange(1,1,output.length,output[0].length).setValues(output);

google-sheets-formula
With the advent of new LAMBDA and MAKEARRAY functions, we can unpivot the data without string manipulation. This works by creating a sequence of appropriate index numbers for the new array, which should be faster than string manipulation.
=ARRAYFORMULA(LAMBDA(range,s_cols,
QUERY(
MAKEARRAY(ROWS(range)*(COLUMNS(range)-s_cols),s_cols+1,
LAMBDA(i,j,
TO_TEXT(
INDEX(range,
ROUNDDOWN(1+(i-1)/(COLUMNS(range)-s_cols)),
if(j>s_cols,MOD(i-1,COLUMNS(range)-s_cols)+s_cols+1,j)
)
)
)
),"where Col"&s_cols+1&" is not null"
)
)(A1:C10,2))
Or as a named function(UNPIVOT(range,s_cols)):
=ARRAYFORMULA(
QUERY(
MAKEARRAY(ROWS(range)*(COLUMNS(range)-s_cols),s_cols+1,
LAMBDA(i,j,
TO_TEXT(
INDEX(range,
ROUNDDOWN(1+(i-1)/(COLUMNS(range)-s_cols)),
if(j>s_cols,MOD(i-1,COLUMNS(range)-s_cols)+s_cols+1,j)
)
)
)
),"where Col"&s_cols+1&" is not null"
)
)
Arguments:
range: The range to unpivot. Eg:A1:C10
s_cols: The number of static columns on the left.Eg:2
google-apps-script
Using simple, yet powerful loops on V8 engine:
/**
* Unpivots the given data
*
* #return Unpivoted data from array
* #param {A1:C4} arr 2D Input Array
* #param {1=} ignoreCols [optional] Number of columns on the left to ignore
* #customfunction
*/
const unpivot = (arr, ignoreCols = 1) =>
((j, out) => {
while (++j < arr[0].length)
((i) => {
while (++i < arr.length)
out.push([arr[0][j], ...arr[i].slice(0, ignoreCols), arr[i][j]]);
})(0);
return out;
})(ignoreCols - 1, []);
Usage:
=UNPIVOT(A1:C4)
=UNPIVOT(A1:F4,3)//3 static cols on left
={{"Customer","Products","Qty"};UNPIVOT(A1:D4)}//add headers
Live demo:
/*<ignore>*/console.config({maximize:true,timeStamps:false,autoScroll:false});/*</ignore>*/
const arr = [
[' ', ' Customer 1 ', ' Customer 2 ', ' Customer 3'],
['Product 1 ', ' 1 ', ' 2 ', ' 3'],
['Product 2 ', ' 4 ', ' 5 ', ' 6'],
['Product 3 ', ' 7 ', ' 8 ', ' 9'],
];
console.log("Input table")
console.table(arr)
/**
* Unpivots the given data
*
* #return Unpivoted data from array
* #param {A1:C4} arr 2D Input Array
* #param {1=} ignoreCols [optional] Number of columns on the left to ignore
* #customfunction
*/
const unpivot = (arr, ignoreCols = 1) =>
((j, out) => {
while (++j < arr[0].length)
((i) => {
while (++i < arr.length)
out.push([arr[0][j], ...arr[i].slice(0, ignoreCols), arr[i][j]]);
})(0);
return out;
})(ignoreCols - 1, []);
console.log("Output table")
console.table(unpivot(arr));
console.log("Output table with 2 static columns")
console.table(unpivot(arr,2));
<!-- https://meta.stackoverflow.com/a/375985/ --> <script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>
Check history for older deprecated functions

Use FLATTEN. It converts any array into single column.
Here's the formula for unpivot:
=ARRAYFORMULA(SPLIT(FLATTEN(A2:A12&"💣"&B1:F1&"💣"&B2:F12),"💣"))
FLATTEN creates 1-column array of Item1💣Date1💣67455 strings, which we then split.
Please copy the sample file to try.
Shorter:
=index(SPLIT(FLATTEN(A2:A12&"💣"&B1:F1&"💣"&B2:F12),"💣"))
Please also see this solution.
It uses INDIRECT and settings, so the formula looks like a more general solution:

I didn't think you had enough array formula answers so here's another one.
Test Data (Sheet 1)
Formula for customer
=ArrayFormula(hlookup(int((row(indirect("1:"&Tuples))-1)/Rows)+2,{COLUMN(Sheet1!$1:$1);Sheet1!$1:$1},2))
(uses a bit of math to make it repeat and hlookup to find correct column in column headers)
Formula for product
=ArrayFormula(vlookup(mod(row(indirect("1:"&Tuples))-1,Rows)+2,{row(Sheet1!$A:$A),Sheet1!$A:$A},2))
(similar approach using mod and vlookup to find correct row in row headers)
Formula for quantity
=ArrayFormula(vlookup(mod(row(indirect("1:"&Tuples))-1,Rows)+2,{row(Sheet1!$A:$A),Sheet1!$A:$Z},int((row(indirect("1:"&Tuples))-1)/Rows)+3))
(extension of above approach to find both row and column in 2d array)
Then combining these three formulas into a query to filter out any blank values for quantity
=ArrayFormula(query(
{hlookup(int((row(indirect("1:"&Tuples))-1)/Rows)+2, {COLUMN(Sheet1!$1:$1);Sheet1!$1:$1},2),
vlookup(mod(row(indirect("1:"&Tuples))-1,Rows)+2,{row(Sheet1!$A:$A),Sheet1!$A:$A},2),
vlookup(mod(row(indirect("1:"&Tuples))-1,Rows)+2,{row(Sheet1!$A:$A),Sheet1!$A:$Z},int((row(indirect("1:"&Tuples))-1)/Rows)+3)},
"select * where Col3 is not null"))
Note
The named ranges Rows and Cols are obtained from the first column and row of the data using counta and Tuples is their product. The separate formulas
=counta(Sheet1!A:A)
=counta(Sheet1!1:1)
and
=counta(Sheet1!A:A)*counta(Sheet1!1:1)
could be included in the main formula if required with some loss of readability.
For reference, here is the 'standard' split/join solution (with 50K data limit) adapted for the present situation:
=ArrayFormula(split(transpose(split(textjoin("♫",true,transpose(if(Sheet1!B2:Z="","",Sheet1!B1:1&"♪"&Sheet1!A2:A&"♪"&Sheet1!B2:Z))),"♫")),"♪"))
This is also fairly slow (processing 2401 array elements). If you restrict the computation to the actual dimensions of the data, it is much faster for small datasets:
=ArrayFormula(split(transpose(split(textjoin("♫",true,transpose(if(Sheet1!B2:index(Sheet1!B2:Z,counta(Sheet1!A:A),counta(Sheet1!1:1))="","",Sheet1!B1:index(Sheet1!B1:1,counta(Sheet1!1:1))&"♪"&Sheet1!A2:index(Sheet1!A2:A,counta(Sheet1!A:A))&"♪"&Sheet1!B2:index(Sheet1!B2:Z,counta(Sheet1!A:A),counta(Sheet1!1:1))))),"♫")),"♪"))

=ARRAYFORMULA({"Customer", "Product", "Qty";
QUERY(TRIM(SPLIT(TRANSPOSE(SPLIT(TRANSPOSE(QUERY(TRANSPOSE(QUERY(TRANSPOSE(
IF(B2:Z<>"", B1:1&"♠"&A2:A&"♠"&B2:Z&"♦", )), , 999^99)), , 999^99)), "♦")), "♠")),
"where Col1<>'' order by Col1")})

Here another alternative:
=arrayformula
(
{ "PRODUCT","CUSTOMER","QTY";
split
( transpose ( split
( textjoin("✫" ,false,filter(Sheet2!A2:A,Sheet2!A2:A<>"") & "✤" &
filter(Sheet2!B1:1,Sheet2!B1:1<>""))
,"✫",true,false)),"✤",true,false
),
transpose ( split ( textjoin ( "✤", false, transpose ( filter
(
indirect( "Sheet2!B2:" & MID(address(1,COUNTA( Sheet2!B1:1)+1), 2,
FIND("$",address(1,COUNTA( Sheet2!B1:1)+1),2)-2)
)
, Sheet2!A2:A<>""
))),"✤",true,false)
)
}
)
Explanation:
1. "PRODUCT","CUSTOMER","QTY"
-- Use for giving title
2. split
( transpose ( split
( textjoin("✫" ,false,filter(Sheet2!A2:A,Sheet2!A2:A<>"") & "✤" &
filter(Sheet2!B1:1,Sheet2!B1:1<>""))
,"✫",true,false)),"✤",true,false
)
-- Use for distributing Row1 and ColumnA, to be Product and Customer Columns
3. transpose ( split ( textjoin ( "✤", false, transpose ( filter
(
indirect( "Sheet2!B2:" & MID(address(1,COUNTA( Sheet2!B1:1)+1), 2,
FIND("$",address(1,COUNTA( Sheet2!B1:1)+1),2)-2)
)
, Sheet2!A2:A<>""
))),"✤",true,false)
)
--use to distributed data qty to Qty Column
Sheet2 Pict:
Result Sheet Pict:

Input Sheet
This function will handle many customers and many products and it will sum the quantities of multiple customer/product entries and summarize it into one simple table.
The Code:
function rPVT() {
var ss=SpreadsheetApp.getActive();
var sh=ss.getSheetByName('Sheet1');
var osh=ss.getSheetByName('Sheet2');
osh.clearContents();
var vA=sh.getDataRange().getValues();
var itoh={};
var pObj={};
vA[0].forEach(function(h,i){if(h){itoh[i]=h;}});
for(var i=1;i<vA.length;i++) {
for(var j=1;j<vA[i].length;j++) {
if(!pObj.hasOwnProperty(itoh[j])){pObj[itoh[j]]={};}
if(!pObj[itoh[j]].hasOwnProperty(vA[i][0])){pObj[itoh[j]][vA[i][0]]=vA[i][j];}else{pObj[itoh[j]][vA[i][0]]+=(vA[i][j]);}
}
}
var oA=[['Customer','Product','Quantity']];
Object.keys(pObj).forEach(function(ik){Object.keys(pObj[ik]).forEach(function(jk){oA.push([ik,jk,pObj[ik][jk]]);});});
osh.getRange(1,1,oA.length,oA[0].length).setValues(oA);
}
Output Sheet:
The following function reads Sheet2 which is the output of the above function and returns it to the original format.
function PVT() {
var ss=SpreadsheetApp.getActive();
var sh2=ss.getSheetByName('Sheet2');
var sh3=ss.getSheetByName('Sheet3');
sh3.clearContents();
var vA=sh2.getRange(2,1,sh2.getLastRow()-1,sh2.getLastColumn()).getValues();
pObj={};
vA.forEach(function(r,i){if(!pObj.hasOwnProperty(r[1])){pObj[r[1]]={};}if(!pObj[r[1]].hasOwnProperty(r[0])){pObj[r[1]][r[0]]=r[2];}else{pObj[r[1]][r[0]]+=r[2];}});
var oA=[];
var ikeys=Object.keys(pObj);
var jkeys=Object.keys(pObj[ikeys[0]]);
var hkeys=jkeys.slice();
hkeys.unshift('');
oA.push(hkeys);
ikeys.forEach(function(ik,i){var row=[];row.push(ik);jkeys.forEach(function(jk,j){row.push(pObj[ik][jk]);});oA.push(row);});
sh3.getRange(1,1,oA.length,oA[0].length).setValues(oA);
}

If your data has a single unique key column, this spreadsheet may have what you need.
Your unpivot sheet will contain:
The key column =OFFSET(data!$A$1,INT((ROW()-2)/5)+1,0)
The column header column =OFFSET(data!$A$1,0,IF(MOD(ROW()-1,5)=0,5,MOD(ROW()-1,5)))
The cell value column =INDEX(data!$A$1:$F$100,MATCH(A2,data!$A$1:$A$100,FALSE),MATCH(B2,data!$A$1:$F$1,FALSE))
where 5 is the number of columns to unpivot.
I did not make the spreadsheet. I happened across it in the same search that led me to this question.

One range refrence
This will work regardless of the number of customers and products. with one range reference in this case (A1:D4)
=ArrayFormula({SPLIT("Customer|Product|Qty","|");
QUERY(LAMBDA(r,SPLIT(FLATTEN(
QUERY({r}, " Select Col1 ", 1)&"+"&
QUERY({r}, " select "& TEXTJOIN(",",1,REGEXREPLACE("Col#", "#", SEQUENCE(COLUMNS(QUERY(r, " select * limit 0 ", 1))-1,1,2,1)&""))&" limit 0 ", 1)&"+"&
QUERY({QUERY({r}, " Select "& TEXTJOIN(",",1,REGEXREPLACE("Col#", "#", SEQUENCE(COLUMNS(QUERY({r}, " select * where Col1 <> '' ", 1))-1,1,2,1)&""))&" ", 0)},
" Select * where Col1 is not null ")),"+"))(A1:D4)," Select * Where Col2 <> '' ")})
Demonstration
This woks well when you have this table "on the left" as an output of another formula.
in this case simulated with the range A1:G15
20 columns and hundreds of rows
Named function
Pending...
Used formulas help
ARRAYFORMULA - SPLIT - QUERY - LAMBDA - FLATTEN - TEXTJOINREGEXREPLACE - SEQUENCE - COLUMNS - NOT

Spark join hangs

I have a table with n columns that I'll call A. In this table there are three columns that i'll need:
vat -> String
tax -> String
card -> String
vat or tax can be null, but not at the same time.
For every unique couple of vat and tax there is at least one card.
I need to alter this table, adding a column count_card in which I put a text based on the number of cards every unique combination of tax and vat has.
So I've done this:
val cardCount = A.groupBy("tax", "vat").count
val sqlCard = udf((count: Int) => {
if (count > 1)
"MULTI"
else
"MONO"
})
val B = cardCount.withColumn(
"card_count",
sqlCard(cardCount.col("count"))
).drop("count")
In the table B I have three columns now:
vat -> String
tax -> String
card_count -> Int
and every operation on this DataFrame is smooth.
Now, because I wanted to import the new column in A table, i performed the following join:
val result = A.join(B,
B.col("tax")<=>A.col("tax") and
B.col("vat")<=>A.col("vat")
).drop(B.col("tax"))
.drop(B.col("vat"))
Expecting to have the original table A with the column card_count.
Problem is that the join hangs, getting all system resources blocking the pc.
Additional details:
Table A has ~1.5M elements and is read from parquet file;
Table B has ~1.3M elements.
System is a 8 thread and 30GB of RAM
Let me know what I'm doing wrong

At the end, I didn't found out which was the issue, so I changed approach
val cardCount = A.groupBy("tax", "vat").count
val cardCountSet = cardCount.filter(cardCount.col("count") > 1)
.rdd.map(r => r(0) + " " + r(1)).collect().toSet
val udfCardCount = udf((tax: String, vat:String) => {
if (cardCountSet.contains(tax + " " + vat))
"MULTI"
else
"MONO"
})
val result = A.withColumn("card_count",
udfCardCount(A.col("tax"), A.col("vat")))
If someone knows a better approach let me know it

Multipy after joining data in PIG

I am trying to multiply two fields and take their sum after joining three tables in Pig. However I keep on getting this error:
<file loyalty_program.pig, line 30, column 74> (Name: Multiply Type: null Uid: null)incompatible types in Multiply Operator left hand side:bag :tuple(new_details1::new_details::potential_customers::num_of_orders:long) right hand side:bag :tuple(products::price:int)
-- load the data sets
orders = LOAD '/dualcore/orders' AS (order_id:int,
cust_id:int,
order_dtm:chararray);
details = LOAD '/dualcore/order_details' AS (order_id:int,
prod_id:int);
products = LOAD '/dualcore/products' AS (prod_id:int,
brand:chararray,
name:chararray,
price:int,
cost:int,
shipping_wt:int);
recent = FILTER orders by order_dtm matches '2012-.*$';
customer = GROUP recent by cust_id;
cust_orders = FOREACH customer GENERATE group as cust_id, (int)COUNT(recent) as num_of_orders;
potential_customers = FILTER cust_orders by num_of_orders>=5;
new_details = join potential_customers by cust_id, recent by cust_id;
new_details1 = join new_details by order_id, details by order_id;
new_details2 = join new_details1 by prod_id, products by prod_id;
--DESCRIBE new_details2;
final_details = FOREACH new_details2 GENERATE potential_customers::cust_id, potential_customers::num_of_orders as num_of_orders,recent::order_id as order_id,recent::order_dtm,details::prod_id,products::brand,products::name,products::price as price,products::cost,products::shipping_wt;
grouped_data = GROUP final_details by cust_id;
member = FOREACH grouped_data GENERATE SUM(final_details.num_of_orders * final_details.price) ;
lim = limit member 10;
dump lim;
I even casted the result of count to int. It still keeps on throwing this error at me. I have no clue how to go about it.

Ok.. I think at first, you want to multiply no.of purchases with the price of each product and then you need total SUM of that multiplied value..
Even though this is a strange requirement, but you can go with below approach..
All you need to do is calculate the multiplication in final_details Foreach statement itself and simply apply the SUM for that multiplied amount..
Based on your load statements I created the below input files
main_orders.txt
6666,100,2012-01-01
7777,101,2012-09-02
8888,100,2012-01-09
9999,101,2012-12-08
6666,101,2012-09-02
9999,100,2012-07-12
9999,100,2012-08-01
6666,100,2012-01-02
7777,100,2012-09-09
orders_details.txt
6666,6000
7777,7000
8888,8000
9999,9000
main_products.txt
6000,Nike,Shoes,3000,3000,1
7000,Adidas,Cap,1000,1000,1
8000,Rebook,Shoes,4000,4000,1
9000,Puma,Shoes,25000,2500,1
Below is the code
orders = LOAD '/user/cloudera/inputfiles/main_orders.txt' USING PigStorage(',') AS (order_id:int,cust_id:int,order_dtm:chararray);
details = LOAD '/user/cloudera/inputfiles/orders_details.txt' USING PigStorage(',') AS (order_id:int,prod_id:int);
products = LOAD '/user/cloudera/inputfiles/main_products.txt' USING PigStorage(',') AS(prod_id:int,brand:chararray,name:chararray,price:int,cost:int,shipping_wt:int);
recent = FILTER orders by order_dtm matches '2012-.*';
customer = GROUP recent by cust_id;
cust_orders = FOREACH customer GENERATE group as cust_id, (int)COUNT(recent) as num_of_orders;
potential_customers = FILTER cust_orders by num_of_orders>=5;
new_details = join potential_customers by cust_id, recent by cust_id;
new_details1 = join new_details by order_id, details by order_id;
new_details2 = join new_details1 by prod_id, products by prod_id;
DESCRIBE new_details2;
final_details = FOREACH new_details2 GENERATE potential_customers::cust_id, potential_customers::num_of_orders as num_of_orders,recent::order_id as order_id,recent::order_dtm,details::prod_id,products::brand,products::name,products::price as price,products::cost,products::shipping_wt, (potential_customers::num_of_orders * products::price ) as multiplied_price;// multiplication is achived in last variable
dump final_details;
grouped_data = GROUP final_details by cust_id;
member = FOREACH grouped_data GENERATE SUM(final_details.multiplied_price) ;
lim = limit member 10;
dump lim;
Just for clarity I am dumping the output of final_details foreach statement as well.
(100,6,6666,2012-01-01,6000,Nike,Shoes,3000,3000,1,18000)
(100,6,6666,2012-01-02,6000,Nike,Shoes,3000,3000,1,18000)
(100,6,7777,2012-09-09,7000,Adidas,Cap,1000,1000,1,6000)
(100,6,8888,2012-01-09,8000,Rebook,Shoes,4000,4000,1,24000)
(100,6,9999,2012-07-12,9000,Puma,Shoes,25000,2500,1,150000)
(100,6,9999,2012-08-01,9000,Puma,Shoes,25000,2500,1,150000)
final output is below
(366000)
This code may help you, but Please clarify your requirement again

How can I count unique values in a table in Rails?

I have a table "stock" which consists of many package_ids
package_id = 1
package_id = 3
package_id = 2
package_id = 3
package_id = 3
package_id = 4
package_id = 2
What is the most elegant way to:
count each unique package_id in the db, e.g.: package_id 1 = one time
in the db; package_id 2 = two times in the db; package_id 3 = three
times in the DB ...
echo the top 3 of package IDs afterwards
I have tried this step by step:
counting each single package_id (Stock.where(:package_id => 1).count)
putting that all in an array
and sort that array from high to low (only first 3 items)
This however does not seems to be an effective path though.

How about:
Stock.group(:package_id).count
It will return a hash having package_id as a key and the count as a value:
{ package_id1: count1, package_id2: count2 ....}

Try with this to get the 'Top 3':
Stock.select('package_id, count(*) as c').group(:package_id).order('c DESC').limit(3)

Strange execution time for summary query

I am giving here part of the query I am executing:
SELECT SUM(ParentTable.Field1),
(SELECT SUM(ChildrenTable.Field1)
FROM ChildrenRable INNER JOIN
GrandChildrenTable ON ChildrenTable.Id = GrandChildrenTable.ChildrenTableId INNER JOIN
AnotherTable ON GrandChildrenTable.AnotherTableId = AnotherTable.Id
WHERE ChildrenTable.ParentBaleId = ParentTable.Id
AND AnotherTable.Type=1),
----
FROM ParentTable
WHERE some_conditions
Relationships:
ParentTable -> ChildrenTable = 1-to-many
ChildrenTable -> GrandChildrenTable = 1-to-many
GrandChildrenTable -> AnotherTable = 1-to-1
I am executing this query three times, while changing only the Type condition, and here are the results:
Number of records that are returned:
Condition Total execution time (ms)
Type = 1 : 973
Type = 2 : 78810
Type = 3 : 648318
If I execute just the inner join query, here is the count of joined records:
SELECT p.Type, COUNT(*)
FROM CycleActivities ca INNER JOIN
CycleActivityProducts cap ON ca.Id = CAP.CycleActivityId INNER JOIN
Products p ON cap.ProductId = p.Id
GROUP BY p.Type
Type
---- -----------
1 55152
2 13401
4 102730
So, why would the query with Type = 1 condition execute much faster than the query with Type = 2, although it is querying 4x larger resultset (Type is tinyint)?

The way your query is written instructs SQL Server to execute the sub-query with JOIN for every row of the output.
This way it should be faster, if I understand what you want correctly (UPDATED):
with cte_parent as (
select
Id,
SUM (ParentTable.Field1) as Parent_Sum
from ParentTable
group by Id
),
cte_child as (
SELECT
Id,
SUM (ChildrenTable.Field1) as as Child_Sum
FROM ChildrenRable
INNER JOIN
GrandChildrenTable ON ChildrenTable.Id = GrandChildrenTable.ChildrenTableId
INNER JOIN
AnotherTable ON GrandChildrenTable.AnotherTableId = AnotherTable.Id
WHERE
AnotherTable.Type=1
AND
some_conditions
GROUP BY Id
)
select cte_parent.id, Parent_Sum, Child_Sum
from parent_cte
join child_cte on parent_cte.id = child_cte.id

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Nested stored procedures - stored-procedures

Related

Advance manual unpivot of data table [duplicate]

Spark join hangs

Multipy after joining data in PIG

How can I count unique values in a table in Rails?

Strange execution time for summary query

Categories

Resources