Neo4j: Why difference in result? - neo4j

I have the Cypher query:
match(p:Product {StyleNumber : "Z94882A", Color: "Black"})--(stock:Stock {Retailer: "11"})
with sum(stock.Stockcount) as onstock, p
optional match(p)-->(s:Sale {Retailer : "11"})
where s.Date = 20170801
return p.Color,p.Size, onstock as stock, sum(s.Quantity) as sold
This gives correctly:
Color,Size,Stock,Sold
Black,M,3,0
Black,S,3,1
Black,L,1,1
Black,XL,5,2
But if I leave out the Size property in the return statement,and just return:
return p.Color, onstock as stock, sum(s.Quantity) as sold
This only returns 3 rows (Size "M" is missing):
Black,3,1
Black,1,1
Black,5,2
I can't figure out why there is a difference in the result?

Because you are using the sum() aggregation function.
Cypher doesn't have a GROUP BY clause (like traditional SQL databases), but when you use an aggregation function all non-aggregated fields are implicitly used as grouping fields.
So when you remove p.Size from return the first row is grouped with the second row because all values implicitly grouped are equals (p.Color = 'Black' and onstock = 3). Also, the values of the Sold column are used in the sum() function (0 + 1 = 1), producing the row:
Black,3,1

Related

Arrayformula to preserve a running/cumulative balance when inserting new rows

The top right cell (Natwest) is a list from a range using data validation.
The Opening Balance 1,000.00 is sourced from another sheet using a lookup formula.
Using simple if statements, the cumulative balance is then produced - according to the Amount column and whether the Natwest account occurs in the Dr(+) or Cr (-) column
i.e. =if(B4=$D$1,D3+A4,if(C4=$D$1,D3-A4,D3)) and copied down.
Natwest
Amount Dr Cr Balance
1,000.00
100.00 Natwest Account 1 1,100.00
200.00 Account 2 Natwest 900.00
400.00 Natwest Account 1 1,300.00
It works fine, except that when a new row is inserted, the if statement formula is not copied into the new row.
I am looking for an arrayformula solution (or other formula inside the cell solution), so that the Cumulative Balance still works, but doesn't need to be copied into column D new row - when a new row(s) are inserted.
(I don't mind the Natwest (drop down from the list) or the Opening Balance 1,000.00 to be moved elsewhere if required for a solution.)
Thanks for your help.
Something adding up in between the same range of the arrayformula is always going to be tricky with circular dependency. I suggest to get the initial value and add it the SUMIF of second column and substract the SUMIF of second column up to each value. With BYROW you can do it like this:
=BYROW(A4:A,LAMBDA(each,SUMIF(INDIRECT("B4:B"&ROW(each)),D1,A4:each)-SUMIF(INDIRECT("C4:C"&ROW(each)),D1,A4:each)+D3))
Alternate solution:
You can use this custom function from AppScript for automatically calculating cumulative balance
Code:
function customFunction(startnum, key, range) {
var res = [];
var current = startnum;
range.forEach((x) => {
res.push(x.map((y, index) => {
return y == key && index == 1 ? current = (current + x[0]) : (y == key && index == 2 ? current = (current - x[0]) : null)
}).filter(c => c))
})
return res;
}
Custom Function Parameters:
=customFunction(startnum, key, range)
startnum = opening balance
key = Account name
range = cell range
Sample output:
=customFunction(D3,D1,A4:C)

Finding lowest value with no overlapping dates

I have a spreadsheet with criteria, a start and end date, and a value. The goal is to find the lowest value for each unique criteria and start date without overlapping dates (exclusive of end date). I made a pivot table to make it easier for myself but I know there is probably a way to highlight all valid rows that meet the above requirements with some formula or conditional formatting.
I have attached a google drive link where the spreadsheet can be found here and I have some images of the sheet as well. I know that it might be possible with conditional formatting but I just don't know how to combine everything I want it to do in a single formula.
Example below:
Row 2 is a valid entry because it has the lowest value for Item 1 starting on 03-15-2021, same with row 9.
Row 5 is valid because the start date does not fall within the date range of row 2 (exclusive of end date)
Row 7 is not valid because the start date is between the start and end date of row 6
You may add a bounded script to your project. Then you can call it either with a picture/drawing that has the function assigned (button-like), or adding a menu to Google Sheets.
From what you said in the question and the comments, this seems to do what you are trying. Notice that this requires the V8 runtime (which should be the default).
function validate() {
// Get the correct sheet
const spreadsheet = SpreadsheetApp.getActiveSpreadsheet()
const sheet = spreadsheet.getSheetByName('Sheet1')
// Get the data
const length = sheet.getLastRow() - 1
const range = sheet.getRange(2, 1, length, 4)
const rows = range.getValues()
const data = Array.from(rows.entries(), ([index, [item, start, end, value]]) => {
/*
* Row Index
* 1 Criteria 1
* 2 Item 1 0
* 3 Item 1 1
* 4 Item 1 2
*
* row = index + 2
*/
return {
row: index + 2,
criteria: item,
start: start.getTime(),
end: end.getTime(),
value: value
}
})
// Sort the data by criteria (asc), start date (asc), value (asc) and end date (asc)
data.sort((a, b) => {
let order = a.criteria.localeCompare(b.criteria)
if (order !== 0) return order
order = a.start - b.start
if (order !== 0) return order
order = a.value - b.value
if (order !== 0) return order
order = a.end - b.end
return order
})
// Iterate elements and extract the valid ones
// Notice that because we sorted them, the first one of each criteria will always be valid
const valid = []
let currentCriteria
let currentValid = []
for (let row of data) {
if (row.criteria !== currentCriteria) {
// First of the criteria
valid.push(...currentValid) // Move the valids from the old criteria to the valid list
currentValid = [row] // The new list of valid rows is only the current one (for now)
currentCriteria = row.criteria // Set the criteria
} else {
const startDateCollision = currentValid.some(valid => {
row.start >= valid.start && row.start < valid.end
})
if (!startDateCollision) {
currentValid.push(row)
}
}
}
valid.push(...currentValid)
// Remove any old marks
sheet.getRange(2, 5, length).setValue('')
// Mark the valid rows
for (let row of valid) {
sheet.getRange(row.row, 5).setValue('Valid')
}
}
Algorithm rundown
We get the sheet that we have the data in. In this case we do it by name (remember to change it if it's not the default Sheet1)
We read the data and transform it in a more an array of objects, which for this case makes it easier to manage
We sort the data. This is similar to the transpose you made but in the code. It also forces a priority order and groups it by criteria
Iterate the rows, keeping only the valid:
We keep a list of all the valid ones (valid) and one for the current criteria only (currentValid) because we only have to check data collisions with the ones in the same criteria.
The first iteration will always enter the if block (because currentCriteria is undefined).
When changing criteria, we dump all the rows in currentValid into valid. We do the same after the loop with the last criteria
When changing criteria, the CurrentValid is an array with the current row as an element because the first row will always be valid (because of sorting)
For the other rows, we check if the starting date is between the starting and ending date of any of the valid rows for that criteria. If it's not, add it to this criteria's valid rows
We remove all the current "Valid" in the validity row and fill it out with the valids
The cornerstone of the algorithm is actually sorting the data. It allows us to not have to search for the best row, as it's always the next one. It also ensures things like that the first row of a criteria is always valid.
Learning resources
Javascript tutorial (W3Schools)
Google App Scripts
Overview of Google Apps Script
Extending Google Sheets
Custom Menus in Google Workspace
Code references
Class SpreadsheetApp
Class Sheet
Sheet.getRange (notice the 3 overloads)
let ... of (MDN)
Spread syntax (...) (MDN)
Arrow function expressions (MDN)
Array.from() (MDN)
Array.prototype.push() (MDN)
Array.prototype.sort() (MDN)
Date.prototype.getTime() (MDN)
String.prototype.localeCompare() (MDN)

Google Sheets: Count number of rows in a column that do not match corresponding row in another column?

Say we have the following spreadsheet in google sheets:
a a
b b
c
d e
e d
How would I build a formula that counts the number of rows in column B that do not match the corresponding row in column A, and are not blank? In other words I want to get the number of rows that changed to a new letter in column B. So in this example the formula should return 2.
Thank you for your help.
UPDATE:
Now suppose I have this spreadsheet:
a a
b b b
c a
d e e
e d e
How would I build on the last formula for the third column, where the value returned is:
(the number of rows in column 3 that don't match the corresponding row in column 2) + (if column 2 is blank, the number of rows in column 3 that do not match the corresponding row in column 1)
and I also don't want to count blanks in the third column.
The value returned in this case should be 2 (rows 3 and 5).
To me it sounds like you could use:
=SUMPRODUCT((B:B<>"")*(B:B<>A:A))
=IFNA(ROWS(FILTER(A:B,
(A:A<>B:B)*
(B:B<>"")
)),0)
FILTER by matching conditions * for AND + for OR.
ROWS counts rows
IFNA returns 0 if nothing was found.
or with QUERY
=INDEX(QUERY(A:B,"select count(B) where B<>A"),2)
Try this:
=ARRAYFORMULA(COUNTA($B$1:$B)-SUM(COUNTIFS($A$1:$A, $B$1:$B,$B$1:$B,"<>")))
I see 2 ways to complete this.
First you could add a function to each row to return 1 or 0 if the value changed and was not blank and then sum results. This unfortunately adds a messy column in your spreadsheet.
=if(A1<>IF(ISBLANK(B1),A1,B1),1,0)
Second you could create a function where you would pass the range as a string.
The call from the worksheet would look like this:
=myFunction("A1:B5")
Then create a script by opening Tools -> Script editor and use something like this
function myFunction(r) {
var sheet = SpreadsheetApp.getActiveSheet();
var range = sheet.getRange(r);
var numRows = range.getNumRows();
var areDifferent = 0;
for (let i=1; i<= numRows; i++) {
let currentValue = range.getCell(i,1).getValue();
let cmpValue = range.getCell(i,2).getValue();
if ((currentValue != cmpValue) && (cmpValue != "")) {
areDifferent++;
}
}
return areDifferent;
}

Post-Calculations on Aggregations

I'm new to Neo4j and I feel stucked in simple operations that I would solve in regular SQL using subqueries.
How can I subtract the two resulting rows? I have grouped the results and I would like to return the difference between them:
MATCH (seguidores:RelevantTwitterUser {location:"Madrid"})-[:FOLLOWS]->(seguidos:RelevantTwitterUser {location:"Barcelona"})
WITH COLLECT({origen:seguidores.location, user:seguidores.userId}) AS ROWS
MATCH (seguidores:RelevantTwitterUser {location:"Barcelona"})-[:FOLLOWS]->(seguidos:RelevantTwitterUser {location:"Madrid"})
WITH ROWS + COLLECT({origen:seguidores.location, user:seguidores.userId}) AS allRows
UNWIND allRows AS ROW
RETURN ROW.origen, COUNT(ROW.user)
with the output:
Answer using aggregating functions
WITH "Madrid" AS loc1, "Barcelona" AS loc2
MATCH (:RelevantTwitterUser{location:loc1})-[:FOLLOWS]->(:RelevantTwitterUser{location:loc2})
WITH loc1, loc2, COUNT(*) AS count1
MATCH (:RelevantTwitterUser{location:loc2})-[:FOLLOWS]->(:RelevantTwitterUser{location:loc1})
WITH loc1, loc2, count1, COUNT(*) AS count2
RETURN loc1, count1, loc2, count2, count1 - count2 AS diff
You should read the documentation on aggregating functions (like COUNT) if you want to understand how this query works, and to avoid getting the wrong counts if you need to modify this query. It is especially important to understand how grouping keys (e.g., locs and count in the last WITH clause) affect the behavior of aggregating functions.
Answer using SIZE() on pattern expressions
WITH "Madrid" AS loc1, "Barcelona" AS loc2
WITH loc1, loc2,
SIZE((:RelevantTwitterUser{location:loc1})-[:FOLLOWS]->(:RelevantTwitterUser{location:loc2})) AS count1,
SIZE((:RelevantTwitterUser{location:loc2})-[:FOLLOWS]->(:RelevantTwitterUser{location:loc1})) AS count2
RETURN loc1, count1, loc2, count2, count1 - count2 AS diff

Should this be a SUMIF formula?

I'm trying to make a formula that can recognize in Column A the name Brooke B for instance here, from there I'd like to SUM the values listed in Column I Cash Discounts for that specific user.
(Yes this user has no Cash Discounts, thus column I states "Non-Cash Payment").
There's about 80 users total here, so I'd prefer to automate the name recognition in Column A.
Sheet: https://docs.google.com/spreadsheets/d/1xzzHT7VjG24UJ4ZXaiZWsfzroTpn7jCJLexuTOf6SQs/edit?usp=sharing
Desired Results listed in Cash Discounts sheet, listed per user in column C.
You are trying to calculate the total amount of the Cash Discount per person given to people in a list. You have data that has been exported from a POS system to which that you have added a formula to calculate the amout of the discount on a line by line basis. You have speculated whether the discount totals could be calculated using SUMIFS formulae.
In my view, the layout of the spreadsheet and the format of the POS report do not lend themselves to isolating discrete data elements though Google sheets functions (though, no doubt, someone with greater skills than I will disprove this theory). Column A, containing names, also includes sub-groupings (and their sub-totals) as well as transaction dates. There are 83 unique persons and over 31,900 transaction lines.
This answer is a script-based solution which updates a sheet with the names and values of the discount totals. The elapsed execution time is #11 seconds.
function so5882893202() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
// get the Discounts sheet
var discsheetname = "Discounts";
var disc = ss.getSheetByName(discsheetname);
//get the Discounts data
var discStartrow = 3;
var discLR = disc.getLastRow();
var discRange = disc.getRange(discStartrow, 1, discLR-discStartrow+1, 9);
var discValues = discRange.getValues();
// isolate Column A
var discnameCol = discValues.map(function(e){return e[0];});//[[e],[e],[e]]=>[e,e,e]
//Logger.log(discnameCol); // DEBUG
// isolate Column I
var discDiscounts = discValues.map(function(e){return e[8];});//[[e],[e],[e]]=>[e,e,e]
//Logger.log(discDiscounts); // DEBUG
// create an array to build a names list
var names =[]
// get the number of rows on the Discounts sheet
var discNumrows = discLR-discStartrow+1;
// Logger.log("DEBUG: number of rows = "+discNumrows);
// identify search terms
var searchPercent = "%";
var searchTotal = "Total";
// loop through Column A
for (var i=0; i<discNumrows; i++){
//Logger.log("DEBUG: i="+i+", content = "+discnameCol[i]);
// test if value is a date
if (Object.prototype.toString.call(discnameCol[i]) != "[object Date]") {
//Logger.log("it isn't a date")
// test whether the value contains a % sign
if ( discnameCol[i].indexOf(searchPercent) === -1){
//Logger.log("it doesn't have a % character in the content");
// test whether the value contains the word Total
if ( discnameCol[i].indexOf(searchTotal) === -1){
//Logger.log("it doesn't have the word total in the content");
// test whether the value is a blank
if (discnameCol[i] != ""){
//Logger.log("it isn't empty");
// this is a name; add it to the list
names.push(discnameCol[i])
}// end test for empty
}// end test for Total
} // end for percentage
} // end test for date
}// end for
//Logger.log(names);
// get the number of names
var numnames = names.length;
//Logger.log("DEBUG: number of names = "+numnames)
// create an array for the discount details
var discounts=[];
// loop through the names
for (var i=0;i<numnames;i++){
// Logger.log("DEBUG: name = "+names[i]);
// get the first row and last rows for this name
var startrow = discnameCol.indexOf(names[i]);
var endrow = discnameCol.lastIndexOf(names[i]+" Total:");
var x = 0;
var value = 0;
// Logger.log("name = "+names[i]+", start row ="+ startrow+", end row = "+endrow);
// loop through the Cash Discounts Column (Column I) for this name
// from the start row to the end row
for (var r = startrow; r<endrow;r++){
// get the vaue of the cell
value = discDiscounts[r];
// test that it is a value
if (!isNaN(value)){
// increment x by the value
x = +x+value;
// Logger.log("DEBUG: r = "+r+", value = "+value+", x = "+x);
}
}
// push the name and the total discount onto the array
discounts.push([names[i],x]);
}
//Logger.log(discounts)
// get the reporting sheet
var reportsheet = "Sheet10";
var report = ss.getSheetByName(reportsheet);
// define the range (allow row 1 for headers)
var reportRange = report.getRange(2,1,numnames,2);
// clear any existing content
reportRange.clearContent();
//update the values
reportRange.setValues(discounts);
}
Report Sheet - extract
Not everyone wants a script solution to their problem. This answer seeks to supply a repeatable solution using common garden-variety formula/functions.
As noted elsewhere, the layout of the spreadsheet does not lend itself to a quick/simple solution, but it IS possible to break down the data to compile a non-script answer. Though it may "seem" as though the following formula are less than "simple, when taken one-at-a-time they are logical, very easy to create, and very easy to verify successful outcomes.
Note: It is important to know at the outset that the first row of data = row#3, and the last row of data = row#31916.
Step#1 - get Text values from ColumnA
Enter this formula in Cell J3, and copy to row 31916
=if(isdate(A3),"",A3):
evaluates Column A, if the content is a date, returns blank, otherwise, returns the context
Taking Customer "AJ" as an example, the content at this point includes:
AJ
10% BuildingDiscount
10% BuildingDiscount Total:
Northwestern 10%
Northwestern 10% Total:
AJ Total:
Step#2 - ignore the values that contain "10%" (this removes both headings and sub-subtotals
Enter this formula in Cell K3 and copy to row 31916
=iferror(if(search("10%",J3)>0,"",J3),J3): searches for "10%" in Column J. Returns all values except those that containing "10%".
Taking Customer "AJ" as an example, the content at this point includes:
AJ
AJ Total:
**Step#3 - ignore the values that contain the word "Total"
Enter this formula in Cell L3 and copy to row 31916.
=iferror(if(search("total",K3)>0,"",K3),K3)
Taking Customer "AJ" as an example, the content at this point includes:
AJ
Results after Step#3
You might wonder, "couldn't this be done in a single formula?" and/or "an array formula would be more efficent". Both those thoughts are true, but we're looking at simple and easy, and a single formula is NOT simple (as shown below); and given that, an array formula is out-of-the-question unless/until an expert can wave a magic wand over the data.
FWIW - Combining Steps#1, 2 & 3
each of the Steps#1, 2 and 3 build on each other. So it is possible to create a single formula that combines these steps.
enter this formula in Cell J3, and copy dow to row #31916.
=iferror(if(search("total",iferror(if(search("10%",if(isdate(A3),"",A3))>0,"",if(isdate(A3),"",A3)),if(isdate(A3),"",A3)))>0,"",iferror(if(search("10%",if(isdate(A3),"",A3))>0,"",if(isdate(A3),"",A3)),if(isdate(A3),"",A3))),iferror(if(search("10%",if(isdate(A3),"",A3))>0,"",if(isdate(A3),"",A3)),if(isdate(A3),"",A3)))
As the image showed, step#3 concludes with mainly empty cells in Column L; the only populated cell is the first instance of the customer name at the start of their transactions - such as "Alec" in this example. However (props to #Rubén) it is possible to populate the blank transaction Cells in Column L. An arrayformula to find the previous non-empty cell in another column on Webapps explains how.
Step#4 - Create a customer name for each transaction row.
Enter this formula in Cell M3, it will automatically populate the cells to row#31916
=ArrayFormula(vlookup(ROW(3:31916),{IF(LEN(L3:L31916)>0,ROW(3:31916),""),L3:L31916},2))
Step#5 - Get the discount amount for each transaction value
The discount values are already displayed in Column I. They are interspersed with text values, so the formula for tests if this is a total line by testing the value in Column D; only if there is a vale (Product item) does the formula then test of there is a value in column I.
Enter this formula in Cell N3, it will automatically populate the cells to row#31916
=ArrayFormula(if(len(D3:D31914)>0,if(ISNUMBER(I3:I31916),I3:I31916,0),""))
Screenshot after step#5
Reporting by Query
Reporting is done via queries. These can go anywhere, but it is probably more convenient to put it on a separate sheet.
Step#6.1 - query the results to create report showing total by ALL customers
=query(Discounts_analysis!$M$2:$N$31916,"select M, sum(N) where N is not null group by M label M 'Customer', sum(N) 'Total Discount' ",1)
Step#6.2 - query the results to create report showing total by customer where the customer received a discount
=query(Discounts_analysis!$M$2:$N$31916,"select M, sum(N) where N >0 group by M label M 'Customer', sum(N) 'Total Discount' ",1)
Step#6.3 - query the results to create report showing customers with no discount
- `=query(query(Discounts_analysis!$M$2:$N$31916,"select M, sum(N) where N is not null group by M label M 'Customer', sum(N) 'Total Discount' ",1),"select Col1 where Col2=0")`
Queries screenshot

Resources