I am currently working with a football match data set and trying to get Cypher to return the teams with the most consecutive wins.
At the moment I have a collect statement which creates a list i.e. [0,1,1,0,1,1,1] where '0' represents a loss and '1' represents a win. I am trying to return the team with the most consecutive wins.
Here is what my code looks like at the moment:
MATCH(t1:TEAM)-[p:PLAYS]->(t2:TEAM)
WITH [t1,t2] AS teams, p AS matches
ORDER BY matches.time ASC
UNWIND teams AS team
WITH team.name AS teamName, collect(case when ((team = startnode(matches)) AND (matches.score1 > matches.score2)) OR ((team = endnode(matches)) AND (matches.score2 > matches.score1)) then +1 else 0 end) AS consecutive_wins
RETURN teamName, consecutive_wins
This returns a list for each team showing their win / lose record in the form explained above (i.e. [0,1,0,1,1,0])
Any guidance or help in regards to calculating consecutive wins would be much appreciated.
Thanks
I answered a similar question here.
The key is using apoc.coll.split() from APOC Procedures, splitting on 0, which will yield a row per winning streak (list of consecutive 1's) as value. The size of each of the lists is the number of consecutive wins for that streak, so just get the max size:
// your query above
CALL apoc.coll.split(consecutive_wins, 0) YIELD value
WITH teamName, max(size(value)) as consecutiveWins
ORDER BY consecutiveWins DESC
LIMIT 1
RETURN teamName, consecutiveWins
Your use case does not actually require the detection of the most consecutive 1s (and it also does not need to use UNWIND).
The following query uses REDUCE to directly calculate the maximum number of consecutive wins for each team (consW keeps track of the current number of consecutive wins, and maxW is the maximum number of consecutive wins found thus far):
MATCH (team:TEAM)-[p:PLAYS]-(:TEAM)
WITH team, p
ORDER BY p.time ASC
WITH team,
REDUCE(s = {consW: 0, maxW: 0}, m IN COLLECT(p) |
CASE WHEN (team = startnode(m) AND (m.score1 > m.score2)) OR (team = endnode(m) AND (m.score2 > m.score1))
THEN {consW: s.consW+1, maxW: CASE WHEN s.consW+1 > s.maxW THEN s.consW+1 ELSE s.maxW END}
ELSE s
END).maxW AS most_consecutive_wins
RETURN team.name AS teamName, most_consecutive_wins;
Related
I'm trying to create random transaction between bank accounts. I have created the following query:
//Create transactions
CALL apoc.periodic.iterate("
match (a:BANK_ACCOUNT)
WITH apoc.coll.randomItem(collect(a)) as sender
return sender", "
MATCH (b:BANK_ACCOUNT)
WHERE NOT sender = b
WITH apoc.coll.randomItem(collect(b)) as receiver
MERGE (sender)-[r:HAS_TRANSFERED {time: datetime()}]->(receiver)
set r.amount = rand()*1000",
{batchSize:100, parallel:false});
I would assume that it would create 100 random transactions between random bank accounts. Instead it creates 1 new bank account and 1 new relationship. What am I doing wrong and what should I do?
Thanks for your help !
The following query uses apoc.coll.randomItems to get 200 different random accounts at one time (which is much faster than getting one random account 200 times):
MATCH (ba:BankAccount)
WITH apoc.coll.randomItems(COLLECT(ba), 200) AS accts
WHERE SIZE(accts) > 1
UNWIND RANGE(0, SIZE(accts)/2*2-1, 2) AS i
WITH accts[i] AS sender, accts[i+1] AS receiver
CREATE (sender)-[:TRANSFERED_TO {time: datetime()}]->(receiver)
Notes:
This query uses CREATE instead of MERGE because it is unlikely that a TRANSFERED_TO relationship already exists with the current time as the time value. (You can choose to use MERGE anyway, if duplication is still possible.)
The WHERE SIZE(accts) > 1 test avoids errors when there are not at least 2 accounts.
SIZE(accts)/2*2-1 calculation prevents the RANGE function from generating a list index (i) that exceeds the last valid index for a sender account.
Getting the folllowing error:
Neo.ClientError.Statement.SyntaxError: Invalid input 'r': expected
't/T' (line 4, column 9 (offset: 116)) "sum(if sr.WScore > sr.LScore
then 1 else 0 ) as wins"
Is my logic right????
MATCH (t:Teams),(sr:SeasonResults)
WHERE sr.WTeamID=t.TeamID and t.TeamName="x"
RETURN count(wins),
sum(if sr.WScore > sr.LScore then 1 else 0 ) as wins
Use CASE not IF
sum(CASE WHEN sr.WScore > sr.LScore THEN 1 ELSE 0 )
A WITH or RETURN clause cannot both assign to a variable (like wins) AND use the same variable. So, a clause like RETURN COUNT(wins), SUM(...) AS wins is not supported.
However, if your use case is just to count the number of times in which a related SeasonResults node had WScore > LScore, you don't need to use COUNT(), and this should be sufficient:
MATCH (t:Teams), (sr:SeasonResults)
WHERE sr.WTeamID=t.TeamID and t.TeamName="x"
RETURN SUM(CASE WHEN sr.WScore > sr.LScore THEN 1 END) AS wins
I think if you take the count of SeasonResults where sr.WTeamID=t.TeamID will give you the required wins count.
I am assuming WTeamID is the ID of the winning team, so when this team's ID is equal to WTeamID of SeasonResults implies win for this team. And the count of all such a SeasonResults will be total wins for this team.
You can structure the query for the same as:
MATCH (t:Teams)
WHERE t.TeamName="x"
WITH t
MATCH (sr:SeasonResults)
WHERE sr.WTeamID=t.TeamID
RETURN count(sr) AS wins
I'm trying to create an algorithm along these lines:
-Create 8 participants
-Each participant has a set of interests
-Pair them with another participant with the least amount of interests
So what I've done so far is create 2 classes, the Participant and Interest, where the Interest is Hashable so that I can create a Set with it. I manually created 8 participants with different names and interests.
I've made an array of participants selected and I've used a basic for in loop to somewhat pair them together using the intersection() function of sets. Somehow my index always kicks out of range and I'm positive there's a better way of doing this, but it's just so messy and I don't know where to start.
for i in 0..<participantsSelected.count {
if participantsSelected[i].interest.intersection(participantsSelected[i+1].interest) == [] {
participantsSelected.remove(at: i)
participantsSelected.remove(at: i+1)
print (participantsSelected.count)
}
}
So my other issue is using a for loop for this specific algorithm seems a bit off too since what if they all have 1 similar interest, and it won't equal to [] / nil.
Basically the output I'm trying is to remove them from the participants selected array once they're paired up, and for them to be paired up they would have to be with another participant with the least amount of interests with each other.
EDIT: Updated code, here's my attempt to improve my algorithm logic
for participant in 0..<participantsSelected {
var maxInterestIndex = 10
var currentIndex = 1
for _ in 0..<participantsSelected {
print (participant)
print (currentIndex)
let score = participantsAvailable[participant].interest.intersection(participantsAvailable[currentIndex].interest)
print ("testing score, \(score.count)")
if score.count < maxInterestIndex {
maxInterestIndex = score.count
print ("test, \(maxInterestIndex)")
} else {
pairsCreated.append(participantsAvailable[participant])
pairsCreated.append(participantsAvailable[currentIndex])
break
// participantsAvailable.remove(at: participant)
// participantsAvailable.remove(at: pairing)
}
currentIndex = currentIndex + 1
}
}
for i in 0..<pairsCreated.count {
print (pairsCreated[i].name)
}
Here is a solution in the case that what you are looking for is to pair your participants (all of them) optimally regarding your criteria:
Then the way to go is by finding a perfect matching in a participants graph.
Create a graph with n vertices, n being the number of participants. We can denote by u_p the vertex corresponding to participant p.
Then, create weighted edges as follows:
For each couple of participants p, q (p != q), create the edge (u_p, u_q), and weight it with the number of interests these two participants have in common.
Then, run a minimum weight perfect matching algorithm on your graph, and the job is done. You will obtain an optimal result (meaning the best possible, or one among the best possible matchings) in polynomial time.
Minimum weight perfect matching algorithm: The problem is strictly equivalent to the maximum weight matching algorithm. Find the edge of maximum weight (let's denote by C its weight). Then replace the weight w of each edge by C-w, and run a maximum weight matching algorithm on the resulting graph.
I would suggest that yoy use Edmond's blossom algorithm to find a perfect matching in your graph. First because it is efficient and well documented, second because I believe you can find implementations in most existing languages, but also because it truly is a very, very beautiful algorithm (it ain't called blossom for nothing).
Another possibility, if you are sure that your number of participants will be small (you mention 8), you can also go for a brute-force algorithm, meaning to test all possible ways to pair participants.
Then the algorithm would look like:
find_best_matching(participants, interests, pairs):
if all participants have been paired:
return sum(cost(p1, p2) for p1, p2 in pairs), pairs // cost(p1, p2) = number of interests in common
else:
p = some participant which is not paired yet
best_sum = + infinite
best_pairs = empty_set
for all participants q which have not been paired, q != p:
add (p, q) to pairs
current_sum, current_pairs = find_best_matching(participants, interests, pairs)
if current_sum < best_sum:
best_sum = current_sum
best_pairs = current_pairs
remove (p, q) from pairs
return best_sum, best_pairs
Call it initially with find_best_matching(participants, interests, empty_set_of_pairs) to get the best way to match your participants.
I have two tables i.e.
1) Places data - 2.4 Million records
2) Office data - 40 thousand records
I have a Neo4J query that takes 3 inputs from the users through a UI and outputs the results after calculating distance between them using Latitude/Longitude information at the run time. I want to calculate the distance in the run-time only
Below is the query:-
MATCH (c:places), (c2:office)
WHERE c2.office_id = {office}
AND c2.city = {city}
AND c.category = {category}
RETURN c.places_id as place_name, c.category as Category,
c.sub_category as Sub_Category, distance(c.location, c2.location)
as Distance_in_meters order by distance(c.location, c2.location) LIMIT 50
Above query taken some 10-15 seconds to output the results on the UI, which is a bit annoying. Can you please help to optimize the performance ?
You can try the next query:
MATCH (c:places), (c2:office {office_id: YourOffice, city: YourCity, category: YourCategory}) RETURN c.places_id as place_name, c.category as Category,
c.sub_category as Sub_Category, distance(c.location, c2.location)
as Distance_in_meters ORDER BY Distance_in_meters ASC/DESC LIMIT 50
And decide how order the results: ASC or DESC
from fig we can see that Arsenal have won three match consecutively but I could not write the query.
Here is a query that should return the maximum number of consecutive wins for Arsenal:
MATCH (a:Club {name:'Arsenal FC'})-[r:played_with]-(:Club)
WITH ((CASE a.name WHEN r.home THEN 1 ELSE -1 END) * (TOINT(r.score[0]) - TOINT(r.score[1]))) > 0 AS win, r
ORDER BY TOINT(r.time)
RETURN REDUCE(s = {max: 0, curr: 0}, w IN COLLECT(win) |
CASE WHEN w
THEN {
max: CASE WHEN s.max < s.curr + 1 THEN s.curr + 1 ELSE s.max END,
curr: s.curr + 1}
ELSE {max: s.max, curr: 0}
END
).max AS result;
The WITH clause sets the win variable to true iff Arsenal won a particular game. Notice that the ORDER BY clause converts the time property to an integer, because the ordering of numeric strings does not work properly if the strings could be of different lengths (I am being a bit picky here, admittedly). The REDUCE function is used to calculate the maximum number of consecutive wins.
======
Finally, here are some suggestions for some improvements to your data model. For example:
It looks like your played_with relationship always points from the home team to the away team. If so, you can get rid of the redundant home and away properties, and you can also rename the relationship type to HOSTED to make the direction of the relationship more clear.
The scores and time should be stored as integers, not strings. That would make your queries more efficient, and easier to write and understand.
You could also consider splitting the scores property into two scalar properties, say homeScore and awayScore, which would make your code more clear. There seems to be no advantage to storing the scores in an array.
If you made all the above suggested changes, then you would just need to change the beginning of the above query to this:
MATCH (a:Club {name:'Arsenal FC'})-[r:HOSTED]-(:Club)
WITH ((CASE a WHEN STARTNODE(r) THEN 1 ELSE -1 END) * (r.homeScore - r.awayScore)) > 0 AS win, r
ORDER BY r.time
...