This question is a follow on to the question here
With 2 answers for that
Now I need to modify this query to return those items related to this HashTag, order by createdDate(as all those items have createdDate property).
I've written this query:
MATCH (r:RateableEntity)<-[:TAG]-(h:HashTag:Featured)
WITH h, COUNT(h) AS Count
ORDER BY Count DESC
SKIP 2
LIMIT 3
WITH h, Count, h.tag as Name,
[(h)-[:TAG]->(m:RateableEntity {audience: 'world'}) | m][..3] AS Items
UNWIND Items as row
RETURN row, Name, Count, COLLECT(row.id)
ORDER BY row.createdDate
But the results are:
Name row.id Count
"vanessa" "cdd14968-404c-41e9-84d5-bf147030a023" 14
"vanessa" "qwd14968-2344-41e9-84d5-bftt34534566" 14
"vanessa" "cd14968-404c-41e9-84d5-certt4545455g" 14
"hash" "b7e74f38-44e4-4b7f-b2c4-8301023ffa9b" 15
"hash" "edr34334-2995-4202-b178-bb2a6f230ab0" 15
"hash" "htth5548-404c-41e9-84d5-bf147030a023" 15
"new" "oljj4968-2344-41e9-84d5-bftt34534566" 3
"new" "werr4968-404c-41e9-84d5-certt4545455" 3
"new" "be545b38-44e4-4b7f-b2c4-8301023ffa9b" 3
I can see that count is correct andskip and limit working as I want but here I have 3 rows instead of one row and 3 id.
Also ORDER BY is not working.
Any idea? ideas appreciated.
UPDATE:
Actually the result of this query will be nodes and after that, in my code, I'm mapping to this, so still it's not what I want
Related
I have this query:
MATCH (user:Users)-[buy:Sales]->(item:Items)<-[buy2:Sales]- (user2:Users)-[buy_other:Sales]->(item2:Items)
where item.category = item2.category
return
user.mail, item2.id
the idea is to get items that the first user could be interested in that other user2 also bought, but i want to limit the results to max 2 item2 id per user
I know i can limit results in general, with limit 10 for example, but that means that those 10 results could all be for the same user.
Any help? thanks in advance
You can do it by inserting a COLLECTing and getting the first n items of it.
MATCH (user:Users)-[buy:Sales]->(item:Items)<-[buy2:Sales]- (user2:Users)-[buy_other:Sales]->(item2:Items)
WHERE item.category = item2.category
// this is where you collect and get some items of it
WITH user,COLLECT(item2)[0..2] AS item2s
UNWIND item2s AS item2
RETURN
user.mail, item2.id
I am trying to count distinct sessionIds from a measurement. sessionId being a tag, I count the distinct entries in a "parent" query, since distinct() doesn't works on tags.
In the subquery, I use a group by sessionId limit 1 to still benefit from the index (if there is a more efficient technique, I have ears wide open but I'd still like to understand what's going on).
I have those two variants:
> select count(distinct(sessionId)) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 3757
> select count(sessionId) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 4206
To my understanding, those should return the same number, since group by sessionId limit 1 already returns distinct sessionIds (in the form of groups).
And indeed, if I execute:
select * from UserSession group by sessionId limit 1
I have 3757 results (groups), not 4206.
In fact, as soon as I put this in a subquery and re-select fields in a parent query, some sessionIds have multiple occurrences in the final result. Not always, since there is 17549 rows in total, but some are.
This is the sign that the limit 1 is somewhat working, but some sessionId still get multiple entries when re-selected. Maybe some kind of undefined behaviour?
I can confirm that I get the same result.
In my experience using nested queries does not always deliver what you expect/want.
Depending on how you use this you could retrieve a list of all values for a tag with:
SHOW TAG VALUES FROM UserSession WITH KEY=sessionId
Or to get the cardinality (number of distinct values for a tag):
SHOW TAG VALUES EXACT CARDINALITY FROM UserSession WITH KEY=sessionId.
Which will return a single row with a single column count, containing a number. You can remove the EXACT modifier if you don't need to be exact about the result: SHOW TAG VALUES CARDINALITY on Influx Documentation.
I have two tables joined via third in a many-to-many relationship. To simplify:
Table A
ID-A (int)
Name (varchar)
Score (numeric)
Table B
ID-B (int)
Name (varchar)
Table AB
ID-AB (int)
A (foreign key ID-A)
B (foreign key ID-B)
What I want is to display the B-Name and a sum of the "Score" values of all the As belonging to the given B. However, the following code:
WITH "Data" AS(
SELECT "B."."Name" As "BName", "A"."Name", "Score"
FROM "AB"
LEFT OUTER JOIN "A" ON "AB"."A" = "A"."ID-A"
LEFT OUTER JOIN "B" ON "AB"."B" = "B"."ID-B")
SELECT "BName", SUM("Score") AS "Total"
FROM "Data"
GROUP BY "Name", "Score"
ORDER BY "Total" DESC
The results display several rows for every "BName" with the "score" divided into semingly random increments between these rows. For example, if the desired result for Johnny is 12 and for April it's 25, the query may shows something like:
Johnny | 7
Johnny | 3
Johnny | 2
April | 19
April | 5
April | 1
etc.
Even after trying to nest the query and doing another SELECT with SUM("Score"), the results are the same. I'm not sure what I'm doing wrong?
Remove Score from the GROUP BY clause:
SELECT BName, SUM(Score) AS Total
FROM Data
GROUP BY BName
ORDER BY Total DESC;
The purpose of your query is to summarize by name, so name alone should appear in the GROUP BY clause. By also including the score, you will get a record in the output for each unique name/score combination.
Okay, I figured out my problem. Indeed, I had to GROUP BY "Name" only, but Firebird I thought wasn't letting me do that. Turns out it was just a typo. Oops.
I have a table called lead, which have about 500 thousand records and we need the following query to get executed.
SELECT skip 300000 first 75 *
FROM lead
WHERE ((enrollment_period IS NULL) OR
(enrollment_period IN ('FT2015','F16','SUM2016','FALL2016','FALL2017','SP17')))
ORDER BY created_on DESC
The table lead has id column as the primary key and thus have clustered index in that column. This query is taking about 12 - 13 mins. When I added a non-clustered index on created_on and enrollment_period columns, it came down to 4 - 5 mins. Then I changed the clustered index from id column to this index, execution time came down further to about 50 seconds now.
Is there any other optimization scope available for this query?
Overall, is there any other change that can be done so that the query will execute faster?
Thanks in Advance,
Manohar
So I have a standard users table structure, with a primary id key and what so not and the following persona table:
user_id | persona_id | time_inserted
2 1 x
2 2 x+1
2 3 x+2
1 1 x+3
5 8 x+6
5 9 x+1
What I'd like to do is retrieve the LAST inserted row and limit to ONE per user id. So, in that query, the result I want would be:
[2, 3] because the last inserted for 2 was persona_id 3 (x+2), [1, 1], and [5,8] because the last inserted for 5 was persona_id 8 (x+6)
This is my query:
to_return = Persona.select(to_get).where(to_condition)
This works, but retrieves them all. How can I restrict the query as asked? Thank you very much.
This should work:
to_return = Persona.select(to_get).where(to_condition).group('user_id').having('time_inserted = MAX(time_inserted)')
Update
You can't select a column if you don't put that in the group clause.
As you want to group by only user_id, one possible solution is, select the user_id s first with the maximum time_inserted like this:
users_ids_relation = Persona.select('user_id').group('user_id').having('time_inserted = MAX(time_inserted)')
Then, join it with the personas table based on the condition and then select the required columns:
users_ids_relation.joins('personas').where(to_condition).select(to_get)
It will give you the expected result.