getting random rows with yql? - yql

I want to use javascript to fetch data with yql from flickr,
e.g.
select id from flickr.photos.search(10) where text = 'music' and license=4
however, I would like to fetch 10 random rows, rather then the latest, since the latest tend to be 10 photos all from the same person.
ist that possible in yql itself (I suspect not),
or any workarounds that could bring the same effect?
(it does not have to be complete random, the main thing I want to avoid is to get 10 photos from the same poster)

To get only results from unique owners, you can use the unique() function (docs).
My suggestion would be to query for a larger result set (more likely to have 10 unique people) then call unique() followed by truncate() to limit to 10 results, as below.
select id from flickr.photos.search(100) where text = 'music' and
license=4 | unique(field="owner") | truncate(count=10)

Related

Neo4j- typical query - returning a node with the most appearances

I have to make a query that returns me a club or clubs, where play the most amount of players that are not representing the country, from where the club is.
My query works fine, but I want to filter, so my result is ONLY clubs that size is the most.
As for now the biggest size is 4, and I have 4 clubs that have 4 players which were supposed to be there.
The only thing comes to my mind to filter it out was by using LIMIT 1 in the end, but then, I cut out three clubs, that also fill the predicate.
MATCH (c: Club)<-[r: PLAYS_FOR]-(p: Player)-[r2: REPRESENTS]->(n: NationalTeam)
WHERE c.country<>n.country
WITH c,collect(p.name) as list_players,n.country as country,size(collect(p.name)) as size
RETURN c,list_players,country,size ORDER BY size DESC LIMIT 1
edit:
I managed to do something like this, don't know if it's optimal, but it is working:
MATCH (c: Club)<-[r: PLAYS_FOR]-(p: Player)-[r2: REPRESENTS]->(n: NationalTeam)
WHERE c.country<>n.country
WITH c,collect(p.name) as list_players,n.country as country,size(collect(p.name)) as size
WITH c,list_players,country,size ORDER BY size DESC LIMIT 1
WITH size
MATCH (c: Club)<-[r: PLAYS_FOR]-(p: Player)-[r2: REPRESENTS]->(n: NationalTeam)
WHERE c.country<>n.country
WITH size,c,collect(p.name) as list_players,n.country as country,size(collect(p.name)) as size2 WHERE size(collect(p.name)) = size
RETURN c,list_players,country,size
If you install APOC Procedures, there is an aggregation function you can use to get the items associated with a maximum value, and this works even when multiple items are tied for that value: apoc.agg.maxItems()
The trouble now is that all the club-specific data needs to be encapsulated into the item itself, so you'll need to add them to a map and use the map as the item, and the size of the person collection as the value.
Also your aggregation isn't quite correct. You're collecting player names, but you have the country of the player as a part of the grouping key (when you aggregate, all non-aggregation terms form the grouping key), and that isn't likely want you want. Maybe you wanted the country of the club instead?
Try working from this:
MATCH (c: Club)<-[r: PLAYS_FOR]-(p: Player)-[r2: REPRESENTS]->(n: NationalTeam)
WHERE c.country<>n.country
WITH c,collect(p) as list_players
WITH apoc.agg.maxItems({club:c, players:list_players}, size(list_players)) as maxResults
UNWIND maxResults.items as result
WITH result.club as c, [player IN result.players | player.name] as list_players, maxResults.value as size
RETURN c,list_players,size

Write Cypher query to display temperature values till it reaches set temperature

I have about 200,000 rows of 24 hour data as follows:
I can use the query to create a room node with time, roomtemp, and set temp as properties. Moreover, I can also, define the relationship of each room with its corresponding temperatures.
Now, I need to find:
all rows that show an update/increase/decrease from initial temperature till set temperature for all rooms. e.g. based on above data, I need:
Here I have discarded 5th row data as 16 was repetitive and showed no update(increase or decrease) in temp value. The temperature values continued till it reached set temperature '18'.
I can manually create the temperature states by giving its values one by one, but I am unsure how to MERGE the above requirement into the graph using Cypher.
Can I utilize any other programming language to obtain same results using Neo4j in conjunction?
Do I have to utilize in-graph time-tree for this scenario? Can I retrieve my results without creating a time tree?
Filter temparature by room and date (which can also be a date-node)
Sort by time
Collect into a list
Filter by differences in two subsequent temperatues
Turn list into rows
Here is a query that does this:
MATCH (r:Room)<-[:TEMP]-(t:Temparature)
WHERE t.time STARTS WITH "2016-01-01"
AND t.temp < room.temp ADN t.temp > {initial}
WITH t ORDER by t.time ASC
WITH collect(t) temps
WITH [idx in range(0,size(temps)-2) WHERE temps[idx].temp <> temps[idx+1].temp | temps[idx] ] as filtered
UNWIND filtered as t
RETURN t;

Modelling a forum with Neo4j

I need to model a forum with Neo4j. I have "forums" nodes which have messages and, optionally, these messages have replies: forum-->message-->reply
The cypher query I am using to retrieve the messages of a forum and their replies is:
start forum=node({forumId}) match forum-[*1..]->msg
where (msg.parent=0 and msg.ts<={ts} or msg.parent<>0)
return msg ORDER BY msg.ts DESC limit 10
This query retrieves the messages with time<=ts and all their replies (a message has parent=0 and a reply has parent<>0)
My problem is that I need to retrieve pages of 10 messages (limit 10) independently of the number or replies.
For example, if I had 20 messages and the first one with 100 replies, it would only return 10 rows: the first message and 9 replies but I need the first 10 messages and the 100 replies of the first one.
How can I limit the result based on the number of messages and not their replies?
The ts property is indexed, but is this query efficient when mixing it with other where clauses?
Do you know a better way to model this kind of forum with Neo?
Supposing you switch to labels and avoid IDs (as they can be recycled and therefore are not stable identifiers):
MATCH (forum:FORUM)<--(message:MESSAGE {parent:0})
WHERE forum.name = '%s' // where %s identifies the forum in a *stable* way
WITH message // using a subquery allows to apply LIMIT only to main messages
ORDER BY message.ts DESC
LIMIT 10
OPTIONAL MATCH (message)<-[:REPLIES_TO]-(replies)
RETURN message, replies
The only important change here is to split the reply and message matching in two sub-queries, so that the LIMIT clause applies to the first subquery only.
However, you need to link the relevant replies to the matched main messages in the second subquery (I introduced a fictional relationship REPLIES_TO to link replies to messages).
And when you need to fetch page 2,3,4 etc.
You need an extra parameter (which the biggest message timestamp of the previous page, let's say previous_timestamp).
The first sub-query WHERE clause becomes:
WHERE forum.name = '%s' AND message.ts > previous_timestamp

Sybase compare columns with duplicate row ids

So far I have a query with a result set (in a temp table) with several columns but I am only concerned with four. One is a customer ID(varchar), one is Date (smalldatetime), one is Amount(money) and the last is Type(char). I have multiple rows with the same custmer ID and want to evaluate them based on Date, Amount and Type. For example:
Customer ID Date Amount Type
A 1-1-10 200 blue
A 1-1-10 400 green
A 1-2-10 400 green
B 1-11-10 100 blue
B 1-11-10 100 red
For all occurrences of A I want to compare them to identify only one, first by earliest date, then by greatest Amount, then if still tied by comparing Types. I would then return one row for each customer.
I would provide some of the query but I am at home now after spending two days trying to get a correct result. It looks something like this:
(query to populate #tempTable)
GROUP BY customer_id
HAVING date_cd =
(SELECT MIN(date_cd)
FROM order_table ot
WHERE ot.customerID = #tempTable.customerID
)
OR date_cd IS NULL
I assume the HAVING would result in only one row per customer_id. This did not end up being the case since there were some ties there.
I am not sure I can do the OR - there are some with NULL values here - and it did not account for the step to the next comparison if they were all the same anyway. I am not seeing a way to avoid doing some row processing of the temp table with some kind of IF or WHERE loop.
As I write I am thinking maybe I use #tempTable.date_cd in the HAVING clause instead of looking at the original table. but that should return the same dates?
Am I on the right track or is there something missing? Suggestions? More info??
try below query :-
select * from #tempTable
GROUP BY customer_id
HAVING isnull(date_cd,"1900/01/01") =min(isnull(date_cd,"1900/01/01"))

Select certain number of records for batch processing

Hi is it possible using Entity Framework and/or linq to select a certain number of rows? For example i want to select rows 0 - 500000 and assign these records to the List VariableAList object, then select rows 500001 - 1000000 and assign this to the List VariableBList object, etc. etc.
Where the Numbers object is like ID,Number,DateCreated, DateAssigned, etc.
Sounds like you're looking for the .Take(int) and .Skip(int) methods
using (YourEntities db = new YourEntities())
{
var VariableAList = db.Numbers
.Take(500000);
var VariableBList = db.Numbers
.Skip(500000)
.Take(500000);
}
You may want to be wary of the size of these lists in memory.
Note: You also may need an .OrderBy clause prior to using .Skip or .Take--I vaguely remember running into this problem in the past.

Resources