I've got rooms that hold sub-rooms that users can post to. I want to only show user-posts that are less than 24 hours old to my users.
For example :
Cats
kittens
users : 10
posts :
ksjdflkjaslkdjf
userUID : 123
postCaption : "I like kittens"
postTimestamp : 203940340930
So here, if a user went into the Cats room, into the subroom of kittens, and wanted to see posts that have been posted about kittens, I'd only want to return back the posts whose timestamp was less than 24 hours old.
Im not sure about is the most effective way to structure the database. The problem with the way it's set up now is that if there have been 10 million kitten posts, every time the user wants to load up posts firebase would have to go through each post and check the timestamp to see if it's valid or not.
An alternative option might be :
Rooms:
Cats
kittens
users : 10
posts :
post1UID : 209384938942024234 //The value here would be the timestamp
post2UID : 309238942024234
Posts :
209384938942024234
userUID: 123
postCaption : "....."
postTimestamp : 209384938942024234
and I'd query for posts with only a valid timestamp and use the post UID to go grab the posts.
Any advice on the best way to do this before I get started?
Thanks in advance.
You could simply query the list of posts to only request the posts from the last 24 hours. While you can't enforce this with security rules, you can at least make your code do the right thing.
But it's quite wasteful: after your app has been live for more than 24 hours, you're querying data that you already know will never be matched.
To prevent this: in NoSQL databases such as Firebase, you often model your data for the specific use-cases of your app. So if you want your users to see a list of post from the last 24 hours, you should model your database to have a list of the posts from the last 24 hours.
You can do this in two ways:
Only keep the most recent 24 hours of posts for each room. So just delete the data that is older than 24 hours, e.g. when a new post is added, or with a cron-triggered Cloud Function. Alternatively you could move the older post to a separate node, or to a system like BigQuery, which is better suited to handling such historical data.
Keep your current structure, but add a secondary index: a list of post IDs for the last day's worth of posts. A simple way to do this is to keep the ID of each post as the key, and its timestamp as the value:
roomId1:
post1UID: timestamp
post2UID: timestamp
The differene with your approach (at least as far as I saw in my quick scan) is that the index would only have post of the most recent 24 hours, so you wouldn't have to query at all to read the posts (only to clean them up).
Related
I have data in BigQuery which have specific columns like time-stamp and userid, Some users visit the website multiple times.
The goal is to find out the time difference of users visiting multiple times.
Even if they visit 14 times, I need to find the difference between every consecutive visit.
This is a sample of my data:
This should help (assuming you want delta in minute). You can always switch to whatever period you need (hour, second, etc.)
Please note the usage of analytical function LAG which uses data partitioned over user_id and ordered by timestamp ts. Also, note that the first appearance of the user_id gets the difference of 0 because this is the first time user showed up :). Hope it helps.
select user_id, coalesce(timestamp_diff(ts_a, ts_b, minute), 0) as diff_from_prv_visit_minutes from (
select user_id, ts as ts_a, lag(ts) over (partition by user_id order by ts) as ts_b
from `mydataset.mytable`
)
Due to the PowerShell methods of getting mailbox statistics from Office365 taking about 2 seconds per mailbox, I am working on getting the data from Office 365 Reporting web service, which takes only a few seconds for each 2000 mailboxes.
The problem I'm running into is that the stats are updated periodically and some historical data is kept, so there are numerous records for each user. I only want to get the latest record for each user, but I haven't been able to find a way to do that. The closest I've come is to use $filter=Date ge DateTime'2016-03-10T00:00:00' where the date is concatenated to a couple of days ago. Theoretically, if I sort by Date desc I should get the latest records first, and if there is a user that has a record for 3/10 and 3/11, the 3/11 record would get pulled first, which would work for me. But regardless of how I do the sort it seems to come back with the older records first.
Ideally, I would like to be able to set criteria so that it only returns the latest record for each mailbox, but I can't seem to figure out or find how to do that. The closest I've been able to come is to just start running queries filtered on specific dates, walking the date back a day on each query.
If I can get the latest records to be returned first, I would be able to work with that because I can just discard a record if I've already received a later one.
https://reports.office365.com/ecp/reportingwebservice/reporting.svc/MailboxUsageDetail/
?DelegatedOrg=nnn.onmicrosoft.com&$select=Date,WindowsLiveID,CurrentMailboxSize
&$filter=Date ge DateTime'2016-03-08T00:00:00'&$orderby=Date desc
So the questions are:
Is there a way to specify criteria so that only the latest record for each user is returned?
Is there a way to get it to order by Date descending--what am I doing wrong with the $orderby?
Thanks!
You can use $top=1 to get latest record by applying $orderby on date (desc). $filter and $skip may not require in this case.
https://reports.office365.com/ecp/reportingwebservice/reporting.svc/MailboxUsageDetail/?DelegatedOrg=nnn.onmicrosoft.com&$select=Date,WindowsLiveID,CurrentMailboxSize&$orderby=Date desc&$top=1
Your query looks fine, here is an another example from Odata sample service to get employee detail with most recent birth date.
http://services.odata.org/V4/Northwind/Northwind.svc/Employees?$select=EmployeeID,FirstName,LastName,BirthDate&$orderby=BirthDate%20desc&$top=1
I Need to show the latest posts. In future, there will be around billions of posts.
So which is the optimized way to show the latest posts list.
By storing every post's month as 201506 and indexing it. or
By creating label as 201506 .. 201508 and storing the post in their particular label.
Then retrive the posts in descending order based on every month, Or is there any other way to do this.
Also if i have more labels, whether it will affect the performance or not.
If you want to have an ordered list of all posts in your system (regardless of the author) you might organize it as a linked list representing your timeline:
(post1:Post) -[:PREV_POST]-> (post2:Post) -[:PREV_POST]-> ...
So the PREV_POST relationship connects the most recent post to the previous one.
Additionally you might have a timetree (see http://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html as a sample implementation). Since your maximum domain granularity is month, you have years and months in the timetree.
Only the first post for every month is then connected to the month node in the time tree. See below for a sample model:
To query e.g. the posts in decending order for Dec 2014 we first find the respective month (Dec 2014) in the timetree, go to the next month (Jan 2015). From the two month nodes we go to the first post of that month and find everything in between:
MATCH (:TimeRoot)-[:HAS_YEAR]->(startMonth:Year{year:2014})-[:HAS_MONTH]->(endMonth:Month{month:Dec}),
(startMonth)<-[:FIRST_IN_MONTH]-(firstPost:Post),
(endMonth)<-[:FIRST_IN_MONTH]-()-[:PREV_POST]->(lastPost:Post),
path = (lastPost)-[:PREV_POST*]->(firstPost)
UNWIND nodes(path) as post
RETURN post
Please note that I've not actually tested the query, so there might be some typos. The intention was to demo the model, not the full solution.
I had asked this question few days back but maybe as I was not clear as I didn't get much response.
let me try to rephrase my question to something like this:
I have few customers who are placing the orders with various products. and I am interested in knowing orders placed for various combinations of products and customers then bucket them into the number of orders last week, last fortnight, last month etc.
I am able to query the orders based on my criteria, but I am unable to understand how to then use this result in to data I need.
lets say my data is like this:
(c:Customer)<-[:PLACED_BY]-(o:Order)-[:HAS_PRODUCT]->(p:Product), (o)-[:PLACED_ON]->(d:Date)
and assuming that I have successfully found the Order's I am looking for, then how do I efficiently get the count i want out of these selected orders.
{... some queries that returns (o:Order) of interest ...}
With o
RETURN ??? as CountLastWeek, ??? as LastFortnight, ??? as LastMonth
BTW i also have OrderedDate property on the Order if that helps simplify the query.
is it even possible to achieve this in Cypher?
What you're looking to do is date time indexing. This is most certainly possible to achieve with Cypher.
You'll need to index your nodes via relationships. Each date is a day, you need to batch those into groups. Each day has a week that it is in, and a month that it is in. You just need to add in those groups to your dates and then aggregate on the collections of dates belonging to a week, a fort night, or a month.
I'm looking to build an analytics dashboard for my data in a rails application.
Let's say I have a list of request types "Fizz", "Buzz", "Bang", "Bar".
I want to display a count for each day based on type.
How should I do this?
Here is what I plan on doing:
Add get_bazz_by_day, get_fizz_by_day, etc to the appropriate models.
In each model get all records of type Fizz, then create an array that stores date and count.
format in view so a JS library can format it into a pretty graph.
Does this sound reasonable?
Depending on number of records, your dashboard can soon get performance problems.
Step 1 is misleading. Don't get the data for each day individually, try to get them all at once.
In Step 2 you can have the database do the the aggregation over days, with the group method.
See http://guides.rubyonrails.org/active_record_querying.html#group
Fizz.select("date(created_at) as fizzed_day, count(*) as day_count").
group("date(created_at)")
In Step 3 you need to take care that days without any fizzbuzz are still displayed, as they are not returned in the query.