Indexing or Labeling is best in Neo4j - neo4j

I Need to show the latest posts. In future, there will be around billions of posts.
So which is the optimized way to show the latest posts list.
By storing every post's month as 201506 and indexing it. or
By creating label as 201506 .. 201508 and storing the post in their particular label.
Then retrive the posts in descending order based on every month, Or is there any other way to do this.
Also if i have more labels, whether it will affect the performance or not.

If you want to have an ordered list of all posts in your system (regardless of the author) you might organize it as a linked list representing your timeline:
(post1:Post) -[:PREV_POST]-> (post2:Post) -[:PREV_POST]-> ...
So the PREV_POST relationship connects the most recent post to the previous one.
Additionally you might have a timetree (see http://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html as a sample implementation). Since your maximum domain granularity is month, you have years and months in the timetree.
Only the first post for every month is then connected to the month node in the time tree. See below for a sample model:
To query e.g. the posts in decending order for Dec 2014 we first find the respective month (Dec 2014) in the timetree, go to the next month (Jan 2015). From the two month nodes we go to the first post of that month and find everything in between:
MATCH (:TimeRoot)-[:HAS_YEAR]->(startMonth:Year{year:2014})-[:HAS_MONTH]->(endMonth:Month{month:Dec}),
(startMonth)<-[:FIRST_IN_MONTH]-(firstPost:Post),
(endMonth)<-[:FIRST_IN_MONTH]-()-[:PREV_POST]->(lastPost:Post),
path = (lastPost)-[:PREV_POST*]->(firstPost)
UNWIND nodes(path) as post
RETURN post
Please note that I've not actually tested the query, so there might be some typos. The intention was to demo the model, not the full solution.

Related

How do I get the nearest past date in a range for each entry in another list?

I collect customer feedback for my education business and add it to a Google Sheet. The feedback data has a submission date (A2:A) and some satisfaction metrics, which I visualize in a Google Data Studio dashboard.
The problem is that I want the feedback per cohort, but not everyone fills in the feedback form on the same day. I have a list of all courses with their respective dates (Cohorts!A2:A), and I want to assign each feedback submission to their respective cohort in a new column. It would be nice to also match it to the specific course type and country, but for now matching the cohort date would suffice.
I've tried using VLOOKUP and ARRAYFORMULA to go through the feedback dates and get the nearest past date to take it as the "course date" for that student. All the solutions I've tried either only take a single date or TODAY as a reference, but I have a whole list I'd like to fill in.
From my understanding, you are trying to round the timestamp, then match it to your course table?
To round a timestamp to a date:
=INT($A2)
When doing lookups like you're describing, I frequently end up calculating the nearest week as well - this formula returns the Sunday of the week start. Figured it might be helpful.
=text($A2+CHOOSE(WEEKDAY($A2),0,-1,-2,-3,-4,-5,-6),"m/d/yyyy")

better design for fact table where each row has a Start & End Date

My fact table contains details for clients who attend a course.
To ensure i can get a list of clients registered on any particular day, I have not related the date dimension to the fact table.
Instead i created a measure that does basic between logic (where startDate <= selectedDate && endDate >=SelectedDate)
This allows me to find all clients registered on one single selected day.
There are a few drawback to this however:
-I have to ensure the report user only selects a single day, i.e. they cannot select a date range.
-I cant easily do counts for samePeriodLastMonth or Year.
Is there a better design i should consider that will still allow me to see counts of registered clients on any given day, along with allowing me to use SamePeriodLastMonth/Year functionality?
Would you mind uploading the structure of your fact and dim tables?
Just a thought bubble: if you would like to measure counts for a program over calendar years, I believe you would definitely need to create a Date dimension. Also depending on your reporting needs you might want to consider whether you need an Accumulating Snapshot Fact table.
Please find further details on this:
http://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
Cheers
Nithin

Get latest record for each user with ODATA

Due to the PowerShell methods of getting mailbox statistics from Office365 taking about 2 seconds per mailbox, I am working on getting the data from Office 365 Reporting web service, which takes only a few seconds for each 2000 mailboxes.
The problem I'm running into is that the stats are updated periodically and some historical data is kept, so there are numerous records for each user. I only want to get the latest record for each user, but I haven't been able to find a way to do that. The closest I've come is to use $filter=Date ge DateTime'2016-03-10T00:00:00' where the date is concatenated to a couple of days ago. Theoretically, if I sort by Date desc I should get the latest records first, and if there is a user that has a record for 3/10 and 3/11, the 3/11 record would get pulled first, which would work for me. But regardless of how I do the sort it seems to come back with the older records first.
Ideally, I would like to be able to set criteria so that it only returns the latest record for each mailbox, but I can't seem to figure out or find how to do that. The closest I've been able to come is to just start running queries filtered on specific dates, walking the date back a day on each query.
If I can get the latest records to be returned first, I would be able to work with that because I can just discard a record if I've already received a later one.
https://reports.office365.com/ecp/reportingwebservice/reporting.svc/MailboxUsageDetail/
?DelegatedOrg=nnn.onmicrosoft.com&$select=Date,WindowsLiveID,CurrentMailboxSize
&$filter=Date ge DateTime'2016-03-08T00:00:00'&$orderby=Date desc
So the questions are:
Is there a way to specify criteria so that only the latest record for each user is returned?
Is there a way to get it to order by Date descending--what am I doing wrong with the $orderby?
Thanks!
You can use $top=1 to get latest record by applying $orderby on date (desc). $filter and $skip may not require in this case.
https://reports.office365.com/ecp/reportingwebservice/reporting.svc/MailboxUsageDetail/?DelegatedOrg=nnn.onmicrosoft.com&$select=Date,WindowsLiveID,CurrentMailboxSize&$orderby=Date desc&$top=1
Your query looks fine, here is an another example from Odata sample service to get employee detail with most recent birth date.
http://services.odata.org/V4/Northwind/Northwind.svc/Employees?$select=EmployeeID,FirstName,LastName,BirthDate&$orderby=BirthDate%20desc&$top=1

Retry: Neo4j Query on first query result to get insight in to what is returned

I had asked this question few days back but maybe as I was not clear as I didn't get much response.
let me try to rephrase my question to something like this:
I have few customers who are placing the orders with various products. and I am interested in knowing orders placed for various combinations of products and customers then bucket them into the number of orders last week, last fortnight, last month etc.
I am able to query the orders based on my criteria, but I am unable to understand how to then use this result in to data I need.
lets say my data is like this:
(c:Customer)<-[:PLACED_BY]-(o:Order)-[:HAS_PRODUCT]->(p:Product), (o)-[:PLACED_ON]->(d:Date)
and assuming that I have successfully found the Order's I am looking for, then how do I efficiently get the count i want out of these selected orders.
{... some queries that returns (o:Order) of interest ...}
With o
RETURN ??? as CountLastWeek, ??? as LastFortnight, ??? as LastMonth
BTW i also have OrderedDate property on the Order if that helps simplify the query.
is it even possible to achieve this in Cypher?
What you're looking to do is date time indexing. This is most certainly possible to achieve with Cypher.
You'll need to index your nodes via relationships. Each date is a day, you need to batch those into groups. Each day has a week that it is in, and a month that it is in. You just need to add in those groups to your dates and then aggregate on the collections of dates belonging to a week, a fort night, or a month.

YQL Fantasy Hockey - League Standings by Month?

I'm trying to gather league standings by month (or a custom time period)
I know how to do it for a specific date but can't seem to find a way to do from x to y
Is this possible? (other than repeating the query for each day I want)
Is is not a head to head or rotisserie league, just straight overall points.
Edit:
Example query:
No, there is no way to fetch by month or for a date range. If you look at the YQL table fantasysports.leagues.scoreboard, you can see the parameters only accept the optional week parameter. This matches the Yahoo! Fantasy Sports API docs (search for 'scoreboard') which shows it can give results for the current week, or another specified week.
I think this is because the Yahoo! Fantasy Sports scoreboards are all week-based, regardless of the actual frequency of games for the specific sport.
To capture scores by month, you can make several individual calls for each week.

Resources