How would I model an email marketing graph in neo4j - neo4j

I am really drawn (and new) to neo4j as a way to model my data for easier analysis. Part of my job requires that I analyze our email marketing efforts.
As a simple data model, I think of my graph as having 3 nodes:
Lead - the customer in the database
Email - the email sent to the customer(s)
URL - a link contained in an email
With the relationships being:
(Lead) -[:SENT]-> (Email)
(Lead) -[:OPEN]-> (Email)
(Lead) -[:CLICKED_THRU]-> (Email)
(EMAIL) -[:CONTAINS]-> (URL)
Now to my question. Using the data model I constructed above, how can I isolate the URLs that a Lead has clicked on. If I add another relationship (Lead) -[:CLICKED_ON]-> (URL), I do not know which email the URL was contained in (we send the same URL in multiple emails).
Right now, I have a traditional RDBMS implementation, where I know which lead clicked on which URLs from each email.
I want to try to learn neo4j using this business problem, but I am struggling as to how to relate the URL that was clicked on to the specific email.
Thanks in advance for any help. If this is not the proper forum, please let me know where I can direct my question.

The trouble with the model as you have it, is that in order for it to work you must assume that each email has one and only one link. That may not generally be true.
Now, if it is true that every email has only one link, then you could do what you wanted to this way:
MATCH (l:Lead)-[:CLICKED_THRU]->(e:Email)->[:CONTAINS]->(url:URL)
return l, url
This would tell you who clicked on which URL. But notice that if there's more than one URL per email, this would make it look like every user who ever clicked on an email link clicked on every link in that email.
A better way to model your data would be like this:
(Lead)-[:CLICKED_THRU]->(URL)
(EMAIL)-[:CONTAINS]->(URL)
(Lead)-[:OPEN]->(Email)
This would let you ask which URLs were clicked (just by following CLICKED_THRU but it would also tell you which emails were opened. Also, if URLs are unique to emails, by following the connection :CONTAINS you could know which email was opened by which link was clicked.
Finally, for general modeling concerns in neo4j, make sure to check out this presentation which goes into depth on how to think about modeling, and how it's different than relational.

I assume that at the point the URL is accessed you know which email it was from, that an E-Mail contains multiple URLs and that the same URLs may be present in multiple emails. You may want to model a hyper edge (something that links together more than two nodes) as a node:
(Lead)-[:CLICKED_URL_FROM]->(EmailLinkNode)
(EmailLinkNode)-[:FROM_EMAIL]->(email)
(EmailLinkNode)-[:CLICKED_URL]->(url)
I think that this is the only way to relate three Nodes in a single 'relationship', but I am quite new to this myself.
Something similar is described on the NeoTechnology page here.
I guess considering your data you could consider also think of creating a new Node to represent the concept of an EmailUrl which is a dummy node used to uniquely identify a url when related to a specific email.
(email)-[:CONTAINS]->(EmailUrl)-[:FOR_URL]->(url)
This leads to a simple relationship between the lead and the now unique node (lead)-[:CLICKED_THRU]->(EmailUrl) and therefore simple queries to find out not only which urls were clicked, but which emails proved the most enticing to your leads.

Related

ASP.Net MVC Facebook-like activity stream

I would like to implement facebook - like news feed for my website, with social functionalities, such as share, like, comment and post and I want to connect it to already created users (nice to have it connected with Azure Active Directory).
Is there any ready solution for this problem?
Thanks in advance!
I don't know of anything specific, and StackOverflow is not the place for generalized library recommendations. However, it should be trivial enough to implement yourself. Activity streams are composed of four main components:
Actor
Verb
Object
Target
For example: "John (actor) shared (verb) a photo (object) with Mary (target)."
Just create an entity that can track these four aspects, and then add a record describing the action each time something happens. By making certain parts foreign keys (actor/target could be foreign keys to your user table), you can then pull actions specifically related to a particular user or other object in your system.

Get Attendee Data w/ Website Workflow's 3rd Part Next Steps

I am working w/ the Event Brite API and I have a need that I am trying to figure out the best approach for. Right now, I have an event that people will be registering for. At the final step of the registration process, I need to ask them some questions that are specific to my event. Sadly, these questions are data-driven from my website, so I am unable to use the packaged surveys w/ Event Bright.
In a perfect world, I would use the basic flow detailed in the Website Workflow of the EB documentation, ending upon the "3rd Party Next Steps" step (redirect method).
http://developer.eventbrite.com/doc/workflows/
Upon landing on that page, I would like to be able to access the order data that we just created in order to update my database and to send emails to each person who purchased a seat. This email would contain the information needed to kick off the survey portion of my registration process.
Is this possible in the current API? Does the redirect post any data back to the 3rd party site? I saw a few SO posts that gave a few keywords that could be included in the redirect URL (is there a comprehensive list?). If so, is there a way to use that data to look up order information for that order only?
Right now, my only other alternative is to set up a polling service that would pull EB API data, check for new values, and then kick off the process on intervals. This would be pretty noisy for all parties involved, create delay for my attendees, and I would like to avoid it if possible. Thoughts?
Thanks!
Here are the full set of parameters which we support after an attendee places an order:
http://yoursite.com/?eid=$event_id&attid=$attendee_id&oid=$order_id
It's possible that order_id and attendee_id would not be a numeric value, in which case it would return a value of "unknown." You'll always have the event_id though.
If you want to get order-specific data after redirecting an attendee to your site, you can using the event_list_attendees method, along with the modified_after parameter. You'll still have to look through the result set for the new order_id, but the result set will be much smaller and easier to navigate. You can get more information here: http://developer.eventbrite.com/doc/events/event_list_attendees/
You can pass the order_id in your redirect URL in order to solve this.
When you define a redirect URL, Evenbrite will automatically swap in the order_id value in place of the string "$order_id".
http://your3rdpartywebsite.com/welcome_back/?order_id=$order_id
or:
http://your3rdpartywebsite.com/welcome_back/$order_id/
When the user completes their transaction, they will be redirected to your external site, as shown here: /http://developer.eventbrite.com/doc/workflows/
When your post-transaction landing page is loaded, grab the order_id from the request URL, and call the event_list_attendees API method to find the order information in the response.

Fixing duplicate records in a rails app from an autocomplete form

I'm building a Rails 3.1 application that allows people to submit events. One of the fields for the event is a venue. On the create/edit form, the venue_name field has autocomplete functionality so it displays venues with a similar name, but the user is able to enter any name.
When the form is submitted, I'm using find_or_create_by_name when attaching the venue to the event model.
I'm doing this because it's not possible for us to maintain a complete list of venues and I don't want to prevent people from submitting an event because the venue isn't in the list.
The problem is that it's quite likely we'll get duplicates over time like "Venue Name" and "The Venue Name" or any number of other possibilities.
I was thinking that I probably just need to create an administrative tool that allows the admin to review recent venues and if he/she thinks they're duplicates to search/select a master record and have the duplicate record's association copied over to the master record and once successful to delete the duplicate record.
Is this a good approach? In terms of the data manipulation would it be best to handle this in a transaction? Would it be best to add this functionality in a sort of utility class - or directly in the Venue model?
Thanks for your time.
If I were going to put together a system like that, I'd probably try to find a unique identifier I could associate with each venue - perhaps an address or a phone number?
So, if I had "The Clubhouse" with a phone number 503-555-1212, and someone tried to input a new venue called "Clubhouse" with the phone number 503-555-1212, I might take them to an interstitial page where I ask them "Did you mean this location?"
Barring that, I might ask for a phone number or address first, then present a list of possible matches with the option to create a new venue.
Otherwise, you're introducing a lot of potential for error at the admin level, plus you run into a scalability problem. If your admin has to review 10 entries a month, maybe not so bad - but if your app takes off and that number goes to 1000, that becomes unmanageable fast!

Generating a new email address on the fly, but not really!

I have a blogging application. Once a blog-post is created by a user, it will be sent as an email to some of user's friends. I want a functionality where the friends will just reply to the email and the content of the email will go as comments for that particular blog-post.
One way to do this is to do something similar to what http://ohlife.com does. It basically creates a unique ID per user per day, has the reply-to attribute of the email set to post+{unique_id}#ohlife.com and probably parses this field to know which user is the email for, when it gets received. But it really has only 1 email address which is post#ohlife.com. The part after the "+" get's ignored by email servers. This also is applicable to gmail.
What I wanted to know, is whether this property is for particular email servers or is it universal? If it is not universal, is there is email server independent way of implementing this? I would not want this to be based on the email subject, as it's the trivial solution I know of.
it is depending on your mail server and how it is configured.. (although it is quite a standard) - for example in postfix:
recipient_delimiter = +
you could set it to anything you like .. i once configured it to be a dot so i can use it all over the web.. http://www.postfix.org/postconf.5.html#recipient_delimiter
but you could simply make it configurable in your application as well..
Besides using the email subject or address, one other easy way to accomplish this would be to just stick an identifier number at the bottom of the outgoing email's body. It would then come back to you in the quoted part of the response message. This is much less obtrusive than putting stuff in the subject or address, and if you're using HTML messages you can even make the code invisible.

MongoDB and embedded documents, good use cases

I am using embedded documents in MongoDB for a Rails 3 app. I like that I can use embedded documents and the values are all returned with one query and there is less load on the database server. But what happens if I want my users to be able to update properties that really should be shared across documents. Is this sort of operation feasible with MongoDB or would I be better off using normal id based relations? If ID based relations are the way to go would it affect performance to a great degree?
If you need to know anything else about the application or data I would be happy to let you know what I am working with.
Document that has many properties that all documents share.
Person
name: string
description: string
Document that wants to use these properties:
Post
(references many people)
body: string
This all depends on what are you going to do with your Person model later. I know of at least one working example (blog using MongoDB) where its developer keeps user data inside comments they make and uses one collection for the entire blog. Well, ok, he uses second one for his "tag cloud" :) He just doesn't need to keep centralized list of all commenters, he doesn't care. His blog contains consolidated data from all his previous sites/blogs?, almost 6000 posts total. Posts contain comments, comments contain users, users have emails, he got "subscribe to comments" option for every user who comments some post, authorization is handled by the external OpenID service aggregator (Loginza), he keeps user email got from Loginza response and their "login token" in their cookies. So the functionality is pretty good.
So, the real question is - what are you going to do with your Users later? If really feel like you need a separate collection (you're going to let users have centralized control panels, have site-based registration, you're going to make user-centristic features and so on), make it separate. If not - keep it simple and have fun :)
It depends on what user info you want to share acrross documents. Lets say if you have user and user have emails. Does not make sence to move emails into separate collection since will be not more that 10, 20, 100 emails per user. But if user say have some big related information that always growing, like blog posts then make sence to move it into separate collection.
So answer depend on user document structure. If you show your user document structure and what you planning to move into separate collection i will help you make decision.

Resources