In holochain-rust what is the best way get and show a list of all users? - dht

I am wanting to show a list of all users for a particular app in holochain to enable an active user to make an agreement with someone. What is a best practice for getting a list of all users given the linking nature of the data flow?
Would it make sense to create a central agent that links to all the users to get access to the full user list? Is there a way that seems better?

Link from the DNA Hash (not recommended; read about anchors below)
Each user could link from the DNA hash to themselves in the genesis() function -- the DNA hash is the one hash on the DHT that everyone knows. Then all you need to do is getLinks(App.DNA.Hash, "user") to get them all. (watch out, it could get to be a huge list. I also feel bad for the poor nodes who are in the neighborhood of the DNA hash... that's a lot of metadata to store.
Doing this in Genesis
This can be done in the genesis function. I'll do it in old holochain-proto language, if you don't mind:
function genesis() {
commit({ Links: [ { Base: App.DNA.Hash, Link: App.Key.Hash, Tag: "registered_user" } ] };
}
That will create a sort of 'registered agent' thing for each new person who joins.
The problem with this approach
it's a bit of an antipattern, because a redundant copy of the DNA lives in the DHT as well, and the poor souls whose nodes are in the DHT hash neighbourhood of the DNA will have a higher load than others. Right now I'd recommend anchors instead. Anchors are nothing but a pattern that consists of a string entry + a link. So you would create an anchor whose content is "registered_users", and link from that anchor to any agent who comes on board. Still creates a hotspot for those who hold the anchor entry, but it's expected that your app will have a few anchors like this, and at least they don't have to all hang off the one DNA hash.
Linking with Anchors
The anchors mixin (currently compatible with hc-proto only) features an idempotent function to create the anchor, so each user could safely call anchor() without re-creating an existing anchor.

Related

Delphi - What Structure allows for SAVING inverted index type of information?

Delphi XE6. Looking to implemented a limited style of search, specifically an edit field for the user to enter a business name which would get looked up. I need to allow the user to enter multiple words, or part of multiple words. For Example, on a business "First Bank of Kansas", user should be able to enter "Fir Kan", and it should return a match. This means an inverted index type of structure. I have some type of list of each unique word, then a (document ID, primary Key ID, etc, which is an integer). I am struggling with WHAT type of structure to make this... I have approximately 250,000 business names, which have 43,500 unique words. Word count will vary from 1 occurrence of a word to several thousand (company, corporation, etc) I have some requirements...
1). Assume the user enters BAN. I need to find ALL words that start with BAN. I need to return BANK, BANKER, etc... This means that whatever structure I use, I have to be able to find BAN and then move to the next alphabetic entry... and keep moving to the next until I find a value that does NOT start with BAN. This eliminates any type of HASH structure, correct?
2). I obviously want this to be fast. HASH is the fastest, but I can't use this, correct? See requirement 1.
3). Each entry in this structure needs to be able to hold a list of integers. If I end up going with a LinkedList, then each element has to hold a list of Integers.
4). I need to be able to save and load this structure. I don't want to have to build it each time I use it.
Whatever I end up with, it appears to have to be a NESTED structure, a higher level list (LinkedList?) with each node being an Integer List.
What am I looking for? What do commercial product use? Outlook, etc have search capabilities.
Every word is linked to a specific set of IDs, each representing a business name, right?.
I recommend using a binary tree data structure because effort for searching is normally log(n), which is quite fast. Especially, if business names are changing at runtime, an AVLTree should do well, although it's quite some work to implement it by yourself. But there should be many ready-to-use units on binary trees all over the internet.
For each successful search for a word in your tree data structure, you should take their list of IDs and aggregate those grouped by the entered word they succeeded for.
As the last step you take all those aggregated lists of IDs and do an intersection.
There should only be IDs left which are fitting to all entered words. Those IDs are referencing the searched business names.

Mahout - Item Similarity, but exclude Items a User has already "bought"

I want to create a video recommender, which recommends via similarity. The challenge is, that I want to exclude videos that the user has already seen. This seems like a pretty obvious case to me, but I don't find it covered.
Any hint is appreciated!
This is the default behavior of any recommender, to not return items that already appears in the user's input vector. Certainly it's how the ones I have worked on work.
Do you really mean how? It's just a filtering step. You just don't consider any item that exists when you look it up in the input.
You can always post-process results any way you want beyond this. Mahout/Myrrix both have an IDRescorer abstraction that lets you inject whatever logic you want to remove or boost items in the results. Here's a writeup on rescoring that applies to both.

Questions about implementing surrogate key in Ruby on Rails

For an upcoming project we need to have unique real world identifiers that are exposed to users for things like Account Numbers or Case Numbers (like a bug tracking ID). These will always be system generated and unchangeable. Right now we plan to run strictly on Heroku.
While (as my name would suggest) I am new to the wonderfulness that is Ruby on Rails, I have a long background in enterprise application development. I'm trying to bridge between what I have done in the past while doing in the "RoR way"
Obviously RoR has wonderful primary key support. I have read dozens of posts here recommending to adapt business requirements to just use the out of the box id/key methodology.
So let me describe what I am trying to accomplish and please let me know if you have faced similar objectives and what approach you took.
1) Would like to have a human readable key with a consistent length. There is value in always having an Account ID or Transaction ID that is the same length (for form validation, training sales staff, etc.) Using Ruby's innate key generation one could just add buffer characters (e.g. 100000 instead of 1).
2) Compactness: My initial plan was to go with a base 36 unique key (e.g. 36 values [0..9],[a..z]). As part of our API/interface we plan on exposing certain non-confidential objects based on a shortform URL (e.g. xx.co/000001). I like the idea of being able to have a five character identifier in base 36 vs. 7+ in decimal.
So I can think of two possible approaches:
a) add my own field and develop my own unique key generator (or maybe someone will point me to one).
b) Pad leading digits (and I assume I can force the unique key generation to start at 1xxxxxxx rather than 0000001). Then use the to_s(36) method to convert it to and from base 36 for all interactions with humans. Maybe even store the actual ID value in the database in the base 36 format to avoid ongoing conversions, but always do the conversion before a query to avoid the need to have another index.
I'm leaning towards approach B, as it seems like it would be optimal from a DB performance standpoint and that it would require the least investment in non-value added overhead. Once again, any real world experience with these topics and thoughts on the best approach would be greatly appreciated.
Thanks in advance!
I would never use the primary key in a Rails table for anything of business importance. There will come a day when someone on the business end will want to change it, and it'll end up being an enormous pain in the butt and will invalidate a bunch of URLs you and your users thought were canonical and will mess up all your foreign keys and blah blah blah. It's just a really bad idea and I would encourage you not to do it.
The Rails way to do this is have a new column, called something like number or bug_tracking_number or whatever strikes your fancy, and before_validation implement a callback that gives it a value. This is where you can let your creativity shine; something like this sounds like what you want:
before_validation( :on => :create ) do
self.number = CaseNumber.count + 1
end
You can pad the number there, ensure its uniqueness, or do whatever else you want.

Help me understand mnesia (NoSQL) modeling

In my Quest to understanding Mnesia, I still struggle with thinking in relational terms. So I will put my struggles up here and ask for the best way to solve them.
one-to-many-relations
Say I have a bunch of people,
-record(contact, {name, phone}).
Now, I know that I can define phone to always be saved as a list, so people can have multiple phone numbers, and I suppose that's the way to do it (is it? How would I then look this up the other way around, say, finding a name to a number?).
many-to-many-relations
now let's suppose I have multiple groups I can put people in. The group names don't have any significance, they are just names; the concept is "unix system groups" or "labels". Naively, I would model this membership as a proplist, like
{groups [{friends, bool()}, {family, bool()}, {work, bool()}]} %% and so on...
as a field within the "contact" record from above, for example. What is the best way to model this within mnesia if I want to be able to lookup all members of a group based on group name quickly, and also want to be able to lookup all group an individual is registered in? I also could just model this as a list containing just the group identifiers, of course. For use with mnesia, what is the best way to model this?
I apologize if this question is dumb. There's plenty of documentation on mnesia, but it's lacking (IMO) some good examples for the overall use.
For the first example, consider this record:
-record(contact, {name, [phonenumber, phonenumber, ...]}).
contact is a record with two fields, name and phone where phone is a list of phone numbers. As user425720 said it could make sense to store these as something else than strings, if you have extreme requirements for small storage footprint, for example.
Now here comes the part that is hard to "get" with key-value stores: you need to also store the inverse relationship. In other words, you need something similar to the following:
-record(phone, {phonenumber, contactname}).
If you have a layer in your application to abstract away database handling, you could make it always add/change the phone records when adding/changing a contact.
--
For the second example, consider these two records:
-record(contact, {uuid, name, [group_id, group_id]}).
-record(group, {uuid, name, [contact_id, contact_id]}).
The easiest way is to just store ids pointing to the related records. As Mnesia has no concept of referential integrity, this can become out of sync if you for example delete a group without removing that group from all users.
If you need to store the type of group on the contact record, you could use the following:
-record(contact, {name, [{family, [group_id, group_id]}, {work, [..]}]}).
--
Your second problem could also be solved by using a intermediate record, which you can think of as "membership".
-record(contact, {uuid, name, ...}).
-record(group, {uuid, name, ...}).
-record(membership, {contact_uuid, group_uuid}). # must use 'bag' table type
There can be any number of "membership" records. There will be one record for every users group.
First of all, you ask for key-value store design patters. Perfectly fine.
Before I will try to answer your question lets make it clear - what is Mnesia. It is k-v DB, which is included in OTP. Because it is native, it is very comfortable to use from Erlang. But be careful. This is old database with very ancient assumptions (e.g. data distribution with linear hashing). So go ahead, learn and play with it, but for production take your time and browse NoSQL shop to find the best for your needs.
#telephone example. Do not store stuff as strings (list()) - it is very heavy for GC. I would make couple fields like phone_1 :: < < binary > > , phone_2 :: < < binary > >, phone_extra :: [ < < binary > > ] and build index on the most frequent query-field. Also mnesia indicies are tricky - when node crashes and goes up, they need to rebuild themselves (it can take awfully lot of time).
#family example. It quite hard with flat namespace. You may play with more complex keys.. Maybe create separate table for TheGroup and keep identifiers of members? Or each member would have ids of groups he belongs (hard to maintain..). If you want to recognize friends I would implement some sort of contract before presenting data (A is B's friend iff B is A's friend) - this approach would cope with eventual consistency and conflicts in data.

How would you design a hackable url

Imagine you had a group of product categories organized in a nice tree hierarchy and you wanted to provide hackable urls to browse these. You could do something like this
/catalog/categorya/categoryb/categoryc
You could then quite easily figure out which category you should list the products for (note that the full URL is needed since you could have categories with the same name but at different locations in the hierarchy)
Now what would be a good approach to add product information in that as well? To give you an example, you wanted to display the product Oblivion for this category
/catalog/games/consoles/playstation/adventure
It's tempting to just add the product at the end of the url
/catalog/games/consoles/playstation/adventure/oblivion
but the moment you do so you loose the ability to know if its category or a product which is called oblivion. I personally feel that not being forced to add a suffix such as .html
/catalog/games/consoles/playstation/adventure/oblivion.html
would be the nicest solution and using some sort of prefix, such as
/catalog/games/consoles/playstation/adventure/product:oblivion
You could also add some sort of trigger like
/catalog/games/consoles/playstation/adventure/PRODUCT/oblivion
not as nice either and you would (even though its very unlikely it would be a problem) restrict yourself from having a category called product
So far a suffix solution looks like the most user-friendly approach that I can think of from the top of my head but I'm not fond of having to use an extension
What are your thoughts on this?
Deep paths irk me. They're hideous to share.
/product/1234/oblivion --> direct page
/product/oblivion --> /product/1234/oblivion if oblivion is a unique product,
--> ~ Diambiguation page if oblivion is not a unqiue product.
/product/1234/notoblivion -> /product/1234/oblivion
/categories/79/adventure --> playstation adventure games
/categories/75/games --> console games page
/categories/76/games --> playstation games page
/categories/games --> Disambiguation Page.
Otherwise, the long urls, while seeming hackable, require you to get all node elements right to hack it.
Take php.net
php.net/str_replace --> goes to
http://nz2.php.net/manual/en/function.str-replace.php
And this model is so hackable people use it all the time blindly.
Note: The .html suffix is regarded by the W3C as functionally meaningless and redundant, and should be avoided in URLs.
http://www.w3.org/Provider/Style/URI
Lets disect your URL in order to be more DRY (non-repetitive). Here is what you are starting with:
/catalog/games/consoles/playstation/adventure/oblivion
Really, the category adventure is redundant as the game can belong to multiple genres.
/catalog/games/consoles/playstation/oblivion
The next thing that strikes me is that consoles is also not needed. It probably isn't a good idea to differentiate between PC's and Console machines as a subsection. They are all types of machines and by doing this you are just adding another level of complexity.
/catalog/games/playstation/oblivion
Now you are at the point of making some decisions about your site. I would recommend removing the playstation category on your page, as a game can exist across multiple platforms and also the games category. Your url should look like:
/catalog/oblivion
So how do you get a list of all the action games for the Playstation?
/catalog/tags/playstation+adventure
or perhaps
/catalog/tags/adventure/playstation
The order doesn't really matter. You have to also make sure that tags is a reserved name for a product.
Lastly, I am assuming that you cannot remove the root /catalog due to conflicts. However, if your site is tiny and doesn't have many other sections then reduce everything to the root level:
/oblivion
/tags/playstation/adventure
Oh and if oblivion isn't a unique product just construct a slug which includes it's ID:
/1234-oblivion
Those all look fine (except for the one with the colon).
The key is what to do when they guess wrong -- don't send them to a 404 -- instead, take the words you don't know and send them to your search page results for that word -- even better if you can spell check there.
If you see the different pieces as targets then the product itself is just another target.
All targets should be accessable by target.html or only target.
catalog/games/consoles/playstation.html
catalog/games/consoles/playstation
catalog/games/consoles/playstation/adventure.html
catalog/games/consoles/playstation/adventure
catalog/games/consoles/playstation/adventure/oblivion.html
catalog/games/consoles/playstation/adventure/oblivion
And so on to make it consistent.
My 5 cents...
One problem is that your user's notion of a "group of product categories organized in a nice tree hierarchy" may match yours.
Here's a google tech talk by David Weinberger's "Everything is Miscellaneous" with some interesting ideas on categorizing stuff:
http://www.youtube.com/watch?v=x3wOhXsjPYM
#Lou Franco yeah either method needs a sturdy fallback mechanism and sending it to some sort of suggestion page or seach engine would be good candidates
#Stefan the problem with treating both as targets are how to distinguish them (like I described). At worst case scenario is that you first hit your database to see if there is a category which satisfies the path and if it doesn't then you check if there is a product which does. The problem is that for each product path you will end up making a useless call to the database to make sure its not a category.
#some yeah a delimiter could be a possible solution but then a .html suffix is more userfriendly and commonly known of.
i like /videogames/consolename/genre/title" and use the amount of /'s to distinguish between category or product. The only thing i would be worried about multi (or hard to distinguish) genre. I highly recommend no extension on title. You could also do something like videogames(.php)?c=x360;t=oblivion; and just guess the missing information however i like the / method as it looks more neat. Why are you adding genre? it may be easier to use the first letter of the title or just to do videogame/console/title/
My humble experience, although not related to selling games, tells me:
editors often don't use the best names for these "slugs", they don't chose them wisely.
many items belong (logically) to several categories, so why restrict them (technically) to a single category?
Better design item urls by ids, (i.e. /item/435/ )
ids are stable (generated by the db, not editable by the editor), so the url stands a much bigger chance at not being changed over time
they don't expose (or depend on) the organization of the objects in the database like the category/item_name style of urls does. What if you change the underlying design (object structure) to allow an item to belong to multiple categories? the category/item urls suddenly won't make sense anymore; you'll change your url design and old urls might not work anymore.
Labels are better than categories. That is to say, allowing an item to belong to several categories is a better approach than assigning one category to each item.
the problem with treating both as
targets are how to distinguish them
(like I described). At worst case
scenario is that you first hit your
database to see if there is a category
which satisfies the path and if it
doesn't then you check if there is a
product which does. The problem is
that for each product path you will
end up making a useless call to the
database to make sure its not a
category.
So what? There's no real need to make a hard distinction between products and categories, least of all in the URI, except maybe a performance concern over an extra database call. If that's really such a big deal to you, consider these two suggestions:
Most page views will presumably be on products, not categories. So doing the check for a product first will minimize the frequency with which you need to double up on the database lookups.
Add code to your app to display the time taken to generate each page, then go out to the nearest internet cafe (not your internal LAN!) with a stopwatch. Bring up some pages from your site and time how long each takes to come up. Subtract the time taken to generate the page. Also compare the time taken to generate one-database-lookup pages vs. two-database-lookup pages. Then ask yourself, when it takes maybe 1-2 seconds total to establish a network connection, generate the content, and download the content, does it really matter whether you're spending an extra 0.05 second or less for an additional database lookup or not?
Optimize where it matters, like making URLs that will be human-friendly (as in Chris Lloyd's answer). Don't waste your time trying to shave off the last possible fraction of a percent.

Resources