My question :
I need to know if what i'm doing is the best way, and if it's not, what is?
The situation :
I have "Contacts" objects in an array. These contacts must be ordered alphabetically and can have multiple phone numbers. I'm splitting that array into 27 arrays of contacts where each of them reprensents a letter of the alphabet. So i have all my "A" contacts, then "B" and so on.
Also, during the "splitting", I also add a reference of each contact in a dictionary, where the object is the contact, and the key is his phone number.
Because one contact can have X phone numbers, there can be X times the same contact in X different entries in the dictionary. I need that so i can find any contact with any number.
All of the above works like a charm.
Now I need to compare all those numbers from my online database (note: i'm using parse), to see if some of these contacts are already users or not. If they are, they need to be put in a specific section of my tableview. (my tableview is just all the contacts, separated in letter sections, + one "user" section). And the contacts can not appear in the user section AND the letter section. If a contact is a user, he must be separated.
What i'm asking vs What i'm doing :
Right now, i'm just re-looping every array and comparing each element to all the users i've found online. This is a lot of looping and looks like a waste of time and resources.
What i would like to do : Somehow cleaning my arrays of the users i've found, considering i have the reference of the contact object in my dictionary.
TL;DR:
My arrays :
users in the first section, then contacts alphabtically
[[user1, user2, user3, ...],[a1,a2,a3,...],[b1,b2,...],...]
My dictionary :
a1 - phone1
a1 - phone2
a1 - phone3
a2 - phone1
a3 - phone1
...
The ultimate question :
I can very easily find the contact object (since i have his number from my online db). If i interact with the a1 from the dictionary, will it also change the a1 in the array of arrays?
More specifically, can i somehow REMOVE IT from the array considering I don't know which one he is in?
I also add a reference of each contact in a dictionary, where the object is the contact, and the key is his phone number.
You need to be very careful with this approach. It is likely to have collisions. While cell phone numbers are often unique, sometimes they're shared. Home and work numbers are often shared. Phone numbers get reassigned, so your database can wind up with duplicates that way, too. And sometimes people just enter mistaken data. You have to make sure your system behaves reasonably and consistently in those cases. Many behaviors are fine, but "pick one at random" is generally not.
You should think carefully here about what is your model and what is your presentation. One data structure should be the single source of truth. Usually that's going to be your big list of contacts. That has nothing at all to do with how it's displayed. We call this the model. Often it's kept track of as an NSSet since order doesn't matter.
Above that you should have a "view model" that deals with grouping and sorting issues. It is not "truth." You should be willing to throw it away anytime the underlying data changes extensively. Its data structures should point to the underlying model, and should be stored in a way that exactly matches what the table view wants. Keeping the model and the view model separate is one of the best ways to keep your complexity under control. Then you know that there is exactly one place that data can change (the model), and everything else just reacts to that.
Regarding your partitioning problem: so you have a list of contacts and you want to separate them into two groups based on whether any of their phone numbers appear in another list. If your total list is only a few dozen entries long, frankly it doesn't matter how you do this. Even O(n^2) is fine for small enough n. Focus on making it simple and reliable first, then profile with Instruments to see where the real bottlenecks are.
That said, usually the fastest way to determine set intersection is to sort both sets and walk through them both at the same time. So you'd create a "contacts" array of "phone number + contact pointer" and a "users" array of just phone numbers. Sort them both by phone number. Walk over them, comparing the current element of each list, and then incrementing the index of the smaller one. If no match, put the contact in one list. If a match, put it in the other.
But I'd probably just stick all the phone numbers in a set and use member: to look them up. It's just usually easier.
Related
I'm looking to build an app that functions like a dating app:
User A fetches All Users.
User A removes Users B, C, and D.
User A fetches All Users again - excluding Users B, C, and D.
My goal is to perform a query that does not read the User B, C, and D documents in my fetch query.
I've read into array-contains-any, array-contains, not-in queries, but the 10 item limit prevents me from using these as options because the "removed users list" will continue to grow.
2 workaround options I've mulled over are...
Performing a paginated fetch on All User documents and then filtering out on the client side?
Store all User IDs (A, B, C, D) on 1 document in an array field, fetch the 1 document, and then filter client side?
Any guidance would be extremely appreciated either on suggestions around how I store my data or specific queries I can perform.
You can do it the other way around.
Instead of a removed or ignored array at your current user, you have an array of ignoredBy or removedBy in which you add your current user.
And when you fetch the users from the users collection, you just have to check if the requesting user is part of the array ignoredBy. So you don’t have tons of entries to check in the array, it is always just one.
Firestore may get a little pricey with the Tinder model but you can certainly implement a very extensible architecture, well enough to scale to millions of users, without breaking a sweat. So the user queries a pool of people, and each person is represented by their own document, this much is obvious. The user must then take an action on each person/document, and, presumably, when an action is taken that person should no longer reappear in the user's queries. We obviously can't edit the queried documents because there could be millions of users and that wouldn't scale. And we shouldn't use arrays because documents have byte limits and that also wouldn't scale. So we have to treat a collection like an array, using documents as items, because collections have no known limit to how many documents they can contain.
So when the user takes an action on someone, consider creating a new document in a subcollection in the user's own document (user A, the one performing the query) that contains the person's uid, and perhaps a boolean to determine if they liked or disliked that person (i.e. liked: true), and maybe a timestamp for UI purposes. This new document is the item in your limitless array.
When the user later performs another query, those same users are going to reappear in the results, which you need to filter out. You have no choice but to check if each person's uid is in this subcollection. If it is, omit the document and move to the next. But if your UI is configured like Tinder's, where there isn't a list of people to scroll through but instead cards stacked on top of each other, this is no big deal. The user will only be presented with one person at a time and they won't know how many you're filtering out behind the scenes. With a paginated list, the user may see odd behavior like uneven pages. The drawback is that you're now paying double for each query. Each query will cost you the original fetch and the subcollection-check fetch. But, hey, with this model you can scale to millions of users without ever breaking a sweat.
(my reason for asking this question is based on having read this answer, which made me rethink my current setup)
I currently am developing a ruby on rails application in which there are many languages, each of which has a dictionary of base words attached to it, as well as a list of the words that map to each base word. The way I currently have it set up, there is a base_words table that contains the base_word as a string, along with the language_id as a foreign key. There is also a words table, each row of which contains a word string, along with the base_word_id as a foreign key. There is also a language_id indexed on each column, although I'm almost positive that this is superfluous due to the language_id on base_word, so I'm planning to take it off (although this could be a bad assumption on my part).
In sum, on the contrary to the answer I mentioned in the beginning, the tables are not separated by language, because I've reasoned that I can simply pull out the language words programmatically when the time comes. However, my application will also have translation(s) associated with each base word (as did the answer I referenced), and so I'm doubting my structure due to the realization that each translation will actually be a base_word in the same table as itself, which would mean that the translation would actually be just an id of another base word in said table. This may be completely fine, or it may not be - I have no clue (this is my first ever programming project).
Is this ok? Do I need to separate my base_words into separate tables for each language, or can I leave it all in one table?
Another example: I also need to store many phrases for each language, along with their translations. Should I have one table where each row has the appropriate translation of the phrase, or one table where each row contains simply one phrase and a language_id, or multiple tables (one for each language)?
Un saludo,
Michael
As in the other scenario, you'll have a translations table. There is no technical reason it couldn't have multiple foreign keys to base_words (a source_word_id and target_word_id, perhaps). So yes, you can absolutely store all your words in one table. There are some minor side effects involved with translations being directional relationships: it becomes possible to have translations which only work one way, and there will be many pairs of entries with opposite source and target. Neither of these is much of a worry: the first is even potentially desirable in order to represent words with double meanings in one language but not the other, and as for the second, space is cheap and indexing is easy.
You are correct that you do not need words.language_id, so long as you always join base_words when you're querying words and the language matters. This obviously changes if you have a use case where it makes sense to leave base_words out, but that scenario sounds unlikely based on what you describe.
As for phrases: why should they be handled any differently than base_words?
Delphi XE6. Looking to implemented a limited style of search, specifically an edit field for the user to enter a business name which would get looked up. I need to allow the user to enter multiple words, or part of multiple words. For Example, on a business "First Bank of Kansas", user should be able to enter "Fir Kan", and it should return a match. This means an inverted index type of structure. I have some type of list of each unique word, then a (document ID, primary Key ID, etc, which is an integer). I am struggling with WHAT type of structure to make this... I have approximately 250,000 business names, which have 43,500 unique words. Word count will vary from 1 occurrence of a word to several thousand (company, corporation, etc) I have some requirements...
1). Assume the user enters BAN. I need to find ALL words that start with BAN. I need to return BANK, BANKER, etc... This means that whatever structure I use, I have to be able to find BAN and then move to the next alphabetic entry... and keep moving to the next until I find a value that does NOT start with BAN. This eliminates any type of HASH structure, correct?
2). I obviously want this to be fast. HASH is the fastest, but I can't use this, correct? See requirement 1.
3). Each entry in this structure needs to be able to hold a list of integers. If I end up going with a LinkedList, then each element has to hold a list of Integers.
4). I need to be able to save and load this structure. I don't want to have to build it each time I use it.
Whatever I end up with, it appears to have to be a NESTED structure, a higher level list (LinkedList?) with each node being an Integer List.
What am I looking for? What do commercial product use? Outlook, etc have search capabilities.
Every word is linked to a specific set of IDs, each representing a business name, right?.
I recommend using a binary tree data structure because effort for searching is normally log(n), which is quite fast. Especially, if business names are changing at runtime, an AVLTree should do well, although it's quite some work to implement it by yourself. But there should be many ready-to-use units on binary trees all over the internet.
For each successful search for a word in your tree data structure, you should take their list of IDs and aggregate those grouped by the entered word they succeeded for.
As the last step you take all those aggregated lists of IDs and do an intersection.
There should only be IDs left which are fitting to all entered words. Those IDs are referencing the searched business names.
This is less of a how to in a technical sense, but a more of what approach to use in an algorithmic way, I guess..
I have a Photo model, which has an id, created_at and the image itself.
I want to allow the user to order their photos in whatever order they feel like. So I guess I can add an attribute which will note the order somehow, and then reorder it by that column. But how would I build that column in a way that is efficient?
My options as I see it are:
a simple integer to denote the order. so 1,2,3,4,5. If the user chooses to put photo#5 before photo#2, I need to reassign all photos with a new sequential numbering to match the new order. With many photos, and drag and drop, this could have a lot of writes to the DB, and could be slow and inefficient
Make it so that any photo that is first, will get a higher number, so when the user puts photo#5 before photo#2, #5 will get a higher number than #2 but smaller than #1, but this can also get messy pretty quick..
Allow only "bump to first place or bump to last place" and in the last place make it a larger number than the previous last, and in the first place make it a smaller number than the previous first place. seeing that users won't have millions of photos, using an integer could work.
linked-list - this could technically work, but only in very limited situation where I have/ want to use all the photos. If I need a subset of the photos and want it custom ordered this won't work. I prefer a way that I can use <=> in o(1) and know immediately how to sort and not to go through all of it (which would be o(n^2))
Is there a better way to do this?
I have done the exactly same stuff in RoR. I think the approach you choose depends on what kind of operation you will do on the model most frequently.
I tried to use database to implement a double linked list. Which means, your Photo model have two more attributes, prev and next. prev is the id of the previous Photo item, and next is the id of the next Photo item. If you are still not clear, check out any data structure book about double linked list.
For this data structure, complexity for inserting is O(1), and querying is O(n).
Another approach is the one you mentioned in item 1: a simple integer to denote the order. so 1,2,3,4,5. .... Complexity inserting is O(n), and querying is O(1).
Thus, if you do inserting more than querying, choose my approach. Otherwise choose your first approach.
There are two ways that I can think to do this. Two basic data structures are Arrays and LinkedLists. The main difference between the two is that an item in an Array is referenced by it's index location, where as an item in a LinkedList is referenced by what's in front of it and what's behind it.
Therefor you could have a location (or number as you mentioned) that is correlated to is position on the page or screen, and you can store this number in the model and change it whenever the user moves the image. To do this effectively you would not want to push every image back one or forward one, but instead only allow swapping. So two photos could swap places easily.
Another way to do this would be to have each photo have a pointer that points to the next photo, in essence creating a linked list. Then you would print out all the photos until you reach a null pointer. This would be very easy to move one photo around.
To insert a new photo, you set the pointer index of the photo being inserted to the next photo in the list, and then you change the photo that was pointing to the next photo, to the photo being inserted.
1 -> 2 -> 3 #insert 4 after 1, before 2
1 -> 2 -> 3 4 -> 2 # point 4's pointer to 2
1 -> 4 -> 2 -> 3 # change 1's pointer to 4
I was reading through rails tutorial (http://ruby.railstutorial.org/book/ruby-on-rails-tutorial#sidebar-database_indices) but confused about the explanation of database indicies, basically the author proposes that rather then searching O(n) time through the a list of emails (for login) its much faster to create an index, giving the following example:
To understand a database index, it’s helpful to consider the analogy
of a book index. In a book, to find all the occurrences of a given
string, say “foobar”, you would have to scan each page for “foobar”.
With a book index, on the other hand, you can just look up “foobar” in
the index to see all the pages containing “foobar”.
source:
http://ruby.railstutorial.org/chapters/modeling-users#sidebar:database_indices**
So what I understand from that example is that words can be repeated in text, so the "index page" consists of unique entries. However, in the railstutorial site, the login is set such that each email address is unique to an account, so how does having an index make it faster when we can have at most one occurrence of each email?
Thanks
Indexing isn't (much) about duplicates. It's about order.
When you do a search, you want to have some kind of order that lets you (for example) do a binary search to find the data in logarithmic time instead of searching through every record to find the one(s) you care about (that's not the only type of index, but it's probably the most common).
Unfortunately, you can only arrange the records themselves in a single order.
An index contains just the data (or a subset of it) that you're going to use to search on, and pointers (or some sort) to the records containing the actual data. This allows you to (for example) do searches based on as many different fields as you care about, and still be able to do binary searching on all of them, because each index is arranged in order by that field.
Because the index in the DB and in the given example is sorted alphabetically. The raw table / book is not. Then think: How do you search an index knowing it is sorted? I guess you don't start reading at "A" up to the point of your interest. Instead you skip roughly to the POI and start searching from there. Basically a DB can to the same with an index.
It is faster because the index contains only values from the column in question, so it is spread across a smaller number of pages than the full table. Also, indexes usually include additional optimizations such as hash tables to limit the number of reads required.