I recently uploaded 37,000 strings of data to Firebase (name of all cities in USA). To find that it takes way too long to go through each one using the observe .childAdded method to upload it to a basic table view.
Is there an alternative? How can I get the data to my app faster? The data shouldn’t change.... so is there an alternative?
There is no way to load the same data faster. Firebase isn't artificially throttling your download speed, so the time it takes to read the 37,000 strings, is he time it takes to read the 37,000 strings.
To make your application respond faster to the user, you will have to load less data. And since it's unlikely your user will read all 37,000 strings, a good first option is to only load the data that they will see.
Since you're describing an auto-complete scenario, I'd first look at using a query to only retrieve child nodes that match what they already typed. In Firebase that'd be something like this:
ref.queryOrdered(byChild: "name")
.queryStarting(atValue: "stack")
.queryEnding(atValue: "stack\u{f8ff}")
This code takes (on the server) all nodes under ref, and orders them by name. It then finds the first one starting with stack and returns all child nodes until it finds one not starting with stack anymore.
With this approach the filtering happens on the server, and the client only has to download the data that matches the query.
This is something easily solvable using Algolia. Algolia can search large data sets without taking much time at all. This way, you query Algolia and never need to look at the Firebase database.
In your Firebase Functions, listen for any new nodes in the place you keep your names of cities, and when that function gets called, add that string to your Algolia index.
You can follow the Algolia docs here: Algolia Docs
Related
I have a rather long and complex paginated query. I'm trying to optimize it. In the worst case - first, I have to execute the data query in a one call to Neo4j, and then I have to execute pretty much the same query for the count. Of course, I do everything in one transaction. Anyway, I don't like the overall execution time, so I extracted the most common part for both - data and count queries and execute it on the first call. This common query returns the IDs of nodes, which I then pass as parameters to the rest of data and count queries. Now, everything works much faster. One thing I don't like is that a common query can sometimes return quite a large set of IDs.. it can be 20k..50k Long IDs.
So my question is - because I'm doing this in a one transaction - is there a way to preserve such Set of IDs somewhere in Neo4j between common query and data/count query calls and just refer them somehow in the subsequent data/count queries without moving between app JVM and Neo4j?
Also, am I crazy for doing this, or is this a good approach to optimize a complex paginated query?
Only with a custom procedure.
Otherwise you'd need to return them.
But usually it's uncommon to both provide counts (even google doesn't provide "real" counts) and data.
One way is to just stream the results with the reactive driver as long as the user scrolls.
Otherwise I would just query for pageSize+1 and return "more than pageSize results".
If you just stream the id's back (and don't collect them as an aggregation) you can start using the id's received already to issue your new queries (even in parallel).
regarding to asked question here :
suppose that we have ProductCreated and ProductRenamed events which both contain the title of the product.now we want to query EventStoreDB for all events of type ProductCreated and ProductRenamed with the given title.i want all these events to check whether there is any product in the system which has been created or renamed to the given title, so that i could throw the exception of repetitive title in the domain
i am using MongoDB for creating UI reports from all the published events and everything is fine there.but for checking some invariants, like checking for unique values, i have to either query the event store for some events along with their criteria and by iterating over them, decide whether there is a product created with the same title which has not renamed or a product renamed with the same title.
for such queries, the only way that event store provides is creating a one-time projection with the proper java script code which filters and emits required events to a new stream.and then all i have to do is to fetch events from the new generated stream which is filled by the projection
no the odd thing is, projections are great for subscriptions and generating new streams, but they seem to be odd for doing real time queries.immediately after i create a projection with the HTTP api, i check the new resulting stream for the query result, but it seems that the workers has not got the chance to elaborate on the result and i get 404 response.but after waiting for a bunch of seconds, the new streams pops out and gets filled with the result.
there are too many things wrong with this approach:
first, it seems that if the event store is filled with millions of events across many streams, it wont be able to process and filter all of them immediately to the resulting stream.it does not create the stream immediately, let alone the population.so i have to wait for some time and check for the result hoping the the projection is done
second, i have to fetch multiple times and issue multiple GET HTTP commands which seems to be slow.the new JVM client is not ready yet.
Third, i have to delete the resulting stream after i'm done with the result and failing to do so will leave event store with millions of orphan query result streams
i wish i could pass the java script to some api and get the result page by page like querying MongoDB without worrying about the projection, new streams and timing issues.
i have seen a query section in the Admin UI, but i dont know whats that for, and unfortunetly the documentation doesn't help much
am i expecting the event store to do something that is impossible?
do i have to create a bounded context inner read model for doing such checks?
i am using my events to dehyderate the aggregates and willing to use the same events for such simple queries without acquiring other techniques
I believe it would not be a separate bounded context since the check you want to perform belongs to the same bounded context where your Product aggregate lives. So, the projection that is solely used to prevent duplicate product names would be a part of the same context.
You can indeed use a custom projection to check it but I believe the complexity of such a solution would be higher than having a simple read model in MongoDB.
It is also fine to use an existing projection if you have one to do the check. It might be not what you would otherwise prefer if the aim of the existing projection is to show things in the UI.
For the collection that you could use for duplicates check, you can have the document schema limited to the id only (string), which would be the product title. Since collections are automatically indexed by the id, you won't need any additional indexes to support the duplicate check query. When the product gets renamed, you'd need to delete the document for the old title and add a new one.
Again, you will get a small time window when the duplicate can slip in. It's then up to the business to decide if the concern is real (it's not, most of the time) and what's the consequence of the situation if it happens one day. You'd be able to find a duplicate when projecting events quite easily and decide what to do when it happens.
Practically, when you have such a projection, all it takes is to build a simple domain service bool ProductTitleAlreadyExists.
There are at least 2 main collection types used in Realm:
List
Results
The relevant description from the documentation on a Results object says:
Results is an auto-updating container type in Realm returned from
object queries.
Because I want my UITableView to respond to any changes on the Realm Object Server, I really think I want my UITableView to be backed by a Results object. In fact, I think I would always want a Results object to back my UI for this reason. This is only reinforced by the description of a List object in the documentation:
List is the container type in Realm used to define to-many
relationships.
Sure seems like a List is focused on data modeling... So, being new to Realm and just reading the API, I'm thinking the answer is to use the Results object, but the tutorial (Step 5) uses the List object while the RealmExamples sample code uses Results.
What am I missing? Should I be using List objects to back my UITableViews? If so, what are the reasons?
Short answer: use a List if one already exists that closely matches what you want to display in your table view, otherwise use a Results.
If the data represented by a List that's already stored in your Realm corresponds to what you want to display in your table view, you should certainly use that to back it. Lists have an interesting property in that they are implicitly ordered, which can sometimes be helpful, like in the tutorial you linked to above, where a user can reorder tasks.
Results contain the results of a query in Realm. Running this query typically has a higher runtime overhead than accessing a List, by how much depends on the complexity of the query and the number of items in the Realm.
That being said, mutating a List has performance implications too since it's writing to the file in an atomic fashion. So if this is something that will be changing frequently, a Results is likely a better fit.
You should use Results<> as the Results is auto updating to back your UITableView. List can be used to link child models in a Realm model. where as Results is used to query the Realm Objects and you should add a Realm Notification Token so you know when the Results are updated and take necessary action (reload table view etc.) Look here for realm notifications: https://realm.io/docs/swift/latest/#notifications
P.S. The data in that example is just static and no changes are observed
I'm a bit confused on the proper way to query Firebase with GeoFire results.
Reading the directions for GeoFire, I should keep locations and user data separate in the tree. So I have the following:
-location
- userID
- g
- l
- 0:lat
- 1:long
- user
-userID
- ...
Using a Geofire query, I have an array of userIDs that are nearby. This is where I get confused, I don't see any methods for Firebase to query an Array of items. I would like to retrieve the user data for every user in the array.
I could loop through the list and make multiple requests, but that seems to be an inefficient way to get the desired results.
I don't see any methods for Firebase to query an Array of items. I would like to retrieve the user data for every user in the array.
There indeed isn't a call for that.
I could loop through the list and make multiple requests, but that seems to be an inefficient way to get the desired results.
While it may seem inefficient, it actually is not. The reason for this is that Firebase is quite efficient at loading multiple items, since it retrieves all of them over the same web socket and pipelines the request.
For a more elaborate explanation, see Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly
Currently I'm thinking about adding a json array column (I'm using postgres) and just pumping log messages for the object into this attribute. I want to log progress (The object is an import report that does a lot of stuff and takes a while so it's useful to have a sense of what's currently happening - how many rows have been imported, how many rows have been normalized, etc -
The other option is to add one of the gems that allow you to see logs streamed in a view, but this I think isn't as useful since what I'm looking for is something where I can see the history of this specific object.
Using a json column or json[] (PostgreSQL array of json) is a very bad idea for logging.
Each time you update it, the whole column contents must be read, modified in memory, and written out again in their entirety.
Instead, create a table used for logs for objects of this kind, with a FK to the table being logged and a timestamp for each entry. Insert a row for each log entry.
BTW, if the report runs in a single transaction, other clients won't be able to see any of the log rows until the whole view commits, in which case it won't be good for progress monitoring, but neither will your original idea. You'll need to use NOTICE messages instead.