I wanted to create an application that will be able to run on multiple machines and store some information. I would then be able to query any given instance of it (knowing its address) and get back the data.
If again this particular instance does not have this data then it should be able to query the other instances and see if it can find it.
The hard thing with this is that new instances might be opened or closed at any time.
During my earlier tests what I did was have static connections and each time one of those came online it would report the data it held to the others. This is not practical though.
What would be a good way to achieve this?
Related
I am currently implementing a web application in .net core(C#) using entity framework. While working on the project, I actually encountered quite a few challenges but I will start with the one which I think are most important. My questions are as follows:
Instead of frequent loading data from the database, I am having a set of static objects which is a mirror of the data in the database. However, it is tedious and error prone when I want to ensure any changes, i.e., adding/deleting/modifying of objects are being saved to the database at real time. Is there any good example or advice that I can refer to improve my approach to do this?
Another thing is that value of some objects' properties will be changed on the fly according to the value of some other objects' properties. Something like a spreadsheet where a cell's value will be changed automatically if the value in the cell that the formula is referring to changes. I do not have a solution to do this yet. Appreciate if anyone has any example that I can refer to. But this will add another layer of complexity to sync the changes of the objects in memory to database.
At the moment, I am unsure if there is any better approach. Appreciate if anyone can help. Thanks!
Basically, you're facing a problem that's called eventual consistency. Something changes and two or more systems need to be aware at the same time. The problem here is that both changes need to be applied in order to consider the operation successful. If either one fails, you need to know.
In your case, I would use the Azure Service Bus. You can create queues and put messages on a queue. An Azure Function would handle these queue messages. You would create two queues, one for database updates, and one for the in-memory update (I think changing this to a cache service may be something to think off). Now the advantage of these queues is that you can easily drop messages on these queues from anywhere. Because you mentioned the object is going to evolve, you may need to update these objects either in the database or in memory (cache).
Once you've done that, I'd create a topic, with two subscriptions. One forwarding messages to Queue 1, and the other to Queue 2. This will solve your primary problem. In case an object changes, just send it to the topic. Both changes (database and memory) will be executed automagically.
The only problem you have now, it that you mentioned you wanted to update the database in real-time. With this scenario, you're going to have to leave that.
Also, you need to make sure you have proper alerts in place for the queues so in case you did miss a message, or your functions didn't handle it well enough, you'll receive an alert to check & correct errors.
I'm totally agree with #nineedm's and answer, but there are also other solutions.
If you introduce cache, you will always face cache revalidation problem - you have to mark cache as invalid when data were changed. Sometimes it is easy, depending on nature of cached data and how often data are changed.
If you have just single application, MemoryCache can be enough with proper specified expiration options.
If there is a cluster - you have to look at Distributed Cache solutions, for example Redis. There is MS article about that Distributed caching in ASP.NET Core
Has anyone posted a response to this problem? There have been other posts with no answers. Our situation is that we are pushing messages onto a topic that is backing a KTable in the first step of our stream process. We are then pulling a small amount of data from those messages and passing them along. We are doing multiple computations on that smaller amount of data for grouping and aggregation. At the end of the streaming process, we simply want to join back to that original topic via a KTable to pick up the full message content again. The results of the join are only a subset of the data because it can not find the entries in the KTable.
This is just the beginning of the problem. In another case, we are using KTables as indexes for lookups meant to enrich the data coming in. Think of these lookups as identifying whether we have seen a specific pattern in the streaming message before. If we have seen the pattern we want to tag it with an ID (used for grouping) pulled from an existing KTable. If we have not seen the pattern before we would assign it an ID and place it back into the KTable to be used to tag future messages. What we have found is that there is no guaranty that the information will be present in the KTable for future messages. This lack of guaranty seems to make KTables useless. We can not figure out why there is a very little discussion of this on the forums.
Finally, none of this seemed to be a problem when running with a single instance of the streams application. However, as soon as our data got large and we were forced to have 10 instances of the app, everything broke. As well, there is no way that we could use things like GlobalKTables because there is too much data to be loaded into a single machine's memory.
What can we do? We are currently planning to abandon KTables all together and use something like Hazelcast to store the lookup data. Should we just move to Hazelcast Jet and drop Kafka streams all together?
Adding flow:
Kafka data flow
I'm sorry for this non-answer answer, but I don't have enough points to comment...
The behavior you describe is definitely inconsistent with my understanding and experience with streams. If you can share the topology (or a simplified one) that is causing the problem, there might be a simple mistake we can point out.
Once we get more info, I can edit this into a "real" answer...
Thanks!
-John
I have been testing neo4j for graph projects for 1 or 2 month now and it has been really efficient, but I'm having a hard time finding how to solve one of my problem and I'm seeking for advice.
I'm using neo4j to store graph databases and check that they follow some structural requirements, for example, I have a db modeling dependency between items : the nodes are the items and the links are labeled "need" or "incompatible" to model the dependency and I want neo4j to check the coherence of the data.
I coded the checker in a server plugin and it works very well. But now I would like to allow users to connect to the database, modify the data (without saving the modification yet), check that the modifications are not breaking the coherence and then save the modifications.
I found the http endpoint which can keep a transaction open and it completely fits the "modifying the db without saving" need, but I can't find how to run my checker on the modified data : is there a way to run something else than Cypher query with the http endpoint or do I have to consider an other way to solve this ?
I now it would be possible to run my checker using the TransactionEventHandler beforeCommit, but it means the user couldn't know if his data are okay without starting a commit, and the fact that the data are split between the db without modification and the TransactionData which store the modification make the checker tricky to apply.
So, if someone knows how I could solve this, it would be great.
Thank you.
Your options is to use Unmanaged Extension and Transaction Event API.
You are able to handle incoming transaction and read all data which are in it. If transaction break your rules, then you can discard the transaction.
I recommend you to use GraphAware framework for that.
Here is the great article about that http://graphaware.com/neo4j/transactions/2014/07/11/neo4j-transaction-event-api.html
I'm currently using the dataAPI to keep the dataitems synchronized between handheld and wearable.
Still I want to make sure that every data is stored and there is no data lost in the process.
I'm currently reading GPS parameters when the wear is not connected to the handheld and when they connect, they sync the dataitems.
How reliable is DataAPI?
Is my idea of creating a local file doubling my effort?
How can I create a local file on my wear device and then access it?
Syncing data using DataApi is reliable and I recommend using that; if you come across a scenario that sync is not happening reliably, that should be considered a bug and needs to be reported as such. One issue that folks run into is that they create the same data item and they don't get the onDataChanged() callback but that is by design, if the very same data is being added multiple times, there is no change, hence no callback triggers.
Another factor you might want to consider is whether the data you create on one node is for consumption by all other nodes or only a targeted one; DataApi syncs data across all connected nodes so if I create a data item on watch1 and want to sync that with my phone and if there is a watch2 in the picture as well, watch2 also gets the same data.
If you end up using the DataApi, I strongly recommend to make sure to put in place a policy that removes the data once it is synced and consumed otherwise data will be accumulated with no supervision and you'll finally run out of space.
To answer your questions:
I don't know how reliable it effectively is, but we had problems where data updates didn't trigger the appropriate listeners on the watch side. So I'm not sure. Maybe someone has an official statement for this?
I think it depends on the amount of data you want to store. So I suggest you first become clear about the amount and then choose the format. Keep in mind that there is also the possibility to store data in the Shared Preferences.
These guys here tried to save an image on the watch, but that makes no difference wheter it is an image file or text or whatever file.
Here's and example of what I am talking about:
Take Twitter for iOS. Whenever you tweet, the tweet is sent to the database, and then it is also displayed on your device as part of the list of tweets.
How is the list of tweets that you see on your device updated after just sending one tweet? Here are some possible ways that I thought of how it could be done, but what Im asking for is which one is the best method of doing so:
The whole list of recent Tweets is re-downloaded from the remote Twitter server after sending a tweet (I highly doubt this, as this would take a relatively long time, when it really is just appending one Tweet to the array of Tweets displayed)
The local array that holds the Tweet objects is updated separately from the database (For example, it updates the database, and then updates its array with the same data you sent to the database, and never downloads the Tweet you just sent since you don't need to, because you already have it locally, since you composed it)
Is Core Data capable of updating the remote data server AND the array all in one (or relatively few) step(s)? (Sorry, if this is the obvious answer and if it sounds like I didn't look into it, but I did read about Core Data and started a tutorial. Its just that there is so much content that it would take me a whole day or two just to figure out if its appropriate for my application)
Is there an alternative way of managing this?
Also, if its one of the latter two ideas above, are you able to update the table view cells by just updating the local array and reloading the cells from that array without loading your one tweet from the database? I'm just curious about what would be the most efficient way of doing this.
So again, my main question reworded is: how do you keep data that you sent to a remote database and the local data (stored in a mutable array) in sync whenever you do a tiny single update (such as sending a Tweet) without having to reload all of the data from the database (when there is other content [i.e. other Tweets]) already loaded.
(I am aware that no one except Twitter developers know exactly how Twitter actually done, but I'm just using this Twitter functionality as an example. This same concept could be applied to any similar app.)
(Also, this is a conceptual question about dataflow, so I don't need to see any code, but suggestions to use different technologies like Core Data, or just updating an array will be appreciated.)
(I've been looking into this, and all the different ways of doing it, and it is becoming very time consuming, so I figured to ask you guys who have experience. Additionally, this could help someone else who has similar questions.)
(Sorry if it looks like I'm asking a bunch of questions, but I'm basically asking the same question in different ways, and offering possible solutions.)
Any insight is appreciated!
Immutable messages like tweets are actually quite easy to handle -- server side, and in your app.
When you send a tweet from your client to the server, you also update your "main context" (see "Managed Object Context") which in turn sends notifications to your controller (see NSFetchedResultsController which in turns updates your table view according your local model residing in the Managed Object Context.
Updating from the server is just merging the local tweets with the new ones added in the meantime.
Since there is no mutable tweet, synchronization is really no big deal. As mentioned in the comment, if there were mutable tweets (or any kind of messages) the synchronization will become much more complex.
Core Data will NOT automatically update a remote server. But there are solutions to "view" a remote database through Core Data - see NSIncrementalStore and a related third party libraries (AFIncrementalStore).
This is ridiculously trivial. You update your local database and send off the remote update at the same time.
You use the remote response to mark your local record as synched or try updating again later.