Microservices (Application-Level joins) more API calls - leads to more latency? - join

I have 2 Micro Services one for Orders and one for Customers
Exactly like below example
http://microservices.io/patterns/data/database-per-service.html
Which works without any problem.
I can list Customers data and Orders data based on input CustomerId
But now there is new requirement to develop a new screen
Which shows Orders of input Date and show CustomerName beside each Order information
When going to implementation
I can fetch the list of Ordersof input Date
But to show the corresponding CustomerNames based on a list of CustomerIds
I make a multiple API calls to Customer microservice , each call send CustomerId to get CustomerName
Which lead us to more latency
I know above solution is a bad one
So any ideas please?

The point of a microservices architecture is to split your problem domain into (technically, organizationally and semantically) independent parts. Making the "microservices" glorified (apified) tables actually creates more problems than it solves, if it solves any problem at all.
Here are a few things to do first:
List architectural constraints (i.e. the reason for doing microservices). Is it separate scaling ability, organizational problems, making team independent, etc.
List business-relevant boundaries in the problem domain (i.e. parts that theoretically don't need each other to work, or don't require synchronous communication).
With that information, here are a few ways to fix the problem:
Restructure the services based on business boundaries instead of technical ones. This means not using tables or layers or other technical stuff to split functions. Services should be a complete vertical slice of the problem domain.
Or as a work-around create a third system which aggregates data and can create reports.
Or if you find there is actually no reason to keep the microservices approach, just do it in a way you are used to.

New requirement needs data from cross Domain
Below are the ways
Update the customer Id and Name in every call . Issue is latency as
there would be multiple round trips
Have a cache of all CustomerName with ID in Order Service ( I am
assuming there a finite customers ).Issue would be , when to refresh
cache or invalidate cache , For that you may need to expose some
rest call to invalidate fields. For new customers which are not
there in cache go and fetch from DB and update cache for future . )
Use CQRS way in which all the needed data( Orders customers etc ..) goes to a separate table . Now in this schema you can create a composite SQL query . This will remove the round trips etc ...

Related

Using EFFECTIVE_TS and EXPIRATION_TS on FACT tables

I have a requirement to create a Fact table which stores granted_share_qty awarded to employees. There are surrounding Dimensions like SPS Grant_dim which stores info about each grant, SPS Plan Dim which stores info about the Plan, SPS Client Dim which stores info about the Employer and SPS Customer Dim which stores info about the customer. The DimKeys (Surrogate Key) and DurableKeys(Supernatural Keys) from each Dimension is added to the Fact.
Reporting need is "as-of" ie on any given date, one should be able to see the granted_share_qty as of that date (similar to account balance as of that date) along with point-in-time values of few attributes from the Grant,Plan, Client, Customer dimensions.
First, we thought of creating a daily snapshot table where the data is repeated everyday in the fact (unless source sends any changes). However since there could be more than 100 million grant records , repeating this everyday was almost impossible, moreover the granted_share_qty doesnt change that often so why copy this everyday?.
So instead of a daily snapshot we thought of adding an EFFECTIVE_DT and EXPIRATION_DT on the Fact table (like a TIMESPAN PERIODIC SNAPSHOT table if such a thing exists)
This reduces the volume and perfectly satisfies a reporting need like "get me the granted_qty and grant details,client, plan, customer details as of 10/01/2022 " will translate to "select granted_qty from fact where 10/01/2022 between EFFECTIVE_DT and EXPIRATION_DT and Fact.DimKeys=Dim.DimKeys"
The challenge however is to keep the Dim Keys of the Fact in sync with Dim Keys of the Dimensions. Even if the Fact doesn't change, any DimKey changes due to versioning in any of the Dimension need to be tracked and versioned in the Fact. This has become an implementation nightmare
(To worsen the things, the Dims could undergo multiple intraday changes , so these are to be tracked near-real-time :-( )
Any thoughts how to handle such situations will be highly appreciated (Database: Snowflake)
P:S: We could remove the DimKeys from the Fact and use DurableKeys+Date to join between the Facts and Type 2 Dims, but that proposal is not favored/approved as of now
Thanks
Sunil
First, we thought of creating a daily snapshot table where the data is repeated everyday in the fact (unless source sends any changes). However
Stop right there. Whenever you know the right model but think it's un-workable for some reason, try harder. At a minimum test your assumption that it would be "too much data", and consider not materializing the snapshot but leaving it as a view and computing it at query time.
... moreover the granted_share_qty doesnt change that often so why copy this everyday?.
And there's your answer. Use a monthly snapshot instead of a daily snapshot, and you've divided the data by 30.

Event Store DB : temporal queries

regarding to asked question here :
suppose that we have ProductCreated and ProductRenamed events which both contain the title of the product.now we want to query EventStoreDB for all events of type ProductCreated and ProductRenamed with the given title.i want all these events to check whether there is any product in the system which has been created or renamed to the given title, so that i could throw the exception of repetitive title in the domain
i am using MongoDB for creating UI reports from all the published events and everything is fine there.but for checking some invariants, like checking for unique values, i have to either query the event store for some events along with their criteria and by iterating over them, decide whether there is a product created with the same title which has not renamed or a product renamed with the same title.
for such queries, the only way that event store provides is creating a one-time projection with the proper java script code which filters and emits required events to a new stream.and then all i have to do is to fetch events from the new generated stream which is filled by the projection
no the odd thing is, projections are great for subscriptions and generating new streams, but they seem to be odd for doing real time queries.immediately after i create a projection with the HTTP api, i check the new resulting stream for the query result, but it seems that the workers has not got the chance to elaborate on the result and i get 404 response.but after waiting for a bunch of seconds, the new streams pops out and gets filled with the result.
there are too many things wrong with this approach:
first, it seems that if the event store is filled with millions of events across many streams, it wont be able to process and filter all of them immediately to the resulting stream.it does not create the stream immediately, let alone the population.so i have to wait for some time and check for the result hoping the the projection is done
second, i have to fetch multiple times and issue multiple GET HTTP commands which seems to be slow.the new JVM client is not ready yet.
Third, i have to delete the resulting stream after i'm done with the result and failing to do so will leave event store with millions of orphan query result streams
i wish i could pass the java script to some api and get the result page by page like querying MongoDB without worrying about the projection, new streams and timing issues.
i have seen a query section in the Admin UI, but i dont know whats that for, and unfortunetly the documentation doesn't help much
am i expecting the event store to do something that is impossible?
do i have to create a bounded context inner read model for doing such checks?
i am using my events to dehyderate the aggregates and willing to use the same events for such simple queries without acquiring other techniques
I believe it would not be a separate bounded context since the check you want to perform belongs to the same bounded context where your Product aggregate lives. So, the projection that is solely used to prevent duplicate product names would be a part of the same context.
You can indeed use a custom projection to check it but I believe the complexity of such a solution would be higher than having a simple read model in MongoDB.
It is also fine to use an existing projection if you have one to do the check. It might be not what you would otherwise prefer if the aim of the existing projection is to show things in the UI.
For the collection that you could use for duplicates check, you can have the document schema limited to the id only (string), which would be the product title. Since collections are automatically indexed by the id, you won't need any additional indexes to support the duplicate check query. When the product gets renamed, you'd need to delete the document for the old title and add a new one.
Again, you will get a small time window when the duplicate can slip in. It's then up to the business to decide if the concern is real (it's not, most of the time) and what's the consequence of the situation if it happens one day. You'd be able to find a duplicate when projecting events quite easily and decide what to do when it happens.
Practically, when you have such a projection, all it takes is to build a simple domain service bool ProductTitleAlreadyExists.

Gave up DDD, but need some of its benefits

I'm giving up traditional DDD, which is often a massive timewaster, and forces me to do endless mapping: data layer <--> domain layer <--> presentation layer.
For even a small change I must change data models, domain models, presentation models / viewmodels, then the repositories, manager/service classes, and of course the AutoMapper maps, and then test the whole thing! Each call requires calling a layer which calls a layer which calls the underlying code. And I don't get anything in return other than "you might need it in the future". Meh.
My current approach is more pragmatic:
I don't worry about the difference between the "data layer" and "domain layer" any longer, as there's no point - the terms are interchangeable. I let EF do it's thing, and add interfaces and repositories on top when needed.
I've merged my "data" and "domain" projects (into "core", boring name, I know), and I could almost swear that Visual Studio is actually running faster.
I allow EF entities to go up and down the stack, but, I still map them to presentation models / viewmodels as usual.
For simple operations I call repositories directly from controllers, for complex operations I use domain managers/services as usual; the repositories never expose IQueryable.
I define entities/POCOs as partial classes, so I can add domain behavior separately in corresponding partial classes.
The problem: I now use the entities all over the place, so client code can see their navigation properties. And the models are always materialized after they leave a repository, so those navigation properties are often null.
Possible solutions:
1. Live with it. It's ugly but preferable to the problems explained above.
2. For each entity, define an interface which hides the navigation properties; and make client code use the interfaces. But ironically, this means another layer (albeit thin and manageable).
3. What else?
I'm not used to this sort of fast-and-loose programming style, so maybe I'm missing some obvious tricks. Is there anything else I should take into account? I'm sure there are other problems I will encounter soon.
EDIT:
This question is not about DDD. And note that many struggle with a traditional DDD approach -- Seemann appears to arrive at the same conclusion, Rahien speaks about the "Useless Abstraction For The Sake Of Abstraction Anti Pattern", and Evans himself said DDD is only truly useful in 5% of cases. Also see this thread. Some of the comments/answers are predictably about how I'm doing DDD wrong, or how I can tweak my system to do it right. However, I'm not asking about DDD or bashing it for the cases where it is suitable, rather I'd like to know what others are doing in line with the thinking I've described above. It's not as if DDD is a panacea to all design ills, every decade a new process comes out (RUP anyone? XP, Agile, Booch, blah...). DDD is just the shiniest new one, and the most well known and used. But pragmatism should come first as I'm trying to build salable products that ship on time and are easy to maintain. The most useful programming axiom I've learned, by far, is YAGNI. What I want is to change my system to a sort of "DDD-lite", where I get it's strong design/OOP/pattern philosophy, but without the fat.
A typical persistence approach with DDD is to map the domain model directly to corresponding tables. Technically, the mappings are still there (and are usually declared in code), but there is no explicit data model, as pointed out by lazyberezovsky.
The problem with navigation properties can be resolved in a few different ways, regardless of whether you are employing DDD or not. I dislike approach 1 because it makes it more difficult to reason about your code - you never know which properties will be set and which won't. Approach 2 is much better in theory, because it makes it very explicit what that a given query requires and making things explicit is a good practice in general. A similar, but simpler and less brittle approach is to use read-models, which are just objects designed to fulfill requirements of a given query of set of queries. Within the context of DDD, they allow you to decouple behavior rich entities from queries, which are quite often at odds. Now proponents of DRY may scream heresy and come at you with torches and pitchforks, but in practice it is often much easier to maintain a read-model and an entity then to try to coerce entities to fulfill query requirements by way of interfaces or complex mapping strategies. Additionally, the responsibilities of a read-model and a behavior model are quite different, therefore DRY isn't applicable.
This is not to say that DDD is applicable in your scenario. It is often a wise decision to avoid full fledged DDD, especially in scenarios that are mostly CRUD. You are correct to be cautious, a good example of KISS and YAGNI. DDD reaps benefits when your domain consists of complex behavior, not just data. At any rate, the read-model pattern applies.
UPDATE
For implementations that don't employ a read-model, take a look at Fetching Strategy Design where the notion of a fetching strategy allows the specification of exactly what is needed from the database which mitigates issues with navigational properties. The material referenced in the linked post is also of interest. Overall, this attempts to avoid the a layer of indirection present in other approaches. However, in my opinion, using the proposed fetching strategy is more complex than using a read-model while the net result is the same.
Some thoughts about this point:
... the repositories never expose IQueryable ... the models are always
materialized after they leave a repository ...
Your question is tagged with "asp.net-mvc", so you have a web application in mind. 90% or more of all requests will be GET requests that are supposed to fetch some data from the database and show those data in a web view. How often are those needed data really entities rather than only bags of properties (a selection of properties of an entity type or perhaps composed of properties from multiple entities)?
Say, your application has 100 views. Only a minority of these will show complete entities:
50 of them are list views that show selected data (a customer with ID and address, but without the customer's contact person, phone number and sales volume)
20 of them contain autocomplete text boxes to select a reference (the customer for an order, but only the customer's name and city is shown in the autocomplete list, not the rest of the address nor contact person, phone number and sales volume and only the first 5 hits are displayed)
1 is an edit view for a customer that shows everything, but not the sales volume
1 is a details view for a customer with his last five orders
1 is a details view for an order including order items including product of each item but without the product's supplier name
1 is the same view but specialized for the purchasing department that wants to see the supplier for each item and item's product with average supplier's lead time for the last three months.
1 is a view for the service department that shows the order with only the order items of product category "repair service"
1 view for the Human Resources department shows employees including a photo stored as a big blob
1 view for personnel planning department shows a short version of the employee without photo
etc., etc.
As a UI programmer I would have all kinds of data requirements to render a view with the examples above:
I need only a selection of properties
I need even different selections of the same entity's properties for different views
I need an order including all items but without a reference to a product
I need an order including all items (but not all properties of the items) and including a reference to a product and to a supplier (but not all supplier's properties)
I need an order including only a filtered list of order items
I need a customer including the last five orders, not all 3000 orders he ever had
I need an employee but please without the big blob image
etc., etc.
How to fulfill these requirements as a data access/repository/service developer?
I only provide a handful of methods and materialize entities: load order header, load order header with items, load order header with items and product, load order header with items and product and supplier, load customer header (throw 15 of the 20 properties away, dear UI developer, if you only need five properties), load customer header with all 3000 orders (throw 2995 away, dear UI developer, if you only need five), etc., etc. I return interfaces from the repositories that hide not loaded navigation properties.
I care about every detail that the UI needs: I create repository/service methods like GetFiveCustomerPropertiesForAutoComplete, GetCustomerWithLastFiveOrders, etc. etc. I return interfaces from the repositories that hide the properties (also scalar) I haven't loaded. Or I return "DTOs" that contain the requested properties. I change the repository/services and create new DTOs every day when a UI developer calls with a data requirement for the next view.
I return IQueryable<TEntity> from the repositories and tell the UI developer "create the LINQ query yourself to fetch the data you need for your views". (Next morning the DBA is complaining about hundreds of terrible performing database queries.)
I return "prepared" IQueryable<TEntity>s from the repositories/services that cover - for example - security concerns like applying Where clauses for the user's access rights or append a Where clause for a search term or apply a NoTracking option to the query. I tell the UI developer: "You are allowed to extend the query with a) projections (Select), b) paging (Take and Skip) and perhaps c) sorting (OrderBy) because I consider those three query parts as UI concerns. All other query requirements (filtering, joining, grouping, etc.) have to be implemented in the repository/service layer and are forbidden in the UI layer." The most important piece here are projections that materialize ViewModels directly through the LINQ/SQL query without intermediate mapping layer and without the overhead to load more than the needed columns/properties.
These are only some thoughts. Every approach has its benefits and downsides. Working in small teams where at least one or a few developers have an overview what is happening in both the repository/service and the UI/"projection" layer the last option works fine for me in my experience although it doesn't always work with the strict rules decribed (for example, the filter by product category for included order items of an order requires to apply a Where clause inside of the projection, i.e. in the UI layer). For POST requests and data modifications I would use DTOs that send to data collected from a view back to a service to be processed there.
For stricter separation of "query layer" and UI layer I would probably prefer something close to the second option, maybe not with an interface/DTO for every UI requirement, but somehow reduced to a set of DTOs for the most common requirements (with the price of a little overhead of sometimes unnecessarily loaded properties). However, I expect that to be more work than the last option due to the larger amount of necessary repository/service methods, the additional maintenance of (perhaps many) DTOs and the intermediate mapping between DTOs and ViewModels.
Personally I am concerned about materializing full entities, especially complex object graphs, when I don't need them 90% of the time. But my concern is not verified by extensive performance measurements proving that this approach is really a problem for a "normal" application that doesn't have special high performance needs.
How can anyone give you sound advice when we have no clue as to what it is you are building? In the grand scheme of things, you might be building the wrong solution (not saying you are). So do realize all we can relate to is technical design issues and similar past experiences.
Many people face your problem, indeed. The mapping is loose coupling tax in the land of static typing. Maybe a more dynamic language could solve some of your pain. Or maybe you might find virtue in automating more (DSL, MDA). You could also switch to client server instead.
Interfaces are not layers, rather abstractions. Use them wisely.
Personally, I'd never take these shortcuts. Been bitten too many times trying to skip steps. Logic starts popping up in odd places. If I have a data driven app to develop simple datasets come to mind, EF as well. But I don't call the objects aggregate or entity in the DDD sense, just entity in the ERD sense. Transactionscript might be a better fit than doing the partial method sprinkeling. As for read model objects, these are not layers of indirection.
Overall, I get the feeling, and it is just that, you're making a mess of things because you fight the mapping friction by taking on a dependency on objects that don't reveal the required shape (navigation properties that are null) thereby causing problems in a different area.
I'll just try to be short - we went for the method 2 - ie, add layer of interfaces that you use on the client. You can have EF generate them for you, just a little tweak of the .tt templates.
Yes, it creates (yet) another layer, but it's logic-free and adds no complexity. Of course, if your client needs to deserialize entities, you have to add (yet) another layer that will handle deserialization and reference both the entities definitions and the interfaces that he'll return to the client. But it's also thin, so we learned to live with it, because it turned out to work just fine, and the client really stays clean...
The problem: I now use the entities all over the place, so client code
can see their navigation properties.
I don't quite get why this is a problem and how it's related to EF entities in particular. By client code do you mean presentation layer code or any code consuming your entities ?
For UI code a simple solution is to define ViewModels that just don't expose these navigation properties (or only expose a few of them depending on the object graph depth your GUIs need).
For other code it's only normal to be able to see the navigation properties of entities. They are public for a reason. You can end up breaking the Law of Demeter if you abuse them, but it's a matter of developer discipline not to fall into that trap.
An entity contains its own contract - all code that has access to the entity is supposed to be able to use any part of this contract. If you feel like your entities are exposing too much and that you need to put interfaces on top of them to restrain access to certain parts, maybe it's just a different entity.
I don't worry about the difference between the "data layer" and "domain layer" any longer, as there's no point - the terms are
interchangeable. I let EF do it's thing, and add interfaces and
repositories on top when needed.
I've merged my "data" and "domain" projects (into "core", boring name, I know), and I could almost swear that Visual Studio is
actually running faster.
I allow EF entities to go up and down the stack, but, I still map them to presentation models / viewmodels as usual.
For simple operations I call repositories directly from controllers, for complex operations I use domain managers/services as
usual; the repositories never expose IQueryable.
I define entities/POCOs as partial classes, so I can add domain behavior separately in corresponding partial classes.
None of these things seems to be fundamentally anti-DDD to me, except data/domain separation.
Especially if you do database-first EF -DDD is clearly a domain-centric approach and you shouldn't define your tables before defining your entities. It's also not clear whether some of your domain entities talk to the database or EF directly (not DDD - and more generally, layered-architecture - compliant) or you systematically have data access objects in between (DDD compliant).

Building a (simple) twitter-clone with CouchDB

I'm trying to build a (simple) twitter-clone which uses CouchDB as Database-Backend.
Because of its reduced feature set, I'm almost finished with coding, but there's one thing left I can't solve with CouchDB - the per user timeline.
As with twitter, the per user timeline should show the tweets of all people I'm following, in a chronological order. With SQL it's a quite simple Select-Statement, but I don't know how to reproduce this with CouchDBs Map/Reduce.
Here's the SQL-Statement I would use with an RDBMS:
SELECT * FROM tweets WHERE user_id IN [1,5,20,33,...] ORDER BY created_at DESC;
CouchDB schema details
user-schema:
{
_id:xxxxxxx,
_rev:yyyyyy,
"type":"user",
"user_id":1,
"username":"john",
...
}
tweet-schema:
{
"_id":"xxxx",
"_rev":"yyyy",
"type":"tweet",
"text":"Sample Text",
"user_id":1,
...
"created_at":"2011-10-17 10:21:36 +000"
}
With view collations it's quite simple to query CouchDB for a list of "all tweets with user_id = 1 ordered chronologically".
But how do I retrieve a list of "all tweets which belongs to the users with the ID 1,2,3,... ordered chronologically"? Do I need another schema for my application?
The best way of doing this would be to save the created_at as a timestamp and then create a view, and map all tweets to the user_id:
function(doc){
if(doc.type == 'tweet'){
emit(doc.user_id, doc);
}
}
Then query the view with the user id's as keys, and in your application sort them however you want(most have a sort method for arrays).
Edited one last time - Was trying to make it all in couchDB... see revisions :)
Is that a CouchDB-only app? Or do you use something in between for additional buisness logic. In the latter case, you could achieve this by running multiple queries.
This might include merging different views. Another approach would be to add a list of "private readers" for each tweet. It allows user-specific (partial) views, but also introduces the complexity of adding the list of readers for each new tweet, or even updating the list in case of new followers or unfollow operations.
It's important to think of possible operations and their frequencies. So when you're mostly generating lists of tweets, it's better to shift the complexity into the way how to integrate the reader information into your documents (i.e. integrating the readers into your tweet doc) and then easily build efficient view indices.
If you have many changes to your data, it's better to design your database not to update too many existing documents at the same time. Instead, try to add data by adding new documents and aggregate via complex views.
But you have shown an edge case where the simple (1-dimensional) list-based index is not enough. You'd actually need secondary indices to filter by time and user-ids (given that fact that you also need partial ranges for both). But this not possible in CouchDB, so you need to work around by shifting "query" data into your docs and use them when building the view.

Caching Consistency vs. Static Object with nHibernate/ASP.NET

I am a complete newbie to both caching, nhibernate, and everything involving the two, so this question may be excessively stupid.
I have certain instances of objects that are used by multiple other objects in my system. So for instance..
class Template {
// lots of data
}
class One {
IList<Template> Templates { get; set; }
}
class Two {
IList<Template> Templates { get; set; }
}
class Three {
IList<Template> Templates { get; set; }
}
Now, then, certain instances of Template are going to be used very, very frequently. (think like, every 20 seconds) and it includes a lot of things that need to be mathematically computed.
My question is basically which approach will yield the least stress on my database/server.
Am I best to just leave everything to Level 2 Caching in nHibernate? Or am I wiser to retrieve the Template object and store it in a static variable when my ASP.NET application starts up, and refresh this variable if it changes?
I've looked at some of the other similar questions around SO but I am still very much in the dark. Most of the documentation on caching assumes a good deal of knowledge on the subject, so I'm having a difficult time discerning what the optimal process is.
once every 20 second doesn't really sound very stressful. You need to weight the need for updated data vs the stress you can live with on your database.
2nd level cache won't necessarily help you in this case, since you use collections of objects. In order to know which object it needs, it still need to query the database, and if you do that it might even fetch the data anyway (unless it's a lot of raw data in the entities).
You basically have three different options:
1st level cache
For each connection/session that you make, NHibernate will always cache the unique entity that it has fetched. Every time you try to get a single entitity based in it's identifier (primary key), it will first check it's first level cache. This does not apply to collections of entities though, unless you can force NHibernate to only get "identifiers" for the collection and the get them one by one (usually very slow)
2nd level cache
This cache will available for each and every connection/session, and try to fetch the data from cache before it hits the database. Same rules apply as for the 1st level cache, that you can't get collections to an entity without querying the database unless it has already been loaded.
custom cache
You can always take care of caching your self, however, that way you need to model your classes accordingly (having Template objects stored, and the collections only keep track of the identifier instead of Template objects). If you refactor like this, 2nd and 1st level cache would still be equally useful though.
I will give you an example that shows you what I'm talking about:
if One contains templates with identifier [1,2,3,4]
Two contains templates with identifier [2,3]
Three contains templates with identifier [3,4,5]
In order for NHibernate to know that One needs templates 1,2,3,4, it needs to query the database. 1,2,3,4 will be cached individually here.
In order to actually know that Two needs entity 2 and 3, it still needs to query the database. It can't possibly know that 2,3 is also part of the collection in Two. Si it won't fetch them from cache, because it will select Template objects that belongs to class Two, hence full data. That is why caching won't help you here.
I think you need to give more details on what kind of data it is that you will be processing, and how it will be stored and updated in order to get an answer that is useful.
Static variables would be the less stress on your server, however that imposes some restrictions, specifically, it would be much harder to scale (web garden/farm), if you don't need to scale, that's the option you're looking for

Resources