How to implement OData federation for Application integration - odata

I have to integrate various legacy applications with some newly introduced parts that are silos of information and have been built at different times with varying architectures. At times these applications may need to get data from other system if it exists and display it to the user within their own screens based on the business needs.
I was looking to see if its possible to implement a generic federation engine that kind of abstracts the aggregation of the data from various other OData endpoints and have a single version of truth.
An simplistic example could be as below.
I am not really looking to do an ETL here as that may introduce some data related side effects in terms of staleness etc.
Can some one share some ideas as to how this can be achieved or point me to any article on the net that shows such a concept.
Regards
Kiran

Officially, the answer is to use either the reflection provider or a custom provider.
Support for multiple data sources (odata)
Allow me to expose entities from multiple sources
To decide between the two approaches, take a look at this article.
If you decide that you need to build a custom provider, the referenced article also contains links to a series of other articles that will help you through the learning process.
Your project seems non-trivial, so in addition I recommend looking at other resources like the WCF Data Services Toolkit to help you along.
By the way, from an architecture standpoint, I believe your idea is sound. Yes, you may have some domain logic behind OData endpoints, but I've always believed this logic should be thin as OData is primarily used as part of data access layers, much like SQL (as opposed to service layers which encapsulate more behavior in the traditional sense). Even if that thin logic requires your aggregator to get a little smart, it's likely that you'll always be able to get away with it using a custom provider.
That being said, if the aggregator itself encapsulates a lot of behavior (as opposed to simply aggregating and re-exposing raw data), you should consider using another protocol that is less data-oriented (but keep using the OData backends in that service). Since domain logic is normally heavily specific, there's very rarely a one-size-fits-all type of protocol, so you'd naturally have to design it yourself.
However, if the aggregated data is exposed mostly as-is or with essentially structural changes (little to no behavior besides assembling the raw data), I think using OData again for that central component is very appropriate.
Obviously, and as you can see in the comments to your question, not everybody would agree with all of this -- so as always, take it with a grain of salt.

Related

Zanzibar doubts about Tuple + Check Api. (authzed/spicedb)

We currently have a home grown authz system in production that uses opa/rego policy engine as core for decision making(close to what netflix done). We been looking at Zanzibar rebac model to replace our opa/policy based decision engine, and AuthZed got our attention. Further looking at AuthZed, we like the idea of defining a schema of "resource + subject" types and their relations (like OOP model). We like the simplicity of using a social-graph between resource & subject to answer questions. But the more we dig-in and think about real usage patterns, we get more questions and missing clarity in some aspects. I put down those thoughts below, hope it's not confusing...
[Doubts/Questions]
[tuple-data] resource data/metadata must be continuously added into the authz-system in the form of tuple data.
e.g. doc{org,owner} must be added as tuple to populate the relation in the decision-graph. assume, i'm a CMS system, am i expected to insert(or update) the authz-engine(tuple) for for every single doc created in my cms system for lifetime?.
resource-owning applications are kept in hook(responsible) for continuous keep-it-current updates.
how about old/stale relation-data(tuples) - authz-engine don't know they are stale or not...app's burnded to tidy it?.
[check-api] - autzh check is answered by graph walking mechanism - [resource--to-->subject] traverse path.
these is no dynamic mixture/nature in decision making - like rego-rule-script to decide based on json payload.
how to do dynamic decision based on json payload?
You're correct about the application being responsible for the authorization data it "owns". If you intend to have a unique role/relationship for each document in your system, then you do need to write/delete those relationships as the referenced resources (or the roles on them, more likely) change, but if you are using an RBAC-like design for your schema, you'd have to apply these role changes anyway; you'd just apply them to SpiceDB, instead of to your database. Likewise, if you have a relationship between say, a document and its parent organization, you do have to write/delete those as well, but that should only occur when the document is created or deleted.
In practice, unless you intend to keep the relationships in both your database and in SpiceDB (which some users do), you'll generally only have to write them to one or the other. If you do intend to apply them to both, you can either just perform the updates to both at the same time, or use an outbox-like pattern to synchronize behind the scenes.
Having to be proactive in your applications about storing data in a centralized system is necessary for data consistency. The alternative is federated systems that reach into other services. Federated systems come with the trade-offs of being eventually consistent and can also suffer from priority inversion. I presented on the centralized vs federate trade-offs in a bit of depth and other design aspects of authorization systems in my presentation on the cloud native authorization landscape.
Caveats are a new feature in SpiceDB that enable dynamic policy to be enforced on the relationship graph. Caveats are defined using Google's Common Expression Language, which a language used for policy in other cloud-native projects like Kubernetes. You can also use caveats to make relationships that eventually expire, if you want to take some of book-keeping out of your app code.

Comparison of OData and Semantic Web/Linked Data

I'm trying to get my head around two very different approaches to data sharing: OData and Semantic Web/Linked Data. Is there a good comparison of the two?
As I understand it, OData combines syndication/CRUD (AtomPub), serialisation formats (XML, JSON), a data model, a query language, and some semantics/conventions governing use of those existing technologies. It's primarily intended for exposing data from one system so that others can consume it.
Linked Data is a data model, a rigorous commitment to URIs, an (optional?) serialisation format (RDF/XML), but (correct me if I'm wrong) doesn't say anything about transport, CRUD, etc. It seems intended to allow inferencing across lots of little chunks of data drawn from a wide variety of sources. (Not something of major importance to us right now - we would be synchronising large slabs of data between a small number of sources, and wanting to preserve provenance information).
I'm interested in technologies for sharing data between certain data management platforms, some of which I work on directly. OData seems more appealing as it's very straightforward to explain to developers: implement this API, follow that Atom standard, serialise the data like this. We're already doing something very similar for one platform: sharing XML-serialised data on an Atom feed, with URL parameters used to filter.
By contrast, my past experiences working with RDF have given me an impression of brittle, opaque (massive slabs of RDF/XML), inaccessible (using SPARQL vs SQL) technology - but perhaps I'm confusing the experience of working with a triplestore like Jena with simply exposing an existing database via a linked data API.
Any pointers, comments etc on the differences and similarities between these two approaches in terms of scope, technologies, ease, future potential etc would be great.
I think discussing this in depth is not really what Stackoverflow is meant for, but just to give you some pointers to interesting discussions about differences and overlap:
Oh - it is data on the Web
Microsoft, OData and RDF
One of the key differences seems to be that OData has no means to link data from different sources to each other. Essentially, you're still stuck in a silo.
It might also be interesting to check out various attempts to convert data between the two approaches. See a.o. http://answers.semanticweb.com/questions/1298/has-anyone-written-a-mapping-from-odata-to-rdf .
OData may be easier, but its not better, by any means. SPARQL and RDF (forget RDF/XML, better to look at Turtle) satisfies everything in OData along with providing many more cutting edge features such as:
Federation Extensions
Linked Data
Reasoning and Inference (for the more brave)
Equally, the software supporting the standards is actually quite sophisticated. Most people interested in OData generally come from a Microsoft background, so take a look at dotNetRdf
Here's a comparison matrix:
http://uoccou.wordpress.com/2011/02/17/linked-data-odata-gdata-datarss-comparison-matrix/
Unfortunately the table formatting is pretty horrible, but the content is useful.

Arguments of using WCF/OData as access layer instead of EF/L2S/nHibernate directly

We develop mostly low traffic but highly specialized web applications. Normally we use L2S, EF or nHibernate as access layer and then throws Asp.Net MVC to it and in which for normal crud operations we query the ISession/DataContext directly but for more advanced functions/side effects we put it in a some kind of service layer.
Now, i was think about publishing the data through OData (WCF Data Service) and query that from the controllers (or even from jQuery when the a good template engine shows up) and publish the service operations through a WCF service (or as custom methods on the WCF Data Service?). What advantages/disadvantages does this architecture poses?
Do I gain something except higher complexity and latency? Better separations of concerns (or is it just a illusion)?
Edit:
Can it be a good idea to create a complete ajax driven solution with eg. WCF RIA Services? Or do one loose too much flexibility? Feels like you can completely dispatch your views from your logic then, heck, one should be able to just write pure HTML, not even a asp.net MVC should be needed? but i guess there's a lot of new problems arising?
Don't Do it. Sorry, but this is a stupid over-engineered approach. You are IN ONE PROCESS and you insist on running a network connection AND coding all passing data into XML and back out, plus running it over a HTTP connection with limited query semantics? Don't tell anyone you even tried.
Separation of concern is an illusion here - you replace a highly optimized domain model with a simplified data layer.
THAT SAID: I love OData - great. But it is not an in program technology, it is a FRONT END technology, like ASP.NET MVC - just not for the end user, but for ANOTHER program to integrate into your data. It should be used in similar scenarios, and when exposing data over trust borders (Silverlight - for example - is a trust border as the requests can be faked).
It is NOT optimized to replace in process high end application run-time layers like NHibernate.
As TomTom mentions, you don't want to pay the cost of loopback for OData when within a process. If you have direct line-of-sight to your database and it's your own application's database, then there is no reason to put WCF Data Services in the middle. I would continue to use one of the other options you mentioned (L2S, EF, nHibernate).
Now, if you need to expose data over your http endpoint for other applications to consume, or even for your own application if you have some jQuery code in the client that needs to access data from the server, then definitely an OData endpoint may help and WCF Data Services is the simplest way to create one.
TomTom has a lot of votes and although he's not wrong, he's also not right, in spite of his persuasive tone.
In this particularly instance, the OP appears to be writing an intranet LOB style app that probably only stands to be impeded by an OData service mimicking the underlying database, but what if he were not mimicking the underlying database?
If he were building an application based on various or unknown future data sources, then the services layer can unify, re-present, simplify and aggregate those services, even if a large proportion of queries eventually back to a SQL Server in the next room.
Similarly, if you're building an application of massive scale, and by scale I mean millions of users expecting to wait a few seconds between actions, not millions of FX trades an hour, then placing a services layer between your application the data is a common pattern. The scalability of the internet is based on many small stateless HTTP servers and the caching infrastructure in between.
In real life, the same queries are run countless times, people refresh pages or click the same link over and over. No one really asks for 10m rows, because not many humans can look at it in one go. So working in small pages keeps the data flowing and requests interleaving. You also have the opportunity to introduce a shared in RAM cache in the services layer, or even a RAM database.
You may even find that you need to shard your database or partition it between SQL and a key/value store. You can then do the joins in the middle tier, scaled out, and offload the joining and compute-intensive stuff away from the database server.
The rule with internet scale is that the database is your hot spot and you need to do everything you can to prevent anyone talking to it! Be that local HTTP cache in an iPad, in your ISPs proxy, in the IIS output cache, or in a Redis cache, all those layers are helping to spread the load, ease the burden.
So if Carl came to interview with me and told me he'd considered putting an OData layer before his SQL boxes, I'd be interested to hear his reasoning.
WCF Data Services and OData support JSON, so you can minimize the payload by leveraging that. Plus, with WCF Data Services you can completely control your data access. You don't have to roll Entity Framework. You can customize everything. The benefit is that the protocol structure is completely handled for you by using WCF Data Services and OData. And consuming the service from MVC is an Add Service Reference away. WCF Data Services runs on WCF so you have the ability to do other web services beyond just OData type delivery, so it is extremely flexible.
There are limitations here and there that come with the nature of OData as well as the way WCF Data Services handles OData, but they are fairly specific and if they arise in your architecture there are ways around them.
If you solution is isolated to a single web application, then having the data layer embedded in that application works well. But if you have any need whatsoever to have another app or process hit the data layer or shared business logic then exploring the option of putting your data layer in a WCF Data Service is well worth it. For example, you could write a PowerShell script to call a web service method in 2 lines of code. So if you have domain logic that you want to be able to run from your web app and from a command line or scheduled task then your WCF Data Service layer could handle that scenario for all without having to duplicate logic or code.
Many ways to skin a cat. I have used both approaches in business applications and would not say that one or the other should be avoided. They both work well and provide plenty of value without being detrimental.
To be fair, there are benefits to this approach that may outweigh the performance concerns, which are admittedly tremendous. An application built this way will have orders of magnitude more latency and may cost several times more in compute resources to execute than an in-process solution.
That having been said, in development scenarios where human resources are limited, this may work better. It allows for contractors to be quickly hired on to write new screens or whole new applications very quickly in whatever language suits them. Developers can get up-to-speed faster than a proprietary homegrown solution. No more sa passwords in config files, injection of a custom security layer if required, unified logging and auditing, combining several data stores into one consistent resource. If you have a heterogenous platform, you don't need to write SDKs, they have already been written in many important languages. oData works very well with MS Excel, which is a huge win at many organizations. Depending on your network topology, it might be cheaper and even faster to route out over the internet than to use a leased line if you're in a remote office, or behind a firewall (at a client site doing a demo, for instance).
For large datasets, the overhead of the request and packaging becomes less important. For reporting scenarios, for instance. While I have never designed something like this, I can see where it might be useful, depending on your corporate culture and available resources, to consume oData endpoints internally.

object served for communcation between business layer and presentation layer

This is a general question about design. What's the best way to communicate between your business layer and presentation layer? We currently have a object that get pass into our business layer and the services reads the info from the object and sets the result into the object. When the service are finish, we'll have a object populated with result from business layer and then the UI can display according to the result of the object.
Is this the best approach? What other approach are out there?
Domain Driven Design books (the quickly version is freely avaible here) can give you insights into this.
In a nutshell, they suggest the following approach: the model objects transverse from model tier to view tier seamlessly (this can be tricky if you are using static typed languages or different languages on clinet/server, but it is trivial on dynamic ones). Also, services should only be used to perform action that do not belong to the model objects themselves (or when you have an action that involves lots of model objects).
Also, business logic should be put into model tier (entities, services, values objects), in order to prevent the famous anemic domain model anti pattern.
This is another approach. If it suits you, it depends a lot on the team, how much was code written, how much test coverage you have, how long the project is, if your team is agile or not, and so on. Domain Driven Design quickly discusses it even further, and any decision would be far less risky if you at least skim over it first (getting the original book from Eric Evans will help if you choose to delve further).
We use a listener pattern, and have events in the business layer send information to the presentation layer.
It depends on your architecture.
Some people structure their code all in the same exe or dll and follow a standard n-tier architecture.
Others might split it out so that their services are all web services instead of just standard classes. The benefit to this is re-usable business logic installed in one place within your physical infrastructure. So single changes apply accross all applications.
Software as a service and cloud computing are becoming the platform where things are moving towards. Amazons Elastic cloud, Microsofts Azure and other cloud providers are all offering numerous services which may affect your decisions for architecture.
One I'm about to use is
Silverlight UI
WCF Services - business logic here
NHibernate data access
Sql Server Database
We're only going to allow the layers of the application to talk via interfaces so that we can progress upto Azure cloud services once it becomes more mature.

ASP.Net MVC with web service as model?

Does anyone have advice or tips on using a web service as the model in an ASP.Net MVC application? I haven't seen anyone writing about doing this. I'd like to build an MVC app, but not tie it to using a specific database, nor limit the database to the single MVC app. I feel a web service (RESTful, most likely ADO.Net Data Services) is the way to go.
How likely, or useful, is it for your MVC app to be decoupled from your database? How often have you seen, in your application lifetime, a change from SQL Server to Oracle? From the last 10 years of projects I've delivered, it's never happened.
Architectures are like onions, they have layers of abstractions above things they depend on. And if you're going to use an RDBMS for storage, that's at the core of your architecture. Abstracting yourself from the DB so you can swap it around is very much a fallacy.
Now you can decouple your database access from your domain, and the repository pattern is one of the ways to do that. Most mature solutions use an ORM these days, so you may want to have a look at NHibernate if you want a mature technology, or ActiveRecord / linq2sql for a simpler active record pattern on top of your data.
Now that you have your data strategy in place, you have a domain of some sort. When you expose data to your client, you can choose to do so through an MVC pattern, where you'll usually send DTOs generated from your domain for rendering, or you can decide to leverage an architecture style like REST to provide more loosely coupled systems, by providing links and custom representations.
You go from tight coupling to looser coupling as you go towards the external layers of your solution.
If your question however was to build an MVC app on top of a REST architecture or web services, and use that as a model... Why bother? If you're going to have a domain model, why not reuse it in your system and your services where it makes sense?
Generating a UI from an MVC app and generating documents needed for a RESTful architecture are two completely different contexts, basing one on top of each other is just going to cause much more pain than needed. And you're sacrificing performance.
Depends on your exact scenario, but remote XML-based service as the model in MVC, from experience, not a good idea, it's probably over-engineering and disregarding the need for a domain to start with.
Edit 2010-11-27; clarified my thoughts, which was really needed.
A web service exposes functionality across different types of applications, not for abstraction in one single application, most often. You are probably thinking more of a way of encapsulating commands and reads in a way that doesn't interfere with your controller/view programming.
Use a service from a service bus if you're after the decoupling and do an async pattern in your async pages. You can see Rhino.ServiceBus, nServiceBus and MassTransit for .Net native implementations and RabbitMQ for something different http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/.
Edit: I've had some time to try rabbit out in a way that pushed messages to my service which in turn pushed updates to the book keeping app. RabbitMQ is a message broker, aka a MOM (message oriented middle-ware) and you could use it to send messages to your application server.
You can also simply provide service interfaces. Read Eric Evan's Domain Driven Design for a more detailed description.
REST-ful service interfaces deal a lot with data, and more specifically with addressable resources. It can greatly simplify your programming model and allows great control over output through the HTTP protocol. WCF's upcoming programming model uses true rest as defined in the original thesis, where each document should to some extent provide URIs for continued navigation. Have a look at this.
(In my first version of this post, I lamented REST for being 'slow', whatever that means) REST-based APIs are also pretty much what CouchDB and Riak uses.
ADO.Net is rather crap (!) [N+1 problems with lazy collection because of code-to-implementation, data-access leakage - you always need your db context where your query code is etc] in comparison to for example LightSpeed (commercial) or NHibernate. Spring.Net also allows you to wrap service interfaces in their contain with a web service facade, but (without having browsed it for a while) I think it's a bit too xmly in its configuration.
Edit 1: With ADO.Net here I mean the default "best practice" with DataSets, DataAdapter and iterating lots of rows from a DataReader; it breeds rather ugly and hard-to-debug code. The N+1 stuff, yes, that is about the entity framework.
(Edit 2: EntityFramework doesn't impress me either!)
Edit 1: Create your domain layer in a separate assembly [aka. Core] and provide all domain and application services there, then import this assembly from your specific MVC application. Wrap data access in some DAO/Repository, through an interface in your core assembly, which your Data assembly then references and implements. Wire up interface and implementation with IoC. You can even program something for dynamic service discovery with the above mentioned service buses, to solve for the interfaces. WCF uses interfaces like this and so do most of the above service busses; you can provide a subcomponentresolver in your IoC container to do this automatically.
Edit 2:
A great combo for the above would be CQRS+EventSourcing+ReactiveExtensions. Your write-model would take commands, your domain model would decide whether to accept them, it would push events to the reactive-extensions pipeline, perhaps also over RabbitMQ, which your read-model would consume.
Update 2010-01-02 (edit 1)
The jest of my idea has been codified by something called MindTouch Dream. They have made a screencast where they treat almost all parts of a web application as a (web)-service, which also is exposed with REST.
They have created a highly parallel framework using co-routines to handle this, including their own elastic thread pool.
To all the nay-sayers in this question, in ur face :p! Listen to this screen-cast, especially at 12 minutes.
The actual framework is here.
If you are into this sort of programming, have a look at how monads work and their implementations in C#. You can also read up on CoRoutines.
Happy new year!
Update 2010-11-27 (edit 2)
It turned out CoRoutines got productized with the task parallel library from Microsoft. Your Task now implement the same features, as it implements IAsyncResult. Caliburn is a cool framework that uses them.
Reactive Extensions took the monad comprehensions to the next level of asynchronocity.
The ALT.Net world seems to be moving in the direction I talked about when I wrote this answer the first time, albeit with new types of architectures I knew little of.
You should define your models in a data access agnostic way, e.g. using Repository pattern. Then you can create concrete implementations backed by specific data access technologies (Web Service, SQL, etc).
It really depends on the size of this mvc project. I would say keep the UI and Domain in same running environment if the website is going to be used by a small number of users ( < 5000).
On the other side, if you are planning on a site that is going to be accessed by millions, you have to think distributed and that means you need to build your website in a way that it can scale up/out. That means you might need to use extra servers (Web, application and database).
For this to work nicely, you need to decouple your mvc UI site from the application. The application layer would usually contain your domain model and might be exposed through WCF or a service bus. I would prefer a Service Bus because it is more reliable and might use persistent queues like msmq.
I hope this helps

Resources