I have been working on a project which collects data from various third party data sources and mines into our data stores (DI). We have been using Pentaho for this.
I want to know if this can also be done with ESB (Camel or Mule) ?
And what other features does ESB brings which DI do not offers ?
I have read lots of articles on both ESB and DI but none of them were able to resolve this query. I have also read about mule data connectors for third party data sources.
DI (Data Integration not 'dependency-injection') or ETL approaches tend to be long running batch-style jobs to approach the solution of moving data from System A to System B. The ESB or lightweight integration approach is generally to break up the task into smaller pieces (blocks of data, or single event per data item) and allow for other systems to subscribe to the data stream-- generally over an Enterprise Messaging System-- without having to impact System A, System B or the existing code project. This also means that there is no human dependency requirement in the project plan. If System C comes along, they do not necessarily require resources from the System B team to access the data stream
There are suitable use cases to have both in any given environment. However, in my experience (Big Data/MDM best practices tend to agree) is that if you have an originating stream of data, some other system will want to access the data stream at some point as well. If the ability to access the data stream without having to change existing code, systems or other teams within your organization sounds useful in your use case, than it would be a good idea to design for that up front and go with the ESB approach. This allows new interested consumers to come in and not have to rewrite the process used by the existing systems. ESB/Lightweight integration systems tend to allow that design pattern more efficiently than DI/ETL tools.
Some random thoughts:
ESB's support that "one bad record problem" by allowing you to route that to an error queue to have a human look at it and then republish
ETL/DI tend to have a straight-line happy-path speed advantage
ETL/DI start getting complicated once you go past the simple point-to-point integration use case
IMHO: ESB's are better at supporting versioning of data sets, services and data models.
ETL/DI tend to have more mature UI's for non-technical users to perform data mapping tasks
ESB's are really strong at supporting runtime decoupling of systems. If System B is down, the data just sits in a queue until it comes back up. No long running blocking thread or risk of having to restart a job
ESB has a slightly higher ramp-up curve
ETL/DI generally leads to ESB eventually (most vendors offer both a DI and ESB product)
Related
I'm facing difficulties with finding the best CEP product for our problem. We need a distributed CEP solution with shared memory. The main reason for distribution isn't speeding up the process, but having a fallback in case of hardware or software problems on nodes. Because of that, all nodes should keep their own copy of the event-history.
Some less important requirements to the CEP product are:
- Open source is a big pre.
- It should run on a Linux system.
- Running in a Java environment would be nice.
Which CEP products are recommended?
A number of commercial non-open source products employ a distributed data grid to store the stateful event processing data in a fault-tolerant manner. My personal experience is with TIBCO BusinessEvents, which internally uses TIBCO ActiveSpaces. Other products claim do similar things, e.g., Oracle Event Processing uses Oracle Coherence.
Open source solutions, I wouldn't be aware that any of them offers functionality like this out of the box. With the right skills you might be able to use them in conjunction with a data grid (I've seen people try to use Drools Fusion together with infinispan), but there are quite a number of complexities that you need think about that a pre-integrated product would take care of for you (transaction boundaries, data access, keeping track of changes, data modeling).
An alternative you might consider if performance doesn't dictate a distributed/load-balanced setup could be to just run a hot standby, i.e., two engines performing the same CEP logic, but only one engine (the active one) actually triggering outgoing actions. The hot-standby engine would be just evaluating the CEP logic to have the data in its memory ready to take over in case of failure but not trigger outgoing actions as long as the other engine is running.
I'v read this topic but still have not complete picture and I would really appreciate your answer to the next question:
for what type of application should be used SOA approach (get JSON from server side and generate html on the client side using javascript framework,
like knockout js, angular js and so on), and ASP.net MVC on the server side - like alternative architecture approach (generate pages totally on the server side and return views as result).
For example, for last SPA with rich client side logic wcf services + knockout js (client side MVVM) provided great result. But what approach will be better suited
for CRUD application (for instance, several tables for adding, updating data with different user roles in use).
SOA is about a lot more than just sending JSON to a web client.
Imagine you have a business with a database-driven software system for things like sales, inventory, reporting, etc. Most systems start out small, with just a client or web app talking directly to the database... and that's okay.
However, as the system grows, there are some things you'll find that don't fit well inside this model: long-running batch processes that lock up the app or web page, scheduled jobs that involve more than just the database server, processes involving data living in outside sources, or complex reports that bog down your DB while they run.
At this point, you'll want to think about adding an application server to handle some of these tasks. An application server can take some of that workload off of your clients. It can also take certain loads off what is likely by this time to be an over-worked database, such that the application server requests or moves raw data to and from the DB, and your user-facing client requests/submits transformed data to and from the application server.
As the system grows even more, you'll also find that different parts of the system have unexpected side effects elsewhere as you maintain things. Even simple enhancements become more and more complicated to complete. Development slows, and bug counts increase. The application server now becomes a great place to centralize design efforts on how to make sure a change in one area has the expected consequences (and only the expected consequences) everywhere else.
At its outset, what an SOA really is, then, is taking that application server (which might happen to use json over http, but might also offer a completely different interface or even automatically translate among several data transport technologies) and enforcing all, not just some, database access goes through this application server: the service layer.
Once this access is enforced, and nothing else talks directly to the database any more (at least, nothing that's not specifically accounted for), the layer also becomes a great place to start enforcing business rules and system logic. It allows you to write traditional application-style code here that's easier to use with source control than sql and will automatically be shared among any applications using the system. The code all lives in about the same place and so it's easier to model changes and their effects through the system.
As a bonus, this layer is often very easy to scale out to multiple redundant servers, especially compared to a traditional relational database server. The result is scaling the application server can become a way to improve and manage performance and reliability of a large application. On the back-end, it can also improve performance by simplifying and centralizing efforts to use database caching tools like Redis, making it easier to involve a dedicated DBA in performance tuning, and helping you centralize access to data that lives in multiple places.
At this point, your MVC web site is just one more app that connects to the application server in your SOA system. You might also have a legacy client-server app installed on some desktops, or your MVC app may be public sales facing while actual sales and support reps use something completely different, billing uses a different app, and order fulfillment or procurement have yet another interface ... but they all talk to the same service layer. An additional advantage here is this service layer makes it easier to pull in data from multiple sources, so if your manufacturing system needs material availability information from an outside system, the service layer can know how to go find it and front-end code doesn't have to know this data came from anywhere special.
The point of all this is it's not a case of either/or here. If you have an SOA, you can use MVC at one level of the system, and the interface provided by the SOA's service layer will determine some of what your MVC model looks like and how the controller behaves. If you don't have an SOA, MVC just happens to work okay at building the whole stack, from database to presentation, and in fact works such that the model becomes a microcosm for a larger service layer.
The question, then, of when to use JSON vs when to use ASP.Net MVC takes on a new form. ASP.Net MVC can be part of an SOA architecture, and service frameworks that offer JSON data are often implemented using client-side MVC libraries. You really want to know when it's more appropriate to do more on the client side vs more on the server side. Honestly, I think this is mostly personal preference, but there are trade-offs you should be aware of.
Doing more work client side can be great for performance and scalability, because it spreads some of the application's work-load among your users' computers systems, and it can reduce latency introduced by making round trips to a web server or application server.
On the other hand, doing more work server side is good for avoiding latency in transferring larger data sets over slower public internet links, can make it easier to meet compliance requirements for things like the American Disabilities Act accessibility mandates, where too much javascript can cause problems with accessible browsers or pushing data to client systems may constitute a privacy or security risk, and can make it easier to develop, deploy, and maintain new code when more of the processing happens all within the same layer.
Client side MV* (MVC, MVP, MVVM etc.) architectures and server side MV* architectures are the same as far as the SOA part of your architecture is concerned.
The Model is where you communicates with the services and fetch data from the the various services. The choice between client side MV* and server side is orthogonal.
Is it beneficial to pull the data from Datawarehouse for analytical CRM application or it should be pulled from the source systems without the need of Datawarehouse??....Please help me answering.....
For CRM it is better to fetch the data from datawarehouse. Where a data transformations developed according to the buiness needs using various ETL tools, using this transofrmations you can integrate the CRM analytics for analysing the large chunk of data.
I guess the answer will lie in a few factors
what data you need,
the granularity of that data and,
the ease of extract
If you need data that you will need to access more than one source system, then you will have to do the joining of that data between them. One big strength of getting the data from a DWH, is that they tend to have data from a number of source systems and are well connected across these source systems with busienss rules being applied consistently across them.
A datawarehouse should have lowest granularity data, but sometimes, for pragmatic reasons, decisions may have been taken to partly summarise the data, thus you may not have the approproate granularity.
The big advantage of a DWH is that it is a simle dimensional model structure (for a kimball star schema any how), so as long as the first two are true, I would always get my data from the DWH.
g/l!
Sharing my thoughts on business case to pull from datawarehouse rather than directly from CRM system would be -
DWH can hold lot more indicators for Decision making and analysis at enterprise level across various systems than a single system like CRM. Therefore if you want to further your analysis on CRM data you can merge easily information from other system to perform better analytics/BI from DWH.
If you want to bring conformity across systems for seeing data of customer with single view. For example, you can have pipeline and sales information from CRM and then perform revenue calculation in another system for the same customer. Its possible that you want both sets of details in single place with same customer record linked to both measures.Then you might want to add Risk (Credit information) from external source into the same record in DWH. It brings true scability in terms of reporting and adhoc requests.
Remove the non-core work and dettach the CRM production system from BI and reporting (not talking of specific CRM reports). This has various advantages both terms of operations and convinence. You can google on this subject more to understand the benefits.
For now these are the only points that come to me. I will try adding more thoughts later.
P.S: I am more than happy to be corrected :-)
We develop mostly low traffic but highly specialized web applications. Normally we use L2S, EF or nHibernate as access layer and then throws Asp.Net MVC to it and in which for normal crud operations we query the ISession/DataContext directly but for more advanced functions/side effects we put it in a some kind of service layer.
Now, i was think about publishing the data through OData (WCF Data Service) and query that from the controllers (or even from jQuery when the a good template engine shows up) and publish the service operations through a WCF service (or as custom methods on the WCF Data Service?). What advantages/disadvantages does this architecture poses?
Do I gain something except higher complexity and latency? Better separations of concerns (or is it just a illusion)?
Edit:
Can it be a good idea to create a complete ajax driven solution with eg. WCF RIA Services? Or do one loose too much flexibility? Feels like you can completely dispatch your views from your logic then, heck, one should be able to just write pure HTML, not even a asp.net MVC should be needed? but i guess there's a lot of new problems arising?
Don't Do it. Sorry, but this is a stupid over-engineered approach. You are IN ONE PROCESS and you insist on running a network connection AND coding all passing data into XML and back out, plus running it over a HTTP connection with limited query semantics? Don't tell anyone you even tried.
Separation of concern is an illusion here - you replace a highly optimized domain model with a simplified data layer.
THAT SAID: I love OData - great. But it is not an in program technology, it is a FRONT END technology, like ASP.NET MVC - just not for the end user, but for ANOTHER program to integrate into your data. It should be used in similar scenarios, and when exposing data over trust borders (Silverlight - for example - is a trust border as the requests can be faked).
It is NOT optimized to replace in process high end application run-time layers like NHibernate.
As TomTom mentions, you don't want to pay the cost of loopback for OData when within a process. If you have direct line-of-sight to your database and it's your own application's database, then there is no reason to put WCF Data Services in the middle. I would continue to use one of the other options you mentioned (L2S, EF, nHibernate).
Now, if you need to expose data over your http endpoint for other applications to consume, or even for your own application if you have some jQuery code in the client that needs to access data from the server, then definitely an OData endpoint may help and WCF Data Services is the simplest way to create one.
TomTom has a lot of votes and although he's not wrong, he's also not right, in spite of his persuasive tone.
In this particularly instance, the OP appears to be writing an intranet LOB style app that probably only stands to be impeded by an OData service mimicking the underlying database, but what if he were not mimicking the underlying database?
If he were building an application based on various or unknown future data sources, then the services layer can unify, re-present, simplify and aggregate those services, even if a large proportion of queries eventually back to a SQL Server in the next room.
Similarly, if you're building an application of massive scale, and by scale I mean millions of users expecting to wait a few seconds between actions, not millions of FX trades an hour, then placing a services layer between your application the data is a common pattern. The scalability of the internet is based on many small stateless HTTP servers and the caching infrastructure in between.
In real life, the same queries are run countless times, people refresh pages or click the same link over and over. No one really asks for 10m rows, because not many humans can look at it in one go. So working in small pages keeps the data flowing and requests interleaving. You also have the opportunity to introduce a shared in RAM cache in the services layer, or even a RAM database.
You may even find that you need to shard your database or partition it between SQL and a key/value store. You can then do the joins in the middle tier, scaled out, and offload the joining and compute-intensive stuff away from the database server.
The rule with internet scale is that the database is your hot spot and you need to do everything you can to prevent anyone talking to it! Be that local HTTP cache in an iPad, in your ISPs proxy, in the IIS output cache, or in a Redis cache, all those layers are helping to spread the load, ease the burden.
So if Carl came to interview with me and told me he'd considered putting an OData layer before his SQL boxes, I'd be interested to hear his reasoning.
WCF Data Services and OData support JSON, so you can minimize the payload by leveraging that. Plus, with WCF Data Services you can completely control your data access. You don't have to roll Entity Framework. You can customize everything. The benefit is that the protocol structure is completely handled for you by using WCF Data Services and OData. And consuming the service from MVC is an Add Service Reference away. WCF Data Services runs on WCF so you have the ability to do other web services beyond just OData type delivery, so it is extremely flexible.
There are limitations here and there that come with the nature of OData as well as the way WCF Data Services handles OData, but they are fairly specific and if they arise in your architecture there are ways around them.
If you solution is isolated to a single web application, then having the data layer embedded in that application works well. But if you have any need whatsoever to have another app or process hit the data layer or shared business logic then exploring the option of putting your data layer in a WCF Data Service is well worth it. For example, you could write a PowerShell script to call a web service method in 2 lines of code. So if you have domain logic that you want to be able to run from your web app and from a command line or scheduled task then your WCF Data Service layer could handle that scenario for all without having to duplicate logic or code.
Many ways to skin a cat. I have used both approaches in business applications and would not say that one or the other should be avoided. They both work well and provide plenty of value without being detrimental.
To be fair, there are benefits to this approach that may outweigh the performance concerns, which are admittedly tremendous. An application built this way will have orders of magnitude more latency and may cost several times more in compute resources to execute than an in-process solution.
That having been said, in development scenarios where human resources are limited, this may work better. It allows for contractors to be quickly hired on to write new screens or whole new applications very quickly in whatever language suits them. Developers can get up-to-speed faster than a proprietary homegrown solution. No more sa passwords in config files, injection of a custom security layer if required, unified logging and auditing, combining several data stores into one consistent resource. If you have a heterogenous platform, you don't need to write SDKs, they have already been written in many important languages. oData works very well with MS Excel, which is a huge win at many organizations. Depending on your network topology, it might be cheaper and even faster to route out over the internet than to use a leased line if you're in a remote office, or behind a firewall (at a client site doing a demo, for instance).
For large datasets, the overhead of the request and packaging becomes less important. For reporting scenarios, for instance. While I have never designed something like this, I can see where it might be useful, depending on your corporate culture and available resources, to consume oData endpoints internally.
Does anyone have advice or tips on using a web service as the model in an ASP.Net MVC application? I haven't seen anyone writing about doing this. I'd like to build an MVC app, but not tie it to using a specific database, nor limit the database to the single MVC app. I feel a web service (RESTful, most likely ADO.Net Data Services) is the way to go.
How likely, or useful, is it for your MVC app to be decoupled from your database? How often have you seen, in your application lifetime, a change from SQL Server to Oracle? From the last 10 years of projects I've delivered, it's never happened.
Architectures are like onions, they have layers of abstractions above things they depend on. And if you're going to use an RDBMS for storage, that's at the core of your architecture. Abstracting yourself from the DB so you can swap it around is very much a fallacy.
Now you can decouple your database access from your domain, and the repository pattern is one of the ways to do that. Most mature solutions use an ORM these days, so you may want to have a look at NHibernate if you want a mature technology, or ActiveRecord / linq2sql for a simpler active record pattern on top of your data.
Now that you have your data strategy in place, you have a domain of some sort. When you expose data to your client, you can choose to do so through an MVC pattern, where you'll usually send DTOs generated from your domain for rendering, or you can decide to leverage an architecture style like REST to provide more loosely coupled systems, by providing links and custom representations.
You go from tight coupling to looser coupling as you go towards the external layers of your solution.
If your question however was to build an MVC app on top of a REST architecture or web services, and use that as a model... Why bother? If you're going to have a domain model, why not reuse it in your system and your services where it makes sense?
Generating a UI from an MVC app and generating documents needed for a RESTful architecture are two completely different contexts, basing one on top of each other is just going to cause much more pain than needed. And you're sacrificing performance.
Depends on your exact scenario, but remote XML-based service as the model in MVC, from experience, not a good idea, it's probably over-engineering and disregarding the need for a domain to start with.
Edit 2010-11-27; clarified my thoughts, which was really needed.
A web service exposes functionality across different types of applications, not for abstraction in one single application, most often. You are probably thinking more of a way of encapsulating commands and reads in a way that doesn't interfere with your controller/view programming.
Use a service from a service bus if you're after the decoupling and do an async pattern in your async pages. You can see Rhino.ServiceBus, nServiceBus and MassTransit for .Net native implementations and RabbitMQ for something different http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/.
Edit: I've had some time to try rabbit out in a way that pushed messages to my service which in turn pushed updates to the book keeping app. RabbitMQ is a message broker, aka a MOM (message oriented middle-ware) and you could use it to send messages to your application server.
You can also simply provide service interfaces. Read Eric Evan's Domain Driven Design for a more detailed description.
REST-ful service interfaces deal a lot with data, and more specifically with addressable resources. It can greatly simplify your programming model and allows great control over output through the HTTP protocol. WCF's upcoming programming model uses true rest as defined in the original thesis, where each document should to some extent provide URIs for continued navigation. Have a look at this.
(In my first version of this post, I lamented REST for being 'slow', whatever that means) REST-based APIs are also pretty much what CouchDB and Riak uses.
ADO.Net is rather crap (!) [N+1 problems with lazy collection because of code-to-implementation, data-access leakage - you always need your db context where your query code is etc] in comparison to for example LightSpeed (commercial) or NHibernate. Spring.Net also allows you to wrap service interfaces in their contain with a web service facade, but (without having browsed it for a while) I think it's a bit too xmly in its configuration.
Edit 1: With ADO.Net here I mean the default "best practice" with DataSets, DataAdapter and iterating lots of rows from a DataReader; it breeds rather ugly and hard-to-debug code. The N+1 stuff, yes, that is about the entity framework.
(Edit 2: EntityFramework doesn't impress me either!)
Edit 1: Create your domain layer in a separate assembly [aka. Core] and provide all domain and application services there, then import this assembly from your specific MVC application. Wrap data access in some DAO/Repository, through an interface in your core assembly, which your Data assembly then references and implements. Wire up interface and implementation with IoC. You can even program something for dynamic service discovery with the above mentioned service buses, to solve for the interfaces. WCF uses interfaces like this and so do most of the above service busses; you can provide a subcomponentresolver in your IoC container to do this automatically.
Edit 2:
A great combo for the above would be CQRS+EventSourcing+ReactiveExtensions. Your write-model would take commands, your domain model would decide whether to accept them, it would push events to the reactive-extensions pipeline, perhaps also over RabbitMQ, which your read-model would consume.
Update 2010-01-02 (edit 1)
The jest of my idea has been codified by something called MindTouch Dream. They have made a screencast where they treat almost all parts of a web application as a (web)-service, which also is exposed with REST.
They have created a highly parallel framework using co-routines to handle this, including their own elastic thread pool.
To all the nay-sayers in this question, in ur face :p! Listen to this screen-cast, especially at 12 minutes.
The actual framework is here.
If you are into this sort of programming, have a look at how monads work and their implementations in C#. You can also read up on CoRoutines.
Happy new year!
Update 2010-11-27 (edit 2)
It turned out CoRoutines got productized with the task parallel library from Microsoft. Your Task now implement the same features, as it implements IAsyncResult. Caliburn is a cool framework that uses them.
Reactive Extensions took the monad comprehensions to the next level of asynchronocity.
The ALT.Net world seems to be moving in the direction I talked about when I wrote this answer the first time, albeit with new types of architectures I knew little of.
You should define your models in a data access agnostic way, e.g. using Repository pattern. Then you can create concrete implementations backed by specific data access technologies (Web Service, SQL, etc).
It really depends on the size of this mvc project. I would say keep the UI and Domain in same running environment if the website is going to be used by a small number of users ( < 5000).
On the other side, if you are planning on a site that is going to be accessed by millions, you have to think distributed and that means you need to build your website in a way that it can scale up/out. That means you might need to use extra servers (Web, application and database).
For this to work nicely, you need to decouple your mvc UI site from the application. The application layer would usually contain your domain model and might be exposed through WCF or a service bus. I would prefer a Service Bus because it is more reliable and might use persistent queues like msmq.
I hope this helps