Is there a mechanism for preserving/saving Processor state between calls?
In particular I want a reliable mechanism to know when my process last ran, even if the processor, or even NiFi itself has been restarted.
(Please don't give answers such as hBase or the file system. I am looking for something provided by NiFi, or that can be built with services provided by NiFi)
There is currently no out of the box functionality that automatically captures the listed information unilaterally throughout the application for all processors.
There are mechanisms that provide the capability of accomplishing this type of functionality in components via ControllerServices (think of these as components for cross-cutting concerns or aspects) like the DistributedMapCacheServer/Client or DistributedSetCacheServer/Client.
There are processors that make use of these controller services in manner analogous to your desired feature such as DetectDuplicate or ListHDFS.
This is where things stand currently. There is work under way for the next release (0.5.0) that brings more framework functionality to accomplish such tasks and its work is outlined in our State Management Feature Proposal.
If none of these items quite fits your desired functionality or you have some other ideas, I encourage you to share them with the community either via our mailing lists if you want to hash out your ideas and/or JIRA.
Related
Traditional categorization of processes is talking about integration, human centric and document centric processes, with the last one as a good candidate for placing inside the DMS system (of course, the prerequisite is that there is a built-in support for BPM).
But I was unable to find some concrete,more detailed explanation of the distinction between those options.
Imagine a company, that have Enterprise BPM solution , and also a DMS system with quite good support for BPM (i.e. Filenet DMS).
In both systems you can create user screens and workflows (process logic) as well.
Also, most processes working with documents are also quite "human-centric".
I am perfectly aware of the fact, that choosing the target platform always depends on the requirements and specific circumstances, but I wonder, if there are some general rules, or principles, based on which I can better decide where to put the process layer of the whole solution.
Additional clarification:
I don't want to implement any new platform. As I indicated a little bit in the previous post, we already have BPM platform (Oracle) and DMS as well (Filenet with BPM support - Case Foundation). So the question is not about choosing the new platform...but more about setting the rules for using the existing products/platforms. There are a lot new projects in the queue...and for some of them (that are touching the area of working with documents) we need to decide the target platform/s. For example, when you have a simple process with a few steps, and in all steps there is some work with an existing document (the document - or at least his original version, is also input to this process), the requirements on the front-end are not very complicated etc...it would simpler to build the whole solution in the Filenet platform( mostly because of the cost). But I am wondering if there are some similar rules....Like you should think about that or that... when you want use only the DMS platform...or both platforms etc. You can call these rules the principles for development, references architectures or something like that....that is guiding you when designing the target architecture/s.
Thank you
I'm reposting the answer because I don't see a reason for deletion (by #Bohemian).
I think it adds value to anyone asking the same question. #Bohemian could have at least specified why he deleted the post.
Here it goes:
You gave us rather small amount of information. And what exactly is
the question? What do you mean by "where to put the process layer"?
You shouldn't constrain yourself to only those DM systems that claim
to have BPM built-in. That's marketing speak behind which often lay
two half-baked products. You should instead question which
standards-based integration points the system has, so you can
integrate effortlessly. And then invest in best-of-breed DM and best
BPM separately. All-in-one solutions are often too closed, difficult
to extend and above all, they bring free vendor-lock-in with them.
What are your business requirements, i.e. what do you have to do?
Implement BPM inside organization that already has DM or not? Do you
have some BPM platform already? Do you have any
constraints/requirements when choosing either of those (vendor,
technology foundation, Gartner quadrant...)?
What are the options you're considering for DM and which options are
you evaluating (if any) as a BPM platform? Have you already settled on
IBM or you can go elsewhere? Is open source an option?
What is your role/responsibility in this project?
EDIT - after the author's clarifications:
I have not worked with Oracle's BPM, but I can tell you that, although Case Foundation is more suited to Case Management, you can develop a complete Process Management solution with it (workflows, tasks, roles, deadlines, in-baskets, etc.).
If you go that path and later come across the business need to allow business users to define their own case templates, take a look at IBM Case Manager, as it builds on top of Case Foundation, but also brings additional WebUI features (built on IBM Content Navigator), suitable for business users (although, more often than not, it turns out the IT does that job).
A few IBM redbooks about Case & Content management that might help you make an informed decision:
Introducing IBM FileNet Business Process Manager - this is the former name for Case Foundation - the same product, new version.
Advanced Case Management with IBM Case Manager
Customizing and Extending IBM Content Navigator - you'll need this one for customizations, if you decide to go with CF (instead of Oracle).
Building IBM Enterprise Content Management Solutions From End to End - from ingestion to case/process management (contains Case Manager).
I agree with #Robert regarding integration, after all, before version 5.2 FileNet Content Platform Engine was FN Content Engine + FN Process Engine.
The word of advice I can give you is to first document all features that business requires from BPM. Then do a due diligence on both products, noting down which of those features each of those products supports. Then the answer, if not laid out in front of you, will at least be much easier.
You also have to take into account that IBM is oriented towards IBM BPM (former Lombardi) when process management is concerned. Former FN BPM is now more pushed into Case Management (but those two are very similar paradigms).
You should definitely post back about your experience, whichever option you choose.
Good "luck" :)
I have a fairly complex windows service (written in .net 4) with several sub systems that run in parallel.
I have implemented pretty good logging throughout, but I'm feeling I'm needing more info about what each subsystem is currently doing. This would be very useful for times that I need to stop the service for upgrade/bug fixes.
It would be nice to have a gui app that will show me the status for each part of the application that I'm interested in. I've had some ideas for how I'm going to do this, but I'd like to hear some others' ideas as well.
I'm interested in a solution that would be easy to plop down in a future windows service and I'm not looking for anything very complex.
Are there any tools for this sort of thing?
Have you done this yourself?
What about interprocess communication?
Since Windows services can no longer interact with the user session, you'll need to have a separate application that does the interacting for you. Based on the details of your question, I think you understand this.
The big question is how to facilitate the communication between your Windows service and the application. There are all kinds of approaches - shared memory, socket, pipe, remoting, etc. What I have used successfully is WCF. If your UI is going to reside on the same machine as the service, use the NetNamedPipeBinding. If you ever need access from a remote machine, you can change to the NetTcpBinding. I've found this flow chart helpful in binding selection.
.
If you're looking for a more formal framework approach that just straight WCF, have a look at Juval Lowy's Publish-Subscribe WCF Framework, which is described in pretty good detail in this MSDN article. The code is available to look at via the article, or you can download the source and example from Lowy's website here. Go to the Downloads section, filter by the Discovery category, and you'll see it there.
I am trying to design an event driven system where the elements of the system communicate by generating events that are responded to by other components of the system. It is intended that the components be independent of each other - or as largely independent as I can make them. The system will initially be implemented on Windows 7, and is being written in Delphi. The generated events will be generated by the Delphi code. I understand how to implement a system of the type described on a single machine.
I wish to design the system so that it can readily be deployed on different machine architectures in particular with different components running on a distributed architecture, which may well be different to Windows 7. There is no requirement for the system ever to communicate with any systems external to itself.
I have tried investigating the architecture I need to consider and have looked at the questions mentioned below. These seem to point towards utilising named pipes as a mechanism for inter-hardware communications. As a result of these investigations I have sketched out the following to describe my system - the first part of the diagram is the system as I am developing it; the second part what I have deduced I would need for possible future implementations.
This leads to the following points:
Can you pass events via named pipes?
Is this an appropriate and sensible structure to tackle this problem?
Are there better alternatives?
What have I forgotten (at this level of granularity)?
How is event driven programming implemented?
How do I send a string from one instance of my Delphi program to another?
EDIT:
I had not given the points arising from "#I give crap answers" response sufficient consideration. My initial responses to his points are:
Synchronous v Asynchronous - mostly asynchronous
Events will always be in a FIFO queue.
Connection loss - is not terribly important - I can afford to deal with this non-rigourously.
Unbounded queues are a perfectly good way of dealing with events passed (if they can be) - there is no expectation of large volume of event generation.
For maximum deployment flexibility (operating-system independent), I recommend to take a look at popular open source message brokers which run on the Java platform. Using standard protocols. they integrate well with Delphi and other programming languages, can be used with web applications, and have a large installed user base and active community.
They are quite easy to install and configure in a few minutes, and free / commercial clients for Delphi are available.
Some examples are:
Apache ActiveMQ
OpenMQ
JBoss HornetQ
I also recommend the book "Enterprise Integration Patterns" by Martin Fowler as an overview and introduction, with many simple recipes to handle specific problems.
Note that I am a developer of commercial Delphi clients for enterprise messaging systems, such as xmlBlaster, RabbitMQ, Amazon Simple Queue Service and the three brokers mentioned above.
I can only answer for your point 4 here: You have not yet decided if an event is synchronous or asynchronous. In the async case, you have to decide what to do when messages arrive. Do you have a queue? How big is the queue? Can one grab arbitrary elements in the queue or is it strictly FIFO. What happens if a message is lost (somebody axes the network cable)?
In the sync variant, the advantage is that you got delivery guarantees, but then what do you do when connections are suddenly lost?
Connection loss is going to be a problem. The more machines you have, the greater is the chance that they will occur. Decide how you will handle that.
Another trouble may be what you do if you have a large event and several small. Is the order of transfer FIFO or smallest-first? Can events be reeordered? What are the assumptions here?
The aside is that I hack Erlang a lot. In Erlang all the event-handling is already solved but it also means a specific model is chosen for you (async, unbounded queues, no guaranteed delivery, but detection of connection loss).
I suggest to look at RabbitMQ, http://www.rabbitmq.com/. It has the server and client. Just need some wrapper codes in delphi and you are ready to build your business logic
Cheers
This is probably just an application for a message queue.
http://msdn.microsoft.com/en-us/library/ms632590(v=vs.85).aspx
So I've been tasked at work to write windows services to replace some old legacy VB6 WinForms apps currently running as services, consistently repeating tasks day-to-day. To give some general background, they have there own state machines built in to handle decision basing and not utilizing threading.
A lot of the senior developers here thought it would be worth a try to look into WorkFlow to replace the state machines rather than write my own business logic and try threading it programmaticly. So it's WF vs. the "Old College Try" I suppose.
My concern is that there aren't many books on the topic, and since it was implemented in .Net I've heard very little about it being used. I brought this up at work and another developer mentioned that it's because Biz Talk never really caught on and it was designed for that.
So is it broken? Do you think it will be supported long enough to not worry so much? I don't want an ill-functioning process injected into my services, my new babies at work, and then have WF's keel over. Leaving me with having to replace them with my own code in the event of an emergency; which does not seem like much of a grand scenario to me.
Any suggestions, recommendations would be super.
Workflow Foundation is used in Microsoft SharePoint, so I think they will continue supporting it.
There is an open source project called Stateless by Nicholas Blumhardt. It is quite flexible and very light weight. See my SO answer for details.
I chose this over Windows Workflow simply because I could define a state as State and thereby persist the state of my workflows back to the database using SubSonic. Configuration consists of one XML file. If I need to add tasks, I simply add nodes to the XML.
The each state can have a series of triggers that once satisfied will advance to appropriate state. This framework is a single assemble and fits nicely in your domain logic.
I need to build a web application with different process flows and different UI steps depending on the locale of the logged in user.
I have developed a number of ASP.NET applications in C# and like the separation of concerns an MVC approach would give me. So I am looking at using these technologies.
The kicker is that different users in different locales need to have very different experiences, despite accessing the same datasource. I'm also constrained by the requirement to have new process flows be able to be configured easily. The XAML based Windows Workflow Foundation looks like a good candidate, and would allow me to avoid developing my own process flow engine.
However, I am a little concerned about the performance implications of such an approach. Has anyone tried this sort of architecture? What sort of impact can I expect on request time, CPU utilization and memory consumption?
All opinions gratefully received, Thanks.
Howdo
The principle objection, and I suppose, performance risk, which although not specically related to your ask, is that windows workflow hasn't been designed for that type of scenario. Simply put, WF first design tenet, was to enable long running transactions, built around web service and SOA, i.e. transactional connectivity lasting days, and as such its runtime is optimised for it. It doesn't really make a good fit, especially when you will need to shoe horn it into working, and it will be much more resource hungry for short running workflows, i.e. in relation to state changes, caching, etc, in terms of comparable other process flow code/engines. If you do go ahead, get a pilot up and running and test to death using Mercury Load Runner.
scope_creep