Using Erlang to manage multiple instances of an external process - erlang

I have a single threaded process, which takes an input file and produces an output file (takes file in and file out paths as inputs). I want to use Erlang to create, manage and close multiple instances of this process.
Basically, whenever the client process need to produce the output file, the client connects to the Erlang server with the input and output path - the server initiates a new process - feeds it the paths, and then once the process is done, terminate the process.
I have basic understanding of how gen_server etc. work, but I want to know whether I can use erlang to create and delete instances of an external process? (e.g. a JAR). What library should I look into?

Look at ports. http://www.erlang.org/doc/man/erlang.html#open_port-2

The os:cmd function is probably the closest, see [http://www.erlang.org/doc/man/os.html1. It does assume that your processes run and then finish - the "deleting" part is not covered.

Related

Orchestration/notification of processing events

I have the following SCDF use case.
I have a couple hundred files to process and put in the db
A producer will get a single file, reads the first N number of rows and send it to source (rabbit mq) , then reads the next N number of rows and sends it to source again, etc, until done.
A consumer will receive these file chunks (from rabbit mq), do some minor enriching, and write it to the DB (sink)
I will have some number of streams > 1 running (say 4 for example) for some parallel processing of these files
My question is: Does SCDF have a mechanism to know when all consumers are completed (and hence the queue(s) are exhausted) so I can know when to start some other process (could be another stream/task/anything) that needs the db fully populated to begin
Yes sink1 is the only consumer of source1. In a streaming application, there is no concept of “COMPLETED”. By definition, stream processing is logically unbounded and stream apps (sources and sinks) are designed to run forever. Tasks, on the other hand, are short lived, finite processes that exit when they are complete. The application logic defines when the task is complete. Processing a file, or a chunk of a file is the most common use case. A stream can monitor a file system, or remote file source such as sftp, or s3, and launch a task whenever a new file appears. The task processes the file and marks the execution as COMPLETE.
This type of use case is better suited for task/batch. See https://dataflow.spring.io/docs/recipes/batch/sftp-to-jdbc/ which details the recommended architecture. You can use define a composed task to run the ingest and then the next task.

Possibilities to accept input for COBOL batch program

I have a batch COBOL program which needs input in the form of a flat file. It is working when I FTP a single file to the batch using a software.
Problem is that , in the final solution , many concurrent users are needed to access the batch program together or individually. For example lets say 10 users need to run the batch.
They can FTP all of the files to a shared directory from where the Mainframe can access the file.
Now the problem comes as to
How the Mainframe Job can be triggered?
since there will be 10 or more files , the JOB needs to run each one of them individually and generate a report.
How should the file names be? for example if two files have same name they will get overridden when they are FTP into the shared directory in the first place. On the other hand if the file names are unique , Mainframe will not be able to differentiate between them .
The user will recieve the report through E-Mail its coded in the Batch program, the ID will be present in the input Flat file.
Previously the CICS functionality was done through excel macro(Screen scrapping). The whole point of this exercise is to eliminate the CICS usage to reduce MIPS
Any help is appreciated.
Riffing off what #SaggingRufus said, if you have Control-M for scheduling you can use CTMAPI to set an auto-edit variable to the name of your file and then order a batch job. You could do this via a web service in CICS using the SPOOLWRITE API to submit the job, or you could try FTPing to the JES spool.
#BillWoodger is absolutely correct, get your production scheduling folks and your security folks involved. Don't roll your own architecture, use what your shop has decided is right for it.

Simple_one_for_one application

I have a supervisor which starts simple_one_for_one children. Each child is in fact a supervisor which has its own tree. Each child is started with an unique ID, so I can distinguish them. Each gen_server is then started with start_link(Id), where:
-define(SERVER(Id), {global, {Id, ?MODULE}}).
start_link(Id) ->
gen_server:start_link(?SERVER(Id), ?MODULE, [Id], []).
So, each gen_server can easily be addresed with {global, {Id, module_name}}.
Now I'd like to make this child supervisor into application. So, my mother supervisor should start applications instead of supervisors. That should be straightforward, except one part: passing ID to an application. Starting supervisor with an ID is easy: supervisor:start_child(?SERVER, [Id]). How do I do it for application? How can I start several applications of the same name (so I can access the same .app file) with different ID (so I can start my children with supervisor:start_child(?SERVER, [Id]))?
If my question is not clear enough, here is my code. So, currently, es_simulator_dispatcher starts es_simulator_sup. I'd like to have this: es_simulator_dispatcher starts es_simulator_app which starts es_simulator_sup. That's all there is to it :-)
Thanks in advance,
dijxtra
Applications don't run under anything else, they are a top-level abstraction. When you start an application with application:start/1 the application is started by the application controller which manages applications. Applications contain code and data, and maybe at runtime a supervision tree of processes doing the applications thing at runtime. Running multiple invocations of an application does not really make sense because of the nature of applications.
I would suggest reading OTP Design Principles User's Guide for a description of the components of OTP, how they relate and how they are intended to be used.
I don't think applications where meant for dynamic construction like you want. I'd make a single application, because in Erlang, applications are bundles of code more than they are bundles of running processes (you can say they are an artifact of compile-time moreso than of runtime).
Usually you feed configuration to an application through the built-in configuration system. That is, you use application:get_env(Key) to read something it should use. There is also an application:set_env(...) to feed specific configuration into one - but the preferred way is the config file on disk. This may or may not work in your case.
In some sense, what you are up to corresponds to creating 200 Apache configuration files and then spawn 200 Apache systems next to each other, rather than running a single one and then handle the multiple domains inside it.

In Erlang, is it possible to send a running process to a different node?

I have been researching Mobile Agents, and was wondering if it is possible to send a running process to another node in erlang. I know it is possible to send a process on another node a message. I know it is possible to load a module on all nodes in a cluster. Is it possible to move a process that might be in some state on a particular node to another node and resume it's state. That is, does erlang provide strong mobility? Or is it possible to provide strong mobility in erlang?
Yes, it is possible, but there is no "Move process to node" call. However, if the process is built with a feature for migration, you can certainly do it by sending the function of the process and its state to another node and arrange for a spawn there. To get the identity of the process right, you will need to use either the global process registry or gproc, as the process will change pid.
There are other considerations as well: The process might be using an ETS table whose data are not present on the other node, or it may have stored stuff in the process dictionary (state from the random module comes to mind).
The general consensus in Erlang is that processes are not mobilized to move between machines. Rather, one either arranges for a takeover of applications between nodes should a node die. Or for distribution of the system so data are already distributed to another machine. In any case, the main problem of making state persistent in the event of errors still hold, mobility or not - and distribution is a nice tool to solve the persistence problem.

scheduled task or windows service

My team is having a debate which is better: a windows service or scheduled tasks. We have a server dedicated to running jobs and currently they are all scheduled tasks. Some jobs take files, rename them and place them in other directories on the network. Other jobs extract data from SQL, modify it, and ship it elsewhere. Other jobs ftp files out. There is a lot of variety, but all in all, they are fairly straightforward.
I am partial to having each of these run as a windows service instead of a scheduled task because it is so much easier to monitor a windows service than a scheduled task. Some are diametrically opposed. In the end, none of us have that much experience to provide actual factual comparisons between the two methods. I am looking for some feedback on what other have experienced.
If it runs constantly - windows service.
If it needs to be run at various intervals - scheduled task.
Scheduled Task - When activity to be carried out on some fixed/predefined schedule. It take less memory and resources of OS. Not required installation. It can have UI (eg. Send reminder mail to defaulters)
Windows Service - When a continue monitoring is required. It makes OS busy by consuming more. Require install/uninstallation while changing version. No UI at all (eg. Process a mail as soon as it arrives)
Use them wisely
Sceduling jobs with the build in functionality is a perfectly valid use. You would have to recreate the full functionality in order to create a good service, and unless you want to react to speciffic events, I see no reason to move a nightly job into a service.
Its different when you want to process a file after it was posted in a folder, thats something I would create a service for, thats using the filesystem watcher to monitor a folder.
I think its reinventing the wheel
While there is nothing wrong with using the Task Scheduler, it is itself, a service. But we have the same requirements where I work and we have general purpose program that does several of these jobs. I interpreted your post to say that you would run individual services for each task, I would consider writing a single, database driven (service) program to do all your tasks, and that way, when you add a new one, it is simply a data entry chore, and not a whole new progam to write. If you practice change control, this difference is can be significant. If you have more than a few tasks the effort may be comperable. This approach will also allow you to craft a logging mechanism best suited to your operations.
This is a portion of our requirments document for our task program, to give you an idea of where to start:
This program needs to be database driven.
It needs to run as a windows service.
The program needs to be able to process "jobs" in the following manner:
Jobs need to be able to check for the existence of a source file, and take action based on the existence or not of the source file. (i.e proceed with processing, vs report that the file isn't there vs ignore it because it is not critical that the file isn't there.
Jobs need to be able to copy a file from a source to a target location or
Copy a file from source, to a staging location, perform "processing", and then copy either the original file or a result of the "processing" to the target location or
Copy a file from source, to a staging location, perform "processing", and the processing is the end result.
The sources and destination that jobs might copy to and from can be disparate: UNC, SFTP, FTP, etc.
The "processing", can be, encrypting/decrypting a file, parsing a data file for correct format, feeding the file to the mainframe via terminal emulation, etc., usually implemented by calling a command line passing parameters to an .exe
Jobs need to be able to clean up after themselves, as required. i.e. delete intermediate or original files, copy files to an archive location, etc.
The program needs to be able to determine the success and failure of each phase of a job and take appropriate action which would be logging, and possibly other notification, abort further processing on failure, etc.
Jobs need to be configured to activate at certain set times, or at certain intervals (optionally during certain set hours) i.e. every 15 mins from 9:00 - 5:00.
There needs to be a UI to add new jobs.
There needs to be a button to push to fire off a job as if a timer event had activated it.
The standard Display of the program should show an operator what is going on and whether the program is functioning properly.
All of this is predicated on the premise that it is a given that you write your own software. There are several enterprise task scheduler programs available on the market, as well. Buying off the shelf may be a better solution for you.

Resources