Mule ESB - Clear Memory of a batch process - memory

My Scenario is I have 4 batches in a my Mule flow. One of the batch say batch 1 loads records of say 10,000 but then i decided to force stop the batch. Now i decided to run batch No. 2 in the same XML. The batch 2 runs but the batch 1 records which were earlier loaded also gets run. Is this a bug or is there are a configuration to prevent this.

Are you running the batch on Studio?
If yes, go to the Run Configurations on studio. Look for the configuration of your project and scroll down to 'Clear Application Data', set this to 'Always'.

Related

ordering message pub/sub GCP

I am new to Dataflow and pub-sub tools in GCP.
Need to migrate current on prem process to GCP.
Current Process is as follows:
We have two types of data feeds
Full Feed – its adhoc job – Size of full XML is ~100GB (Single XML – very complex one – Complete data – ETL Job process this xml and load it into ~60 tables)
Separate ETL jobs are there to process full feed. ETL job process
full feed and create load ready files and all tables will be truncate
and re-load.
Delta Feed - Every 30 min need to process delta files(XML files – it will have only changes with in last 30 min)
Source system push XML files in every 30 mins(More than one, file has timestamp), scheduled ETL process will pick all the files which are produced by source system and process all the xml files and create 3 load ready files insert, delete and update for each table
Schedule – ETL Jobs are scheduled to run every 5 min, if current process is running more than 5 min, next run will not trigger until current process completes
Order of the file processing is very important(ETL Job will take care of this). Need to process all the files in sequence.
At the end of ETL process load the load ready files into tables (Mainframe)
I was asked to propose the design to Migrate this to GCP. Need to have two process in GCP as well full and delta. My proposed solution should be handle/suitable for both the feeds.
Initially I thought below design.
Pub/sub -> DataFlow -> mySQL/BigQuery
Then came to know that pub/sub will not give the guarantee to process the files in sequence/order. After doing some research learn that recently google introduced ordering key concept for pub/sub, which will make sure to process the messages in order. In google cloud docs it was mentioned that, this feature is in Beta.
I have two questions:
Whether any one used ordering key concept in pub/sub in production environment. If yes, did you face any challenges while implementing this
Is this design is suitable for the above requirement or is there any better solution in GCP
is there any alternative for DataFlow?
Came to know that pub/sub can handle maximum 10MB size of messages, for us each XML size is more than ~5G.
As was mentioned by #guillaume blaquiere, Beta product launching phase brings some restrictions but they are mostly related to the product support:
At beta, products or features are ready for broader customer testing
and use. Betas are often publicly announced. There are no SLAs or
technical support obligations in a beta release unless otherwise
specified in product terms or the terms of a particular beta program.
The average beta phase lasts about six months.
Commonly, Cloud Pub/Sub message ordering feature works as intended, once you have something for developers attention it is highly appreciated to send a report via Google Issue tracker.

Spring Cloud Dataflow - composed-task-runner doesn't start second task

I have a Dataflow pipeline consisting of two sequential batch jobs. The first batch gets completed successfully, but the second one doesn't start.
I have started Dataflow server with the embedded H2 DB. I've pointed Spring Batch to the same H2 instance via application.properties. After the first step in my pipeline gets completed, I can see batch execution logs in that same DB instance.
My composed-task-runner application seems getting the Dataflow's datasource correctly. I can see it inherits it from Dataflow server and props are shown in the Dashboard's task execution section.
There are no errors in the logs. Only log entries from successful execution of the first batch.
My TASK_EXECUTION entries:
What could be the problem? And why there are two entries in the TASK_EXECUTION table for the first step? Per the task_name - these entries belong to the first batch step only.
I was able to address this issue by re-building my batch task using Spring Initialzr. Initially I was trying to use spring-cloud-task-app-starters as the base for my work, and probably it is not the right way of building Dataflow tasks.

Creating a structured Jenkins Failing Test Report

The situation right now:
Every Monday morning I manually check Jenkins jobs jUnit results that ran over the weekend, using Project Health plugin I can filter on the timeboxed runs. I then copy paste this table into Excel and go over each test case's output log to see what failed and note down the failure cause. Every weekend has another tab in Excel. All this makes tracability a nightmare and causes time consuming manual labor.
What I am looking for (and hoping that already exists to some degree):
A database that stores all failed tests for all jobs I specify. It parses the output log of a failed test case and based on some regex applies a 'tag' e.g. 'Audio' if a test regarding audio is failing. Since everything is in a database I could make or use a frontend that can apply filters at will.
For example, if I want to see all tests regarding audio failing over the weekend (over multiple jobs and multiple runs) I could run a query that returns all entries with the Audio tag.
I'm OK with manually tagging failed tests and the cause, as well as writing my own frontend, is there a way (Jenkins API perhaps?) to grab the failed tests (jUnit format and Jenkins plugin) and create such a system myself if it does not exist?
A good question. Unfortunately, it is very difficult in Jenkins to get such "meta statistics" that spans several jobs. There is no existing solution for that.
Basically, I see two options for getting what you want:
Post-processing Jenkins-internal data to get the statistics that you need.
Feeding a database on-the-fly with build execution data.
The first option basically means automating the tasks that you do manually right now.
you can use external scripting (Python, Perl,...) to process Jenkins-internal data (via REST or CLI APIs, or directly reading on-disk data)
or you run Groovy scripts internally (which will be faster and more powerful)
It's the most direct way to go. However, depending on the statistics that you need and depending on your requirements regarding data persistance , you may want to go for...
The second option: more flexible and completely decoupled from Jenkins' internal data storage. You could implement it by
introducing a Groovy post-build step for all your jobs
that script parses job results and puts data of interest in a custom, external database
Statistics you'd get from querying that database.
Typically, you'd start with the first option. Once requirements grow, you'd slowly migrate to the second one (e.g., by collecting internal data via explicit post-processing scripts, putting that into a database, and then running queries on it). You'll want to cut this migration phase as short as possible, as it eventually requires the effort of implementing both options.
You may want to have a look at couchdb-statistics. It is far from a perfect fit, but at least seems to do partially what you want to achieve.

Is there a way to emulate 2 CPU Cores?

My app is ASP.NET MVC.
I am using a lot of parallel processing on my local machine (8 cores) and things are running very smoothly.
But when I roll to Azure Medium Instance (2 cores), during testing, I get weird slow downs and program stops working sometimes.
Is there a way to emulate 1, 2 or another number of cores to match what will happen in production environment?
I guess you could try setting the process affinity for the development server. To do that, (assuming Windows 7) open task manager, right-click the server process and select Set Affinity... and select the cores you want it to run on.
Just managed to find a way around #Dai's answer above, but it means you'll have to start the development server yourself. This .bat file runs notepad.exe using two cores (you can verify that by checking its affinity within task manager):
start /affinity 0x03 notepad.exe
The 0x03 specifies core 1 and core 2. The help command was a bit confusing, but it seems it combines those to get the result (as in, 1 + 2 = 3, unless I've misunderstood it). So if you need to change to a different set of cores, keep that in mind.
#JohnH's method seems the best, but you'd have to do it every time w3wp.exe runs on your machine. An alternative is to restrict your operation to 2 threads (or 4 if using hyperthreading). We need more information about how you're processing information in parallel.
A parralelisable problem should just run 4 times as fast on an 8-core machine as opposed to the 2-core Azure VM, but getting 'program stops working' situations means you've got a bug there.
I'm not sure if this will fully help the situation, but you can set the MaxDegreeOfParalellism for each core. This way you can limit the threads that run.

multiple instances of cucumber for 3000 scenarios

I have a test pack consisting of more than 3000 scenarios.now the problem is that when i run the scenarios in one shot ..it takes approx 10 hours to complete,i want to divide the scenarios in 4 blocks,each of approx 750 scenarios and wanted to run them parallel in different windows/terminal(VMware).is there a workaround ???
This question has a selenium tag, so (assuming that's accurate) Selenium-Grid would be an option for setting up a distributed parallel testing environment.
orde mentions Selenium Grid and that's one piece of the puzzle. The main benefit you get out of that is that you only have to identify your Server Hub in your code when creating a new selenium instance.
The next thing would be to actually execute your 4 blocks of 750 scenarios at once. I'd recommend using a CI tool like Jenkins to accomplish that and you'll have your results together on Jenkins' web gui.

Resources