How to handle deadlock situation in SSIS? - ssis-2012

In my SSIS package, I have two data flow tasks and those are accessing same table. While I am running the package at that time, I am facing some deadlock issues, How to handle this situation in our SSIS Packages?

Related

Difference between agents and worker threads

I'm working on running NUnit console runners using Jenkins. These tests connect to a Selenium Grid (which is also run by Jenkins), so I want to limit their level of parallelism in order to avoid getting agents starving while waiting for a free node on the grid.
So far I haven't managed to figure out what exactly is the difference between an agent and a worker thread in NUnit... I suspect the agent can manage threads, but it's only a guess. Thanks :)
An agent is a separate process running tests for an assembly. A worker is a thread, within a process, running the tests for a particular assembly.
Theoretically, an agent process could have multiple appdomains, each domain could have multiple assemblies and each assembly could have multiple worker threads.
Practically, however, the normal thing to do is to have one process per assembly, so that there is no need for multiple domains, and each process will run some specified number of worker threads to run tests for the assembly. In some contexts, you may prefer to only run processes in parallel and not have any parallelism within the assembly - it's the approach that is most likely to work without any change to your tests, which you may not have designed with parallelism in mind.
Agents do not "manage" threads. They simply run the framework in a process and the framework decides how many threads to use depending on the attributes you have applied.
Using multiple agents is the only way to run nunit V2 tests in parallel, since the v2 framework is ignorant of parallelism.

Scheduled Job equivalent functionality in MVC

I have a requirement in my MVC app.
I had an export to excel functionality that is taking 3 mins of time (user clicks on a export button and waits on).
This export downloads an excel that has multiple worksheets after applying certain rules on the data.
These rules are datamanipulations plus applying colors on the cells belonging to certain columns.
Inorder to avoid the wait time, I was asked to develop a code with in the MVC app that can run like a scheduled job.
This job has to export the excel to a dedicated folder with in the network on the scheduled time (daily once).
Also i was asked to develop a web page within the app which has links to download these excels.
Quesions here (Any help would be appreciated) :
I have chosen Quartz.NET to implement this requirement. This is an open source (to my little knowledge) that can
provide the facility to schedule a job (class developed in .NET). Is it the right choice or would there be any implications in future?
Is it really needed to develop a job like code or any other way of coding can address this?
I'm not very familiar with Quartz.net, but I do know that trying to run background/scheduled tasks from within the same process as the MVC application can be problematic.
Ref 1: http://haacked.com/archive/2011/10/16/the-dangers-of-implementing-recurring-background-tasks-in-asp-net.aspx/
Ref 2: http://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx
Essentially, you can't guarantee that the process will complete correctly when running it due to how IIS handles app pools (which is where you MVC process runs: assuming hosting on IIS anyway).
You mention running a scheduled task within your MVC app. Again, this is incorrect. Why can't you just slap a console app project into the solution and drive the code from there, then put it on the server and use the Windows Task Scheduler?
In terms of background tasks, the "correct" way to do this is to send a command from your MVC app to some sort of message queue, which can then ensure that the command doesn't get dropped. I've used RabbitMQ in the past (a middleware message broker). Perhaps this is the aim of Quartz.net.
This setup typically involves another app (for me, usually a console app run on the server) that receives the command message from the message queue and runs in it's own process, entirely separate from MVC and thus the issues inherent with IIS AppPools and background tasks.
A lot of work, really... one would think it'd be easier, but that's the surefire way to do it and maintain the integrity of the task to be run.

Multiple export using google dataflow

Not sure whether this is the right place to ask but I am currently trying to run a dataflow job that will partition a data source to multiple chunks in multiple places. However I feel that if I try to write to too many table at once in one job, it is more likely for the dataflow job to fail on a HTTP transport Exception error, and I assume there is some bound one how many I/O in terms of source and sink I could wrap into one job?
To avoid this scenario, the best solution I can think of is to split this one job into multiple dataflow jobs, however for which it will mean that I will need to process same data source multiple times (once for which dataflow job). It is okay for now but ideally I sort of want to avoid it if later if my data source grow huge.
Therefore I am wondering there is any rule of thumb of how many data source and sink I can group into one steady job? And is there any other better solution for my use case?
From the Dataflow service description of structuring user code:
The Dataflow service is fault-tolerant, and may retry your code multiple times in the case of worker issues. The Dataflow service may create backup copies of your code, and can have issues with manual side effects (such as if your code relies upon or creates temporary files with non-unique names).
In general, Dataflow should be relatively resilient. You can Partition your data based on the location you would like it output. The writes to these output locations will be automatically divided into bundles, and any bundle which fails to get written will be retried.
If the location you want to write to is not already supported you can look at writing a custom sink. The docs there describe how to do so in a way that is fault tolerant.
There is a bound on how many sources and sinks you can have in a single job. Do you have any details on how many you expect to use? If it exceeds the limit, there are also ways to use a single custom sink instead of several sinks, depending on your needs.
If you have more questions, feel free to comment. In addition to knowing more about what you're looking to do, it would help to know if you're planning on running this as a Batch or Streaming job.
Our solution to this was to write a custom GCS sink that supports partitions. Though with the responses I got I'm unsure whether that was the right thing to do or not. Writing Output of a Dataflow Pipeline to a Partitioned Destination

How to guarantee data integrity for concurrent Rails/Active Record operations

I need to implement a feature for a rails site that will involve reading and exporting most of my database.
I know this operation is going to take a while. That's fine-- I've got delayed job for that.
What I'm worried about is the data changing during the running of the job, and the resulting export being corrupted because of that.
My initial thought was to do all of the reads within a transaction. However, I would also like to be running the reads concurrently, if possible. ActiveRecord docs say that Transactions cannot be shared between Connections, and Connections cannot be shared between Threads. So it looks as though I am restricted to a single thread with this approach.
Any suggestions for a workaround? Is there another way to give the job a consistent view of the data that doesn't involve transactions? Or is there some alternative to ActiveRecord/Mysql out there that can distribute transactions across threads?

Rollback informix database

I have some code that uses an Informix 11.5 database that I want to run some tests against.
If the tests fail they will often leave the database in an inconsistent state that needs to be manually resolved before the tests can be run again.
I would like to automate this so that the tests do not require manual intervention before running the tests again.
My current solution is to write some code that does the cleanup, but this means the code must be maintained whenever potential new inconsistent states can occur in new features.
The code runs a lot of stored procedures, which themselves often use transactions. As Informix does not support nested transactions I can't just wrap up all the work in one big transaction.
Is there another way to create a checkpoint which I can restore the database back to?
You could create a virtual machine with an undo disk and after you run the test you can close the virtual machine without saving the changes. It's equivalent to like you never ran the tests!
If this is a development only server, how about taking a Level 0 ontape system archive before the test? I think this can be done via the sysadmin functions too (not sure though), so it can be automated. After the tests you just restore the archive.
Changing database state - and resetting it back to a known state - is one of the reasons that the Unit Test community spends time and effort avoiding testing against databases. It is a tough problem.
Informix 11.50 does support savepoints; however, it does not support one BEGIN WORK after another without an intervening COMMIT or ROLLBACK.
To the extent possible, have the tests create and load a set of tables with the known data. One way of achieving that is to create a whole new database for the test. However, this is only borderline feasible if you need to test with high volumes of data.
I don't think this issue is in any way unique to Informix - it is a general problem with testing DBMS operations.

Resources