how to do coverage testing in a distributed system? - code-coverage

I'm trying to figure out how to do a code coverage testing in a distributed system - i.e. a system that consists of several executables that only do something meaningful together (and are actually started by one another). Has anyone tackled a similar problem wand what tools did you use?

You can obviously collect test coverage data on each distributed part, independently. Your first problem is that the distributed pieces may be coded in different languages (e.g., C++ and Java), which mean you need a test coverage tool for each language. If you only want to view the data from the distributed elements independently, this should be straightforward modulo the inconvenience of managing the multiple parts. Use the test coverage display of each test coverage tool to display coverage for its corresponding piece.
If the question is, "how do you collect/see integrated test coverage data for the whole system", you need to collect test coverage data on the individual distributed pieces all at the same time you run a test. (Whether you are running unit or system tests doesn't matter).
Then you need to compose those results into a single overview for the whole system.
Many tools won't combine test coverage from different programs. So you likely have to handle this on your own. If all your distributed code elements are in the one language, you have the advantage of only having one representation of the test coverage data, but the disadvantage that it probably isn't documented very well. Normally this data is viewed directly by a viewer and the test coverage tool supplier has little reason to document how the test coverage data is encoded. People often end up building ad hoc tools to combine this data, if they can figure out the representation.
If you have multiple languages, and multiple test coverage tools, you likely have multiple test coverage representations and the ad hoc solution gets a lot harder if you can do it at all.
Our Test Coverage tools have the nice property that they cover a wide variety of languages and dialects (C, C++, C#, Java, PLSQL, COBOL, PHP, ...),
that they all use the same representation for test coverage vectors, and that the test coverage display tool provided will combine this information for you (UI selectable action). No guessing or ad hocery, just a view of all of the code in the set of languages (even if on different machines) with the complete test coverage for the entire distributed application.

Related

Understanding how do the coverage modules work

I came across with a new dynamic language. I would like to create a coverage tool for that language. I started reading the source code of Perl 5 and Python coverage modules but it got complicated. It's a dynamic scripting language so I guess that source code of static languages (like Java & C++) won't help me here. Also, as I understand, each language was built in a different way and the same ideas won't work. But, the big concepts could be similar.
My question is as follows: how do I "attack" this task? What is the proper workflow I need to follow? What I need to investigate? Are there any books or blogs I can read about those kind of stuff?
There are two kinds of coverage collection mechanisms:
1) Real-time sampling of the program counter, typically by a clock running at 1-10ms. Difficulties: a) mapping an actual PC value back to a source line, b) sampling means you might not see execution of a rarely used bit of code, so your coverage reporting is inaccurate. Because of these issues, this approach isn't used very often.
2) Instrumenting the program so that it collects coverage as it runs. This is hard to do with object code... a) you have to decode the instructions to see where to put probes, and this can be very hard to do right, b) you have patch the source code to include the probes (this can be really awkward; a "probe" might consist of a 5 byte subroutine call but the probe has replace a single-byte instruction). c) you still have to figure out how to map a probe location back to a source code line. A more effective way is to instrument the source code, which requires pretty sophisticated machinery to read source, make probe patches, and regenerate the instrumented code for execution/compilation.
My technical paper Branch Coverage for Arbitrary Languages Made Easy provides explicit detail for how to do this in a general way. My company has built commercial test coverage tools for a wide variety of languages (C, Python, PHP, COBOL, Java, C++, C#, ProC,....) using this approach. This covers most static and dynamic languages. Some dynamic mechanisms are extremely difficult to instrument, e.g., eval() but that is true of every approach.
In addition to Ira's answer, there is a third coverage collection mechanism: the language implementation provides a callback that can inform you about program events. For example, Python has sys.settrace: you provide it a function, and Python calls your function for every function called or returned, and every line executed.

Best practices for unit tests on custom functions for a drake workflow

A drake workflow can have several custom functions stored in its R directory. It would make sense to develop unit tests for the custom functions. There is well-established tooling and structures for running testthat unit tests on an R package in RStudio (or from a command line).
But what are best practices for developing and running testthat unit tests for the custom functions in a drake workflow?
Any pointers to resources or examples would be greatly appreciated. Thanks!
The best practices for unit tests do not change much when drake enters the picture. Here are the main considerations.
If you are using drake, you are probably dealing with annoyingly long runtimes in your full pipeline. So one challenge is to construct tests that do not take forever. I recommend invoking your functions on a small dataset, a small number of iterations, or whatever will get the test done in a reasonable amount of time. You can run a lot of basic checks that way. To more thoroughly validate the answers that come from your functions, you can run an additional set of checks on the results of the drake pipeline.
If you are using testthat, you probably have your functions arranged in a package-like structure, or even a fully-fledged package, and you may even be loading your functions with devtools::load_all() or library(yourPackage). If you load your functions this way instead of individually sourcing your function scripts, be sure to call expose_imports() before make() so drake can analyze the functions for dependencies.

Unit Testing Strategy, Ideal Code Coverage Baseline

There's still not much information out there on the XCode7 and Swift 2.0 real-world experiences from a unit testing and code coverage perspective.
While there're plenty of tutorials and basic how-to guides available, I wonder what is the experience and typical coverage stats on different iOS teams that actually tried to achieve a reasonable coverage for their released iOS/Swift apps. I specifically wonder about this:
1) while code coverage percentage doesn't represent the overall quality of the code base, is this being used as an essential metric on your team? If not, what is the other measurable way to assess the quality of your code base?
2) For a bit more robust app, what is your current code coverage percentage? (just fyi, we have hard time getting over 50% for our current code base)
3) How do you test things like:
App life-cycle, AppDelegate methods
Any code related to push/local notifications, deep linking
Defensive programming practices, various piece-of-mind (hardly reproducible) safe guards, exception handling etc.
Animations, transitions, rendering of custom controls (CG) etc.
Popups or Alerts that may include any additional logic
I understand some of the above is more of a subject for actual UI tests, but it makes me wonder:
Is there a reasonable way to get the above tested from the UTs perspective? Should we be even trying to satisfy an arbitrary minimal code coverage percentage with UTs for the whole code base or should we define that percentage off a reasonably achievable coverage given the app's code base?
Is it reasonable to make the code base more inflexible in order to achieve higher coverage? (I'm not talking about a medical app where life would be in stake here)
are there any good practices on testing all the things mentioned above, other than with UI tests?
Looking forward to a fruitful discussion.
You do ask a very big and good question. Although your question includes:
I wonder what is the experience and typical coverage stats on different iOS teams ...
I think the issue is language/OS agnostic. Sure some languages and platform are more unit testable than others. So some are more expensive to unit test (as opposed to other forms of automated/coded testing). I think you are searching for a cost/benefit equation to maximize productivity. Ah the fun of software development processes.
To jump to the end to give you the quick sound grab answer:
You should unit test all code that you want to work and is appropriate to unit testing.
So now why the all and why the emphasis on unit testing ...
What is a unit test?
The language in the development community is corrupted, so please bear with me. Unit testing is just one type of automated testing. Others are Automated Acceptance Tests, Application tests, Integration Tests, and Components test. These all test different things. They have different purposes.
However, when I hear unit testing two things pop into mind:
What is a unit test?
As part of TDD (Test Driven Development)?
TDD is about writing tests before writing code. It is a very low level coding practice/process (XP - eXtreme Programming) as you write a test to write a statement and then another test. Very much a coding practice but not an application/requirements practice as it is about writing code that does what you intended, not what the product requirements are (oh gosh I feel the points being lost).
Writing code and then unit testing it is ... in my experience ... fun, short term team building, but not productive. Sure some defects are found, but not many. TDD leads to better "healthy" code.
My point here is that unit testing is:
A subset of automated/coded testing.
Is part of a coding process.
Is about code health (maintainability).
Does note prove that your application works (sound of falling points).
Why all?
If you're team delivers zero defect software (ZDFD is real and achievable .. but that a flat earth discussion) all the time without unit testing then this is nonsense and you would not be asking any questions here.
The only valid reason for a team to include unit testing as part of its coding process is to improve productivity. If all team members commit to team productivity then the only issue is identifying which code profits from unit testing. This is the context of the all.
The easiest way I think to illustrate this is to list types I do not unit test:
Factories - They only instantiate types.
Builders / writing (IoC) - Same as factories - No domain logic.
Third party libraries - We call 3rd party libraries as documented. If you want to test these then use integration/component tests.
Cyclomatic Complexity of one - Every method of of type has a CC of 1. That is, no conditions. Unit tests will tell you nothing useful, peer review is more useful.
The practical answer
My teams have expected 100% unit test coverage on all new code that should be unit tested. This is achieved by attributing code that does not meeting the unit testing criteria. All code must go through code review and the attributes must be specific to the why options listed above. -- Simple.
A long answer, and perhaps not easy to digest, nor what people want to hear. But, from long experience, I know it is the best answer that can lead to best profitability.
Post comment
My answer is aimed at the unit testing aspects of the question. As for defensive programming and other practices, TDD is a process that mitigates that by making it harder to do the wrong thing. But build system static code analysis tools may help you capture these before they get to peer review (they can fail a build on new issues). Look at others like SonarQube, Resharper, CppDepend, NDepend (yes language dependent).

Maintaining large numbers of Concordion scripts

I am currently working for a large organisation with about 2k developers working in our IT department. We maintain many things including our e-commerce platform and there are currently about 30 projects currently impacting that.
Recently all of our teams have been instructed to deliver a series of automated tests using Concordion and Selenium Webdriver. For a while this has been going fairly well and many tests have been created but lately maintaining the existing tests while our e-commerce platform constantly changes has been somewhat of a nightmare. We have thousands of test scripts covering many parts of our website but there does not seem to be any facility in Concordion to split scripts into reusable compartments which could then be maintained once, rather than having to make changes to hundreds of HTML files for one change.
How are other people approaching this?
The goal of Concordion is not to implement test scripts as HTML, but rather for the HTML to describe the behaviour that you are testing (what you are trying to achieve). The implementation details (how it is being tested) are implemented as Java code. This code can then be structured with an appropriate level of abstraction so that each change to the system under test only requires a change to one part of the code.
Your HTML specifications should only need to change on the rare occasions that the business rules change.
These concepts are described further on the Hints and Tips tab of the Concordion home page.
Thank you for sharing your experience with us. It’s great to hear / read about large scale application of behavior driven development / specification by example.
One approach that could help you is to focus on key examples (http://gojko.net/2014/05/05/focus-on-key-examples). During specification workshops the entire team is working to get a common understanding of the new user needs and requirements. Then you go on and write specification documents containing key examples. There you should not try to cover everything, but to write only as many examples as necessary to express the common understanding.
Additionally, you should try to identify concepts, on which the examples are based. Are there some examples related to a similar topic – this is probably an underlying concept. It is often easier to understand the examples, if they focus just on one concept (e.g. the validation of a card number). Each concept can be usually described with only a few examples.
Do you have any other types of automated tests (e.g. unit tests)? Are you experiencing the same maintainability challenges with these other tests? Could you use good practices from these other test types to improve your Concordion approach?
Could you tell us more about your setup? How many active specifications have you already created within your company?

How to choose between different test types with SpecFlow, Cucumber or other BDD acceptance test framework?

I am looking at SpecFlow examples, and it's MVC sample contains several alternatives for testing:
Acceptance tests based on validating results generated by controllers;
Integration tests using MvcIntegrationTestFramework;
Automated acceptance tests using Selenium;
Manual acceptance tests when tester is prompted to manually validate results.
I must say I am quite impressed with how well SpecFlow examples are written (and I managed to run them within minutes after download, just had to configure a database and install Selenium Remote Control server). Looking at the test alternatives I can see that most of them complement each other rather than being an alternative. I can think of the following combinations of these tests:
Controllers are tested in TDD style rather than using SpecFlow (I believe Given/When/Then type of tests should be applied on higher, end-to-end level; they should provide good code coverage for respective components;
MvcIntegrationTestFramework is useful when running integration tests during development sessions, these tests are also part of daily builds;
Although Selenium-based tests are automated, they are slow and are mainly to be started during QA sessions, to quickly validate that there are no broken logic in pages and site workflow;
Manual acceptance tests when tester is prompted to confirm result validity are mainly to verify page look and feel.
If you use SpecFlow, Cucumber or other BDD acceptance test framework in you Web development, can you please share your practices regarding choosing between different test types.
Thanks in advance.
It's all behaviour.
Given a particular context, when an event occurs (within a particular scope), then some outcome should happen.
The scope can be a whole application, a part of a system or a single class. Even a function behaves this way, with inputs as context and the output as outcome (you can use BDD for functional language as well!)
I tend to use Unit frameworks (NUnit, JUnit, RSpec, etc.) at a class or integration level, because the audience is technical. Sometimes I document the Given / When / Then in comments.
At a scenario level, I try to find out who actually wants to help read or write the scenarios. Even business stakeholders can read text containing a few dots and brackets, so the main reason for having a natural language framework like MSpec or JBehave is if they want to write scenarios themselves, or show them to people who will really be put off by the dots and brackets.
After that, I look at how the framework will play with the build system, and how we'll give the ability to read or write as appropriate to the interested stakeholders.
Here's an example I wrote to show the kind of thing you can do with scenarios using simple DSLs. This is just written in NUnit.
Here's an example in the same codebase showing Given, When, Then in class-level example comments.
I abstract the steps behind, then I put screens or pages behind those, then in the screens and pages I call whatever automation framework I'm using - which could be Selenium, Watir, WebRat, Microsoft UI Automation, etc.
The example I provided is itself an automation tool, so the scenarios are demonstrating the behaviour of the automation tool through demonstrating the behaviour of a fake gui, just in case that gets confusing. Hope it helps anyway!
Since acceptance tests are a kind of functional tests, the general goal is to test your application with them end-to-end. On the other hand, you might need to consider efficiency (how much effort is to implement the test automation), maintainability, performance and reliability of the test automation. It is also important that the test automation can easily fit into the development process, so that it supports a kind of "test first" approach (to support outside-in development).
So this is a trade off, that can be different for each situation (that's why we provided the alternatives).
I'm pretty sure, that today the most widely fitting option is to test at the controller layer. (Maybe later as UI and UI automation frameworks will evolve, this will change.)

Resources