Reusing SpecFlow backgrounds, is this right? - bdd

I have a little over a month working with SpecFlow and got to a point where I configured a Background Scenario to setup/verify common data on a database, so the next step was trying to reuse the background for several feature files, to avoid cutting and pasting.
It has been asked before but I expected something else, more user-friendly, just as the Background scenario is easy to understand and update:
Background:
Given I have created the following currencies:
| Code | Name |
| USD | United States Dollar |
| EUR | Euro |
And I have created the following countries:
| Code | Currency | Name |
| US | USD | United States |
| ES | EUR | Spain |
| IT | EUR | Italy |
I found a quite naive solution that is working (or at least seems to, so far), but I'm concerned it may lead me the wrong way, because of my shallow knowledge of SpecFlow.
Taking a look at the generated code for a feature file I got to this:
Create a "feature" file that only has the background scenario, named something like "CommonDataSetup"
Create a step definition like:
[Given(#"common data configuration has been verified")]
public void GiveCommonDataConfigurationHasBeenVerified()
{
// this class is inside the generated feature file
var commonSetup = new CommonDataSetupFeature();
var scenarioInfo = new ScenarioInfo("Common data configuration", ((string[])(null)));
commonSetup.FeatureSetup();
commonSetup.ScenarioSetup(scenarioInfo);
commonSetup.FeatureBackground();
commonSetup.ScenarioCleanup();
commonSetup.FeatureTearDown();
}
In the Background of the other feature files write:
Background:
Given common data configuration has been verified
So now I can reuse the "common data configuration" step definition in as many feature files I need keeping DRY, and background scenarios can be much shorter.
I seems to work fine, But I wonder, is this a the right way to achieve background reuse?
Thanks in advance.

If you have a conversation with a business person who wants the feature, they probably don't say "Given common data configuration has been verified..."
They probably say something like, "Okay, you've got your standard currencies and country codes..."
Within that domain, as long the idea of standard countries and currencies is really well-known and understood, you don't need to include it. It has to be the case that every single person on the team is familiar with these, though. The whole business needs to be familiar with them. If they're that completely, totally familiar, then re-introducing a table full of them at the beginning of every scenario would be waste.
Anything you can do to eliminate that waste and get to the interesting bits of the scenario is good. Remember that the purpose of the conversations is to surface uncertainty and misunderstandings, and nobody's likely to get these wrong. The automation is a record of those conversations, and you really don't even need to have much of a conversation for this step.
Do have the conversations, though. Even if it's just one line and everyone knows what it is, using the business language for it is important. Without that, you'll end up discussing these really boring bits to try and work out what you each mean by "common data configuration" and "verify" before you can move on to the interesting parts of the scenarios.
Short version: I'd expect to see something like:
Given standard currencies and country codes
When...
You don't even need to use background for that, and however you implement it is fine. If you have a similar situation with standard data that's slightly less familiar, then include it in each feature file; it's important not to hide magic. Remember that readability trumps DRY in tests (which are really records of conversations).

I understand where the need comes, but reusing the same background in different feature files is against the idea behind Gherkin.
See https://github.com/cucumber/cucumber/wiki/Gherkin
Gherkin is the language that Cucumber understands. It is a Business Readable, Domain Specific Language that lets you describe software’s behaviour without detailing how that behaviour is implemented.
With the "Given common data configuration has been verified" step it is not more business readable.
Additional your current implementation messes with the internal state of SpecFlow. It is now somehow working, but when you will get in trouble with it.
If you need something setup in every test, did you had a look at the various Hooks?
http://www.specflow.org/documentation/Hooks/
With an [BeforeScenario]- hook you could setup your tests.

Related

How to identify text file format by its structure?

I have a few text file types with data such as product info, stock, supplier info etc. and they are all structured differently. There is no other identifier for the type except the structure itself (there are no headers, no filename convention etc.)
Some examples of these files:
(products and stocks)
2326 | 542212 | Bananas | 00023 | 1 | pack
2326 | 297875 | Apples | 00085 | 1 | bag
2326 | 028371 | Pineapple | 00007 | 1 | can
...
(products and prices)
12556 Meat, pork 0098.57
58521 Potatoes, mashed 0005.20
43663 Chicken wings 0009.99
...
(products and suppliers - here N is the separator)
03038N92388N9883929
28338N82367N2837912
23002N23829N9339211
...
(product information - multiple types of rows)
VIN|Mom & Pops|78 Haley str.
PIN|BLT Bagel|5.79|FRESH
LID|0239382|283746
... (repeats this type of info for different products)
And several others.
I want to make a function that identifies which of these types a given file is, using nothing but the content. Google has been no help, in part because I don't know what search term to use. Needless to say, "identify file type by content/structure" is of no help, it just gives me results on how to find jpgs, pdfs etc. It would be helpful if I saw some code that others wrote to deal with a similar problem.
What I have thought so far is to make a FileIdentifier class for each type, then when given a file try to parse it and if it doesn't work move on to the next type. But that seems error prone to me, and I would have to hardcode a lot of information. Also, what happens if another format comes along and is very similar to any of the existing ones, but has different information in the columns?
There really is no one-size-fits-all answer unless you can limit the file formats that can happen. You will always only be able to find a heuristic for identifying formats unless you can get whoever designs these formats to give it a unique identifier or you ask the user what format the file is.
That said, there are things you can do to improve your results, like make sure you try all instances of similar formats and then pick the best fit instead of the first match.
The general approach will always be the same: make each decode attempt as strictly as possible, and with as much knowledge about not just syntax, but also semantics. I. e. If you know an item can only contain one of 5 values, or numbers in a certain range, usethat knowledge for detection. Also, don‘t just call strtol() on a component and accept that, check that it parsed the entire string. If it didn‘t, either fail right there, or maintain a „confidence“ value and lower that if a file has any possibly invalid parts.
Then in the end, go through all parse results and pick the one with the highest confidence percentage. Or if you can‘t you can ask the user to pick between the most likely formats.
PS - The file command line tool on Unixes does something similar: It looks at the start of a file and identifies common sequences that indicate certain file formats.

Using F# to Build a Highly Debuggable Business Rules Engine

The Problem
I have code in F# representing a logical tree. It’s a Business Rules Engine with some fairly simple mathematical functions. I would like to be able to run the rules of the tree many times and see how many times each specific route through the tree is taken.
The requirements are that the base rules should not be changed too much from the simple match statements I’m using at the moment. Tagging the important functions with an attribute would be fine, but adding a call to a logging function at every node is not. I want to be able to run the code in two modes, a highly performant standard mode which just gives answers, and then an “exploratory mode” which gives more detail behind each call. While I don’t mind complicated code to dynamically load and profile the rules, the rules code itself must look simple. Ideally I’d like to not rely on 3rd party libraries - powerpack is ok. The solution must also target the .NET 4.0 runtime.
Potential Solutions
Add a logging call to every function with the function name and arguments. I don’t like this because even if I could disable it in some kind of release mode, it still clutters the rules and means all new code has to be written in an unnatural way.
Each function return its result, and then a list which contains the names of the methods so far called. I don’t like this because it would look unnatural, and would carry a performance hit. I’m sure I could use a computational expression to do a lot of the plumbing, but that violates the requirement to keep the rules simple.
Parse the rules tree using quotations, and then build a new expression which is the old expression with a call to a logging function injected into the site of each tagged function. This is the best thing I’ve got so far, but I’m worried about compiling the resulting quotation so I can run it. I understand (please correct me if I’m wrong) that not all quotations can be compiled. I’d rather not have an unstable process that limits the rules code to a subset of the F# language. If the rules compile, I would like my solution to be able to deal with them.
I know this is a difficult problem with a fairly strict set of requirements, but if anyone has any inspiration for a solution, I would be very grateful.
Edit: Just to give an example of the sort of rules I might be using, if I owned a widget factory producing products A and B the simple following code might be used. I don't want to lose the readability and simplicity of the formulas by adorning this layer with helper functions and hooks.
type ProductType = | ProductA | ProductB
let costOfA quantity =
100.0 * quantity
let costOfB quantity =
if quantity < 100.0 then
20.0 * quantity
else
15.0 * quantity
let calculateCostOfProduct productType quantity =
match productType with
| ProductA -> costOfA quantity
| ProductB -> costOfB quantity

Profanity filter import

I am looking to write a basic profanity filter in a Rails based application. This will use a simply search and replace mechanism whenever the appropriate attribute gets submitted by a user. My question is, for those who have written these before, is there a CSV file or some database out there where a list of profanity words can be imported into my database? We are submitting the words that we will replace the profanities with on our own. We more or less need a database of profanities, racial slurs and anything that's not exactly rated PG-13 to get triggered.
As the Tin Man suggested, this problem is difficult, but it isn't impossible. I've built a commercial profanity filter named CleanSpeak that handles everything mentioned above (leet speak, phonetics, language rules, whitelisting, etc). CleanSpeak is capable of filtering 20,000 messages per second on a low end server, so it is possible to build something that works well and performs well. I will mention that CleanSpeak is the result of about 3 years of on-going development though.
There are a few things I tell everyone that is looking to try and tackle a language filter.
Don't use regular expressions unless you have a small list and don't mind a lot of things getting through. Regular expressions are relatively slow overall and hard to manage.
Determine if you want to handle conjugations, inflections and other language rules. These often add a considerable amount of time to the project.
Decide what type of performance you need and whether or not you can make multiple passes on the String. The more passes you make the slow your filter will be.
Understand the scunthrope and clbuttic problems and determine how you will handle these. This usually requires some form of language intelligence and whitelisting.
Realize that whitespace has a different meaning now. You can't use it as a word delimiter any more (b e c a u s e of this)
Be careful with your handling of punctuation because it can be used to get around the filter (l.i.k.e th---is)
Understand how people use ascii art and unicode to replace characters (/ = v - those are slashes). There are a lot of unicode characters that look like English characters and you will want to handle those appropriately.
Understand that people make up new profanity all the time by smashing words together (likethis) and figure out if you want to handle that.
You can search around StackOverflow for my comments on other threads as I might have more information on those threads that I've forgotten here.
Here's one you could use: Offensive/Profane Word List from CMU site
Based on personal experience, you do understand that it's an exercise in futility?
If someone wants to inject profanity, there's a slew of words that are innocent in one context, and profane in another so you'll have to write a context parser to avoid black-listing clean words. A quick glance at CMU's list shows words I'd never consider rude/crude/socially unacceptable. You'll see there are many words that could be proper names or nouns, countries, terms of endearment, etc. And, there are myriads of ways to throw your algorithm off using L33T speak and such. Search Wikipedia and the internets and you can build tables of variations of letters.
Look at CMU's list and imagine how long the list would be if, in addition to the correct letter, every a could also be 4, o could be 0 or p, e could be 3, s could be 5. And, that's a very, very, short example.
I was asked to do a similar task and wrote code to generate L33T variations of the words, and generated a hit-list of words based on several profanity/offensive lists available on the internet. After running the generator, and being a little over 1/4 of the way through the file, I had over one million entries in my DB. I pulled the plug on the project at that point, because the time spent searching, even using Perl's Regex::Assemble, was going to be ridiculous, especially since it'd still be so easy to fool.
I recommend you have a long talk with whoever requested that, and ask if they understand the programming issues involved, and low-likelihood of accuracy and success, especially over the long-term, or the possible customer backlash when they realize you're censoring them.
I have one that I've added to (obfuscated a bit) but here it is: https://github.com/rdp/sensible-cinema/blob/master/lib/subtitle_profanity_finder.rb

Should BDD scenarios include actual test data, or just describe it?

We've come to a point where we've realised that there are two options for specifying test data when defining a typical CRUD scenario:
Option 1: Describe the data to use, and let the implementation define the data
Scenario: Create a region
Given I have navigated to the "Create Region" page
And I have typed in a valid name
And I have typed in a valid code
When I click the "Save" button
Then I should be on the "Regions" page
And the page should show the created region details
Option 2: Explicitly state the test data to use
Scenario: Create a region
Given I have navigated to the "Create Region" page
And I have filled out the form as follows
| Label | Value |
| Name | Europe |
| Code | EUR |
When I click the "Save" button
Then I should be on the "Regions" page
And the page should show the following fields
| Name | Code |
| Europe | EUR |
In terms of benefits and drawbacks, what we've established is that:
Option 1 nicely covers the case when the definition of say a "valid name" changes. This could be more difficult to deal with if we went with Option 2 where the test data is in several places. Option 1 explicitly describes what's important about the data for this test, especially if it were a scenario where we were saying something like "has typed in an invalid credit card number". It also "feels" more abstract and BDD somehow, being more concerned with description than implementation.
However, Option 1 uses very specific steps which would be hard to re-use. For example "the page should show the created region details" will probably only ever be used by this scenario. Conversely we could implement Option 2's "the page should show the following fields" in a way that it could be re-used many times by other scenarios.
I also think Option 2 seems more client-friendly, as they can see by example what's happening rather than having to interpret more abstract terms such as "valid". Would Option 2 be more brittle though? Refactoring the model might mean breaking these tests, whereas if the test data is defined in code the compiler will help us with model changes.
I appreciate that there won't be a right or wrong answer here, but would like to hear people's opinions on how they would decide which to use.
Thanks!
I would say it depends. There are times when a Scenario might require a large amount of data to complete a successful run. Often the majority of that data is not important to the thing we are actually testing and therefore becomes noise distracting from the understanding we are trying to achieve with the Scenario. I started using something I call a Default Data pattern to provide default data that can be merged with data specific to the Scenario. I have written about it here:
http://www.cheezyworld.com/2010/11/21/ui-tests-default-dat/
I hope this helps.
I prefer option 2.
To the business user it is immediately clear what the inputs are and the outputs. With option 1 we don't know what valid data is, so your implementation may be wrong.
You can be even more expressive by adding invalid data too, when appropriate
Scenario: Filter for Awesome
Given I have navigated to the "Show People" page
And I have the following data
| Name | Value |
| John | Awesome|
| Bob | OK |
| Jane | Fail |
When I click the "Filter" button
Then the list should display
| Name | Value |
| John | Awesome |
You should however keep the data so its described in terms of the domain, rather that the specific implementation. This will allow you to test at different layers in your application. e.g. UI Service etc..
Every time I think about this I change my mind. But if you think about it - the test is to prove that you can create a region. A Criteria met by both options. But I agree that the visual cues with option 2 and developer friendliness are probably too good to turn down. In examples like this, at least.
I would suggest you take a step back and ask what stories and rules you are trying to illustrate with these scenarios. If there are rules about what makes a valid or invalid region code, and your stakeholders want to describe those using BDD, then you can use specific examples of valid and invalid region codes. If you want to describe what can happen after a region is created, then the exact data is not so interesting.
Your "Create a region" is not actually typical of the scenarios that we use in BDD. It can be characterised as "when I create a thing, then I can see the thing". It's not a useful scenario in that it doesn't by itself deliver anything valuable to the user. We look for scenarios in which something interesting or valuable is delivered to the end-user. Why is the user creating a region? What is the end goal? So that another user can assign other objects to that region, perhaps?
Example mapping, where stories are linked with rules and examples (where the examples become scenarios), is described in https://cucumber.io/blog/bdd/example-mapping-introduction/

Cucumber step for all scenarios

I have around 30 scenarios that all bar one require this step to be at the top of the Background:
Given I have an account:
| name | path |
| ticketee | ticketee |
For the one that doesn't require this step it is not important that it exists or doesn't exist, because it's the feature for creating accounts. I can simply use a different account name and path for this.
Now, I was thinking rather than putting this in every single feature file 29 times that I could make use of the Before method in Cucumber which would mean placing a file in features/support/create_account.rb which would have this code:
Before do
steps(%Q{
Given I have an account:
| name | path |
| ticketee | ticketee |
})
end
The only downside from this is that it extracts what some would think belongs in the feature to a very difficult-to-track-down location and is probably not standard. But on the other hand, it saves quite a lot of repetition.
What should I do?
I'd use a tagged hook such as #with_ticketee_account. It still brings a bit of repetition, but it makes the background of the scenario more obvious than having it completely hidden.
If you wanted to make it so you only have to tag the one odd scenario, maybe create a tagged hook such as #without_ticketee_account which sets a variable which your generic before filter could check for before creating the ticketee account.
Tagged hook is IMHO a good idea.
Another Idea is to make that one line instead of three. perhaps:
Given I have an account called "ticketee"
which does all the same stuff, but is succinct as well. I have never used those tables with cucumber because I can usually make stuff on one line that reads better.
I'd go ahead and extract it into the Before block like you have proposed or use a factory (e. g. factory_girl).

Resources