Is there a Feature Outline? - specflow

In my application, the set of tests for an Estimate and Invoice are very similar. I can use the Scenario Outline and Examples to repeat a test with these types. But how do I repeat all the tests within a feature with examples and not repeat the examples at every scenario outline?
For example, is there a way I can rewrite the tests below without having the state the examples twice?
Scenario Outline: Adding a sales line item
Given I have a <Transaction>
And Add Hours of quantity 2 and rate 3
When I save
Then the total is 6
Examples:
| Transaction |
| Invoice |
| Estimate |
Scenario Outline: Adding two sales line item
Given I have a <Transaction>
And Add Hours of quantity 2 and rate 3
And Add Hours of quantity 5 and rate 2
When I save
Then the total is 16
Examples:
| Transaction |
| Invoice |
| Estimate |
In other words, is there such a thing called, for a lack of a better, Feature Outline?

Unfortunatelly Gherkin language does not support anything like this

Related

Can you tag individual examples in a scenario outline in SpecFlow?

Scenario outlines are very handy for creating data driven tests, but the number of scenarios increases with the number of examples. I've gotten in the habit of tagging scenarios to make it easier to filter on major features of our application.
I would like to set up a "smoke test" that hits all the major use cases. Some of these use cases are captured in scenario outlines that perform boundary testing on dates or numbers, but I just want to hit that one prototypical case within the examples.
For instance say we've got a feature allowing us to add job openings to a job (basically a "hiring opportunity" versus "we have warm bodies filling this position").
On screen we have two form fields for the minimum experience: years and months. The user should not enter more than 11 months in the months field, otherwise they should put something in the years field (e.g. 18 months should actually be 1 year and 6 months).
#job-openings
Scenario Outline: Adding a job opening with experience
Given a job exists
When I add a job opening requiring <years> years and <months> months experience
Then a job opening should exist requiring <years> years and <months> months experience
Examples:
| years | months |
| 0 | 1 |
| 0 | 11 |
| 1 | 0 |
| 2 | 6 | # <-- the "prototypical" example I want to tag
| 99 | 0 |
| 99 | 11 |
| 100 | 0 |
Having those examples hitting the boundaries of the acceptable values for years and months is definitely useful from a regression testing standpoint, but not when performing a "smoke test" of the system. It would be nice to run a single example in the scenario outline that represents a typical use case. As some background info, we have a PowerShell script that developers use to run automated tests of all sorts, and a general "smoke test" hitting all the major features would be useful.
Is there a way to tag an individual example in a scenario outline?
This is the way I do it:
#job-openings
Scenario Outline: Adding a job opening with experience
Given a job exists
When I add a job opening requiring <years> years and <months> months experience
Then a job opening should exist requiring <years> years and <months> months experience
#smoketest #regression
Examples:
| years | months |
| 2 | 6 | # <-- the "prototypical" example I want to tag
#regression
Examples:
| years | months |
| 0 | 1 |
| 0 | 11 |
| 1 | 0 |
| 99 | 0 |
| 99 | 11 |
| 100 | 0 |
There are two example sections that both belong to the scenario. The smoketest has its own example section. When running
dotnet test --filter "TestCategory=job-opening&TestCategory=smoketest"
it will only run the example with the smoketest tag. When running
dotnet test --filter "TestCategory=job-opening&TestCategory=regression"
it will run all the examples. It will also run the smoketest because it has the regression tag too.
user1207289's method also works. I sometimes do it that way when a test breaks and I want to retest it later. When the tests are generated the specific example you want to run will get a name (e.g. AddingAJob_ExampleYears2Months6). You can find the name of the generated unit tests in the scenario with the -t flag, which lists all the tests:
dotnet test --filter "TestCategory=job-opening" -t
To run one specific test (technically all tests with AddingAJob_ExampleYears2Months6 in the name):
dotnet test --filter AddingAJob_ExampleYears2Months6
I used the official dotnet cli tool in the examples above, but it's pretty similar for the other test runners.
I am able to run single example from scenario outline by below command
C:\Users\..\..\bin\Debug>"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\Common7\IDE\Extensions\TestPlatform\vstest.console.exe" yourTests.exe /Tests:yourTestName
where yourTestName is the name of the test that is generated in test explorer upon build and yourTests.exe is the generated exe in /bin/debug. I am using MSTest
For more info on names generated look here

prepare clickstream for k-means clustering

i'm new to machine learning algorithms and i'm trying to do a user segmentation based on the users clickstreams of a news website. i have prepared the clickstreams so that i know which user id read which news-category and how many times.
so my table looks something like this:
-------------------------------------------------------
| UserID | Category 1 | Category 2 | ... | Category 20
-------------------------------------------------------
| 123 | 4 | 0 | ... | 2
-------------------------------------------------------
| 124 | 0 | 10 | ... | 12
-------------------------------------------------------
i'm wondering if the k-means works well for so many categories? would it be better to use percentages instead of whole numbers for the read articles?
so e.g. user123 read 6 articles overall - 4 of 6 were category 1 so its 66,6% interest in category 1.
another idea would be to pick the 3 most-read categories of each user and transform the table to something like this whereby Interest 1 : 12 means that the user is most interested in Category 12
-------------------------------------------------------
| UserID | Interest 1 | Interest 2 | Interest 3
-------------------------------------------------------
| 123 | 1 | 12 | 7
-------------------------------------------------------
| 124 | 12 | 13 | 20
-------------------------------------------------------
K-means will not work well for two main reasons:
It is for continuous, dense data. Your data is discrete.
It is not robust to outliers, you probably have a lot of noisy data
well, the number of users is not defined because it's a theoretical approach, but because it's a news website let's assume there are millions of users...
would there be another, better algorithm for clustering user groups based on their category interests? and when i prepare the data of the first table so that i have the interest of one user for each category in percentage - the data would be continuous and not discrete - or am i wrong?

Design pattern for many-to-many association with additional fields and partial satisfaction

I am working on a project in which customers can get loans, that will be divided in n instalments.
The customer will make payments.
class Instalment
has_and_belongs_to_many :payments
end
class Payment
has_and_belongs_to_many :instalments
end
He can pay the whole loan or a single instalment, but the idea is that he can pay any kind of quantity.
This means that some instalments will be partial paid.
I am not sure about how to express this in the code.
E.g., The customer get a loan of 50$, which will be divided in 5 instalments of 10$. The customer decides to pay 25$ two times.
Instalment
--------------
ID | quantity
1 | 10$
2 | 10$
3 | 10$
4 | 10$
5 | 10$
Payment
----------------------
ID | quantity
1 | 25$
2 | 25$
Instalments_Payments
----------------------
payment_id | instalment_id
1 | 1
1 | 2
1 | 3 # note that this instalment_id and next one
2 | 3 # are the same, but we don't know how much quantity it satisfied in each one.
2 | 4
2 | 5
I have two ideas, but I don't like any of them:
Add a two new field in the Instalments_Payments table. One will tag if the instalment has been totally paid and the other how much it was paid.
Instalments_Payments
----------------------
payment_id | instalment_id | full_paid | paid
1 | 1 | true | 10$
1 | 2 | true | 10$
1 | 3 | false | 5$
2 | 3 | false | 5$
2 | 4 | true | 10$
2 | 5 | true | 10$
Add a new model PartialPayments in which a payment has_many :partial_payments
I am looking for the best approach of this issue.
If you want time efficiency then you should create a column in database table recording the status, which will serve frequent queries on a better performance level. And this is the common solution.
Otherwise, you may create a helper function to compare the money a customer should pay with the money he has paid, which consumes more time during runtime.
Since under most conditions time is more precious than space, we usually choose to record the status and do the comparison asynchronously in the background, so the users will not need to wait for your runtime action finished. Thus the first solution is usually preferred.

Time series database that computes integral

I have a data sets that contains the following information:
Device # | Timestamp | In Error (1=yes, 0=false)
1 | 1459972740 | 1
1 | 1459972745 | 1
1 | 1459972750 | 0
1 | 1459972755 | 1
2 | 1459972740 | 0
2 | 1459972745 | 1
2 | 1459972750 | 1
2 | 1459972755 | 1
...
I would like to compute the number of minutes a device has been in error in a specific period. ie: "How much downtime (in minutes) did we had per device yesterday". Which would lead to "What is our device with the most downtime yesterday", "What is our average error time per device per day", etc
I would assume that this is a classic use case for time series but I can't find any product that can compute integral aggregation on this dataset. Note that the engine must be able to assume a value based on the previous snapshot. In my example, if I request the downtime per device between 1459972742 and 1459972752, the output should be 8ms for device #1 and 7ms for device #2.
Thanks!
VictoriaMetrics provides integrate() function, which can be used for calculating the integral over the given duration. For example, the following MetricsQL query calculates the integral for the given metric over the last 24 hours:
integrate(metric[24h])
Axibase Time Series Database provides both the API and visualization for threshold aggregation functions that can compute SLA/outage metrics.
Grafana: How to have the duration for a selected period
In your case threshold would be 1:
"threshold": {
"max": 1
}

SVM Machine Learning: Feature representation in LibSVM

Im working with Libsvm to classify written Text. (Genderclassification)
Im having Problems understanding how to create Libsvm Training data with multiple features.
Training data in Libsvm is build like this:
label index1:value1 index2:value2
Lets say i want these features:
Top_k words: k Most used words by label
Top_k bigrams: k Most used bigrams
So for Example the count would look like this:
Word count Bigram count
|-----|-----------| |-----|-----------|
|word | counts | |bigra| counts |
|-----|-----|-----| |-----|-----|-----|
index |text | +1 | -1 | index |text | +1 | -1 |
|-----|-----|-----| |-----|-----|-----|
1 |this | 3 | 3 | 4 |bi | 6 | 2 |
2 |forum| 1 | 0 | 5 |gr | 10 | 3 |
3 |is | 10 | 12 | 6 |am | 8 | 10 |
|... | .. | .. | |.. | .. | .. |
|-----|-----|-----| |-----|-----|-----|
Lets say k = 2, Is this how a training instance would look like?(Counts are not affiliated with before)
Label Top_kWords1:33 Top_kWords2:27 Top_kBigrams1:30 Top_kBigrams2:25
Or does it look like this (Does it matter when the features mix up)?
Label Top_kWords1:33 Top_kBigrams1:30 Top_kWords2:27 Top_kBigrams2:25
I just want to know how the feature vector looks like with multiple and different features and how to it.
EDIT:
With the updated table above, is this training data correct?:
Example
1 1:3 2:1 3:10 4:6 5:10 6:8
-1 1:3 2:0 3:12 4:2 5:3 6:10
libSVM representation is purely numeric, so
label index1:value1 index2:value2
means that each "label", "index" and "value" have to be numbers. In your case you have to enumerate your features, for example
1 1:23 2:47 3:0 4:1
if some of the featues has value 0 then you can omit it
1 1:23 2:47 4:1
remember to leave features in increasing order.
In general, libSVM is not designed to work with texts, and I would not recommend you to do so - rather use some already existing library which make working with text easy and wraps around libsvm (such as NLTK or scikit-learn)
Whatever k most words/bigrams you use for training may not be the most popular in your test set. If you want to use the most popular words in the english language you will end up with the, and and so on. Maybee beer and footballare more suitable to classify males even if they are less popular. This process step is called feature selection and has got nothing to do with SVM. When you found selective features (beer, botox, ...) you do enumerate them and stuff them into SVM training.
For bigrams you maybe could omit feature selection as there is at most 26*26=676 bigrams making 676 features. But again I assume bigrams like be to be not selective as the selective match in beer is comleteley buried in lots of matches in to be. But that is speculation, you have to learn the quality of your features.
Also, if you use word/bigram counts you should normalize them, i. e. divide by the overall word/bigram count of your document. Otherwise shorter documents in your training set will have less weight than bigger ones.

Resources