I am trying to analyze a series of sentences by identifying the most common adverb-adjective-noun strings. I have managed to get answers for how to do so with random words but I think this is a standalone question, and it might better to be dealt with separately.
In this case, I would like to omit common word types like personal pronouns, articles, prepositions and even verbs. Ideally, the results should produce:
Most common nouns
Most common adjectives
Most common adverbs
Most common adjective+noun strings
Most common adverb+noun strings
I understand there is a way to do this by using an online dictionary but I have been unable to integrate that in my code to get the results I want. Is there any way of automating this without listing all the words that you want omitted? How could it be done?
Here's a link to the spreadsheet I'm using (for this particular query, see page 2) and a screenshot of the types of text I would like to analyze with a manual color-coded visualization of what I want to achieve:
How can I hit multiple apis like example.com/1000/getUser, example.com/1001/getUser in Gatling? They are Get calls.
Note: Numbers start from a non zero integer.
Hard to give good advice based on the small amount of information in your question, but I'm guessing that passing the userID's in with a feeder could be a simple, straightforward solution. Largely depends on how your API-works, what kind of tests you're planning, and how many users (I'm assuming the numbers are userId's) you need to test with.
If you need millions of users, a custom feeder that generates increments would probably be better, but beyond that the strategy would otherwise be the same. I advice you to read up on the feeder-documentation for more information both on usage in general, and how to make custom feeders: https://gatling.io/docs/3.0/session/feeder/
As an example, if you just need a relatively small amount of users, something along these lines could be a simple, straightforward solution:
Make a simple csv file (for example named userid.csv) with all your userID's and add it to the resources folder:
userid
1000
1001
1002
...
...
The .feed() step adds one value from the csv-file to your gatling user session, which you can fetch as you would work with session values ordinarily. Each of the ten users injected in this example will get an increment from the csv-file.
setUp(
scenario("ScenarioName")
.feed(csv("userid.csv"))
.exec{http("Name of your request").get("/${userid}/getUser")}
)
.inject(
atOnceUsers(10)
)
).protocols(http.baseUrl("example.com"))
I am writing a feature that will have scenarios with a parameter in common.
The step would be something like this:
Given the user is viewing the book <bookIdAdress>
When ...
Then ...
Examples:
| bookIdAddress |
| ... |
| ... |
I will have many scenarios like the above in my feature. And I want to test this feature with many books.
This same parameter would repeat for all scenarios of a feature. As far as my current knowledge of BDD is concerned, the only way is to keep putting the same examples in every single scenario. I was wondering if there was an option to have the Examples written once for the entire feature, or if I am completely wrong in doing it this way, what approach should I take?
I know I can use the Background tab to write a set up for the entire feature, but I don't know an option to just put the examples in a feature context only.
You can not share example tables in SpecFlow. I tried by adding the table to the background as a way to hack it but it didn't work.
One option to consider is telling each Scenario to get data from the same excel file. Then you can share a data source as well as hide long tables of data.
http://www.specflow.org/plus/Excel/
I have just read #PerformanceDBA's arguments re: 6NF and E-A-V. I am intrigued. I had previously been skeptical of 6NF as it was presented as "merely" sticking some timestamp columns on tables.
I have always worked with a data dictionary and do not need to be convinced to use one, or to generate SQL code. So I expect an answer that would require a dictionary (or catalog) that is used to generate code.
So I would like to know how 6NF would deal with an extremely simple example. A table of items, descriptions and prices. The prices change over time.
So anyway, what does the Items table look like when converted to 6NF? What is the "explosion of tables?" that happens here?
If the example does not work with a table this simple, feel free to add what is necessary to get the point across.
I actually started putting an answer together, but I ran into complications, because you (quite understandably) want a simple example. The problem is manifold.
First I don't have a good idea of your level of actual expertise re Relational Databases and 5NF; I don't have a starting point to take up and then discuss the specifics of 6NF,
Second, just like any of the other NFs, it is variegated. You can just barely step into it; you can implement 6NF for certan tables; you can go the full hog on every table, etc. Sure there is an explosion of tables, but then you Normalise that, and kill the explosion; that's an advanced or mature implementation of 6NF. No use providing the full or partial levels of 6NF, when you are asking for the simplest, most straight-forward example.
I trust you understand that some tables can be "in 5NF" while others are "in 6NF".
So I put one together for you. But even that needs explanation.
Now SQL barely supports 5NF, it does not support 6NF at all (I think dportas says the same thing in different words). Now I implement 6NF at a deep level, for performance reasons, simplified pivoting (of entire tables; any and all columns, not the silly PIVOT function in MS), columnar access, etc. For that you need a full catalogue, which is an extension to the SQL catalogue, to support the 6NF that SQL does not support, and maintain data Integrity and business Rules. So, you really do not want to implement 6NF for fun, you only do that if you have a need, because you have to implement a catalogue. (This is what the EAV crowd do not do, and this is why most EAV systems have data integrity problems. Most of them do not use the declarative Referential & Data Integrity that SQL does have.)
But most people who implement 6NF don't implement the deeper level, with a full catalogue. They have simpler needs, and thus implement a shallower level of 6NF. So, let's take that, to provide a simple example for you. Let's start with an ordinary Product table that is declared to be in 5NF (and let's not argue about what 5NF is). The company sells various different kinds of Products, half the columns are mandatory, and the other half are optional, meaning that, depending on the Product Type, certain columns may be Null. While they may have done a good job with the database, the Nulls are now a big problem: columns that should be Not Null for certain ProductTypes are Null, because the declaration states NULL, and their app code is only as good as the next guy's.
So they decide to go with 6NF to fix that problem, because the subtitle of 6NF states that it eliminates The Null Problem. Sixth Normal Form is the irreducible Normal Form, there will be no further NFs after this, because the data cannot be Normalised further. The rows have been Normalised to the utmost degree. The definition of 6NF is:
a table is in 6NF when the row contains the Primary Key, and at most one, attribute.
Notice that by that definition, millions of tables across the planet are already in 6NF, without having had that intent. Eg. typical Reference or Look-up tables, with just a PK and Description.
Right. Well, our friends look at their Product table, which has eight non-key attributes, so if they make the Product table 6NF, they will have eight sub-Product tables. Then there is the issue that some columns are Foreign Keys to other tables, and that leads to more complications. And they note the fact that SQL does not support what they are doing, and they have to build a small catalogue. Eight tables are correct, but not sensible. Their purpose was to get rid of Nulls, not to write a little subsytem around each table.
Simple 6NF Example
Readers who are unfamiliar with the Standard for Modelling Relational Databases may find IDEF1X Notation useful in order to interpret the symbols in the example.
So typically, the Product Table retains all the Mandatory columns, especially the FKs, and each Optional column, each Nullable column, is placed in a separate sub-Product table. That is the simplest form I have seen. Five tables instead of eight. In the Model, the four sub-Product tables are "in 6NF"; the main Product table is "in 5NF".
Now we really do not need every code segment that SELECTs from Product to have to figure out what columns it should construct, based on the ProductType, etc, so we supply a View, which essentially provides the 5NF "view" of the Product table cluster.
The next thing we need is the basic rudiments of an extension to the SQL catalog, so that we can ensure that the rules (data integrity) for the various ProductTypes are maintained in one place, in the database, and not dependent on app code. The simplest catalogue you can get away with. That is driven off ProductType, so ProductType now forms part of that Metadata. You can implement that simple structure without a catalogue, but I would not recommend it.
Update
It is important to note that I implement all Business Rules in the database. Otherwise it is not a database (the notion of implementing rules "in application code" is hilarious in the extreme, especially nowadays, when we have florists working as "developers"). Therefore all rules, etc are first and foremost implemented as SQL declarations, CHECK constraints, functions, etc. That preserves all Declarative Referential Integrity, and declarative Data Integrity. The extension to the SQL catalog covers the area that SQL does not have declarations for, and they are then implemented as SQL. Being a good data dictionary, it does much more. Eg. I do not write Views every time I change the tables or add or change columns or their characteristics, they are created directly from the catalog+extension using a simple code generator.
One more very important note. You cannot implement 6NF (or EAV properly, for that matter), without completing a full and faithful Normalisation exercise, to 5NF. The problem I see at every site is, they don't have a genuine 5NF state, they have a mish-mash of partial normalisation or no normalisation at all, but they are very attached to that. Creating either 6NF or EAV from that is a disaster. Creating EAV or 6NF from that without all business rules implemented in declarative SQL is a nuclear disaster, burning for years. You get what you pay for.
End update.
Finally, yes, there are at least four further levels of Normalisation (Normalisation is a Principle, not a mere reference to a Normal Form), that can be applied to that simple 6NF Product cluster, providing more control, less tables, etc. The deeper we go, the more extensive the catalogue. And higher levels of performance. When you are ready, just ask, I have already erected the models and posted details in other answers.
In a nutshell, 6NF means that every relation consists of a candidate key plus no more than one other (key or non-key) attribute. To take up your example, if an "item" is identified by a ProductCode and the other attributes are Description and Price then a 6NF schema would consist of two relations (* denotes the key in each):
ItemDesc {ProductCode*, Description}
ItemPrice {ProductCode*, Price}
This is potentially a very flexible approach because it minimises the dependencies. That's also its main disadvantage however, especially in a SQL database. SQL makes it hard or impossible to enforce many multi-table constraints. Using the above schema, in most cases it will not be possible to enforce a business rule that every product must always have a description AND a price. Similarly, you may not be able to enforce some compound keys that ought to apply (because their attributes could be split over multiple tables).
So in considering 6NF you have to weigh up which dependencies and integrity rules are important to you. In many cases you may find it more practical and useful to stick to 5NF and normalize no further than that.
I had previously been skeptical of 6NF
as it was presented as "merely"
sticking some timestamp columns on
tables.
I'm not quite sure where this apparent misconception comes from. Perhaps the fact that 6NF was introduced for the book "Temporal Data and The Relational Mode" by Date, Darwen and Lorentzos? Anyhow, I hope the other answers here have clarified that 6NF is not limited to temporal databases.
The point I wanted to make is, although 6NF is "academically respectable" and always achievable, it may not necessarily lead to the optimal design in every case (and not just when considering implementation using SQL either). Even the aforementioned discoverers and proponents of 6NF seem to agree e.g.
Chris Date: "For practical purposes, stick to 5NF (and 6NF)."
Hugh Darwen: "the 6NF decomposition around Date [not the person!] would be overkill... an optimal design for the soccer club is... 5-and-a-bit-NF!"
Hugh Darwen: "we are in 5NF but not in 6NF, and again 5NF is sufficient" (several similar examples).
Then again, I can also find evidence to the contrary:
Chris Date: "Darwen and I have both felt for some time that all base relvars should be in 6NF".
On a practical note, I recently extended the SQL schema of one of our products to add a minor feature. I adopted a 6NF to avoid nullable columns and ended up with six new tables where most (all?) of my colleagues would have used one table (or perhaps extended an existing table) with nullable columns. Despite me proving several 'helper' stored procs and a 'denormalized' VIEW with a INSTEAD OF triggers, every coder that has had to work with this feature at the SQL level has gone out of their way to curse me :)
These guys have it down: Anchor Modeling. Great academic papers on the subject, combined with practical examples. Their writings have finally pushed me over the edge to consider building a DW in 6nf on an upcoming project. The POC work I have done has validated (for me, at least) that the enormous benefits of 6nf don't outweigh the costs.
This is more of an open question. What is your opinion on query strings in a URL? While creating sites in ASP.NET MVC you spend a lot of time thinking about and crafting clean URLs only for them to be shattered the first time you have to use query strings, especially on a search form.
For example I recently did a fairly simple search form with half a dozen text field and two or three lists of checkboxes and selects. This produced the query string below when submitted
countrylcid=2057&State=England&StateId=46&Where=&DateFrom=&DateTo=&Tags=&Keywords=&Types
=1&Types=0&Types=2&Types=3&Types=4&Types=5&Costs=0.0-9.99&Costs=10.00-29.99&Costs=30.00-59.99&Costs=60.00-10000.00
Beautiful I think you'll agree. Half the fields had no information in them and the list inputs are very verbose indeed.
A while ago I implemented a simple solution to this for paging which produced a url such as
www.yourdomain.com/browse/filter-on/page-1/perpage-50/
This used a catchall route to grab what is essentially a replacement query string after the filter-on portion. Works quite well but breaks down when doing form submissions.
Id be keen to hear what other solutions people have come up with? There are lots of articles on clean urls but are aimed at asp.net developers creating basic restful urls which MVC has covered. I am half considering diving into model binding to produce a proper solution along those lines. With the above convention the large query string could be rewritten as:
filter-on/countrylcid-2057/state-England/stateId-46/types-{1,0,2,3,4,5}/costs-{0.0-9.99,10.00-29.99,30.00-59.99,60.00-10000.00}/
Is this worth the effort?
Thanks,
My personal view is that where users are likely to want to either bookmark or pass on URLs to other people then a nice, clean "friendly" URL is the way to go. Aesthetically they are much nicer. For simple pagination and ordering then a re-written URL is a good idea.
However, for pages that have a large number of temporary, dynamic fields (such as a search) then I think the humble query string is fine. Like wise for pages whose contents are likely to change significantly given the exact same URL in the future. In these cases, URLs with query strings are fine and, perhaps, even preferable as they at least indicate to the observant user that the page is dynamic. However, in these cases it may be better to use form POST variables, anyway, that way visitors are not tempted to "fiddle" with the values.
In addition to what others have said, a URL implies a hierarchy that is semantic. Whether true today or not, the ancestry is directories and people still think of it as such. That's why you have controller/action/id. Likewise, to me a querystring implies options or queries.
Personally, I think a rewritten URL is best when you can't tell if there's an interpreter behind it -- maybe it's just a generated HTML file?
So however you choose to do it (and it's a pain on the client in a search form -- I'd say more trouble than it's worth), I'd support you doing it for hierarchies.
E.g. /Search/Country/State/City
but once you start getting into prices and types, or otherwise having to preface a "directory" with the type of value (e.g. /prices=50.00/ or worse, with an array), then that's where you've lost me.
In fact, if all elements are filled in, then all you've really done is taken the querystring, replaced "&" with "/", and combined your arrays into a single field.
If you're going to be writing the javascript anyways, why don't you just loop through the form elements and:
Remove the empty ones, cleaning up the querystring from the "&price_low=&price_high=&" sorts of things.
Combine multiple values into an array structure
But then submit as a querystring.
James
Aren't the values of the different fields available in the FormsCollection anyway on post?