I have written a lot of steps that cause me to get confused between the Arrange and Assert steps in my code.
For example:
given file A exists
when I rename it
then file B exists
The arrange part of my code should be making a file and the Assert part of my code should be testing for the existance of a file
however because both steps translate as "File X Exists" I keep getting confused and writing assert code in my arrange steps.
How can I phrase things better so that I don't get confused ?
I have thought of using the present tense in the Arrange part of the specflow
For example
given I make file A
However the human readable aspect doesn't feel right.
In your given and your then you are expressing a different intent. File A exists is very concise but not really great for communicating with other people.
There are many ways of writing cucumber. For me, the thing I think about when phrasing "THEN" parts is "what should have happened" - "should" is the important word to me.
How about
Given a file named "0001.mpg" exists
When I rename "0001.mpg" to "dance competition finals.mpg"
Then a file named "dance competition finals.mpg" should exist
Grammatically:
Given ... exists - is in the simple present tense - i.e. this is true now, and isn't conditional on anything.
Then ... should exist expresses neccessity in the present tense - i.e. if you check right now this instant, then it ought to be so. ("should" is a "deontic" modal verb according to english stackexchange)
Related
I currently work on a personal writing project which has ended up with me maintaining a few different versions due to the differences of the relevant platforms and output formats I want to support that are not trivially solved. After several instances of me glancing at pandoc and the sheer forest that it represents, I have concluded mere templates don't do what I need, and worse, that I seem to need a combination of a custom filter and writer... suffice to say: messing with the AST is where I feel way out of my depth. Enough so that, rather than asking specific questions of 'how do I do X' here, this is a question of 'is X the right way to go about it, or what is the proper way to do it, and can you give an example of how it ties together?'... so if this question is rather lengthy: my apologies.
My current goal is to have custom markup like the following which is supposed to 'track' which character says something:
<paul|"Hi there">
If I convert to HTML, I'd want something similar to:
<span class="speech paul">"Hi there"</span>
to pop out (and perhaps the <p> tags), whereas if it is just pure markdown / plain text, I'd want it to silently disappear:
"Hi there"
Looking at the JSON AST structures I've studied, it would make sense that I'd want a new structure type similar to the 'Emph' tag called 'Speech' which allows whole blobs of text to be put inside of it with a bit of extra information attached (the person speaking). So something like this:
{"t":"Speech","speaker":"paul","c":[ ... ] }
Problem #1: At the point a lua-filter sees the document, it is obviously already distilled to an AST. This means replacing the items in a manner similar to what most macro expander samples do cannot really work since it would require reading forward. With this method, I just replace bits and pieces in place (<NAME| becomes a StartSpeech and the first solitary > that follows becomes an EndSpeech, but that would make malformed input a bigger potential problem because of silent-ish failures. Additionally, these tags would be completely out of sorts with how an AST is supposed to look.
To complicate matters even further, some of my characters end up learning a secondary language throughout the story, for which I apply a different format that contains a simplified understanding of the spoken text with perspective-characters understanding of what was said. Example:
<paul|"Heb je goed geslapen?"|"Did you ?????">
I could probably add a third 'UnderstoodSpeech' group to my filter, but (problem #2) at this point, the relationship between the speaker, the original speech, and the understood translation is completely gone. As long as the final documents need these values in these respective orders and only in these orders, it is fine... but what if I want my HTML version to look like
"Did you?????"
with a tool-tip / hover-over effect containing the original speech? That would be near impossible to achieve because the AST does not contain that kind of relational detail.
Whatever kind of AST I create in the filter is what I need to understand in my custom writer. Ideally, I want to re-use as much stock functionality of pandoc as possible for the writer, but I don't even know if that is feasible at this point.
So now my question: could someone with great pandoc understanding please give me an example on how to keep relevant data-bits together and apply them in the correct manner? By this I mean show a basic example of what needs to be put in the lua-filter and lua-writer scripts in the following toolchain
[CUSTOMIZED MARKDOWN INPUT] -> lua-filter -> lua-writer -> [CUSTOMIZED HTML5 OUTPUT]
I am writing a clang tool, yet I am quite new to it, so i came across a problem, that I couldn't find in the docs (yet).
I am using the great Matchers API to find some nodes that I will later want to manipulate in the AST. The problem is, that the clang tool will actually parse eeeverything that belongs to the sourcefile including headers like iostream etc.
Since my manipulation will probably include some refactoring I definitely do not want to touch each and every thing the parser finds.
Right now I am dealing with this by comparing the sourceFiles of nodes that I matched against with the argumets in argv, but needless to say, that this feels wrong since it still parses through ALL the iostream code - it just ignores it whilst doing so. I just cant believe there is not a way to just tell the ClangTool something like:
"only match nodes which location's source file is something the user fed to this tool"
Thinking about it it only makes sense if its possible to individually create ASTs for each source file, but I do need them to be aware of each other or share contextual knowledge and I also haven't figured out a way to do that either.
I feel like I am missing something very obvious here.
thanks in advance :)
There are several narrowing matchers that might help: isExpansionInMainFile and isExpansionInSystemHeader. For example, one could combine the latter with unless to limit matches to AST nodes that are not in system files.
There are several examples of using these in the Code Analysis and Refactoring with Clang Tools repository. For example, see the file lib/callsite_expander.h around line 34, where unless(isExpansionInSystemHeader)) is used to exclude call expressions that are in system headers. Another example is at line 27 of lib/function_signature_expander.h, where the same is used to exclude function declarations in system headers that would otherwise match.
I'm trying to implement diff3 algorithm and currently stuck at chunks creation stage. I already know how to get LCS between original file and "other" file and LCS between original file and "my" file. Which steps need to do to get chunks?
I don't think this really answers your question, but Subversion implements this layering exactly as you describe here. It follows the theory quite closely, so you might be able to re-use some pieces.
See http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_diff/
I am looking to write a basic profanity filter in a Rails based application. This will use a simply search and replace mechanism whenever the appropriate attribute gets submitted by a user. My question is, for those who have written these before, is there a CSV file or some database out there where a list of profanity words can be imported into my database? We are submitting the words that we will replace the profanities with on our own. We more or less need a database of profanities, racial slurs and anything that's not exactly rated PG-13 to get triggered.
As the Tin Man suggested, this problem is difficult, but it isn't impossible. I've built a commercial profanity filter named CleanSpeak that handles everything mentioned above (leet speak, phonetics, language rules, whitelisting, etc). CleanSpeak is capable of filtering 20,000 messages per second on a low end server, so it is possible to build something that works well and performs well. I will mention that CleanSpeak is the result of about 3 years of on-going development though.
There are a few things I tell everyone that is looking to try and tackle a language filter.
Don't use regular expressions unless you have a small list and don't mind a lot of things getting through. Regular expressions are relatively slow overall and hard to manage.
Determine if you want to handle conjugations, inflections and other language rules. These often add a considerable amount of time to the project.
Decide what type of performance you need and whether or not you can make multiple passes on the String. The more passes you make the slow your filter will be.
Understand the scunthrope and clbuttic problems and determine how you will handle these. This usually requires some form of language intelligence and whitelisting.
Realize that whitespace has a different meaning now. You can't use it as a word delimiter any more (b e c a u s e of this)
Be careful with your handling of punctuation because it can be used to get around the filter (l.i.k.e th---is)
Understand how people use ascii art and unicode to replace characters (/ = v - those are slashes). There are a lot of unicode characters that look like English characters and you will want to handle those appropriately.
Understand that people make up new profanity all the time by smashing words together (likethis) and figure out if you want to handle that.
You can search around StackOverflow for my comments on other threads as I might have more information on those threads that I've forgotten here.
Here's one you could use: Offensive/Profane Word List from CMU site
Based on personal experience, you do understand that it's an exercise in futility?
If someone wants to inject profanity, there's a slew of words that are innocent in one context, and profane in another so you'll have to write a context parser to avoid black-listing clean words. A quick glance at CMU's list shows words I'd never consider rude/crude/socially unacceptable. You'll see there are many words that could be proper names or nouns, countries, terms of endearment, etc. And, there are myriads of ways to throw your algorithm off using L33T speak and such. Search Wikipedia and the internets and you can build tables of variations of letters.
Look at CMU's list and imagine how long the list would be if, in addition to the correct letter, every a could also be 4, o could be 0 or p, e could be 3, s could be 5. And, that's a very, very, short example.
I was asked to do a similar task and wrote code to generate L33T variations of the words, and generated a hit-list of words based on several profanity/offensive lists available on the internet. After running the generator, and being a little over 1/4 of the way through the file, I had over one million entries in my DB. I pulled the plug on the project at that point, because the time spent searching, even using Perl's Regex::Assemble, was going to be ridiculous, especially since it'd still be so easy to fool.
I recommend you have a long talk with whoever requested that, and ask if they understand the programming issues involved, and low-likelihood of accuracy and success, especially over the long-term, or the possible customer backlash when they realize you're censoring them.
I have one that I've added to (obfuscated a bit) but here it is: https://github.com/rdp/sensible-cinema/blob/master/lib/subtitle_profanity_finder.rb
I maintain a 3rd party Informix driver that's written with ESQL-style (Informix API) calls. I'm working on a bug where, for TEXT fields, INSERTs work fine and UPDATEs fail. Stepping through the code, what I've found is that we're checking our sqlda structure to tell us whether and how to bind, and after the call to sqli_describe_statement, the sqlda.sqld variable contains 2, the correct number of bound parameters for this insert call, and the parameters appear to be set up correctly whereas in the update case, the number returned is 0, with no parameter information (it should be 1, for the one param in: "UPDATE TESTTAB SET COLNAME = ? WHERE OTHERCOLNAME = 1 ").
Using the sqlda information, we correctly set up the required locator structure for the INSERT, but we can't for the update because the information isn't there. If I fake it out in the debugger and run the set-up-the-locator code for the update, it updates fine.
The statement certainly appears correct, and the same variable is being used for the INSERT as the UPDATE bind. Moreover sqli_prep has no problem with the update. For the describe, sqsla.code returns different non-negative numbers 4 and 6, representing the different types of statements being described, as documeneted (i.e., not an error code), so there's no obvious problem there.
Is there something else I should be checking in the code ahead of this that might cause this weird behavior (other than special case handling for the different queries -- nothing there)
Am I missing something fundamental here about how one does UPDATEs on TEXT fields, such as you have to create a locator object, find the row, and click your heels together three times and say "There's no place like IBM?"
So far Google Fu has turned up little in the documentation, but if you know of docs or samples that point the way, that's cool too.
This is one of the murky areas of Informix behaviour. The behaviour of DESCRIBE is supposed to describe output parameters (it is a shorthand for DESCRIBE OUTPUT stmt INTO ...); to describe the input parameters, you would use DESCRIBE INPUT stmt INTO ... instead.
However, for various reasons extending back to the dawn of time (well, 1985, anyway), the INSERT statement got a special case exemption and plain DESCRIBE described its input parameters - unlike UPDATE or DELETE (or, these days, MERGE).
So, your code was probably written before DESCRIBE INPUT and DESCRIBE OUTPUT became feasible (that was circa 2000±3 years). In principle, using the directed DESCRIBE statements should fix the issue. There may be an ONCONFIG parameter to be set to get this behaviour.
I remember being grateful that the feature arrived, but also I remember thinking "Damn, I'm not going to be able to use that for a while - until the old versions without it are all retired". I think that has basically happened now - IDS 7.31 in particular is now obsolete, and so indeed are the IDS 9.x versions, so all available versions of IDS support the feature. OnLine 5.20 - a minority interest - still doesn't and won't ever support it. So, I need to review how to update my programs such as SQLCMD to exploit this. The code there includes what I call 'vignettes'; they're complete little programs that illustrate how to work with BYTE and TEXT blobs. You might find UPDBLOB or APPBLOB, for example, of some use.