I maintain a 3rd party Informix driver that's written with ESQL-style (Informix API) calls. I'm working on a bug where, for TEXT fields, INSERTs work fine and UPDATEs fail. Stepping through the code, what I've found is that we're checking our sqlda structure to tell us whether and how to bind, and after the call to sqli_describe_statement, the sqlda.sqld variable contains 2, the correct number of bound parameters for this insert call, and the parameters appear to be set up correctly whereas in the update case, the number returned is 0, with no parameter information (it should be 1, for the one param in: "UPDATE TESTTAB SET COLNAME = ? WHERE OTHERCOLNAME = 1 ").
Using the sqlda information, we correctly set up the required locator structure for the INSERT, but we can't for the update because the information isn't there. If I fake it out in the debugger and run the set-up-the-locator code for the update, it updates fine.
The statement certainly appears correct, and the same variable is being used for the INSERT as the UPDATE bind. Moreover sqli_prep has no problem with the update. For the describe, sqsla.code returns different non-negative numbers 4 and 6, representing the different types of statements being described, as documeneted (i.e., not an error code), so there's no obvious problem there.
Is there something else I should be checking in the code ahead of this that might cause this weird behavior (other than special case handling for the different queries -- nothing there)
Am I missing something fundamental here about how one does UPDATEs on TEXT fields, such as you have to create a locator object, find the row, and click your heels together three times and say "There's no place like IBM?"
So far Google Fu has turned up little in the documentation, but if you know of docs or samples that point the way, that's cool too.
This is one of the murky areas of Informix behaviour. The behaviour of DESCRIBE is supposed to describe output parameters (it is a shorthand for DESCRIBE OUTPUT stmt INTO ...); to describe the input parameters, you would use DESCRIBE INPUT stmt INTO ... instead.
However, for various reasons extending back to the dawn of time (well, 1985, anyway), the INSERT statement got a special case exemption and plain DESCRIBE described its input parameters - unlike UPDATE or DELETE (or, these days, MERGE).
So, your code was probably written before DESCRIBE INPUT and DESCRIBE OUTPUT became feasible (that was circa 2000±3 years). In principle, using the directed DESCRIBE statements should fix the issue. There may be an ONCONFIG parameter to be set to get this behaviour.
I remember being grateful that the feature arrived, but also I remember thinking "Damn, I'm not going to be able to use that for a while - until the old versions without it are all retired". I think that has basically happened now - IDS 7.31 in particular is now obsolete, and so indeed are the IDS 9.x versions, so all available versions of IDS support the feature. OnLine 5.20 - a minority interest - still doesn't and won't ever support it. So, I need to review how to update my programs such as SQLCMD to exploit this. The code there includes what I call 'vignettes'; they're complete little programs that illustrate how to work with BYTE and TEXT blobs. You might find UPDBLOB or APPBLOB, for example, of some use.
Related
What is the benefit of using paragraphs and sections for executing pieces of code, instead of using a subprogram instead? As far as I can see paragraphs and sections are dangerous because they have an non intuitive control flow, its easy to fall through and execute stuff you never meant to execute, and there is no variable (item) scoping, therefore it encourages a style of programming where everything is visible to everything else. Its a slippery soup.
I read a lot, but I could not find anything related to the comparative benefit of paragraphs/sections vs a subprogram. I also asked online some people in some COBOL forum, but their answers were along the lines of "is this a joke" or "go learn programming"(!!!).
I do not wish to engage in a discussion of stylistic preferences, everyone writes the way that their brain works, I only want to know, is there any benefit to using paragraphs/sections for flow control? As in, are there any COBOL operations that can be done only by using paragraphs/sections? Or is it just a remnant of an early way of thinking about code?
Because no other language I know of has mimicked that, so it either has some mechanical concrete essential reason to exist in COBOL, or it is a stylistic preference of the COBOL people. Can someone illuminate me on what is happening?
These are multiple questions... the two most important ones:
Are there any COBOL operations that can be done only by using paragraphs/sections?
Yes. A likely not complete list:
USE statements in DECLARATIVES can only apply to a paragraph or a section. These are used for handling file errors and exceptions. Not all compilers support this COBOL standard feature in full.
Segmentation (primary: a program that is only partially loaded in memory) is only possible with sections; but that is to be considered a "legacy feature" (at least I don't know of people actually using it this way explicitly); see the comment of Gilbert Le Blanc for more details on this
fall-through, many other languages have this feature with a kind of a switch statement (COBOL's EVALUATE, which is not the same as a common switch but can be used similar has an explicit break and no fall-through)
GO TO DEPENDING ON (could be recoded to achieve something similar with EVALUATE and then either PERFORM, if the paragraphs are expected to fall-through, which is not uncommon, then that creates a lot of extra code)
GO TO in general and especially nice - the old obsolete ALTER statement
PERFORM statement, format 1 "out-of-line"
file state is only shared between programs when you define it as EXTERNAL, and you often want to have a file state being limited to a single program
up to COBOL85: EXIT statement (plain without anything else, actually doing nothing else then a CONTINUE would)
What is the benefit of using paragraphs and sections for executing pieces of code, instead of using a subprogram instead?
shared data (I guess you know of programs with static data or otherwise (module)global data that is shared between functions/methods and also different source code files)
much less overhead than a CALL is
consistency:
you know what's in your code, you don't know what another program does (or at least: you cannot guarantee that it will do the same some years later exactly the same)
easier to extend/change: adding another variable (and also removing part of it, change its size) to a CALL USING means that you also have to adjust the called program - and all programs that call this, even when you place the complete definition in a copybook, which is very reasonable, this means you have to recompile all programs that use this
a section/paragraph is always available (it is already loaded when the program runs), a CALLed program may not be available or lead to an exception, for example because it cannot be loaded as its parameters have changed
less stuff to code
Note: While not all compilers support this you can work around nearly all of the runtime overhead and consistency issue when you use one source files with multiple program definitions in (possibly nested) and using a static call-convention. This likely gives you the "modern" view you aim for with scope-limitation of variables, within the programs either persistent (like local-static) when defined in WORKING-STORAGE or always passed when in LINKAGE or "local-temporary" when in LOCAL-STORAGE.
Should all code of an application be in one program?
[I've added this one to not lead to bad assumptions] Of course not!
Using sub-programs and also user-defined functions (possibly even nested providing the option for "scoped" and "shared" data) is a good thing where you have a "feature boundary" (for example: access to data, user-interface, ...) or with "modern" COBOL where you have a "language boundary" (for example: direct CALLs of C/Java/whatever), but it isn't "jut for limiting a counter to a section" - in this case: either define a variable which state is not guaranteed to be available after any PERFORM or define one for the section/paragraph; in both cases it would be reasonable to use a prefix telling you this.
Using that "separate by boundary" approach also takes care of the "bad habit of everything being seen by everyone" issue (which is in any case only true for "all sections/paragraphs in the same program).
Personal side note: I would only use paragraphs where it is a "shop/team rule" (it is better to stay consistent then to do things different "just because they are better" [still providing an option to possibly change the common rule]) or for GO TO, which I normally not use.
SECTIONs and EXIT SECTION + EXIT PERFORM [CYCLE] (and very rarely GOBACK/EXIT PROGRAM) make paragraphs nearly unnecessary.
very short answer. subroutines!!
Subroutines execute in the context of the calling routine. Two virtues: no parameter passing, easy to create. In some languages, subroutines are private to (and are part of) the calling (invoking) routine (see various dialects of BASIC).
direct answer: Section and Paragraph support a different way of thinking about programming. Higher performance than call subprogram. Supports overlays. The "fall thru" aspect can be quite useful, a feature rather than a vice. They may be necessary depending on what you are doing with a specific COBOL compiler.
See also PL/1, BAL/360, architecture 360/370/...
As a veteran Cobol dinosaur, I would say asking about the benefit is not the right question. I used paragraph (or section) differently than a subprogram. The right question in my opinion is when to use them logically. If I can make an analogy, if you have a Dog java class, you will write Dog-appropriate methods within it. If there's a cat involved, you may need a helper class. In this case the helper class is the subprogram. Though, you can instead code the helper class methods inside the Dog class, but that will be bad coding.
In any other language I would recommend putting self contained functions into subroutines.
However in COBOL not so much. If the code is very likely to be used in other programs then a subroutine is a good idea. Otherwise not!
The reason being the total lack of any checks on the number type or existence of passed parameters at compile time. Small errors in call statements lead to program crashes at run time. Limiting the use of sub-routines and carefully checking the calling code for errors makes for a more reliable program.
Using paragraphs any type mismatch will be flagged at compile time, or, an automatic conversion will occur.
I'm re-building a Lua to ES3 transpiler (a tool for converting Lua to cross-browser JavaScript). Before I start to spend my ideas on this transpiler, I want to ask if it's possible to convert Lua labels to ECMAScript 3. For example:
goto label;
:: label ::
print "skipped";
My first idea was to separate each body of statements in parts, e.g, when there's a label, its next statements must be stored as a entire next part:
some body
label (& statements)
other label (& statements)
and so on. Every statement that has a body (or the program chunk) gets a list of parts like this. Each part of a label should have its name stored in somewhere (e.g, in its own part object, inside a property).
Each part would be a function or would store a function on itself to be executed sequentially in relation to the others.
A goto statement would lookup its specific label to run its statement and invoke a ES return statement to stop the current statements execution.
The limitations of separating the body statements in this way is to access the variables and functions defined in different parts... So, is there a idea or answer for this? Is it impossible to have stable labels if converting them to ECMAScript?
I can't quite follow your idea, but it seems someone already solved the problem: JavaScript allows labelled continues, which, combined with dummy while loops, permit emulating goto within a function. (And unless I forgot something, that should be all you need for Lua.)
Compare pages 72-74 of the ECMAScript spec ed. #3 of 2000-03-24 to see that it should work in ES3, or just look at e.g. this answer to a question about goto in JS. As usual on the 'net, the URLs referenced there are dead but you can get summerofgoto.com [archived] at the awesome Internet Archive. (Outgoing GitHub link is also dead, but the scripts are also archived: parseScripts.js, goto.min.js or goto.js.)
I hope that's enough to get things running, good luck!
I'm relatively new to Ruby, so this is a pretty general question. I have found through the Ruby Docs page a lot of methods that seem to do the exact same thing or very similar. For example chars vs split(' ') and each vs map vs collect. Sometimes there are small differences and other times I see no difference at all.
My question here is how do I know which is best practice, or is it just personal preference? I'm sure this varies from instance to instance, so if I can learn some of the more important ones to be cognizant of I would really appreciate that because I would like to develop good habits early.
I am a bit confused by your specific examples:
map and collect are aliases. They don't "do the exact same thing", they are the exact same thing. They are just two names for the same method. You can use whatever name you wish, or what reads best in context, or what your team has decided as a Coding Standard. The Community seems to have settled on map.
each and map/collect are completely different, there is no similarity there, apart from the general fact that they both operate on collections. map transform a collection by mapping every element to a new element using a transformation operation. It returns a new collection (an Array, actually) with the transformed elements. each performs a side-effect for every element of the collection. Since it is only used for its side-effect, the return value is irrelevant (it might just as well return nil like Kernel#puts does, in languages like C, C++, Java, C♯, it would return void), but it is specified to always return its receiver.
split splits a String into an Array of Strings based on a delimiter that can be either a Regexp (in which case you can also influence whether or not the delimiter itself gets captured in the output or ignored) or a String, or nil (in which case the global default separator gets used). chars returns an Array with the individual characters (represented as Strings of length 1, since Ruby doesn't have an specific Character type). chars belongs together in a family with bytes and codepoints which do the same thing for bytes and codepoints, respectively. split can only be used as a replacement for one of the methods in this family (chars) and split is much more general than that.
So, in the examples you gave, there really isn't much similarity at all, and I cannot imagine any situation where it would be unclear which one to choose.
In general, you have a problem and you look for the method (or combination of methods) that solve it. You don't look at a bunch of methods and look for the problem they solve.
There'll typically be only one method that fits a specific problem. Larger problems can be broken down into different subproblems in different ways, so it is indeed possible that you may end up with different combinations of methods to solve the same larger problem, but for each individual subproblem, there will generally be only one applicable method.
When documentation states that 2 methods do the same, it's just matter of preference. To learn the details, you should always start with Ruby API documentation
I am looking to write a basic profanity filter in a Rails based application. This will use a simply search and replace mechanism whenever the appropriate attribute gets submitted by a user. My question is, for those who have written these before, is there a CSV file or some database out there where a list of profanity words can be imported into my database? We are submitting the words that we will replace the profanities with on our own. We more or less need a database of profanities, racial slurs and anything that's not exactly rated PG-13 to get triggered.
As the Tin Man suggested, this problem is difficult, but it isn't impossible. I've built a commercial profanity filter named CleanSpeak that handles everything mentioned above (leet speak, phonetics, language rules, whitelisting, etc). CleanSpeak is capable of filtering 20,000 messages per second on a low end server, so it is possible to build something that works well and performs well. I will mention that CleanSpeak is the result of about 3 years of on-going development though.
There are a few things I tell everyone that is looking to try and tackle a language filter.
Don't use regular expressions unless you have a small list and don't mind a lot of things getting through. Regular expressions are relatively slow overall and hard to manage.
Determine if you want to handle conjugations, inflections and other language rules. These often add a considerable amount of time to the project.
Decide what type of performance you need and whether or not you can make multiple passes on the String. The more passes you make the slow your filter will be.
Understand the scunthrope and clbuttic problems and determine how you will handle these. This usually requires some form of language intelligence and whitelisting.
Realize that whitespace has a different meaning now. You can't use it as a word delimiter any more (b e c a u s e of this)
Be careful with your handling of punctuation because it can be used to get around the filter (l.i.k.e th---is)
Understand how people use ascii art and unicode to replace characters (/ = v - those are slashes). There are a lot of unicode characters that look like English characters and you will want to handle those appropriately.
Understand that people make up new profanity all the time by smashing words together (likethis) and figure out if you want to handle that.
You can search around StackOverflow for my comments on other threads as I might have more information on those threads that I've forgotten here.
Here's one you could use: Offensive/Profane Word List from CMU site
Based on personal experience, you do understand that it's an exercise in futility?
If someone wants to inject profanity, there's a slew of words that are innocent in one context, and profane in another so you'll have to write a context parser to avoid black-listing clean words. A quick glance at CMU's list shows words I'd never consider rude/crude/socially unacceptable. You'll see there are many words that could be proper names or nouns, countries, terms of endearment, etc. And, there are myriads of ways to throw your algorithm off using L33T speak and such. Search Wikipedia and the internets and you can build tables of variations of letters.
Look at CMU's list and imagine how long the list would be if, in addition to the correct letter, every a could also be 4, o could be 0 or p, e could be 3, s could be 5. And, that's a very, very, short example.
I was asked to do a similar task and wrote code to generate L33T variations of the words, and generated a hit-list of words based on several profanity/offensive lists available on the internet. After running the generator, and being a little over 1/4 of the way through the file, I had over one million entries in my DB. I pulled the plug on the project at that point, because the time spent searching, even using Perl's Regex::Assemble, was going to be ridiculous, especially since it'd still be so easy to fool.
I recommend you have a long talk with whoever requested that, and ask if they understand the programming issues involved, and low-likelihood of accuracy and success, especially over the long-term, or the possible customer backlash when they realize you're censoring them.
I have one that I've added to (obfuscated a bit) but here it is: https://github.com/rdp/sensible-cinema/blob/master/lib/subtitle_profanity_finder.rb
I'm making a web app where the point is to change a given word by one letter. For example, if I make a post by selecting the word: "best," then the first reply could be "rest," while the one after that should be "rent," "sent", etc. So, the word a user enters must have changed by one letter from the last submitted word. It would be constantly evolving.
Right now you can make a game and respond just by typing a word. I coded up a custom validation using functionality from the Amatch gem:
http://flori.github.com/amatch/doc/index.html
Posts have many responses, and responses belong to a post.
here's the code:
def must_have_changed_by_one_letter
m = Amatch::Sellers.new(title.strip)
errors.add_to_base("Sorry, you must change the last submitted word by one letter")
if m.match(post.responses.last.to_s.strip) != 1.0
end
When I try entering a new response for a test post I made (original word "best", first response is "rest") I get this:
ActiveRecord::RecordInvalid in ResponsesController#create
Validation failed: Sorry, you must change the last submitted word by one letter
Any thoughts on what might be wrong?
Thanks!
Looks like there are a couple of potential issues here.
For one, is your if statement actually on a separate line than your errors.add_to_base... statement? If so, your syntax is wrong; the if statement needs to be in the same line as the statement it's modifying. Even if it is actually on the correct line, I would recommend against using a trailing if statement on such a long line; it will make it hard to find the conditional.
if m.match(post.responses.last.to_s.strip) != 1.0
errors.add_to_base("Sorry, you must change the last submitted word by one letter")
end
Second, doing exact equality comparison on floating point numbers is almost never a good idea. Because floating point numbers involve approximations, you will sometimes get results that are very close, but not quite exactly equal, to a given number that you are comparing against. It looks like the Amatch library has several different classes for comparing strings; the Sellers class allows you to set different weights for different kinds of edits, but given your problem description, I don't think you need that. I would try using the Levenshtein or Hamming distance instead, depending on your exact needs.
Finally, if neither of those suggestions work, try writing out to a log or in the response the exact values of title.strip and post.responses.last.to_s.strip, to make sure you are actually comparing the values that you think you're comparing. I don't know the rest of your code, so I can't tell you whether those are correct or not, but if you print them out somewhere, you should be easily able to check them yourself.