Determine location of Smart-LOB (Informix 11.5) - informix

How, in Informix IDS 11.5, do I determine in which SmartLob space does a BLOB reside?
So really it's two questions:
How can I get something like the dbschema command to produce the PUT clause.
How can I find from which sblobspace did a particular SmartLOB come from?

The answer to the DB-Schema (first) question is "with the '-ss' option", where 'ss' is mnemonic for 'server-specific'. This will include the data specific to IDS, such as the PUT clause.
The counter-question for the blobspace (second) question is:
Why do you think it matters which blobspace the blob comes from?
For an individual smart blob, you can find out which blob space a specific smart blob is stored in as long as you are using ESQL/C or one of the related C-based APIs. The function to do this is ifx_lo_specget_sbspace(), and it is documented in the ESQL/C manual.
I don't know of an SQL-based way of determining the smart blobspace that holds a particular blob.

Related

Converting between M3 `loc` scheme and regular `loc` type?

The M3 Core module returns a sort of simplified loc representation in Rascal. For instance, a method in file MapParser might have the loc: |java+method:///MapParser/a()|.
However, this is evidently different from the other loc scheme I tend to see, which would look more or less like: |project://main-scheme/src/tests/MapParser.java|.
This wouldn't be a problem, except that some functions only accept one scheme or another. For instance, the function appendToFile(loc file, value V...) does not accept this scheme M3 uses, and will reject it with an error like: IO("Unsupported scheme java+method").
So, how can I convert between both schemes easily? I would like to preserve all information, like highlighted sections for instance.
Cheers.
There are two differences at play here.
Physical vs Logical Locations
java+method is an logical location, and project is a physical location. I think the best way to describe their difference is that a physical location describes the location of an actual file, or a subset of an actual file. A logical location describes the location of a certain entity in the context of a bigger model. For example, a java method in a java class/project. Often logical locations can be mapped to a physical location, but that is not always true.
For m3 for example you can use resolveLocation from IO to get the actual offset in the file that the logical location points to.
Read-only vs writeable locations
Not all locations are writeable, I don't think any logical location is. But there are also physical locations that are read only. The error you are getting is generic in that sense.
Rascal does support writing in the middle of text files, most likely you do not want to use appendToFile as it will append after the location you point it too. Most likely you want to replace a section of the text with your new section, so a regular writeFile should work.
Some notes
Note that you would have to recalculate all the offsets in the file after every write. So the resolved physical locations for the logical locations would be outdated, as the file has changed since constructing the m3 model and its corresponding map between logical and physical locations.
So for this use case, you might want to think of a better way. The nicest solution is using a grammar, and rewrite the parse tree's of the file, and after rewriting overwrite the old file. Note that the most recent Java grammar shipped with Rascal is for Java 5, so this might be a bit more work than you would like. Perhaps frame your goal as a new Stack Overflow question, and we'll see what other options might be applicable.

Fast Search to see if a String Exists in Large Files with Delphi

I have a FindFile routine in my program which will list files, but if the "Containing Text" field is filled in, then it should only list files containing that text.
If the "Containing Text" field is entered, then I search each file found for the text. My current method of doing that is:
var
FileContents: TStringlist;
begin
FileContents.LoadFromFile(Filepath);
if Pos(TextToFind, FileContents.Text) = 0 then
Found := false
else
Found := true;
The above code is simple, and it generally works okay. But it has two problems:
It fails for very large files (e.g. 300 MB)
I feel it could be faster. It isn't bad, but why wait 10 minutes searching through 1000 files, if there might be a simple way to speed it up a bit?
I need this to work for Delphi 2009 and to search text files that may or may not be Unicode. It only needs to work for text files.
So how can I speed this search up and also make it work for very large files?
Bonus: I would also want to allow an "ignore case" option. That's a tougher one to make efficient. Any ideas?
Solution:
Well, mghie pointed out my earlier question How Can I Efficiently Read The First Few Lines of Many Files in Delphi, and as I answered, it was different and didn't provide the solution.
But he got me thinking that I had done this before and I had. I built a block reading routine for large files that breaks it into 32 MB blocks. I use that to read the input file of my program which can be huge. The routine works fine and fast. So step one is to do the same for these files I am looking through.
So now the question was how to efficiently search within those blocks. Well I did have a previous question on that topic: Is There An Efficient Whole Word Search Function in Delphi? and RRUZ pointed out the SearchBuf routine to me.
That solves the "bonus" as well, because SearchBuf has options which include Whole Word Search (the answer to that question) and MatchCase/noMatchCase (the answer to the bonus).
So I'm off and running. Thanks once again SO community.
The best approach here is probably to use memory mapped files.
First you need a file handle, use the CreateFile windows API function for that.
Then pass that to CreateFileMapping to get a file mapping handle. Finally use MapViewOfFile to map the file into memory.
To handle large files, MapViewOfFile is able to map only a certain range into memory, so you can e.g. map the first 32MB, then use UnmapViewOfFile to unmap it followed by a MapViewOfFile for the next 32MB and so on. (EDIT: as was pointed out below, make sure that the blocks you map this way overlap by a multiple of 4kb, and at least as much as the length of the text you are searching for, so that you are not overlooking any text which might be split at the block boundary)
To do the actual searching once the (part of) the file is mapped into memory, you can make a copy of the source for StrPosLen from SysUtils.pas (it's unfortunately defined in the implementation section only and not exposed in the interface). Leave one copy as is and make another copy, replacing Wide with Ansi every time. Also, if you want to be able to search in binary files which might contain embedded #0's, you can remove the (Str1[I] <> #0) and part.
Either find a way to identify if a file is ANSI or Unicode, or simply call both the Ansi and Unicode version on each mapped part of the file.
Once you are done with each file, make sure to call CloseHandle first on the file mapping handle and then on the file handling. (And don't forget to call UnmapViewOfFile first).
EDIT:
A big advantage of using memory mapped files instead of using e.g. a TFileStream to read the file into memory in blocks is that the bytes will only end up in memory once.
Normally, on file access, first Windows reads the bytes into the OS file cache. Then copies them from there into the application memory.
If you use memory mapped files, the OS can directly map the physical pages from the OS file cache into the address space of the application without making another copy (reducing the time needed for making the copy and halfing memory usage).
Bonus Answer: By calling StrLIComp instead of StrLComp you can do a case insensitive search.
If you are looking for text string searches, look for the Boyer-Moore search algorithm. It uses memory mapped files and a really fast search engine. The is some delphi units around that contain implementations of this algorithm.
To give you an idea of the speed - i currently search through 10-20MB files and it takes in the order of milliseconds.
Oh just read that it might be unicode - not sure if it supports that - but definately look down this path.
This is a problem connected with your previous question How Can I Efficiently Read The First Few Lines of Many Files in Delphi, and the same answers apply. If you don't read the files completely but in blocks then large files won't pose a problem. There's also a big speed-up to be had for files containing the text, in that you should cancel the search upon the first match. Currently you read the whole files even when the text to be found is in the first few lines.
May I suggest a component ? If yes I would recommend ATStreamSearch.
It handles ANSI and UNICODE (and even EBCDIC and Korean and more).
Or the class TUTBMSearch from the JclUnicode (Jedi-jcl). It was mainly written by Mike Lischke (VirtualTreeview). It uses a tuned Boyer-Moore algo that ensure speed. The bad point in your case, is that is fully works in unicode (widestrings) so the trans-typing from String to Widestring risk to be penalizing.
It depends on what kind of data yre you going to search with it, in order for you to achieve a real efficient results you will need to let your programm parse the interesting directories including all files in there, and keep the data in a database which you can access each time for a specific word in a specific list of files which can be generated up to the searching path. A Database statement can provide you results in milliseconds.
The Issue is that you will have to let it run and parse all files after the installation, which may take even more than 1 hour up to the amount of data you wish to parse.
This Database should be updated eachtime your programm starts, this can be done by comparing the MD5-Value of each file if it was changed, so you dont have to parse all your files each time.
If this way of working can be interesting if you have all your data in a constant place and you analyse data in the same files more than each time totally new files, some code analyser work like this and they are real efficient. So you invest some time on parsing and saving intresting data and you can jump to the exact place where a searching word appears and provide a list of all places it appears on in a very short time.
If the files are to be searched multiple times, it could be a good idea to use a word index.
This is called "Full Text Search".
It will be slower the first time (text must be parsed and indexes must be created), but any future search will be immediate: in short, it will use only the indexes, and not read all text again.
You have the exact parser you need in The Delphi Magazine Issue 78, February 2002:
"Algorithms Alfresco: Ask A Thousand Times
Julian Bucknall discusses word indexing and document searches: if you want to know how Google works its magic this is the page to turn to."
There are several FTS implementation for Delphi:
Rubicon
Mutis
ColiGet
Google is your friend..
I'd like to add that most DB have an embedded FTS engine. SQLite3 even has a very small but efficient implementation, with page ranking and such.
We provide direct access from Delphi, with ORM classes, to this Full Text Search engine, named FTS3/FTS4.

Validate words against an English dictionary in Rails?

I've done some Google searching but couldn't find what I was looking for.
I'm developing a scrabble-type word game in rails, and was wondering if there was a simple way to validate what the player inputs in the game is actually a word. They'd be typing the word out.
Is validation against some sort of English language dictionary database loaded within the app best way to solve this problem? If so, are there any libraries that offer this kind of functionality? If not, what would you suggest?
Thanks for your help!
You need two things:
a word list
some code
The word list is the tricky part. On most Unix systems there's a word list at /usr/share/dict/words or /usr/dict/words -- see http://en.wikipedia.org/wiki/Words_(Unix) for more details. The one on my Mac has 234,936 words in it. But they're not all valid Scrabble words. So you'd have to somehow acquire a Scrabble dictionary, make sure you have the right license to use it, and process it so it's a text file.
(Update: The word list for LetterPress is now open source, and available on GitHub.)
The code is no problem in the simple case. Here's a script I whipped up just now:
words = {}
File.open("/usr/share/dict/words") do |file|
file.each do |line|
words[line.strip] = true
end
end
p words["magic"]
p words["saldkaj"]
This will output
true
nil
I leave it as an exercise for the reader to make it into a proper Words object. (Technically it's not a Dictionary since it has no definitions.) Or to use a DAWG instead of a hash, even though a hash is probably fine for your needs.
A piece of language-agnostic advice here, is that if you only care about the existence of a word (which in such a case, you do), and you are planning to load the entire database into the application (which your query suggests you're considering) then a DAWG will enable you to check the existence in O(n) time complexity where n is the size of the word (dictionary size has no effect - overall the lookup is essentially O(1)), while being a relatively minimal structure in terms of memory (indeed, some insertions will actually reduce the size of the structure, a DAWG for "top, tap, taps, tops" has fewer nodes than one for "tops, tap").

TEXT Update versus TEXT insert on Informix Dynamic Server

I maintain a 3rd party Informix driver that's written with ESQL-style (Informix API) calls. I'm working on a bug where, for TEXT fields, INSERTs work fine and UPDATEs fail. Stepping through the code, what I've found is that we're checking our sqlda structure to tell us whether and how to bind, and after the call to sqli_describe_statement, the sqlda.sqld variable contains 2, the correct number of bound parameters for this insert call, and the parameters appear to be set up correctly whereas in the update case, the number returned is 0, with no parameter information (it should be 1, for the one param in: "UPDATE TESTTAB SET COLNAME = ? WHERE OTHERCOLNAME = 1 ").
Using the sqlda information, we correctly set up the required locator structure for the INSERT, but we can't for the update because the information isn't there. If I fake it out in the debugger and run the set-up-the-locator code for the update, it updates fine.
The statement certainly appears correct, and the same variable is being used for the INSERT as the UPDATE bind. Moreover sqli_prep has no problem with the update. For the describe, sqsla.code returns different non-negative numbers 4 and 6, representing the different types of statements being described, as documeneted (i.e., not an error code), so there's no obvious problem there.
Is there something else I should be checking in the code ahead of this that might cause this weird behavior (other than special case handling for the different queries -- nothing there)
Am I missing something fundamental here about how one does UPDATEs on TEXT fields, such as you have to create a locator object, find the row, and click your heels together three times and say "There's no place like IBM?"
So far Google Fu has turned up little in the documentation, but if you know of docs or samples that point the way, that's cool too.
This is one of the murky areas of Informix behaviour. The behaviour of DESCRIBE is supposed to describe output parameters (it is a shorthand for DESCRIBE OUTPUT stmt INTO ...); to describe the input parameters, you would use DESCRIBE INPUT stmt INTO ... instead.
However, for various reasons extending back to the dawn of time (well, 1985, anyway), the INSERT statement got a special case exemption and plain DESCRIBE described its input parameters - unlike UPDATE or DELETE (or, these days, MERGE).
So, your code was probably written before DESCRIBE INPUT and DESCRIBE OUTPUT became feasible (that was circa 2000±3 years). In principle, using the directed DESCRIBE statements should fix the issue. There may be an ONCONFIG parameter to be set to get this behaviour.
I remember being grateful that the feature arrived, but also I remember thinking "Damn, I'm not going to be able to use that for a while - until the old versions without it are all retired". I think that has basically happened now - IDS 7.31 in particular is now obsolete, and so indeed are the IDS 9.x versions, so all available versions of IDS support the feature. OnLine 5.20 - a minority interest - still doesn't and won't ever support it. So, I need to review how to update my programs such as SQLCMD to exploit this. The code there includes what I call 'vignettes'; they're complete little programs that illustrate how to work with BYTE and TEXT blobs. You might find UPDBLOB or APPBLOB, for example, of some use.

How to make a small engine like Wolfram|Alpha?

Lets say I have three models/tables: operating_systems, words, and programming_languages:
# operating_systems
name:string created_by:string family:string
Windows Microsoft MS-DOS
Mac OS X Apple UNIX
Linux Linus Torvalds UNIX
UNIX AT&T UNIX
# words
word:string defenitions:string
window (serialized hash of defenitions)
hello (serialized hash of defenitions)
UNIX (serialized hash of defenitions)
# programming_languages
name:string created_by:string example_code:text
C++ Bjarne Stroustrup #include <iostream> etc...
HelloWorld Jeff Skeet h
AnotherOne Jon Atwood imports 'SORULEZ.cs' etc...
When a user searches hello, the system shows the defenitions of 'hello'. This is relatively easy to implement. However, when a user searches UNIX, the engine must choose: word or operating_system. Also, when a user searches windows (small letter 'w'), the engine chooses word, but should also show Assuming 'windows' is a word. Use as an operating system instead.
Can anyone point me in the right direction with parsing and choosing the topic of the search query? Thanks.
Note: it doesn't need to be able to perform calculations as WA can do.
Have a new index table called terms that contains a tokenised version of each valid term. That way, you only have to search one table.
# terms
Id Name Type Priority
1 window word false
2 Windows operating_system true
Then you can see how close a match the users search term is. I.e. "Windows" would be a 100% match with 2 - so assume that, but a close match to 1 also, so suggest that as an alternative. You've have to write your own rules engine that decided how close a word matches (i.e. what gets assumed with "windows" vs "Windows"?) The Priority field could be the final decider if the rules engine can't decide, and could in theory be driven by user activity so it learns what users are more likely referring to.
And what about to make a cache in form of a database table where all the keywords would be.
The search query would be something like this:
SELECT * FROM keywords WHERE keyword = '<YourKeyWord>' /* mysql */
the keywords table would contain some kind of references to your modules.
The advantage of this approarch is of course fast searching.
You may use two queries in order to simulate the behaviour you ask for:
Exact match (no problem in mysql)
Case insensitive search
Wolfram Alpha is far more complex than your example... I'm not certain of its inner workings (I have done very little reading on it), but I believe it is a very large and complex automated inference system. They're rather trivial to implement (Prolog is basically a general purpose one you can put whatever data you need into), but they're very hard to make useful.

Resources