Is it possible to provide the following type of functionality with informix client tools?
As the user types the first two characters of a name, the drop-down list is empty. At the third character, the list fills with just the names beginning with those three characters. At the fourth character, MS-Access completes the first matching name (assuming the combo's AutoExpand is on). Once enough characters are typed to identify the customer, the user tabs to the next field.
The time taken to load the combo between keystrokes is minimal. This occurs once only for each entry, unless the user backspaces through the first three characters again.
If your list still contains too many records, you can reduce them by another order of magnitude by changing the value of constant conSuburbMin from 3 to 4.
This requires a combination of two things, only one of which is partially under the control of Informix the DBMS or Informix the Client API.
First of all, you need the gadget that is accepting user input to asynchronously generate a query which matches what the user has typed, fetches some of the results from the DBMS, and shows them. Secondly, you need the DBMS to respond rapidly to such queries. Part of the issue is 'what form does the query take'. But the basic functionality is:
SELECT TitleCaseName
FROM ReferenceTable
WHERE LowerCaseName[1,3] = 'abc';
You might or might not bother with 'first rows optimization'; you might or might not bother with an ORDER BY. Your code would only select the first N rows. You might do it with some prioritization information - most frequently used names, etc.
But this is logic is basically the same for any DBMS - give or take the details such as the choice of technique for dealing with case-mapping (function call vs column) and notation for substrings vs LIKE 'abc%'.
The tricky stuff, though, is the asynchronous combination of user-input plus collecting data from the DBMS; that is best handled with multiple threads, one dealing with the user input, one dealing with the DBMS and (possibly) one dealing with the display (or that might also be the one dealing with user input). And that requires hooking into the UI API - not something that the Informix APIs do of their own accord. The UI can get at Informix (or any other DBMS) easily enough through ODBC or any other faintly similar API.
Related
In InfluxDB (1.5), I have a table where the fields have become inconsistently typed. Most rows in the table are Integer, however, some rows have become strings.
How is this possible? I thought, once a field's types were set (upon first insert), any insert into the table with incorrect typing would fail.
What do I do now? If I go back and attempt to overwrite the data in the inconsistent rows, I get errors saying the field is a string.
After some more research, here's what I've discovered:
Answer to Part 1:
InfluxDB uses a system they refer to as 'sharding' - while I don't know the specifics, I do know that data from the same measurement/table can be stored across multiple, different 'shards'.
According to the InfluxDB documentation, field types can differ between these shards, within the same field, on the same table.
Answer to Part 2:
In order to fix this, the currently-suggested answer is to make a new table, download all the data, and re-insert while ensuring the data that gets inserted is the proper types.
If you had a tag which changed type and became a field, this can be especially difficult to fix, the link above does not address that. To do selects only on tag or field, you can use tag_name::tag or field_name::field within a select statement.
The GROUP BY * clause suggested in the link is required in order to preserve tags, but seemed to cause issues when I used it.
My current solution is a PHP script that uses curl, downloads the points, chunks them, then re-inserts the points into the new table, ensuring each point that gets inserted is casted to the new, uniform type, and properly inserted.
The best way to stopping future issues, is simply not to have them. I went looking for how to lock field types in all cases, across all shards, for a particular measurement table.
Unfortunately, it seems impossible to guarantee 100% type consistency across all current and future shards. "Don't make mistakes because it's really difficult to clean up" seems to be InfluxDB's modus operandi.
There is a Java Swing application which uses an Informix database. I have user rights granted for the Swing application (i.e. no source code), and read only access to a mirror of the database.
Sometimes I need to find a database column, which is backing a GUI element (TextBox, TableField, Label...). What would be best approach to find out which database column and table is holding the data shown e.g. in a TextBox?
My general approach is to capture the state of the database. Commit a change using the GUI and then capture the state of the database again. Then I need to examine the difference. I've already tried:
Use the nrows field of systables: Didn't work, because the number in nrows does not seem to be a realtime representation of the row count.
Create a script with SELECT COUNT(*) ... for all tables: didn't work because too many tables (> 5000). Also tried to optimize by removing empty tables, but there are still too many left.
Is there a simple solution that I'm missing?
Please look at the Change Data Capture API and check if this suits your needs
There probably isn't a simple solution.
You probably need to build yourself a map of the database, or a data dictionary for it. It sounds as though you can eliminate many of the tables from consideration since they're empty — at least for a preliminary pass. If you're dealing with information in a text box, the chances are it is some sort of character data; you can analyze which (non-empty) tables which contain longer character strings, and they'd be the primary targets of your searches. If the schema is badly designed with lots of VARCHAR(255) columns even though the columns normally only hold short strings, life is more difficult. Over time, you can begin to classify tables and columns so that you end up knowing where to look for parts of the application.
One problem to beware of: the tabid in informix.systables isn't necessarily as stable as you'd like. Your data dictionary needs to record its own dd_tabid for the table it describes, and can store the last known tabid from informix.systables, but it needs to be ready to find a new tabid value on occasion. You should probably only mark data in your dictionary for logical deletion.
To some extent, this assumes you can create a database in which to record this information. If you can't create an Informix database, you may have to use something else (MySQL, or SQLite, perhaps) to store the data dictionary. Alternatively, go to your DBA team and ask them for the information. Unless you're trying something self-evidently untoward, they're likely to help (but politics can get in the way — I've no idea how collegial your teams are).
Delphi XE6. Looking to implemented a limited style of search, specifically an edit field for the user to enter a business name which would get looked up. I need to allow the user to enter multiple words, or part of multiple words. For Example, on a business "First Bank of Kansas", user should be able to enter "Fir Kan", and it should return a match. This means an inverted index type of structure. I have some type of list of each unique word, then a (document ID, primary Key ID, etc, which is an integer). I am struggling with WHAT type of structure to make this... I have approximately 250,000 business names, which have 43,500 unique words. Word count will vary from 1 occurrence of a word to several thousand (company, corporation, etc) I have some requirements...
1). Assume the user enters BAN. I need to find ALL words that start with BAN. I need to return BANK, BANKER, etc... This means that whatever structure I use, I have to be able to find BAN and then move to the next alphabetic entry... and keep moving to the next until I find a value that does NOT start with BAN. This eliminates any type of HASH structure, correct?
2). I obviously want this to be fast. HASH is the fastest, but I can't use this, correct? See requirement 1.
3). Each entry in this structure needs to be able to hold a list of integers. If I end up going with a LinkedList, then each element has to hold a list of Integers.
4). I need to be able to save and load this structure. I don't want to have to build it each time I use it.
Whatever I end up with, it appears to have to be a NESTED structure, a higher level list (LinkedList?) with each node being an Integer List.
What am I looking for? What do commercial product use? Outlook, etc have search capabilities.
Every word is linked to a specific set of IDs, each representing a business name, right?.
I recommend using a binary tree data structure because effort for searching is normally log(n), which is quite fast. Especially, if business names are changing at runtime, an AVLTree should do well, although it's quite some work to implement it by yourself. But there should be many ready-to-use units on binary trees all over the internet.
For each successful search for a word in your tree data structure, you should take their list of IDs and aggregate those grouped by the entered word they succeeded for.
As the last step you take all those aggregated lists of IDs and do an intersection.
There should only be IDs left which are fitting to all entered words. Those IDs are referencing the searched business names.
I am scanning an SQLite database looking for all matches and using
OneFound:=False;
if tbl1.FieldByName('Name').AsString = 'jones' then
begin
OneFound:=True;
tbl1.Next;
end;
if OneFound then // Do something
or should I be using
if not(OneFound) then OneFound:=True;
Is it faster to just assign "True" to OneFound no matter how many times it is assigned or should I do the comparison and only change OneFuond the first time?
I know a better way would be to use FTS3, but for now I have to scan the database and the question is more on the approach to setting OneFound as many times as a match is encountered or using the compare-approach and setting it just once.
Thanks
Your question is, which is faster:
if not(OneFound) then OneFound:=True;
or
OneFound := True;
The answer is probably that the second is faster. Conditional statements involve branches which risks branch mis-prediction.
However, that line of code is trivial compared to what is around it. Running across a database one row at a time is going to be outrageously expensive. I bet that you will not be able to measure the difference between the two options because the handling of that little Boolean is simply swamped by the rest of the code. In which case choose the more readable and simpler version.
But if you care about the performance of this code you should be asking the database to do the work, as you yourself state. Write a query to perform the work.
It would be better to change your SQL statement so that the work is done in the database. If you want to know whether there is a tuple which contains the value 'jones' in the field 'name', then a quicker query would be
with tquery.create (nil) do
begin
sql.add ('select name from tbl1 where name = :p1 limit 1');
sql.params[0].asstring:= 'jones';
open;
onefound:= not isempty;
close;
free
end;
Your syntax may vary regarding the 'limit' clause but the idea is to return only one tuple from the database which matches the 'where' statement - it doesn't matter which one.
I used a parameter to avoid problems delimiting the value.
1. Search one field
If you want to search one particular field content, using an INDEX and a SELECT will be the fastest.
SELECT * FROM MYTABLE WHERE NAME='Jones';
Do not forget to create an INDEX on the column, first!
2. Fast reading
But if you want to search within a field, or within several fields, you may have to read and check the whole content. In this case, what will be slow will be calling FieldByName() for each data row: you should better use a local TField variable.
Or forget about TDataSet, and switch to direct access to SQLite3. In fact, using DB.pas and TDataSet requires a lot of data marshalling, so is slower than a direct access.
See e.g. DiSQLite3 or our DB classes, which are very fast, but a bit of higher level. Or you can use our ORM on top of those classes. Our classes are able to read more than 500,000 rows per second from a SQLite3 database, including JSON marshalling into objects fields.
3. FTS3/FTS4
But, as you guessed, the fastest would be indeed to use the FTS3/FTS4 feature of SQlite3.
You can think of FTS4/FTS4 as a "meta-index" or a "full-text index" on supplied blob of text. Just like google is able to find a word in millions of web pages: it does not use a regular database, but full-text indexing.
In short, you create a virtual FTS3/FTS4 table in your database, then you insert in this table the whole text of your main records in the FTS TEXT field, forcing the ID field to be the one of the original data row.
Then, you will query for some words on your FTS3/FTS4 table, which will give you the matching IDs, much faster than a regular scan.
Note that our ORM has dedicated TSQLRecordFTS3 / TSQLRecordFTS4 kind of classes for direct FTS process.
I am using Ruby on Rails and have a situation that I am wondering if is appropriate for using some sort of Key Value Store instead of MySQL. I have users that have_many lists and each list has_many words. Some lists have hundreds of words and I want users to be able to copy a list. This is a heavy MySQL task b/c it is going to have to create these hundreds of word objects at one time.
As an alternative, I am considering using some sort of key value store where the key would just be the word. A list of words could be stored in a text field in mysql. Each list could be a new key value db? It seems like it would be faster to copy a key value db this way rather than have to go through the database. It also seems like this might be faster in general. Thoughts?
The general way to solve this using a relational database would be to have a list table, a word table, and a table-words table relating the two. You are correct that there would be some overhead, but don't overestimate it; because table structure is defined, there is very little actual storage overhead for each record, and records can be inserted very quickly.
If you want very fast copies, you could allow lists to be copied-on-write. Meaning a single list could be referred to by multiple users, or multiple times by the same user. You only actually duplicate the list when the user tries to add, remove, or change an entry. Of course, this is premature optimization, start simple and only add complications like this if you find they are necessary.
You could use a key-value store as you suggest. I would avoid trying to build one on top of a MySQL text field in less you have a very good reason, it will make any sort of searching by key very slow, as it would require string searching. A key-value data store like CouchDB or Tokyo Cabinet could do this very well, but it would most likely take up more space (as each record has to have it's own structure defined and each word has to be recorded separately in each list). The only dimension of performance I would think would be better is if you need massively scalable reads and writes, but that's only relevant for the largest of systems.
I would use MySQL naively, and only make changes such as this if you need the performance and can prove that this method will actually be faster.