Sort a Delphi dataset - delphi

I have an unsorted dataset (a TMSQuery from Devart) that I cannot sort using ORDER BY because I manipulate the records after opening the query so the order given by "ORDER BY" is lost.
I don't want to rewrite the whole logic so I should find a way to sort a dataset.
I can Assign the dataset to a TMemDataSet (TMemDataSet is a DevArt class) descendant (TVirtualTable from Devart), but after this how do I sort (I need to sort by a date field)?
I read this question but it doesn't relly contain the answer I am looking for.

Using IndexFieldNames I solved the problem, it was what I waslooking for. Directly from the TMSQuery component:
MSQuery1.IndexFieldNames := 'EXECUTION_DATE'; //this does the job

Related

How to go to the last record in results of ado.locate (Delphi)

I located some records by this code:
ADOQuery1.Locate('field1',ADOQuery2.FieldByName('field2').Value,[])
How to go to the last one of these records?
You have a number of options. The best depends on a whole lot of considerations you haven't mentioned in your question. I'll provide a very brief overview of the options to avoid this becoming "too broad". It'll be up to you to make your choice and figure out the details. If you get stuck, you can ask a new, more specific question.
Using Locate
A solution involving Locate is only feasible if your dataset is sorted by the same field you're searching on.
Clearly your Search Value is not a unique key. So I'm guessing that you're trying to find the last row matching Search Key in data sorted by some other unique field. (Otherwise the concept of last is meaningless.)
So it's highly probable this is not appropriate for you; unless your data is ordered by a composite key of your search field followed by a unique key.
The approach is simple: navigate forwards until you find a row where the search value doesn't match, then backtrack by 1 row.
if not DataSet.Locate(SearchField, SearchValue, []) then
{ handle not found case as desired }
else
begin
while (not DataSet.Eof) and (DataSet.FieldByName(SearchField).Value = SearchValue) do
DataSet.Next;
{ Watch out for case that last row in dataset matches search value }
if (DataSet.FieldByName(SearchField).Value <> SearchValue) then
DataSet.Prior;
end;
Implement your own search
This is straight-forward and will always work. But it is inefficient, having O(n) complexity. So not advised for large datasets.
DataSet.Last;
while (not DataSet.Bof) and (DataSet.FieldByName(SearchField).Value <> SearchValue) do
DataSet.Prior;
NOTE: In order to mirror behaviour of Locate it would be advisable to enhance this method to deal with the case where a match is not found at all. In that case the active record should not be inadvertently changed as a side-effect of the search.
Use filtering
Obviously this solution depends on whether filtering the dataset is appropriate to the rest of your code. But it is a fairly simple option, and depending factors beyond the scope of this answer, it can be more performant than the previous option.
DataSet.Filtered := False;
{ The next line may be a little tricky.
Ensure the filter string is appropriate for the data-types involved. }
DataSet.Filter := '<string of the form SearchField = SearchValue>';
DataSet.Filtered := True;
DataSet.Last;
See documentation on the Filter property.
NOTE: It may be advisable to take precaution against setting the filter redundantly.
Use a master-detail relationship
This option is included because your question code indicates the SearchValue comes from the active record of another dataset. You're using ADO, so this option is available to you.
DataSet.MasterSource := <Appropriate DataSource>;
DataSet.MasterFields := SearchField;
DataSet.Last;
See documentation on master-detail relationships and on ADO MasterFields.
Offload the work to the RDBMS
Finally, it's worth considering using a stored procedure to get the information you need directly from the database. The advantage is that the server can leverage available indexes and have the potential to provide the most performant option. Again though, a lot depends on the particulars of your application.
A query along the following lines can form the basis of your stored procedure.
select MAX(UniqueField) as RowKey
from Table
where SearchField = SearchValue
Then call your stored procedure, and use its result to find the desired row.
DataSet.Locate(UniqueField, RowKey, []);
NOTE: Don't forget to consider the stored procedure returning NULL if no rows with SearchValue exist.
General Disclaimer
All the above code is extremely brief and for illustrative purposes only. In many cases additional code is required for a robust implementation.
E.g. It might be necessary to DisableControls and enable them again.
NOTE: It's very important with the above to be aware of the actual ordering of the data in your datasets. Failure to take this into account can lead to incorrect behaviour. Even the last option may exhibit worse than expected performance if your dataset is not sorted by UniqueKey.
If your table has an Autoincrement identity field you can do this
adoquery1.sql.clear;
adoquery1.sql.add('select top 1 * from yourtablename where field1=value1 and filed2=value2 order by yourAIcolums desc')
adoquery1.execsql;
value1 and value2 are your desired values.pass them as parameters or put them in command text
this way you get only row you want and no need to loop

How to override sort mechanism of TClientDataSet

Is it possible to change the way indexes are used in TClientDataSet to sort records? After reading this question, I thought it would be nice to be able to sort string fields logically in a client dataset. But I have no idea how to override default behavior of client dataset when it comes to indexes. Any ideas?
PS: My CDS is not linked to any provider. I'm looking for a way to modify the sort mechanism of the TClientDataSet (or the parent in which the mechanism is implemented) itself.
You cannot override the sort mechanism of a ClientDataSet - unless you rewrite the according part of Midas.
To achieve the correct sorting (whatever logical means) you can introduce a new field and set its values in a way so that, sorted with the standard mechanism, they will give the required sort order.
Read the excellent on-line article Understanding ClientDataSet Indexes by Cary Jensen.
It explains how to use various ways of sorting and indexing using IndexDefs, IndexFieldNames and IndexName.
Edit: reply to your comment.
You cannot override a sorting method in TClientDataSet, but you can add do this:
If you want to do custom sorting on anything else than existing fields, then you have to add a Calculated Field, perform a kind of order calculation in the OnCalcFields event, then add that field to the IndexDefs.
I would try to achieve the desired sort with an SQL statement that feed the ClientDataSet.
For example if I was dealing with the following strings in FieldN
a_1
a_20
a_10
a_2
and I wanted them sorted like this (I assume this is similar to what you mean by logically
a_1
a_2
a_10
a_20
then I would write the SQL as
SELECT FieldA,
FieldB,
... ,
FieldN,
CAST(SUBSTRING(FieldN, 3, 2) TO INTEGER) As FieldM '<== pseudocode
FROM TableA
ORDER BY FieldM
The exact syntax of the SubString and Cast to Integer operations will depend on which DBMS you're using.

Is it faster to constantly assign a value or compare

I am scanning an SQLite database looking for all matches and using
OneFound:=False;
if tbl1.FieldByName('Name').AsString = 'jones' then
begin
OneFound:=True;
tbl1.Next;
end;
if OneFound then // Do something
or should I be using
if not(OneFound) then OneFound:=True;
Is it faster to just assign "True" to OneFound no matter how many times it is assigned or should I do the comparison and only change OneFuond the first time?
I know a better way would be to use FTS3, but for now I have to scan the database and the question is more on the approach to setting OneFound as many times as a match is encountered or using the compare-approach and setting it just once.
Thanks
Your question is, which is faster:
if not(OneFound) then OneFound:=True;
or
OneFound := True;
The answer is probably that the second is faster. Conditional statements involve branches which risks branch mis-prediction.
However, that line of code is trivial compared to what is around it. Running across a database one row at a time is going to be outrageously expensive. I bet that you will not be able to measure the difference between the two options because the handling of that little Boolean is simply swamped by the rest of the code. In which case choose the more readable and simpler version.
But if you care about the performance of this code you should be asking the database to do the work, as you yourself state. Write a query to perform the work.
It would be better to change your SQL statement so that the work is done in the database. If you want to know whether there is a tuple which contains the value 'jones' in the field 'name', then a quicker query would be
with tquery.create (nil) do
begin
sql.add ('select name from tbl1 where name = :p1 limit 1');
sql.params[0].asstring:= 'jones';
open;
onefound:= not isempty;
close;
free
end;
Your syntax may vary regarding the 'limit' clause but the idea is to return only one tuple from the database which matches the 'where' statement - it doesn't matter which one.
I used a parameter to avoid problems delimiting the value.
1. Search one field
If you want to search one particular field content, using an INDEX and a SELECT will be the fastest.
SELECT * FROM MYTABLE WHERE NAME='Jones';
Do not forget to create an INDEX on the column, first!
2. Fast reading
But if you want to search within a field, or within several fields, you may have to read and check the whole content. In this case, what will be slow will be calling FieldByName() for each data row: you should better use a local TField variable.
Or forget about TDataSet, and switch to direct access to SQLite3. In fact, using DB.pas and TDataSet requires a lot of data marshalling, so is slower than a direct access.
See e.g. DiSQLite3 or our DB classes, which are very fast, but a bit of higher level. Or you can use our ORM on top of those classes. Our classes are able to read more than 500,000 rows per second from a SQLite3 database, including JSON marshalling into objects fields.
3. FTS3/FTS4
But, as you guessed, the fastest would be indeed to use the FTS3/FTS4 feature of SQlite3.
You can think of FTS4/FTS4 as a "meta-index" or a "full-text index" on supplied blob of text. Just like google is able to find a word in millions of web pages: it does not use a regular database, but full-text indexing.
In short, you create a virtual FTS3/FTS4 table in your database, then you insert in this table the whole text of your main records in the FTS TEXT field, forcing the ID field to be the one of the original data row.
Then, you will query for some words on your FTS3/FTS4 table, which will give you the matching IDs, much faster than a regular scan.
Note that our ORM has dedicated TSQLRecordFTS3 / TSQLRecordFTS4 kind of classes for direct FTS process.

How filter a dataset based on a nested dataset record count?

I have a Dataset i want to apply a filter based on a dataset-type field record count, something like: 'NESTED_DATASET_FIELD.RecordCount > 0'
If the dataset comes from a SQL based storage engine, use a select distict query on the joined table with only fields from the master table in the result set. Let the SQL engine do the work for you.
Depending on your situation, you can use:
In OnFilterRecord event you can have:
Accept := myDataSetField.NestedDataSet.RecordCount>0;
If you have a SQL backend you can use the Exists or Count in order to fetch only the records which you need. Perhaps is the best approach if you are over a network. However I don't know what infrastructure you have.
In OnFilterRecord event you can have:
Accept := not myDataSetField.IsNull;
//Just testing if the DataSet field is empty - which is one of the fastest ways to do it
...but this depends on the structure of your data / dataset etc.
Sometimes is better to have a dedicated field in your DataSet / Table to specify this status because usually getting such info from the nested dataset can be expensive. (One must fetch it at least partially etc.)
Also, for the same considerations (see 4. above) perhaps you can have a Stored Procedure (if your DB backend permits) to get this info.
HTH

How to remove duplicate records in grid?

Good morning !
What is the best way to remove duplicate records from grid control? I use Delphi 2009 and devEx quantumGrid component.
I tried looping through all the records and when a duplicate record is found then add it to list and apply filter on grid. I found this as time consuming logic. There are also two downsides of this approach.
[1] When the duplicate records are considerably more say 10K records then applying filter takes lot of time, because of lot of entries to filter out.
[2] Looping through all the records is itself time consuming for big result set like 1M rows.
SQL query returns me distinct rows, but when the user hides any column in grid, then it resembles as if there are duplicate records(internally they are distinct).
Is there any other way of doing this?
Any ideas on this are greatly helpful !
Thanks & Regards,
Pavan.
Can you alter your dataset to not return duplicate records in the first place? I would normally only return the records I want displayed instead of returning unwanted records from the database and then using a database grid to try to suppress unwanted records.
With thousands of rows I would add an additional field to the DB called say Sum or Hash or if you can't change the DB add a calculated field if it is a ClientDataSet but this carries overhead at display time
Calculate the contents of the hash field with something fast and simple like a sum of all the chars in your text field. All dupes are now easily identified. Add this field to your Unique or Distinct Query parameters or filter out on that.
Just an Idea.
Checking for duplicates is always a bit tricky, for the reasons you just mentioned. The best way to do it in this particular case is probably to filter before the data reaches the grid.
If this grid is getting its records from a database, try tweaking your SQL query to not return any duplicate records. (The "distinct" keyword can be useful here.) The database server can usually do a much better job of it than you can.
If not, then you're probably loading your result set from some sort of object list. Try filtering the list and culling duplicate objects before you load it into the grid. Then it's over with and you don't have to filter the grid itself. This is a lot less time-consuming.
I have worked with DevExpress's Quantum Grid for some time and their support form http://www.devexpress.com/Support/Center/ is excellent. When you post questions the DevExpress staff will answer you directly. With that said, I did a quick search for you and found some relevant articles.
how to hide duplicate row values: http://www.devexpress.com/Support/Center/p/Q142379.aspx?searchtext=Duplicate+Rows&p=T1|P0|83
highlight duplicate records in a grid: http://www.devexpress.com/Support/Center/p/Q98776.aspx
Unfortunately, it looks like you will have to iterate through the table in order to hide duplicate values. I would suggest that you try to clean the data up before it makes it to the grid. Ideally you would update the code/sql that produces the data. If that is not possible, you could write a TcxCustomDataSource that will scrub the data when it is first loaded. This should have better performance because you will not be using the grid's api to access the data.
Edit
ExpressQuantumGrid will not automatically hide rows that look like duplicates because the user hid a column. See: http://www.devexpress.com/Support/Center/p/Q205956.aspx?searchtext=Duplicate+Rows&p=T1|P0|83.
Poster
For example, I have a dataset which
contains two fields ID and TXT. ID is
a unique field and TXT field may
contain duplicate values. So when the
dataset is connected to the grid with
all columns visible, the records are
unique. See the image1.bmp. But if I
hide the ID column then the grid shows
duplicate rows. See the image2.bmp.
DevExpress Team
I'm sorry, but our ExpressQuantumGrid
Suite doesn't support such a
functionality, because this task is
very specific. However, you can
implement it manually.

Resources