create a procedure that retrieves all indexes on my table and rebuilt - stored-procedures

I want to create a procedure that retrieves all indexes on my table and rebuilt
i retrieves all indexes with this query:
select index_name from user_indexes where table_name='your_table_name'
and i rebuilt with this query:
alter index <index_name> rebuild;
Thx.

create or replace procedure rebuild_indexes(
p_owner in varchar2,
p_table_name in varchar2
) as
begin
for indexes_to_rebuild in
(
select index_name
from all_indexes
where owner = p_owner
and table_name = p_table_name
) loop
execute immediate 'alter index '||p_owner||'.'
||indexes_to_rebuild.index_name||' rebuild';
end loop;
end;
/
Although this will only work with the simplest indexes. There are many restrictions on rebuilding. For example, if the index is partitioned you need to rebuild each partition or subpartition.
And there are many options you may want to consider. For example, use ONLINE if you want others to use the index during the rebuild, add a PARALLEL option to rebuild faster (but this also changes the index's parallel setting, which can cause problems), etc.
Keep in mind that many of the top Oracle experts think rebuilding indexes is usually a waste of time. There are some rare cases where rebuilding can help an index, such as sparse deletions of monotonically increasing values. But most index rebuilding is done because of myths that can be dispelled by reading Richard Foote's presentation Index Internals - Rebuilding the Truth.
Rebuilding will make your indexes initially run faster and look smaller, but that's because of caching and the reduction of overheads like segment space allocation. A week later, your indexes will probably be right back to where they started.

Related

Rails Postgres. Remove indexes before loading data, then re-add them... tidily

We have an importing process which involves loading data from XML files into the database tables of our rails app. We remove the indexes from our tables and then re-add them when loading is complete. This seems to be the recommendation (and seems to be necessary for us) to speed up the loading process. But what is a good tidy way of doing this, which avoids evil duplication? Our current approach feels... kind of clever, but at the same time horribly hacky:
We have 6 different database tables which have total of 42 different indexes defined on them. These are all defined in the Rails migrations and schema.rb, and we want to be able to make changes to these, avoiding duplicating schema definitions elsewhere so...
Current approach:
Before entering the loading logic we have this little bit of black magic:
indexdefs = []
import_tables.each do |table_name|
res = #conn.exec("SELECT indexname, indexdef FROM pg_indexes WHERE tablename='#{table_name}'");
res.each do |row|
indexdefs << row['indexdef']
#conn.exec("DROP index #{row['indexname']}")
end
end
logger.info "#{indexdefs.size} indexes dropped"
Then we load data into the tables (takes a long time), and then...
indexdefs.each do |indexdef|
logger.info "Re-adding index: #{indexdef}"
#conn.exec(indexdef)
end
As mentioned above, the key thing we achieve with this, is no explicit duplication of any knowledge about index definitions/schemas. We query pg_indexes (Postgres internal schema tables) to get details of the index definitions, as they were set-up by migrations, and then we store an array of strings, indexdefs, of SQL CREATE statements which we run later.
...so good and yet so bad.
What's a better way of doing this? Maybe there's frameworks/gems I should be using, or completely different approaches to the problem.

Multi-column index vs seperate indexes vs partial indexes

While working on my Rails app today I noticed that the paranoia gem says that indexes should be updated to add the deleted_at IS NOT NULL as a where on the index creation (github link). But It occurred to me that the inverted condition when I do want with_deleted, won't benefit from the index.
This makes me wonder...
I know that this is somewhat obtuse because the answer is obviously "it depends on what you need" but I am trying to get an idea of the differences between Multi-column index vs separate indexes vs partial indexes on my web app backed by PostgreSQL.
Basically, I have 2 fields that I am querying on: p_id and deleted_at. Most of the time I am querying WHERE p_id=1 AND deleted_at IS NOT NULL - but sometimes I only query WHERE p_id=1. Very seldom, I will WHERE p_id=1 AND deleted_at=1/1/2017.
So, Am I better off:
Having an index on p_id and a separate index on deleted_at?
Having an index on p_id but add 'where deleted_at IS NOT NULL'?
Having a combined index on p_id and deleted_at together?
Note: perhaps I should mention that p_id is currently a foreign key reference to p.id. Which reminds me, in Postgres, is it necessary for foreign keys to also have indexes (or do they get an index derived from being a foreign key constraint - I've read conflicting answers on this)?
The answer depends on
how often you use each of these queries, and how long they are allowed to run
if query speed is important enough that slow data changes can be tolerated.
The perfect indexes for the three clauses are:
WHERE p_id=1 AND deleted_at IS NOT NULL
CREATE INDEX ON mytable (p_id) WHERE deleted_at IS NOT NULL;
WHERE p_id=1 AND deleted_at=1/1/2017
CREATE INDEX ON mytable (p_id, deleted_at);
WHERE p_id=1
CREATE INDEX ON mytable (p_id);
The index created for 2. can also be used for 3., so if you need to speed up the second query as much as possible and a slightly bigger index doesn't bother you, create only the index from 2. for both queries.
However, the index from 3. will also speed up the query in 2., just not as much as possible, so if you can live with a slightly worse performance for the query in 2. and want the index as small and efficient as possible for the query in 3., create only the index in 3.
I would not create both the indexes from 2. an 3.; you should pick what is best for you.
The case with 1. is different, because that index can only be used for the first query. Create that index only if you want to speed up that query as much as possible, and it doesn't matter if data modifications on the table take longer, because an additional index has to be maintained.
Another indication to create the index in 1. is if only a small percentage of rows satisfies deleted_at IS NOT NULL. If not, the index in 1. doesn't have a great advantage over the one in 3., and you should just create the latter.
Having two separate indexes on the two columns is probably not the best choice – they can be used in combination only with a bitmap index scan, and it may well be that PostgreSQL only chooses to use one of the indexes (depends on the distribution, but probably the one on p_id), and the other one is useless.

Is it faster to constantly assign a value or compare

I am scanning an SQLite database looking for all matches and using
OneFound:=False;
if tbl1.FieldByName('Name').AsString = 'jones' then
begin
OneFound:=True;
tbl1.Next;
end;
if OneFound then // Do something
or should I be using
if not(OneFound) then OneFound:=True;
Is it faster to just assign "True" to OneFound no matter how many times it is assigned or should I do the comparison and only change OneFuond the first time?
I know a better way would be to use FTS3, but for now I have to scan the database and the question is more on the approach to setting OneFound as many times as a match is encountered or using the compare-approach and setting it just once.
Thanks
Your question is, which is faster:
if not(OneFound) then OneFound:=True;
or
OneFound := True;
The answer is probably that the second is faster. Conditional statements involve branches which risks branch mis-prediction.
However, that line of code is trivial compared to what is around it. Running across a database one row at a time is going to be outrageously expensive. I bet that you will not be able to measure the difference between the two options because the handling of that little Boolean is simply swamped by the rest of the code. In which case choose the more readable and simpler version.
But if you care about the performance of this code you should be asking the database to do the work, as you yourself state. Write a query to perform the work.
It would be better to change your SQL statement so that the work is done in the database. If you want to know whether there is a tuple which contains the value 'jones' in the field 'name', then a quicker query would be
with tquery.create (nil) do
begin
sql.add ('select name from tbl1 where name = :p1 limit 1');
sql.params[0].asstring:= 'jones';
open;
onefound:= not isempty;
close;
free
end;
Your syntax may vary regarding the 'limit' clause but the idea is to return only one tuple from the database which matches the 'where' statement - it doesn't matter which one.
I used a parameter to avoid problems delimiting the value.
1. Search one field
If you want to search one particular field content, using an INDEX and a SELECT will be the fastest.
SELECT * FROM MYTABLE WHERE NAME='Jones';
Do not forget to create an INDEX on the column, first!
2. Fast reading
But if you want to search within a field, or within several fields, you may have to read and check the whole content. In this case, what will be slow will be calling FieldByName() for each data row: you should better use a local TField variable.
Or forget about TDataSet, and switch to direct access to SQLite3. In fact, using DB.pas and TDataSet requires a lot of data marshalling, so is slower than a direct access.
See e.g. DiSQLite3 or our DB classes, which are very fast, but a bit of higher level. Or you can use our ORM on top of those classes. Our classes are able to read more than 500,000 rows per second from a SQLite3 database, including JSON marshalling into objects fields.
3. FTS3/FTS4
But, as you guessed, the fastest would be indeed to use the FTS3/FTS4 feature of SQlite3.
You can think of FTS4/FTS4 as a "meta-index" or a "full-text index" on supplied blob of text. Just like google is able to find a word in millions of web pages: it does not use a regular database, but full-text indexing.
In short, you create a virtual FTS3/FTS4 table in your database, then you insert in this table the whole text of your main records in the FTS TEXT field, forcing the ID field to be the one of the original data row.
Then, you will query for some words on your FTS3/FTS4 table, which will give you the matching IDs, much faster than a regular scan.
Note that our ORM has dedicated TSQLRecordFTS3 / TSQLRecordFTS4 kind of classes for direct FTS process.

Optimizing JOINs : comparison with indexed tables

Let's say we have a time consuming query described below :
(SELECT ...
FROM ...) AS FOO
LEFT JOIN (
SELECT ...
FROM ...) AS BAR
ON FOO.BarID = BAR.ID
Let's suppose that
(SELECT ...
FROM ...) AS FOO
Returns many rows (let's say 10 M). Every single row has to be joined with data in BAR.
Now let's say we insert the result of
SELECT ...
FROM ...) AS BAR
In a table, and add the ad hoc index(es) to it.
My question :
How would the performance of the "JOIN" with a live query differ from the performance of the "JOIN" to a table containing the result of the previous live query, to which ad hoc indexes would have been added ?
Another way to put it :
If a JOIN is slow, would there be any gain in actually storing and indexing the table to which we JOIN to ?
The answer is 'Maybe'.
It depends on the statistics of the data in question. The only way you'll find out for sure is to actually load the first query into a temp table, stick a relevant index on it, then run the second part of the query.
I can tell you if speed is what you want, if it's possible for you load the results of your first query permanently into a table then of course your query is going to be quicker.
If you want it to be even faster, depending on which DBMS you are using you could consider creating an index which crosses both tables - if you're using SQL Server they're called 'Indexed Views' or you can also look up 'Reified indexes' for other systems.
Finally, if you want the ultimate in speed, consider denormalising your data and eliminating the join that is occurring on the fly - basically you move the pre-processing (the join) offline at the cost of storage space and data consistency (your live table will be a little behind depending on how frequently you run your updates).
I hope this helps.

Optimize Searching Through Rails Database

I'm building a rails project, and I have a database with a set of tables.. each holding between 500k and 1M rows, and i am constantly creating new rows.
By the nature of the project, before each creation, I have to search through the table for duplicates (for one field), so i don't create the same row twice. Unfortunately, as my table is growing, this is taking longer and longer.
I was thinking that I could optimize the search by adding indexes to the specific String fields through which i am searching.. but I have heard that adding indexes increases the creation time.
So my question is as follows:
What is the trade off with finding and creating rows which contain fields that are indexed? I know adding indexes to the fields will cause my program to be faster with the Model.find_by_name.. but how much slower will it make my row creation?
Indexing slows down insertation of entries because its required to add the entry to the index and that needs some ressources but once added they speed up your select queries, thats like you said BUT maybe the b-tree isnt the right choice for you! Because the B-Tree indexes the first X units of the indexed subject. Thats great when you have integers but text search is tricky. When you do queries like
Model.where("name LIKE ?", "#{params[:name]}%")
it will speed up selection but when you use queries like this:
Model.where("name LIKE ?", "%#{params[:name]}%")
it wont help you because you have to search the whole string which can be longer than some hundred chars and then its not an improvement to have the first 8 units of a 250 char long string indexed! So thats one thing. But theres another....
You should add a UNIQUE INDEX because the database is better in finding duplicates then ruby is! Its optimized for sorting and its definitifly the shorter and cleaner way to deal with this problem! Of cause you should also add a validation to the relevant model but thats not a reason to let things lide with the database.
// about index speed
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
You dont have a large set of options. I dont think the insert speed loss will be that great when you only need one index! But the select speed will increase propotionall!

Resources