How to get the list of chunks of a table in informix? - informix

I need to find which chunks occupied by a particular table in a informix database.
My current method is get the result from oncheck -pe dbspace command. But this task is very time consuming when that db-space has many chunks . I need to know is there any single query or quick way to list down the occupied chunks by extends for a particular table

The systabextents within the sysmaster database can be used to determine the chunks associated with a table. An example query:
select distinct te_chunk
from sysmaster:systabextents
where te_partnum != 0 and te_partnum in
(select partnum from systables where tabname = "<table>"
union
select partn from sysfragments f, systables t
where f.tabid = t.tabid and tabname = "<table>"
);
The first part of the union subquery will deal with tables that are not fragmented whilst the second part deals with index partitions and fragmented tables.
To get the chunk path name instead of number this query can be used:
select distinct c.fname
from sysmaster:systabextents te, sysmaster:syschunks c
where te.te_chunk = c.chknum
and te_partnum != 0 and te_partnum in
(select partnum from systables where tabname = "<table>"
union
select partn from sysfragments f, systables t
where f.tabid = t.tabid and tabname = "<table>"
);

Related

How to get the size of a informix table?

I need to find out how much disk space occupied by a given table. How to see it? Suppose i have a table called tb1. Suppose it is currently using 1000 2kb pages. Then table size should be given as 2000kb.
To add to Jonathan's comment if the table does not store data in blob spaces or smart blob spaces then the oncheck -pt command will give the required information. Look at the "Pagesize" and "Number of pages allocated" information for each fragment.
You can also get this information in SQL with a query such as:
select sum(pagesize * nptotal)
from sysmaster:sysptnhdr
where partnum in
( select partnum from systables
where tabname = '<table name>'
union
select partn from sysfragments f, systables t
where f.tabid = t.tabid
and t.tabname = '<table name>' );

PSQL - Select size of tables for both partitioned and normal

Thanks in advance for any help with this, it is highly appreciated.
So, basically, I have a Greenplum database and I am wanting to select the table size for the top 10 largest tables. This isn't a problem using the below:
select
sotaidschemaname schema_name
,sotaidtablename table_name
,pg_size_pretty(sotaidtablesize) table_size
from gp_toolkit.gp_size_of_table_and_indexes_disk
order by 3 desc
limit 10
;
However I have several partitioned tables in my database and these show up with the above sql as all their 'child tables' split up into small fragments (though I know they accumalate to make the largest 2 tables). Is there a way of making a script that selects tables (partitioned or otherwise) and their total size?
Note: I'd be happy to include some sort of join where I specify the partitoned table-name specifically as there are only 2 partitioned tables. However, I would still need to take the top 10 (where I cannot assume the partitioned table(s) are up there) and I cannot specify any other table names since there are near a thousand of them.
Thanks again,
Vinny.
Your friends would be pg_relation_size() function for getting relation size and you would select pg_class, pg_namespace and pg_partition joining them together like this:
select schemaname,
tablename,
sum(size_mb) as size_mb,
sum(num_partitions) as num_partitions
from (
select coalesce(p.schemaname, n.nspname) as schemaname,
coalesce(p.tablename, c.relname) as tablename,
1 as num_partitions,
pg_relation_size(n.nspname || '.' || c.relname)/1000000. as size_mb
from pg_class as c
inner join pg_namespace as n on c.relnamespace = n.oid
left join pg_partitions as p on c.relname = p.partitiontablename and n.nspname = p.partitionschemaname
) as q
group by 1, 2
order by 3 desc
limit 10;
select * from
(
select schemaname,tablename,
pg_relation_size(schemaname||'.'||tablename) as Size_In_Bytes
from pg_tables
where schemaname||'.'||tablename not in (select schemaname||'.'||partitiontablename from pg_partitions)
and schemaname||'.'||tablename not in (select distinct schemaname||'.'||tablename from pg_partitions )
union all
select schemaname,tablename,
sum(pg_relation_size(schemaname||'.'||partitiontablename)) as Size_In_Bytes
from pg_partitions
group by 1,2) as foo
where Size_In_Bytes >= '0' order by 3 desc;

How to use joins and averages together in Hive queries

I have two tables in hive:
Table1: uid,txid,amt,vendor Table2: uid,txid
Now I need to join the tables on txid which basically confirms a transaction is finally recorded. There will be some transactions which will be present only in Table1 and not in Table2.
I need to find out number of avg of transaction matches found per user(uid) per vendor. Then I need to find the avg of these averages by adding all the averages and divide them by the number of unique users per vendor.
Let's say I have the data:
Table1:
u1,120,44,vend1
u1,199,33,vend1
u1,100,23,vend1
u1,101,24,vend1
u2,200,34,vend1
u2,202,32,vend2
Table2:
u1,100
u1,101
u2,200
u2,202
Example For vendor vend1:
u1-> Avg transaction find rate = 2(matches found in both Tables,Table1 and Table2)/4(total occurrence in Table1) =0.5
u2 -> Avg transaction find rate = 1/1 = 1
Avg of avgs = 0.5+1(sum of avgs)/2(total unique users) = 0.75
Required output:
vend1,0.75
vend2,1
I can't seem to find count of both matches and occurrence in just Table1 in one hive query per user per vendor. I have reached to this query and can't find how to change it further.
SELECT A.vendor,A.uid,count(*) as totalmatchesperuser FROM Table1 A JOIN Table2 B ON A.uid = B.uid AND B.txid =A.txid group by vendor,A.uid
Any help would be great.
I think you are running into trouble with your JOIN. When you JOIN by txid and uid, you are losing the total number of uid's per group. If I were you I would assign a column of 1's to table2 and name the column something like success or transaction and do a LEFT OUTER JOIN. Then in your new table you will have a column with the number 1 in it if there was a completed transaction and NULL otherwise. You can then do a case statement to convert these NULLs to 0
Query:
select vendor
,(SUM(avg_uid) / COUNT(uid)) as avg_of_avgs
from (
select vendor
,uid
,AVG(complete) as avg_uid
from (
select uid
,txid
,amt
,vendor
,case when success is null then 0
else success
end as complete
from (
select A.*
,B.success
from table1 as A
LEFT OUTER JOIN table2 as B
ON B.txid = A.txid
) x
) y
group by vendor, uid
) z
group by vendor
Output:
vend1 0.75
vend2 1.0
B.success in line 17 is the column of 1's that I put int table2 before the JOIN. If you are curious about case statements in Hive you can find them here
Amazing and precise answer by GoBrewers14!! Thank you so much. I was looking at it from a wrong perspective.
I made little changes in the query to get things finally done.
I didn't need to add a "success" colummn to table2. I checked B.txid in the above query instead of B.success. B.txid will be null in case a match is not found and be some value if a match is found. That checks the success & failure conditions itself without adding a new column. And then I set NULL as 0 and !NULL as 1 in the part above it. Also I changed some variable names as hive was finding it ambiguous.
The final query looks like :
select vendr
,(SUM(avg_uid) / COUNT(usrid)) as avg_of_avgs
from (
select vendr
,usrid
,AVG(complete) as avg_uid
from (
select usrid
,txnid
,amnt
,vendr
,case when success is null then 0
else 1
end as complete
from (
select A.uid as usrid,A.vendor as vendr,A.amt as amnt,A.txid as txnid
,B.txid as success
from Table1 as A
LEFT OUTER JOIN Table2 as B
ON B.txid = A.txid
) x
) y
group by vendr, usrid
) z
group by vendr;

Most Efficient Version of PLSQL Stored Procedure

I am writing a PL/SQL stored procedure which will be called from within a .NET application.
My stored procedure must return
the count of values in a table of part revisions, based on an input part number,
the name of the lowest revision level currently captured in this table for the input part number
the name of the revision level for a particular unit in the database associated with this part number and an input unit ID.
The unit's revision level name is captured within a separate table with no direct relationship to the part revision table.
Relevant data structure:
Table Part has columns:
Part_ID int PK
Part_Number varchar2(30)
Table Part_Revisions:
Revision_ID int PK
Revision_Name varchar2(100)
Revision_Level int
Part_ID int FK
Table Unit:
Unit_ID int PK
Part_ID int FK
Table Unit_Revision:
Unit_ID int PK
Revision_Name varchar2(100)
With that said, what is the most efficient way for me to query these three data elements into a ref cursor for output? I am considering the following option 1:
OPEN cursor o_Return_Cursor FOR
SELECT (SELECT COUNT (*)
FROM Part_Revisions pr
inner join PART pa on pa.part_id = pr.part_id
WHERE PA.PART_NO = :1 )
AS "Cnt_PN_Revisions",
(select pr1.Revision_Name from Part_Revisions pr1
inner join PART pa1 on pa1.part_id = pr1.part_id
WHERE PA.PART_NO = :1 and pr1.Revision_Level = 0)
AS "Input_Revison_Level",
(select ur.Revision_Name from Unit_Revision ur
WHERE ur.Unit_ID = :2) as "Unit_Revision"
FROM DUAL;
However, Toad's Explain Plan returns Cost:2 Cardinality: 1, which I suspect is due to me using DUAL in my main query. Comparing that to option 2:
select pr.Revision_Name, (select count(*)
from Part_Revisions pr1
where pr1.part_id = pr.part_id) as "Count",
(select ur.Revision_Name
from Unit_Revision ur
where ur.Unit_ID = :2) as "Unit_Revision"
from Part_Revisions pr
inner join PART pa on pa.part_id = pr.part_id
WHERE PA.PART_NO = :1 and pr.Revision_Level = 0
Essentially I don't really know how to compare the results from my execution plans, to chose the best design. I have also considered a version of option 1, where instead of joining twice to the Part table, I select the Part_ID into a local variable, and simply query the Part_Revisions table based on that value. However, this is not something I can use the Explain Plan to analyze.
Your description and select statements look different... I based the procedure on the SQL statements.
PROCEDURE the_proc
(
part_no_in IN NUMBER
, revision_level_in IN NUMBER
, unit_id_in IN NUMBER
, part_rev_count_out OUT NUMBER
, part_rev_name_out OUT VARCHAR2
, unit_rev_name_out OUT VARCHAR2
)
AS
BEGIN
SELECT COUNT(*)
INTO part_rev_count_out
FROM part pa
WHERE pa.part_no = part_no_in
AND EXISTS
(
SELECT 1
FROM part_revisions pr
WHERE pa.part_id = pr.part_id
);
SELECT pr1.revision_name
INTO part_rev_name_out
FROM part_revisions pr1
WHERE pr1.revision_level = revision_level_in
AND EXISTS
(
SELECT 1
FROM part pa1
WHERE pa1.part_id = pr1.part_id
AND pa.part_no = part_no_in
);
SELECT ur.revision_name
INTO unit_rev_name_out
FROM unit_revision ur
WHERE ur.unit_id = unit_id_in;
END the_proc;
It looks like you are obtaining scalar values. Rather than return a cursor, just return the values using clean sql statements. I have done this numerous times from .net, it works fine.
Procedure get_part_info(p_partnum in part.part_number%type
, ret_count out integer
, ret_revision_level out part_revisions.revision_level%type
, ret_revision_name out part_revisions.revision_name%type) as
begin
select count(*) into ret_count from ....;
select min(revision_level) into ret_revision_level from ...;
select revision_name in ret_revision_name...;
return;
end;

Hive Join returning zero records

I have two Hive tables and I am trying to join both of them. The tables are not clustered or partitioned by any field. Though the tables contain records for common key fields, the join query always returns 0 records. All the data types are 'string' data types.
The join query is simple and looks something like below
select count(*) cnt
from
fsr.xref_1 A join
fsr.ipfile_1 B
on
(
A.co_no = B.co_no
)
;
Any idea what could be going wrong? I have just one record (same value) in both the tables.
Below are my table definitions
CREATE TABLE xref_1
(
co_no string
)
clustered by (co_no) sorted by (co_no asc) into 10 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
CREATE TABLE ipfile_1
(
co_no string
)
clustered by (co_no) sorted by (co_no asc) into 10 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
Hi You are using Star Schema Join. Please use your query like this:
SELET COUNT(*) cnt FROM A a JOIN B b ON (a.key1 = b.key1);
If still have issue Then use MAPJOIN:
set hive.auto.convert.join=true;
select count(*) from A join B on (key1 = key2)
Please see Link for more detail.

Resources