Rails: sum should retun nil if given column values is null - ruby-on-rails

Currently sum function in Rails return 0.0, if provided columns data is null
============================================================================
For example:
Tablename: Price
id | name | Cost
-----------------
1 | A | 1200
2 | A | 2500
3 | A | 3000
4 | B | 5000
5 | B | 7000
6 | C |
Now,
Price.group(:name).sum(:cost)
return 6700, 12000, 0.0 , instead of 6700, 12000, nil.
So here I want nil if given columns value is 'null' or empty

SUM is ignoring null values, so columns with NULL values will always be 0 as zero + nothing is 0
To overcome this I have used condition like:
Price.where("cost IS NOT NULL).group(:name).sum(:cost)
This request will get non null cost values and sum them. After that, I can fill with NULL the cost columns of the other records.
This way I can make sure that if cost is actually 0.0 then I get sum(cost) as 0.0 instead of NULL.

Due to the implementation of sum in Rails, it is impossible to get the values before type cast.
However, you can get the expected values by selecting the values with raw SQL:
Price.group(:name).select('name, sum(cost) AS total_cost')

Related

Hive DB - struct datatype join - with different structure elements

I am pretty new in work with Hive DB and struct data types. I used only basic SELECT statements until now.
I need to join two tables to combine them in my SELECT statement.
Tables have struct datatype with same name, but with different elements inside. This is how tables look like:
TABLE 1
table_one(
eventid string,
new struct<color:string, size:string, weight:string, number:string, price:string>,
date string
)
11 | {"color":"yellow", "size":"xl", "weight":"10", "number":"1111", "price":"1"} | 08-21-2004
12 | {"color":"yellow", "size":"xxl", "weight":"12", "number":"2111", "price":"2"} | 08-21-2004
TABLE 2
table_two(
eventid string,
new struct<number:string, price:string>,
date string,
person string)
11 | {"number":"31", "price":"1"} | 08-21-2004 | john
12 | {"number":"32", "price":"2"} | 08-21-2004 | joe
With SELECT query I need to get value of element 'color' from table_one, but instead that, I am getting value of element 'number' from table_two, query is following:
select
s.eventid,
v.date,
s.new.color,
s.new.size
from table_one s join table_two v where s.eventid = v.eventid;
With s.new.color - instead getting for example value 'yellow' from table_one, I am getting value '31' from table_two. How I am supposed to get wanted value from table_one?
Expected result:
11 | 08-21-2004 | yellow | xl
But I got:
11 | 08-21-2004 | 31 | 1
So how can I select proper value from struct datatype from desired table?
(Please have on mind that this is just simplified example of my problem, I didn't provide exact code or structures of tables to make this clearer for one who will try to provide me answer. I need to use join because I need proper values for some column from table_two)

How do i fill a column with specific fixed value for a table using select

I want to select some data, but that data has a column that has NULL values. I want it to change it from null to a specific fixed value without changing the database
i.e.
V_fruits
number | fruit | three
1 | apple | <null>
2 | pinapple | <null>
3 | grape | <null>
4 | lemon | <null>
I want it using
Select "number","fruit",case when "three" is null then three='ofcourse' from V_fruits
I want some guidance on this please it is Psql
Expected
V_fruits
number fruit three
1 apple Of course
2 pinapple Of course
3 grape Of course
4 lemon Of course
Obtained
V_fruits
number fruit three
1 apple false
2 pinapple false
3 grape false
4 lemon false
Instead:
Select "number","fruit",case when "three" is null then 'ofcourse' END from V_fruits
The difference here is that three='ofcourse' is evaluated and returns false because three is null, therefore it can't be 'ofcourse'.
Optionally you could use:
SELECT "number", "fruit", COALESCE(three, 'ofcourse') FROM v_fruits;

PostgreSQL: Order by multiple column weights

I'm using PostgreSQL 9.4. I've got a resources table which has the following columns:
id
name
provider
description
category
Let's say none of these columns is required (save for the id). I want resources to have a completion level, meaning that resources with NULL values for each column will be at 0% completion level.
Now, each column has a percentage weight. Let's say:
name: 40%
provider: 30%
description: 20%
category: 10%
So if a resource has a provider and a category, its completion level is at 60%.
These weight percentages could change at any time, so having a completion_level column which always has the value of the completion level will not work out (there could be million of resources). For example, at any moment, the percentage weight of description could decrease from 20% to 10% and category's from 10% to 20%. Maybe even other columns could be created and have their own weight.
The final objective is to be able to order resources by their completion levels.
I'm not sure how to approach this. I'm currently using Rails so almost all interaction with the database has been through the ORM, which I believe is not going to be much help in this case.
The only query I've found that somewhat resembles a solution (and not really) is to do something like the following:
SELECT * from resources
ORDER BY CASE name IS NOT NULL AND
provider IS NOT NULL AND
description is NOT NULL AND
category IS NOT NULL THEN 100
WHEN name is NULL AND provider IS NOT NULL...
However, there I must per mutate by every possible combination and that's pretty bad.
Add a weights table as in this SQL Fiddle:
PostgreSQL 9.6 Schema Setup:
CREATE TABLE resource_weights
( id int primary key check(id = 1)
, name numeric
, provider numeric
, description numeric
, category numeric);
INSERT INTO resource_weights
(id, name, provider, description, category)
VALUES
(1, .4, .3, .2, .1);
CREATE TABLE resources
( id int
, name varchar(50)
, provider varchar(50)
, description varchar(50)
, category varchar(50));
INSERT INTO resources
(id, name, provider, description, category)
VALUES
(1, 'abc', 'abc', 'abc', 'abc'),
(2, NULL, 'abc', 'abc', 'abc'),
(3, NULL, NULL, 'abc', 'abc'),
(4, NULL, 'abc', NULL, NULL);
Then calculate your weights at runtime like this
Query 1:
select r.*
, case when r.name is null then 0 else w.name end
+ case when r.provider is null then 0 else w.provider end
+ case when r.description is null then 0 else w.description end
+ case when r.category is null then 0 else w.category end weight
from resources r
cross join resource_weights w
order by weight desc
Results:
| id | name | provider | description | category | weight |
|----|--------|----------|-------------|----------|--------|
| 1 | abc | abc | abc | abc | 1 |
| 2 | (null) | abc | abc | abc | 0.6 |
| 3 | (null) | (null) | abc | abc | 0.3 |
| 4 | (null) | abc | (null) | (null) | 0.3 |
SQL's ORDER BY can order things by pretty much any expression; in particular, you can order by a sum. CASE is also fairly versatile (if somewhat verbose) and an expression so you can say things like:
case when name is not null then 40 else 0 end
which is more or less equivalent to name.nil?? 0 : 40 in Ruby.
Putting those together:
order by case when name is not null then 40 else 0 end
+ case when provider is not null then 30 else 0 end
+ case when description is not null then 20 else 0 end
+ case when category is not null then 10 else 0 end
Somewhat verbose but it'll do the right thing. Translating that into ActiveRecord is fairly easy:
query.order(Arel.sql(%q{
case when name is not null then 40 else 0 end
+ case when provider is not null then 30 else 0 end
+ case when description is not null then 20 else 0 end
+ case when category is not null then 10 else 0 end
}))
or in the other direction:
query.order(Arel.sql(%q{
case when name is not null then 40 else 0 end
+ case when provider is not null then 30 else 0 end
+ case when description is not null then 20 else 0 end
+ case when category is not null then 10 else 0 end
desc
}))
You'll need the Arel.sql call to avoid deprecation warnings in Rails 5.2+ as they don't want you to order(some_string) anymore, they just want you ordering by attributes unless you want to jump through some hoops to say that you really mean it.
Sum up weights like this:
SELECT * FROM resources
ORDER BY (CASE WHEN name IS NULL THEN 0 ELSE 40 END
+ CASE WHEN provider IS NULL THEN 0 ELSE 30 END
+ CASE WHEN description IS NULL THEN 0 ELSE 20 END
+ CASE WHEN category IS NULL THEN 0 ELSE 10 END) DESC;
This is how I would do it.
First: Weights
Since you say that the weights can chage from time to time, you have to create an structure to handle the changes. It could be a simple table. For this solution, it will be called weigths.
-- Table: weights
CREATE TABLE weights(id serial, table_nane text, column_name text, weight numeric(5,2));
id | table_name | column_name | weight
---+------------+--------------+--------
1 | resources | name | 40.00
2 | resources | provider | 30.00
3 | resources | description | 20.00
4 | resources | category | 10.00
So, when you need to change categories from 10 to 20 or/and description from 20 to 10, you update this structure.
Second: completion_level
Since you say that you could have millions of rows, it is ok to have completion_level column in the table resources; for efficiency purposes.
Making a query to get the completion_level works, you could have it in a view. But when you need the data fast and simple and you have MILLIONS of rows, it is better to set the data by "default" in a column or in another table.
When you have a view, every time you run it, it recreates the data. When you have it already on the table, it's fast and you don't have to recreate nothing, just query the data.
But how can you handle a completion_level? TRIGGERS
You would have to create a trigger for resources table. So, whenever you update or insert data, it will create the completion level.
First you add the column to the resources table
ALTER TABLE resources ADD COLUMN completion_level numeric(5,2);
And then you create the trigger:
CREATE OR REPLACE FUNCTION update_completion_level() RETURNS trigger AS $$
BEGIN
NEW.completion_level := (
CASE WHEN NEW.name IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='name') END
+ CASE WHEN NEW.provider IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='provider') END
+ CASE WHEN NEW.description IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='description') END
+ CASE WHEN NEW.category IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='category') END
);
RETURN NEW;
END $$ LANGUAGE plpgsql;
CREATE TRIGGER resources_completion_level
BEFORE INSERT OR UPDATE
ON resources
FOR EACH ROW
EXECUTE PROCEDURE update_completion_level();
NOTE: table weights has a column called table_name; it's just in case you want to expand this functionality to other tables. In that case, you should update the trigger and add AND table_name='resources' in the query.
With this trigger, every time you update or insert you would have your completion_level ready so getting this data would be a simple query on resources table ;)
Third: What about old data and updates on weights?
Since the trigger only works for update and inserts, what about old data? or what if I change the weights of the columns?
Well, for those cases you could use a function to recreate all completion_level for every row.
CREATE OR REPLACE FUNCTION update_resources_completion_level() RETURNS void AS $$
BEGIN
UPDATE resources set completion_level = (
CASE WHEN name IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='name') END
+ CASE WHEN provider IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='provider') END
+ CASE WHEN description IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='description') END
+ CASE WHEN category IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='category') END
);
END $$ LANGUAGE plpgsql;
So everytime you update the weights or to update the OLD data, you just run the function
SELECT update_resources_completion_level();
Finally: What if I add columns?
Well, you would have to insert the new column in the weights table and update the functions (trigger and update_resources_completion_level()). Once everything is set, you run the function update_resources_completion_level() to set all weights acording to the changes :D

Returning all results in one column, but using a where statement to filter another column

Using SQlite 3, and have a table that looks like the following
PKEY|TS | A |B |C
1 |00:05:00|200|200|200
2 |00:10:00|100|100|100
3 |00:15:00| |25 |
4 |00:20:00| | |
Currently I'm using
select ts, (a+b+c) from tablename WHERE a !='null' AND b !='null' and c !='null';"
which returns
TS | (a+b+c)
00:05:00|600
00:10:00|300
I want my results to look like the following though:
TS | total
00:05:00|600
00:10:00|300
00:15:00|
00:20:00|
So in other words, I always want to return everything from the TS column, but I don't want to return the total unless A,B, and C have values.
I think I might need a union or a join, but I can't seem to find an example where the results from a single column are always returned.
Testing for a != 'null' will never be true. you need to use a is not null.
However, to return a null total if any of the numbers are null, just do this:
select ts, a+b+c
from tablename;
Why does this work? Because in SQL if any part of an expression is null, the whole expression is null. This makes sense when you consider that null means "unknown". Logically, when part of a calculation is unknown, then the result is unknown.

Selecting a date range where date is not null with Propel

Using Propel I would like to find records which have a date field which is not null and also between a specific range.
N.B. Unfortunately, as this is part of a larger query, I cannot utilise a custom SQL query here.
For example: I may have records like this:
---------------------
| ID | DUE_DATE |
---------------------
| 1 | NULL |
| 2 | 01/01/2010 |
| 3 | 02/01/2010 |
| 4 | NULL |
| 5 | 05/01/2010 |
---------------------
I may want to return all the rows with a due_date between 01/01/2010 and 02/01/2010 but I don't want to return those records where due_date is NULL.
In the example I only want to return rows 2 and 3.
However, Propel seems to overwrite my NOTNULL criteria.
Is it possible to do this with Propel?
Thanks!
Why do you create the separate Criterion objects?
$start_date = mktime(0, 0, 0, date("m") , date("d")+$start, date("Y"));
$end_date = mktime(0, 0, 0, date("m") , date("d")+$end, date("Y"));
$c = new Criteria();
$c->add(TaskPeer::DUE_DATE, $end_date, Criteria::LESS_EQUAL);
$c->addAnd(TaskPeer::DUE_DATE, $start_date, Criteria::GREATER_EQUAL);
$c->addAnd(TaskPeer::DUE_DATE, null, Criteria::ISNOTNULL);
When I try this in Propel 1.2, 1.3 or 1.4, I get the following SQL statement:
SELECT task.TASK_ID, task.DUE_DATE FROM task WHERE ((task.DUE_DATE<=:p1 AND task.DUE_DATE>=:p2) AND task.DUE_DATE IS NOT NULL )
The $c->add() method replaces the current criterion for the given field. You create your Criterions for TaskPeer::DUE_DATE, so they will always replace the previous ones.
I did't get remove the null entries section, I think it will produce: tasks.due_date IS NULL AND tasks.due_date IS NULL.
Anyway, maybe you can use Criteria::CUSTOM to write raw-SQL WHERE clause? Example from Propel documentation:
$con = Propel::getConnection(ReviewPeer::DATABASE_NAME);
$c = new Criteria();
$c->add(ReviewPeer::REVIEW_DATE, 'to_date('.ReviewPeer::REVIEW_DATE.', \'YYYY-MM-DD\') = '.$con->quote($date->format('Y-m-d'), Criteria::CUSTOM);

Resources