Complex query melting my brain! Rails and Postgres - ruby-on-rails

I apologize if I'm missing something really obvious here, but hopefully you'll humour me!
I have these models
Employee - with id, first_name, last_name
Shift Type - with id, shift_name
Date Indices - with id, date
Locations - with id, location
Allocated shifts - with employee_id, shift_type_id, date_index_id, location_id
Now I can write queries that show me allocated shifts and join with locations, names etc. but what I was is to be able to produce a table that takes dates as columns and employees as rows to produce a roster like such
______________________________________________
|employee|date 1 |date 2 | date 3 |
|'dave' |early shift|late shift |day off |
|'martha'|day off |early shift|early shift|
etc.
I'm sure I'm just pretty dumb, but how can I create these 'virtual' columns and link them to the employee?

You are looking for a "pivot" or "crosstab" query. Postgres has the additional module tablefunc for that. More info in this related answer:
PostgreSQL Crosstab Query
And many links to similar questions on SO from there.

Related

How to delete all logs except last 100 for each user in single table?

I have a single logs table which contains entries for users. I want to (prune) delete all but the last 100 for each user. I'd like to do this in the most efficient way (one statement using ActiveRecord if possible).
I know I can use the following:
.order(created_at: :desc) to get the records sorted
.offset(100) to get all records except the ones I want to keep
.ids to pluck the record ids
select(:user_id).distinct to get a list of all users in the table
The table has id, user_id, created_at columns (and others not pertinent to this question).
Each user should have at least the last 100 log entries remaining the logs table.
Not really sure how to do this using ruby syntax with my Log model. If it can't be done efficiently using ruby then I'll resort to using the SQL equivalent.
Any help much appreciated.
In SQL, you could do this:
DELETE FROM logs
USING (SELECT id
FROM (SELECT id,
row_number()
OVER (PARTITION BY user_id
ORDER BY created_at DESC)
AS rownr
FROM logs
) AS a
WHERE rownr > 100
) AS b
WHERE logs.id = b.id;
If the table is large, this will be slow.

Rails: Polymorphic type relation or not?

I have three tables below.
Those are basically "Class Table Inheritance".
Invoices
CardInvoices
BanktransferInvoices
```
Invoices
--------
- id
- total_amount
- invoiceable (either a CardInvoice *or* a BanktransferInvoice)
CardInvoices
--------
- id
- fee_amount
- gateway_error
- ...
BanktransferInvoices
--------
- id
- ...
```
Basically this is a has_one relation Invoice => BankInvoice/CardInvoice.
But I have to extract from other tables and insert data to those tables. (ETL things.)
So I do use raw SQL to insert data.
So I doubt the "Polymathic type model" above is clean or not, because
First,
metadata has to be inserted to make it work. (Ex. Invoice_type)
Second,
Constraints can't be applied.
Third,
Direct join doesn't work.
So I'm thinking about changing the pointing direction.
Invoice <= BankInvoice / CardInvocie
With this, I can use the direct join, and check the constraint.
Now question is,
With the second way, how can I retrieve with BankInvoice / CardInvoice efficiently in Rails 4.2?
(I'm new to Rails. I think left outer join would be good. But the more Invoice types added, the more overhead there are.)
or can you suggest which implementation would be the better solution and why.
Thanks,

Select join table fields in SOLR

I have a SQL query something like this
SELECT
P . ID,
P .code,
l.parent_id
FROM
properties P
LEFT JOIN locations l ON l. ID = P .location_id;
I want to convert this query to SOLR query. I can join two cores by below system
http://example.com:8999/solr/properties/select?q=*:*&fq={!join from=id to=location_id fromIndex=locations}p_id:12345
But I cant select the fields of locations core.How can I do this? Your valuable suggestion will be appreciated.
You can use subquery in the fl. Something like this fl=*,locations:[subquery fromIndex=locations]&locations.q={!terms f=id v=$row.location_id}
More info here https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html#TransformingResultDocuments-subquery
You can't. Solr does not support returning fields from both ends of an join. Solr is not a relational database, so you're usually better off trying to not use it as one.
Instead, index the information about each location to each property, and query based on that.
If any location info changes (which it turns out, usually happens very rarely), reindex the documents assigned that location.

Joining four tables but excluding duplicates

I am trying to join four tables (users, user_payments, content_type and media_content) but I always get duplicates. Instead of seeing for example that user Smith purchased media_content_id_purchase 5011 for a price of 3.99 and he streamed media_content_stream_id 5000 for a price of 0.001 per min, I get:
multiple combinations such as, media_content_id_purchase 5011 costs 3.99, 1.99, 6.99 etc. with media_content_id_stream that also has all sorts of prices.
This is my query:
select u.surname, up.media_content_id_purchase, ct.purchase_price, up.media_content_id_stream, ct.stream_price, ct.min_price
from users u, user_payments up, content_type ct, media_content mc
where u.user_ID=up.user_ID_purchase and
up.media_content_ID_purchase=mc.media_content_ID or up.media_content_ID_purchase is null and
ct.content_type_ID=mc.content_type_ID;
My goal is to display each user and what they have consumed with the corresponding prices.
Thanks!!!
Perhaps you should try using select distinct?
http://www.w3schools.com/sql/sql_distinct.asp
As you can see here select DISTINCT is supposed to show only the different (distinct) values.

Change Data Capture with table joins in ETL

In my ETL process I am using Change Data Capture (CDC) to discover only rows that have been changed in the source tables since the last extraction. Then I do the transformation only for this rows. The problem is when I have for example 2 tables which I want to join into one dimension, and only one of them has changed. For example I have table Countries and Towns as following:
Countries:
ID Name
1 France
Towns:
ID Name Country_ID
1 Lyon 1
Now lets say a new row is added to Towns table:
ID Name Country_ID
1 Lyon 1
2 Paris 2
The Countries table has not been changed, so CDC for these tables shows me only the row from Towns table. The problem is when I do the join between Countries and Towns, there is no row in Countries change set, so the join will result in empty set.
Do you have an idea how to solve it? Of course there might be more difficult cases, involving 3 and more tables, and consequential joins.
This is a typical problem found when doing Realtime Change-Data-Capture, or even Incremental-only daily changes.
There's multiple ways to solve this.
One way would be to do your joins on the natural keys in the dimension or mapping table, to get the associated country (SELECT distinct country_name, [..other attributes..] from dim_table where country_id = X).
Another alternative would be to do the join as part of the change capture process - when a row is loaded to towns, a trigger goes off that loads the foreign key values into the associated staging tables (country, etc).
There is allot i could babble on for more information on but i will be specific to what is in your question. I would suggest the following to get the results...
1st Pass is where everything matches via the join...
Union All
2nd Pass Gets all towns where there isn't a country
(left outer join with a where condition that
requires the ID in the countries table to be null/missing).
You would default the Country ID value in that unmatched join to something designated as a "Unmatched Value" typically 0 or -1 is used or a series of standard -negative numbers that you could assign descriptions to later to identify why data is bad for your example -1 could be "Found Town Without Country".

Resources