1:N joins of KSQL streams and tables - ksqldb

I'm currently using Debezium to stream events from two tables of a commercial application, INVOICE and INVOICE_LINE. As you can guess by the names, there is a 1:N relationship between the two tables, per invoice there will be a line in INVOICE table and N in INVOICE_LINE.
INVOICE
-------------
ID
DESCRIPTION
INVOICE_DATE
STATUS
-------------
INVOICE_LINE
------------
ID
INVOICE_ID
DESCRIPTION
AMOUNT
PRICE
CURRENCY
------------
Invoices are initially created in status 'NEW' but they are not officially finished (and therefore ready for additional processing) until they are in status 'PROCESSED'.
Using the topics created by Debezium, I would like to provide my consumer the invoices that have changed their status to 'PROCESSED' (header and lines). Between the initial creation of the invoice and when the status changes to 'PROCESSED' there could be several days of difference.
Due to this, my strategy was to create a KSQL table with the invoice_lines and join it with a filtered KSQL stream of processed invoices. However, this seems to not be possible as only the key of the KSQL can be used for joining. It is usually recommended to re-key the KSQL table for this case, however as there could be more than one INVOICE_LINE per INVOICE this will not work on my case.
What's the recommended approach for a 1:N join like the one I described above?

Related

Is there a way to consolidate multiple rows of data in Microsoft Access?

I'm very new to Microsoft Access, and I'm trying to figure out how to output data from a table into a format that's more concise and easier to read.
I have transaction data throughout the year for specific customers. Based off of the Customer number and Catalog number, I would like to consolidate multiple instances of a customer purchasing the same product into one row, run a sum in the Quantity column for instances of the same product, and run a sum in the Price column for instances of the same product.
Desired Output
Input
I tried using an append Query with Customer # and Catalog # as the primary key to try to filter out duplicate entries, but this wasn't working. I'm not super familiar with writing functions in Access, but willing to try anything.

joining hr_organization_information and mtl_parameters in Oracle apps

I have seen many joining condition using hr_all_organization_units and mtl_parameters
but is it possible to join hr_organization_information with mtl_parameters?
In documentation I could not get difference on hr_all_organization_units and hr_organization_information
select * from hr_organization_information hou, mtl_parameters mp where
mp.organization_id=hou.organization_id;
Is the above query logically correct in Oracle EBS?
hr_all_organization_units holds all organizations, regardless of their classification, e.g. Operating Units, HR Organizations, Inventory Organizations, etc..
mtl_parameters has records only for Inventory Organizations, to store additional inventory related information.
hr_organization_information is a generic table that stores attributes for each organization, e.g. org_information_context='CLASS' to define the type of organization.
You can link this table directly with mtl_parameters as you did in the example, but you would:
only find records for inventory organizations
have duplicate records as you have more than one type of org_information_context for each organization in the hr_organization_information table.
Please note that one organization in hr_all_organization_units can have different classifications at the same time, e.g. Operating Unit and Inventory Organization.
Here is an example dataset from an Oracle Vision environment, which has one record per organization and shows their classifications in columns G to Q:
https://www.enginatics.com/example/per-organizations/
The PER Organzations Blitz Report shows the link between the organization and the org information table.

Reducing over multiple joins in CouchDB

In my CouchDB database, I have the following models (implemented as documents in the database with different type fields):
Team: name, id (has many matches, has many fans)
Match: name, team_a, team_b, time (has many teams, has many tweets)
Fan: team_id (has many tweets)
Tweet: time, sentiment, fan_id
I want to average the tweet sentiment for each team. If I were using SQL I'd do it like this:
SELECT avg(sentiment)
FROM team
JOIN match on team.id = match.team_a OR team.id = match.team_b
JOIN fan on fan.team = team.id
JOIN tweet on (tweet.time BETWEEN match.time AND match.time + interval '1 hour') AND tweet.user = fan.id
GROUP BY team.id
However in CouchDB you can at best do 1 join in a view function, as explained in the docs (by emitting the join field as the key).
How can this be better modelled in CouchDB to allow for this query to work? I don't really want to denormalise too much, but I guess I will if I have to?
It's a bit complex, but I use what I call "tertiary indexes". The goal is to be able to write a view that is applied to another view. Unfortunately, the only way to do this is to use a view to write data to a secondary database and then have another view that works on that database. Doing this requires an outside process - I use a script that listens to the _changes feed of the primary database, and then updates the relevant documents in the secondary database when something changes.
So in your example your secondary database could consist of a single document for each team with all of the (or the latest) match/fan/tweet data in that one document. Then you write a view that extracts the sentiment (or whatever) from that secondary database.

How to store data in fact table with multiple products in an order in data warehouse

I am trying to design a dimensional modeling for data warehousing for one of my project(Sales Order). I'm new to this concept.
So far, I could understand that the product, customer and date can be stored in the dimension table and the order info will be in the fact table.
Date_dimension table structure will be
date_dim_id, date, week_number, month_number
Product_dimension table structure will be
product_dim_id, product_name, desc, sku
Order_fact table structure will be
order_id, product_dim_id(fk), date_dim_id(fk), order_quantity, order_total_price, etc
If a order is place with 2 or more number of product, will there be repeated entry in the order_fact table for the same order_id, date_dim_id
Please help on this. I'm confused here. I know that in a relational database, order table will have one entry per order and relation between the product and order will be maintained in a different table having the order_id and product_id as the foreign key.
Thanks in advance.
This is a classic case where you should (probbaly) have two fact tables
FactOrderHeader and FactOrderDetail.
FactOrderHeader will have a record for each order, storing information regarding the value of the order and any order level discounts; though they could be expressed as an OrderDetail record in some cases.
FactOrderDetail will have a record for each order line, storing information regard the product, product cost, product sale price, number of items, item discount. etc.
You may need to have a DimOrderHeader as well, if there are non-Fact pieces of information that you want to store, for example, date the order was taken, delivered, paid.

Granularity in Star Schema leads to multiple values in Fact Table?

I'm trying to understand star schema at the moment & struggling a lot with granularity.
Say I have a fact table that has session_id, user_id, order_id, product_id and I want to roll-up to sessions by user by week (keeping in mind that not every session would lead to an order or a product & the DW needs to track the sessions for non-purchasing users as well as those who purchase).
I can see no reason to track order_ids or session_ids in the fact table so it would become something like:
week_date, user_id, total_orders, total_sessions ...
But how would I then track product_ids if a user makes more than one purchase in a week? I assume I can't keep multiple product ids in an array (eg: "20/02/2012","5","3","PR01,PR32,PR22")?
I'm thinking it may have to be kept at 'every session' level but that could lead to a very large amount of data. How would you implement granularity for an example such as above?
Dimensional modelling required Dimensions as well as Facts.
You need a Date/Calendar dimension, which includes columns like this:
calendar (id,cal_date,cal_year,cal_month,...)
The "grain" of your fact table is the key to data storage. If you have transactions, then the transaction should be the grain, and you store one row per transaction. Use proper (integer) surrogate keys to your dimensions, and your table won't be as large as you fear.
Now you can write a query like this, to sum sales of product by year:
select product_name,cal_year,sum(purchase_amount)
from fact_whatever
inner join calendar on id = fact_whatever.calendar_id
inner join product on id = fact_whatever.product_id
group by product_name,cal_year

Resources