MonetDB : Partitioning Data by Time Intervals - time-series

I am using MonetDB for time series data. I want to partition the tables by time intervals (e.g. by day). Each partition would therefore contain data of a particular day. That ideally would later accelerate the query runtime.
From the documentation, I can see that MonetDB provides partitioning as a feature, by I couldn't know how to implement it, I have tried for example: PARTITION BY DAY, such is implemented by other systems, by that didn't work.
How could a table be partitioned using a fixed time period interval in MonetDB?

In MonetDB there is no PARTITION BY DAY. But what you can do, is
CREATE MERGE TABLE merge_table (d DATE) PARTITION BY VALUES USING (d);
CREATE TABLE part_table_30_05_30(d DATE);
CREATE TABLE part_table_30_05_31(d DATE);
ALTER TABLE merge_table ADD TABLE part_table_30_05_30 AS PARTITION IN ('2022-05-30');
ALTER TABLE merge_table ADD TABLE part_table_30_05_31 AS PARTITION IN ('2022-05-31');
INSERT INTO merge_table VALUES ('2022-05-31');
SELECT * FROM merge_table;
+------------+
| d |
+============+
| 2022-05-31 |
+------------+
1 tuple
SELECT * FROM part_table_30_05_31;
+------------+
| d |
+============+
| 2022-05-31 |
+------------+
1 tuple
SELECT * FROM part_table_30_05_30;
+---+
| d |
+===+
+---+
0 tuples

Related

TABLEAU - Joins on the fly on raw data

I have been trying to perform joins on the fly in Tableau to perform some online computation - with no luck so far.
I wonder if any of you is aware of a way to achieve this?
I have a typical transactions dataset ("MYDATA"), with user ID (user's identifier), transaction date (when the transaction occurred), and purchases (the transactions). Something like:
ID TRANSACTION DATE PURCHASES
123 20/03/2020 1
123 22/03/2020 4
234 20/03/2020 10
234 22/03/2020 1
345 22/03/2020 5
What I would like to achieve is to add to it a variable with the SUM of PURCHASES by ID (say field "PURCHASES PER ID").
Then, critically, I'd like to make this computation update dynamically as I filter by different values in TRANSACTION DATE from the UI.
Ultimately I'd like to create a chart displaying the count of users (field "ID") in each value of the field "PURCHASES PER ID" (like bins), where "PURCHASES PER ID" is re-computed according to the date ranges selected in the worksheet.
Something like:
Case 1 : FILTER Transaction date = 20/03/2020 AND 22/03/2020
|---------------------|------------------|
| count OF ID | SUM of PURCHASES |
|---------------------|------------------|
| 2 | 5 |
|---------------------|------------------|
| 1 | 11 |
|---------------------|------------------|
Case 2 : FILTER Transaction date = 20/03/2020
|---------------------|------------------|
| count OF ID | SUM of PURCHASES |
|---------------------|------------------|
| 1 | 1 |
|---------------------|------------------|
| 1 | 10 |
|---------------------|------------------|
I'd expect this to be doable in Tableau, as I'm able to it with a much more simple (and cheaper) tool like Google Data Studio.
In Data Studio I'd simply do a join between "MYDATA" and the sum of PURCHASES grouped by ID - using ID as KEY. Then, I'd able to use that calculated sum of purchases as a dimension, and count the IDs in it.
Are you aware of a way to achieve the same in Tableau?
Many thanks
Think I got it.
My solution was:
Columns: ({FIXED [ID]: SUM([PURCHASES])})
Rows: CNTD(ID)
Filters: Add TRANSACTION DATE to Context
This allows me to achieve the view I wanted to.

How can I perform the same operation on all cells in a row?

I am trying to calculate the cost of products based on the amount of products sold (in one row) and the cost of each item (in another row).
I have written a simple formula, but every time I add or remove columns, it must be manually adjusted.
=IF(COUNT(E4:AC4)>0,(E4*$E$3+F4*$F$3+G4*$G$3+H4*$H$3+I4*$I$3+J4*$J$3+K4*$K$3+L4*$L$3+M4*$M$3+N4*$N$3+O4*$O$3+P4*$P$3+Q4*$Q$3+R4*$R$3+S4*$S$3+T4*$T$3+U4*$U$3+V4*$V$3+W4*$W$3+X4*$X$3+Y4*$Y$3+Z4*$Z$3+AA4*$AA$3+AB4*$AB$3+AC4*$AC$29), "")
This is an example of a problem best solved by ARRAYFORMULA
Take the table
______|_$5_|_$7_|_$2_|_$3_|_$5_|__TOTAL__
-----------------------------------------
Bob | | 2 | | 1 | | ?
-----------------------------------------
Alice | | | 2 | | | ?
-----------------------------------------
Eve | 1 | | 1 | | 3 | ?
How do we solve the total cost for each row?
In the total column for Bob's row (2), simply invoking
=SUM(ARRAYFORMULA(B2:F2*B$1:F$1))
Will accurately give us his total cost; $7*2 + $3*1 = $17.
Specifically, ARRAYFORMULA(B2:F2*B$1:F$1) will give us a range composed of B2*B1 | C2 * C1 | D2 * D1 ..., which you could use e.g. in line below Bob's order to show the price breakdown by item. SUM() adds those numbers together. You could further add to this formula to add taxes, gratuity, shipping, service fees, etc.
Now that we have this formula, we can simply copy this down the column into each new row in the 'Total' column.
When a new column is inserted to the left, the formula will be automatically adjusted by the spreadsheet to be the new range.

Count unique cells and display them in column

I am building a list of gigs I attended and I want to count how many times I've seen each band.
I know about UNIQUE, but because I keep each band in separate column it just copies each row.
Given the table (or screenshot of real data):
| Date | Venue | Bands |
|----------|--------|--------|--------|--------|--------|--------|
| 02.02.17 | Venue1 | Band A | Band B | Band C | Band D | Band E |
| 02.07.17 | Venue3 | Band D | Band C | | | |
The output I want:
| Band | Attended |
| | (times) |
|--------|----------|
| Band A | 1 |
| Band B | 1 |
| Band C | 2 |
| Band D | 2 |
| Band E | 1 |
I can change structure if needed.
What happens after using UNIQUE: https://i.stack.imgur.com/qmszk.png
Thanks in advance.
Step 1. Get list of all unique bands in one column, one per row
=ArrayFormula(UNIQUE(TRANSPOSE(SPLIT(CONCATENATE(Gigs!D2:Z&CHAR(9)); CHAR(9)))))
Step 2. Place this formula in next column, and drag it down
=SUM(COUNTIF(Gigs!D:Z; E2))
Transform your data to a simple table format in order to make easier to do data-analysis.
A simple table use the first row for column headers a.k.a. fields and has one and only one column for each entity, let say only one column for band names.
The above could be done in a single but complex formula hard to debug, so it's better to start by doing this using simple formulas and once you are certain that all is working fine, think about making a complex formula or writing and script.
Related
Unpivot Matrix to Tabular. Using counts of two variables into individual rows
Generate a list of all unique values of a multi-column range and give the values a rating according to how many times they appear in the last X cols
Normalize (reformat) cross-tab data for Tableau without using Excel
How do you create a "reverse pivot" in Google Sheets?

Is it possible to aggregate expenses per category in google spreadsheets

I am trying to track my expenses manually. I looked for already built options and I did not find anyone I knew how to use or that it covered what I want to do.
What I am doing is basically manually write down what appears in my bank, with the intention of categorizing the expenses myself, since as I said, I did not find a better way to do it.
So it looks like this:
Cinema | 11.95
Going out (restaurant1) | 26.55
Netflix | 13.95
Weekly purchases | 72.66
Going out | 9
Bill (type) | 29.16
Rent month | 650
Going out | 26.55
Bill (type2) | 66.45
Compra semanal | 81.09
Bill (type3) | 21.1
( "|" is used as if it were two different cells) And what I would like now is to take the generic name that I gave the cathegory (without the parenthesis, I am using those for myself, so I can track where was the money spent, more specifically), and how much was spent.
In programming I would do this with a regex for the left cell, and aggregating by name, and then plotting the data somehow. I am unsure if this is even possible, maybe I should use Excel but Drive has the cloud advantage so I would like some help as to where to start, I do not need anything too fancy, a new column with the category and the total spent would work wonders for me, but I have not found an easy way of doing it (and I doubt I am doing something so complex, so I assume I am thinking this the wrong way). Best case scenario, I manage to plot it all so it is more visual, or I can have several columns plotted against each other (I have different columns for shared expenses, personal expenses, and so on).
If you can put the category (e.g. Bill) in a separate column from your details (e.g. type 1) then the Pivot Table feature is exactly what you need.
Start with something like this (the heading on each column is important):
Category | Details | Amount
Cinema | | 11.95
Going out | restaurant 1 | 26.55
Netflix | | 13.95
Weekly purchases | | 72.66
Going out | | 9
Bill | type | 29.16
Rent month | | 650
Going out | | 26.55
Bill | type2 | 66.45
Compra semanal | | 81.09
Bill | type3 | 21.1
Then click Data, Pivot Table. Under Rows, click Add and choose Category. Under Values, click Add and choose Amount. You should see a table like this:
Category | SUM of Amount
Bill| 116.71
Cinema | 11.95
Compra semanal | 81.09
Going out | 62.1
Netflix | 13.95
Rent month | 650
Weekly purchases | 72.66
Grand Total | 1008.46
Any unique value in the Category column creates a new row in the pivot table.
Further Details: https://support.google.com/docs/answer/1272900
=ARRAYFORMULA(QUERY({REGEXREPLACE(TRIM(A:A)," \(.*\)",),B:B},"Select Col1,sum(Col2) where Col1 is not null group by Col1"))

Structuring a query between multiple tabs to join values by name

I'm trying to write a SQL query in Google Sheets to try and get data for "matching" results from two different tabs, but running into some trouble.
This is a sheet that's basically an automated scoring engine for instructors who take a two-part test (written and practical). After the results are entered, I'd like to use some SQL to take the results from the two tabs and collate them into a final score.
Link to the sheet in question.
There's a "Practical Scores" tab (which takes all the data from the associated Google Form), and a "Written Scores" tab. I'd like to get the name of the instructors who match in both those tabs, and give the associated score for them, but I'm mostly having trouble with writing the correct SQL.
Most of what I'm trying to do is working fine. I'm able to pull the final practical scores via the following SQL:
=query(PracticalScores!A2:E, "select A, count(E),SUM(E)/3 group by A")
I can also pull the written scores as follows:
=query('Written Scores'!B2:C,"select B,C")
But I want the intersection of the two as well, and that's where I'm running into problems.
=query(A8:E, "select A,C,D where A = E")
will simply return the rows where the names match up, and I want the instances where the names match up, regardless of whether the rows do.
That is, I want all the rows where the names match from tab 1 to tab 2 and not just the few rows that happen to line up perfectly.
If I'm not explaining this well, please let me know and I can provide additional information. Any assistance would be very greatly appreciated!
Since the query function does not support joins, this can't all be done in one query. Instead, the following device can be used:
=arrayformula(vlookup(name column, table, # of column to extract, False))
For example, suppose I have a table
+---+-------+---+
| | A | B |
+---+-------+---+
| 2 | Jim | 3 |
| 3 | Sarah | 4 |
| 4 | Bob | 5 |
+---+-------+---+
to which I want to add another column, taking it from
+---+-------+---+
| | E | F |
+---+-------+---+
| 2 | Sarah | 9 |
| 3 | Bob | 8 |
| 4 | Jim | 7 |
+---+-------+---+
The basic idea is to put in cell C2 the formula
=arrayformula(vlookup(A2:A, E2:F, 2, false))
which will look up every name from first table (column A) in the column E, and return the matching value in column F. Result:
+---+-------+---+---+
| | A | B | C |
+---+-------+---+---+
| 2 | Jim | 3 | 7 |
| 3 | Sarah | 4 | 9 |
| 4 | Bob | 5 | 8 |
+---+-------+---+---+
In practice, one should filter out empty lookup values to improve performance:
=arrayformula(vlookup(filter(A2:A, len(A2:A)), E2:F, 2, false))
If the second table contains some names not present in the first, they will not be returned by the above formula. In this case it is better to prepare a full list of names, for example with
=sort(unique({Sheet1!A2:A; Sheet2!A2:A}))
which collects the names from A columns of two sheets, eliminating duplicates and sorting. Then look up those using vlookup as above.

Resources