Apache Kylin - Can't select partition date from date dimension - kylin

I am doing a project in Apache Kylin and I would like to use a date column from my date dimension as a partition column as I don't have any date columns in my fact table. However, as you can see in the image, it only allows me to select my fact table as partition table, I don't have a choice of selecting any dimension tables:
Also, I want to clarify that I have connected my fact table with date dimension:
I have read here: https://www.programmersought.com/article/5338102381/ that it should be possible to choose the date field either from the fact table or dimension table... So does anyone know why I can't do that?

Kylin (as of 3.1.3) only supports partition column from fact table.
For the example above, the options I can see are
Can the CREATED_DATE_ID be a partition column? Likely not, as it does not look like a date column.
Create a view joining F_QUESTION and D_DATE, and let the view be a new fact table to create kylin model.
At least the second option should work.

Related

Google Sheets: Many to Many Table Join

I am trying to create a sheet to determine the amount of overlapping hours for employees.
I have one table with timeclock data for the Employees.
Table 1
And another with timeclock data for their Support Staff.
Table 2
This is the desired output. Each row from table A has all the date matches from table B. From here I would compute the number of overlapping hours in the final column and then roll that up into another sheet.
Table 3, Desired Output
(apologies for image links, I can't post inline images yet)
Sample sheet here Please let me know if you have any ideas for me!
I know its a combination of QUERY, ARRAYFORMULA, FILTER and more but I just can't find the right combo.
Here's a way of doing this type of join using only built-in functions:
=arrayformula(lambda(employee,support,
lambda(datecomp,
lambda(rows,
{vlookup(index(rows,,1),{row(employee),employee},sequence(1,columns(employee),2),false),
vlookup(index(rows,,2),{row(support),support},sequence(1,columns(support),2),false)})(
split(filter(datecomp,datecomp<>""),"|")))(
flatten(if(index(employee,,1)=transpose(index(support,,1)),row(employee)&"|"&transpose(row(support)),))))(
Employee!A1:D6,Support!A1:E5))
There's a lot going on here, but the basic idea is that we are comparing the date columns of each table against each other in a 2D IF array, and where the dates match we are obtaining the row index of each table. After some manipulations we can use these row indexes on each table in two side-by-side VLOOKUPs to obtain the joined table.
DMac,
I wrote myself a QUERY replacement custom function that uses real SQL select syntax.
For your data it looks something like (You need a tab called employee and a tab called support for this to work) :
=gsSQL("select * from employee full join support on employee.date = support.date")
See my test worksheet: (line 164 on gsSqlTest sheet)
https://docs.google.com/spreadsheets/d/1Zmyk7a7u0xvICrxen-c0CdpssrLTkHwYx6XL00Tb1ws/edit?usp=sharing
You need to add one Apps Script file to your sheet to give you the custom function:
https://github.com/demmings/gsSQL/blob/main/dist/gssql.js
For more help using more features see:
https://github.com/demmings/gsSQL
For example, changing the column titles, it would be like:
select employee.name as name .... (rest of your select).

Google Data Studio: how to create time series chart with custom Big Query query

I have a Data Studio report with a Time Series added. The data source is from a custom query using the Big Query connector:
select user_dim.app_info.app_version, count(1) as count
from [my_app_domain_ANDROID.app_events_20160929]
group by 1
According to the Data Studio documentation at: https://support.google.com/360suite/datastudio/answer/6370296?hl=en
BigQuery supports querying across multiple tables, where each table has a single day of data. The tables have the format of YYYYMMDD. When Data Studio encounters a table that has the format of YYYYMMDD, the table will be marked as a multi-day table and only the name prefix_YYYYMMDD will be displayed in the table select.
When a chart is created to visualize this table, Data Studio will automatically create a default date range of the last 28 days, and properly query the last 28 tables. You can configure this setting by editing the report, selecting the chart, then adjust the Date Range properties in the chart's
However, in the Time Series Properties DATA tab, there no no valid "Time Dimension" to select. According to the documentation, I should not need to select a Time Dimension. It should query the right table automatically.
Something I am not understanding yet?
There are 2 issues with the query in the question:
To get a time series, you'll need to add a time based column to the custom query.
For example:
SELECT created_at, COUNT(*) c
FROM [githubarchive:day.20160930]
WHERE type='WatchEvent'
GROUP BY 1
Data Studio won't do the 28 day expansion with custom queries. To get the expansion featured in the documentation, you need to point to an actual table (and Data Studio will figure out the prefix and date expansion).
I left a working example at:
https://datastudio.google.com/open/0ByGAKP3QmCjLSjBPbmlkZjA3aUU

Joining Two Data-sets in PowerPivot by Month

I've got 2 different data sets, revenue and contracts sold, that I need to join based off of year and month in PowerPivot so when I use my slicers, they'll filter accordingly. I know part of this will involve coming up with some temp tables for year and month but I can't get those to work. In the contracts sold table, there is an actual date column which I'm then using to format the year/month in "MM-MMM" format:
However, the revenue comes in only as a YYYYMM format:
So the solution would have to take into account this aspect as well. It's been a while since I've dealt with PowerPivot and I recall the PowerPivotPro or Kasep de Jonge's site containing something about linking tables based off of common month but I can't find those pages anymore. If anyone could point me in the right direction or give me some insight, it'd be greatly appreciated.
I'm using Excel 2010 with PowerPivot version 11.0.3000.0.
Thanks,
Joshua
Joshua, I think the solution can be quite simple:
In the contracts sold table, create a new calculated column (a new column within a powerpivot window) that would give you the same date format as is in the revenue table (YYYYMM).
Use Create Time Dimension app in Excel 2013 -- this app creates a date-table with unique dates which makes everything much easier. As with the other table, create a new calculated column with the same format (YYYYMM).
Make a relationship between those tables -- the date table will be linked to revenue as well as contracts.
Created required measures (like sums of revenue, number of contracts etc.).
Place a new pivot table - rows will probably be date-based (YYYYMM), with measures coming from both tables it should be easy to create a report that you need.

how to report column across not down in crystal report

I need create a crystal report to report the column across not down. The report itself is very simple, there is no need to group and summarize. The only thing is different from regular report is it need display the column across rather than down. I try use cross-tab, or multiple columns with no success. is there any way i can make is down in crystal? Thanks
The regular report with column down:
I need display like these:
try the following
put the the 4th names of the first column in the page header and make that section underlay the following section, then create 4 details sections, one per column and put the values values right next below and make the details to grow across then down.
for instance
page header
Mth
Vendor
Trans#
Amt
end of page header
details1 date field
details2 vendor field
details3 trans field
details4 amt field

Summarized data between row labels in PowerPivot (V2) table?

What I have is:
A flat PowerPivot (V2) table to show the data as an ordinary Excel table (very much simplified, it's much wider):
|Starting date|Container|Color|Price|Price inc Tax|
|01.01.2009|container 240|blue|2,50 €|3,05 €|
|01.01.2009|container 240|red |3,60 €|4,39 €|
|01.01.2009|container 360|blue|4,20 €|5,12 €|
Might it be possible to format PowerPivot table so that the summarized columns are not in the end of a row? I'm trying to make a price list/catalog tool. There are a lot of fields in the table and some are less important and I'd like them to be shown after the prices. Starting date, Container and Color are column labels and the Price and Price Tax are summarized data.
Narutally I can't move the summarized data from Value area to Row or Column area in the field list, but is there any other way to reorder the columns so that I get the summarized data e.g. in between Starting date and Container?
Thanks!
Its possible but not totally straightforward.
Writing a measure that returns text is easy for instance:
=VALUES(Table1[Container])
....will return the text from a Column called Container in 'Table1' but ONLY if the context which has been established means that there is only one value for container (VALUES returns a single column table of all values that haven't been filtered out by the current context).
To make this robust you would need to trap errors so your whole formula would look like:
=IF(COUNTROWS(VALUES(Table1[Container]))>1,BLANK(),VALUES(Table1[Container]))
Once perfected this measure can be place after the more important data.
HTH
Jacob

Resources