Merging Two Files Together and Retaining All Columns

Merging Two Files Together and Retaining All Columns - join

I think this surely must be a simple thing to achieve, but I have tried various appends and merges and can't seem to get it right.
I have two files, one titled 'Previous' and one titled 'Current'. Both show near identical data, like so :
ID Status Date_Changed
1 Closed 10/11/21
2 Open 10/01/21
3 Closed 10/03/21
4 Pending 10/15/21
I'd like to merge both files together, but retain all columns so that it is structured as below. This will allow me to show tables of what has changed etc.
ID Previous.Status Current.Status Previous.Date_Changed Current.Date_Changed
1 Closed Open 10/11/21 10/15/21
2 Open Closed 10/01/21 10/15/21
3 Closed Pending 10/03/21 10/14/21
I am aware this is probably due to my own naivety with PowerBI. I have tried combining the data by connecting to the folder, but that seems to create a new dataset with the data stacked on top (ie with duplicate ID values). I tried using merge queries as new and joiningby ID, but that didn't seem to give me the right output either?

You can start from the Current table and merge in the previous table joining on ID and then expand the columns. Rename and reorder columns as desired.
Here's an example you can paste into the Advanced Editor:
let
CurrentSource = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfIvSM0DUoYG+oam+kYGRoZKsTrRSkZAIeec/OLUFEw5Y6BQQGpeSmZeOlTSBCFpglMyFgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Status = _t, Date_Changed = _t]),
Current = Table.TransformColumnTypes(CurrentSource,{{"ID", Int64.Type}, {"Status", type text}, {"Date_Changed", type date}}),
PreviousSource = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXLOyS9OTQEyDA30DQ31jQyMDJVidaKVjIBC/gWpeRAZAyQZYzRdBsYIOROgUEBqXkpmXjrUSFOoZCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Status = _t, Date_Changed = _t]),
Previous = Table.TransformColumnTypes(PreviousSource,{{"ID", Int64.Type}, {"Status", type text}, {"Date_Changed", type date}}),
#"Merged Queries" = Table.NestedJoin(Current, {"ID"}, Previous, {"ID"}, "Previous", JoinKind.LeftOuter),
#"Expanded Previous" = Table.ExpandTableColumn(#"Merged Queries", "Previous", {"Status", "Date_Changed"}, {"Previous.Status", "Previous.Date_Changed"}),
#"Renamed Columns" = Table.RenameColumns(#"Expanded Previous",{{"Status", "Current.Status"}, {"Date_Changed", "Current.Date_Changed"}}),
#"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"ID", "Previous.Status", "Current.Status", "Previous.Date_Changed", "Current.Date_Changed"})
in
#"Reordered Columns"
Note: I've defined Previous within the query above so that it's self-contained. Ordinarily, it would be a separate query.

You can try the following steps:
create a new column called "Source Name" in each table with a constant value mentioning if the data if from "Previous" or "Current"
Then append both the table in Power Query. This will still stack both the tables on top of each other.
But now you can use the "Source Name" column to differentiate them in the Matric Visual. You can add a Matrix visual with
"Source Name" in the Column Field
"Date" & "Status" in values field
"ID" in Rows field
This is ideal because you get all the data in a tabular format which will help you in further calculations if necessary
You can check out an example screenshot below:

Related

[Google Data Studio]: Can't create histogram as bin dimension is interpreted as metric

I like to make a histogram of some data that is saved in a nested BigQuery table. In a simplified manner the table can be created in the following way:
CREATE TEMP TABLE Sessions (
id int,
hits
ARRAY<
STRUCT<
pagename STRING>>
);
INSERT INTO Sessions (id, hits)
VALUES
( 1,[STRUCT('A'),STRUCT('A'),STRUCT('A')]),
( 2,[STRUCT('A')]),
( 3,[STRUCT('A'),STRUCT('A')]),
( 4,[STRUCT('A'),STRUCT('A')]),
( 5,[]),
( 6,[STRUCT('A')]),
( 7,[]),
( 8,[STRUCT('A')]),
( 9,[STRUCT('A')]),
(10,[STRUCT('A'),STRUCT('A')]);
and it looks like
id
hits.pagename
1
A
A
A
2
A
3
A
A
and so on for the other ids. My goal is to obtain a histogram showing the distribution of A-occurences per id in data studio. The report for the MWE can be seen here: link
So far I created a calculated field called pageviews that evaluates the wanted occurences for each session via SUM(IF(hits.pagename="A",1,0)). Looking at a table showing id and pageviews I get the expected result:
table showing the number of occurences of page A for each id
However, the output of the calculated field is a metric, which might cause trouble for the following. In the next step I wanted to follow the procedure presented in this post. Therefore, I created another field bin to assign my sessions to bins according to the homepageviews as:
CASE
WHEN pageviews = 0 OR pageviews = 1 THEN "bin 1"
WHEN pageviews = 2 OR pageviews = 3 THEN "bin 2"
END
According to this bin-defintion I hope to obtain a histogram having 6 counts in bin 1 and 4 counts in bin 2. Well, in this particular example it will actually have 4 counts in bin one as ids 5 and 7 had "null" entries, but never mind. This won't happen in my real world table.
As you can see in the next image showing the same table as above, but now with a bin-column, this assignment works as well - each id is assigned the correct bin, but now the output field is a metric of type text. Therefore, the bar-chart won't let me use it (it needs it as dimension).
Assignment of each id to a bin
Somewhere I read the workaround to create a selfjoined blend, which outputs metrics as dimension. This works only by name: my field is now a dimension and i can use it as such for the bar-chart, but the bar chart won't load and shows a configuration error of the data source, which can be seen in this picture:
bar-chart of id-count over bin. In the configuration of the chart one can see that "bin" is now a dimension. The chart won't plot, however, as it shows a data configuration error (sorry for the German version of data studio).

How do i remove rows based on comma-separated list of values in a Power BI parameter in Power Query?

I have a list of data with a title column (among many other columns) and I have a Power BI parameter that has, for example, a value of "a,b,c". What I want to do is loop through the parameter's values and remove any rows that begin with those characters.
For example:
Title
a
b
c
d
Should become
Title
d
This comma separated list could have one value or it could have twenty. I know that I can turn the parameter into a list by using
parameterList = Text.Split(<parameter-name>,",")
but then I am unsure how to continue to use that to filter on. For one value I would just use
#"Filtered Rows" = Table.SelectRows(#"Table", each Text.StartsWith([key], <value-to-filter-on>))
but that only allows one value.
EDIT: I may have worded my original question poorly. The comma separated values in the parameterList can be any number of characters (e.g.: a,abcd,foo,bar) and I want to see if the value in [key] starts with that string of characters.

Try using List.Contains to check whether the starting character is in the parameter list.
each List.Contains(parameterList, Text.Start([key], 1)
Edit: Since you've changed the requirement, try this:
Table.SelectRows(
#"Table",
(C) => not List.AnyTrue(
List.Transform(
parameterList,
each Text.StartsWith(C[key], _)
)
)
)
For each row, this transforms the parameterList into a list of true/false values by checking if the current key starts with each text string in the list. If any are true, then List.AnyTrue returns true and we choose not to select that row.

Since you want to filter out all the values from the parameter, you can use something like:
= Table.SelectRows(#"Changed Type", each List.Contains(Parameter1,Text.Start([Title],1))=false)
Another way to do this would be to create a custom column in the table, which has the first character of title:
= Table.AddColumn(#"Changed Type", "FirstChar", each Text.Start([Title],1))
and then use this field in the filter step:
= Table.SelectRows(#"Added Custom", each List.Contains(Parameter1,[FirstChar])=false)
I tested this with a small sample set and it seems to be running fine. You can test both and see if it helps with the performance. If you are still facing performance issues, it would probably be easier if you can share the pbix file.

This seems to work fairly well:
= List.Select(Source[Title], each Text.Contains(Parameter1,Text.Start(_,1))=false)
Replace Source with the name of your table and Parameter1 with the name of your Parameter.

Virtual StringTreeView Component primary column editing and selection issue

I have created a Virtual StringTreeview component with multiple columns. Most columns including the primary [0] column are set as coEditable = true. I have a set up a test with 20 nodes. Once started I can successfully edit the primary column for multiple nodes provided I point to the primary column to pick it. However, once I have selected another node by pointing at another column, I can no longer edit the primary column text on any node. When I select a new row pointing to the primary column, the focus shifts to the selected row, but the previous column used is highlighted instead of the primary column just selected. IF I select a row using any other column, the focus shifts to the correct column and I can edit the string.
TreeOptions.EditOPtions is set to toDefaultEdit. IN addition, the column also contains an image (although I do not see how that changes anything).
Can anyone shed light on any additional settings required or steps I have missed to control focus properly?
EDIT: Added current settings
VStringTree Options:
Auto Options = [toAutoDropExpand,toAutoScrollOnExpand,toAutoSort,toAutoTristateTracking,toAutoDeleteMovedNodes,toDisableAutoscrollOnFocus,toAutoChangeScale]
Edit Options = toDefaultEdit
Selection Options = [toExtendedFocus]
Misc. Options = [toEditable,toFullRepaintOnResize,toInitOnSave,toToggleOnDblClick,toWheelPanning,toEditOnClick]
String Options = [toSaveCaptions,toAutoAcceptEditChange]
Header Options:
[hoColumnResize,hoDblClickResize,hoDrag,hoHotTrack,hoShowSortGlyphs,hoVisible]
Column Options:
[coAllowClick,coEnabled,coParentBidiMode,coParentColor,coResizable,coShowDropMark,coVisible,coEditable]

Get Rapidminer to transpose/pivot a single attribute/column in a table

I have a table that looks like the following:
ID City Code
"1005AE" "Oakland" "Value1"
"1006BR" "St.Louis" "Value2"
"102AC" "Miami" "Value1"
"103AE" "Denver" "Value3"
And I want to transpose/pivot the Code examples/values into column attributes like this:
ID City Value1 Value2 Value3
"1005" "Oakland" 1 0 0
"1006" "St.Louis" 0 1 0
"1012" "Miami" 1 0 0
"1030" "Denver" 0 0 1
Note that the ID field is numeric values encoded as strings because Rapidminer had trouble importing bigint datatypes. So that is a separate issue I need to fix--but my focus here is the pivoting or transposing of the data.
I read through a few different Stackoverflow posts listed below. They suggested the Pivot or Transpose operations. I tried both of these, but for some reason I am getting either a huge table which creates City as a dummy variable as well, or just some subset of attribute columns.
How can I set the rows to be the attributes and columns the samples in rapidminer?
Rapidminer data transpose equivalent to melt in R
Any suggestions would be appreciated.

In pivoting, the group attribute parameter dictates how many rows there will be and the index attribute parameter dictates what the last part of the name of new attributes will be. The first part of the name of each new attribute is driven by any other regular attributes that are neither group nor index and the value within the cell is the value found in the original example set.
This means you have to create a new attribute with a constant value of 1; use Generate Attributes for this. Set the role of the ID attribute to be ID so that it is no longer a regular attribute; use Set Role for this. In the Pivot operator, set the group attribute to be City and the index attribute to be Code. The end result is close to what you want. The final steps are, firstly to set missing values to be 0; use Replace Missing Values for this and, secondly to rename the attributes to match what you want; use Rename for this.
You will have to join the result back to the original since the pivot operation loses the ID.
You can find a worked example here http://rapidminernotes.blogspot.co.uk/2011/05/worked-example-using-pivot-operator.html

ISQL Perform instruction: after editadd editupdate of table vs. after add update of table

INFORMIX-SQL 7.3 Perform Screens:
According to documentation, in an "after editadd editupdate of table" control block, its instructions are executed before the row is added or updated to the table, whereas in an "after add update of table" control block, its instructions are executed after the row has been added or updated to the table. Supposedly, this would mean that any instructions which would alter values of field-tags linked to table.columns would not be committed to the table, but field-tags linked to displayonly fields will change?
However, when using "after add update of table", I placed instructions which alter values for field-tags linked to table.columns and their displayed and committed values also changed! I would have thought that an "after add update of table" would only alter displayonly fields.
TABLES
customer
transaction
branch
interest
dates
ATTRIBUTES
[...]
q = transaction.trx_type, INCLUDE=("E","C","V","P","T"), ...;
tb = transaction.trx_int_table,
LOOKUP f1 = ta_days1_f,
t1 = ta_days1_t,
i1 = ta_int1,
[...]
JOINING *interest.int_table, ...;
[...]
INSTRUCTIONS
customer MASTER OF transaction
transaction MASTER OF customer
delimiters ". ";
AFTER QUERY DISPLAY ADD UPDATE OF transaction
if z = "E" then let q = "E"
if z = "C" then let q = "C"
if z = "1" then let q = "E"
[...]
END

Is 'z' a column in the transaction table?
Is the trouble that the value in 'z' is causing a change in the value of 'q' (aka transaction.trx_type), and the modified value is being stored in the database?
Is the value in 'z' part of the transaction table?
Have you verified that the value in the DB is indeed changed - using the Query Language option or a simple (default) form?
It might look as if it is because the instruction is also used AFTER DISPLAY, so when the values are retrieved from the DB, the value displayed in 'q' would be the mapped values corresponding to the value stored in 'z'. You would have to inspect the raw data to hide that mapping.
If this is not the problem, please:
Amend the question to show where 'z' comes from.
Also describe exactly what you do and see.
Confirm that the data in the database, as opposed to on the screen, is amended.
Please can you see whether this table plus form behaves the same for you as it does for me?
Table Transaction
CREATE TABLE TRANSACTION
(
trx_id SERIAL NOT NULL,
trx_type CHAR(1) NOT NULL,
trx_last_type CHAR(1) NOT NULL,
trx_int_table INTEGER NOT NULL
);
Form
DATABASE stores
SCREEN SIZE 24 BY 80
{
trx_id [f000]
trx_type [q]
trx_last_type [z]
trx_int_table [f001 ]
}
END
TABLES
transaction
ATTRIBUTES
f000 = transaction.trx_id;
q = transaction.trx_type, UPSHIFT, AUTONEXT,
INCLUDE=("E","C","V","P","T");
z = transaction.trx_last_type, UPSHIFT, AUTONEXT,
INCLUDE=("E","C","V","P","T","1");
f001 = transaction.trx_int_table;
INSTRUCTIONS
AFTER ADD UPDATE DISPLAY QUERY OF transaction
IF z = "E" THEN LET q = "E"
IF z = "C" THEN LET q = "C"
IF z = "1" THEN LET q = "E"
END
Experiments
[The parenthesized number is automatically generated by IDS/Perform.]
Add a row with data (1), V, E, 23.
Observe that the display is: 1, E, E, 23.
Exit the form.
Observe that the data in the table is: 1, V, E, 23.
Reenter the form and query the data.
Update the data to: (1), T, T, 37.
Observe that the display is: 1, T, T, 37.
Exit the form.
Observe that the data in the table is: 1, T, T, 37.
Reenter the form and query the data.
Update the data to: (1), P, 1, 49
Observe that the display is: 1, E, 1, 49.
Exit the form.
Observe that the data in the table is: 1, P, 1, 49.
Reenter the form and query the data.
Observe that the display is: 1, E, 1, 49.
Choose 'Update', and observe that the display changes to: 1, P, 1, 49.
I did the 'Observe that the data in the table is' steps using:
sqlcmd -d stores -e 'select * from transaction'
This generated lines like these (reflecting different runs):
1|V|E|23
1|P|1|49
That is my SQLCMD program, not Microsoft's upstart of the same name. You can do more or less the same thing with DB-Access, except it is noisier (13 extraneous lines of output) and you would be best off writing the SELECT statement in a file and providing that as an argument:
$ echo "select * from transaction" > check.sql
$ dbaccess stores check
Database selected.
trx_id trx_type trx_last_type trx_int_table
1 P 1 49
1 row(s) retrieved.
Database closed.
$
Conclusions
This is what I observed on Solaris 10 (SPARC) using ISQL 7.50.FC1; it matches what the manual describes, and is also what I suggested in the original part of the answer might be the trouble - what you see on the form is not what is in the database (because of the INSTRUCTIONS section).
Do you see something different? If so, then there could be a bug in ISQL that has been fixed since. Technically, ISQL 7.30 is out of support, I believe. Can you upgrade to a more recent version than that? (I'm not sure whether 7.32 is still supported, but you should really upgrade to 7.50; the current release is 7.50.FC4.)
Transcribing commentary before deleting it:
Up to a point, it is good that you replicate my results. The bad news is that in the bigger form we have different behaviour. I hope that ISQL validates all limits - things like number of columns etc. However, there is a chance that they are not properly validated, given the bug, or maybe there is a separate problem that only shows with the larger form. So, you need to ensure you have a supported version of the product and that the problem reproduces in it. Ideally, you will have a smaller version of the table (or, at least, of the form) that shows the problem, and maybe a still smaller (but not quite as small as my example) version that shows the absence of the problem.
With the test case (table schema and Perform screen that shows the problem) in hand, you can then go to IBM Tech Support with "Look - this works correctly when the form is small; and look, it works incorrectly when the form is large". The bug should then be trackable. You will need to include instructions on how to reproduce the bug similar to those I gave you. And there is no problem with running two forms - one simple and one more complex and displaying the bug - in parallel to show how the data is stored vs displayed. You could describe the steps in terms of 'Form A' and 'Form B', with Form A being Absolutely OK and Form B being Believed to be Buggy. So, add a record with certain values in Form B; show what is displayed in Form B after; show what is stored in the database in Form A after too; show that they are not different when they should be.
Please bear in mind that those who will be fixing the issue have less experience with the product than either you or me - so keep it as simple as possible. Remove as many attributes as you can; leave comments to identify data types etc.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart