Get Rapidminer to transpose/pivot a single attribute/column in a table - transpose

I have a table that looks like the following:
ID City Code
"1005AE" "Oakland" "Value1"
"1006BR" "St.Louis" "Value2"
"102AC" "Miami" "Value1"
"103AE" "Denver" "Value3"
And I want to transpose/pivot the Code examples/values into column attributes like this:
ID City Value1 Value2 Value3
"1005" "Oakland" 1 0 0
"1006" "St.Louis" 0 1 0
"1012" "Miami" 1 0 0
"1030" "Denver" 0 0 1
Note that the ID field is numeric values encoded as strings because Rapidminer had trouble importing bigint datatypes. So that is a separate issue I need to fix--but my focus here is the pivoting or transposing of the data.
I read through a few different Stackoverflow posts listed below. They suggested the Pivot or Transpose operations. I tried both of these, but for some reason I am getting either a huge table which creates City as a dummy variable as well, or just some subset of attribute columns.
How can I set the rows to be the attributes and columns the samples in rapidminer?
Rapidminer data transpose equivalent to melt in R
Any suggestions would be appreciated.

In pivoting, the group attribute parameter dictates how many rows there will be and the index attribute parameter dictates what the last part of the name of new attributes will be. The first part of the name of each new attribute is driven by any other regular attributes that are neither group nor index and the value within the cell is the value found in the original example set.
This means you have to create a new attribute with a constant value of 1; use Generate Attributes for this. Set the role of the ID attribute to be ID so that it is no longer a regular attribute; use Set Role for this. In the Pivot operator, set the group attribute to be City and the index attribute to be Code. The end result is close to what you want. The final steps are, firstly to set missing values to be 0; use Replace Missing Values for this and, secondly to rename the attributes to match what you want; use Rename for this.
You will have to join the result back to the original since the pivot operation loses the ID.
You can find a worked example here http://rapidminernotes.blogspot.co.uk/2011/05/worked-example-using-pivot-operator.html

Related

How can I find the first instance (searching up) of a value in a column?

Picture linked below as this is a bit tangled:
I am working with a data set that has "nested" values. There are three different types of entries: categories, then subcategories that are nested under the categories, then individual items that are nested under the subcategories (picture linked below). The entries are matched up using a filter system. Column A has the entry type, column B has the actual value, column C has the filter. The filter is always the value of entry you are nesting under. So, for a subcategory entry, Column A= "Subcategory", Column B= [name of subcategory] Column C = Column B of the category type entry above (the name of category it belongs to).
I need a way to automatically fill in the filters.
The way I am thinking I could do this is to search Column A (moving up) for the first instance of the entry type I need, and then return the value of the Column B cell in that row. Is this possible?
Given your exact data above (looking only at A14:C), delete everything from C14:C (including the header) and place the following formula in C14:
=ArrayFormula({"FILTER"; IF((A15:A="") + (A15:A="Category"),, IF(A15:A="Subcategory", VLOOKUP(ROW(A15:A), FILTER({ROW(A15:A), B15:B}, A15:A="Category"), 2, TRUE), VLOOKUP(ROW(A15:A), FILTER({ROW(A15:A), B15:B}, A15:A="Subcategory"), 2, TRUE)))})
This will create the title (which you can edit within the formula itself as you like) and all results for non-null rows thereafter.
You'll need to adjust the 15 in ranges to whatever the starting row of your non-header data actually is in your sheet.

How do i remove rows based on comma-separated list of values in a Power BI parameter in Power Query?

I have a list of data with a title column (among many other columns) and I have a Power BI parameter that has, for example, a value of "a,b,c". What I want to do is loop through the parameter's values and remove any rows that begin with those characters.
For example:
Title
a
b
c
d
Should become
Title
d
This comma separated list could have one value or it could have twenty. I know that I can turn the parameter into a list by using
parameterList = Text.Split(<parameter-name>,",")
but then I am unsure how to continue to use that to filter on. For one value I would just use
#"Filtered Rows" = Table.SelectRows(#"Table", each Text.StartsWith([key], <value-to-filter-on>))
but that only allows one value.
EDIT: I may have worded my original question poorly. The comma separated values in the parameterList can be any number of characters (e.g.: a,abcd,foo,bar) and I want to see if the value in [key] starts with that string of characters.
Try using List.Contains to check whether the starting character is in the parameter list.
each List.Contains(parameterList, Text.Start([key], 1)
Edit: Since you've changed the requirement, try this:
Table.SelectRows(
#"Table",
(C) => not List.AnyTrue(
List.Transform(
parameterList,
each Text.StartsWith(C[key], _)
)
)
)
For each row, this transforms the parameterList into a list of true/false values by checking if the current key starts with each text string in the list. If any are true, then List.AnyTrue returns true and we choose not to select that row.
Since you want to filter out all the values from the parameter, you can use something like:
= Table.SelectRows(#"Changed Type", each List.Contains(Parameter1,Text.Start([Title],1))=false)
Another way to do this would be to create a custom column in the table, which has the first character of title:
= Table.AddColumn(#"Changed Type", "FirstChar", each Text.Start([Title],1))
and then use this field in the filter step:
= Table.SelectRows(#"Added Custom", each List.Contains(Parameter1,[FirstChar])=false)
I tested this with a small sample set and it seems to be running fine. You can test both and see if it helps with the performance. If you are still facing performance issues, it would probably be easier if you can share the pbix file.
This seems to work fairly well:
= List.Select(Source[Title], each Text.Contains(Parameter1,Text.Start(_,1))=false)
Replace Source with the name of your table and Parameter1 with the name of your Parameter.

Excluding one column from forEach

I'm using the following expression to return an md5 hash of a concatenation of all values in a row.
md5(forEach(row.columnNames,cn,if(isNull(cells[cn]),"",cells[cn].value)).join("|"))
This is to create an easy index for identifying duplicates (I do not wish to remove them at this stage). However, I've just realised that because one of the columns contains the unique index for the data set, I cannot hash every column as the inclusion of this column will obviously make every hash unique! (duh)
Is there a way to exclude a nominated column from the forEach loop? A sort of forEach except this...
Thanks
Assuming the column you want to exclude is the first one, you can subset row.columnNames like this:
md5(forEach(row.columnNames.slice(1),cn,if(isNull(cells[cn]),"",cells[cn].value)).join("|"))
If you prefer to exclude a column by its name (for example, "ID"), you should use filter() :
md5(forEach(filter(row.columnNames, v, v!="ID"),cn,if(isNull(cells[cn]),"",cells[cn].value)).join("|"))
Similarly, you can also use filter()to include/exclude column names based on conditions (here : exclude columns that contain a capital "C" in their name):
filter(row.columnNames, v, v.contains("C")==false)

Change search query after adding a new column to table rails

Right now I use this search to find items for certain category in my database
#items=Item.where(:category_id => #active_category_id).order(:price)
But recently I added a column to Item table called detail. It can be 0 or 1 and it is integer or it can be empty cause I added it just now and I had already some items in my db.
So now I need two searches: I need search that returns items with detail=1
And where detail is not 1.
So I do it like this:
#for items with detail = 1
#items=Item.where(:category_id => #active_category_id)
.where(:detail=> 1).order(:price)
It is working.
But now I need to find items with detail != 1
So I write
#items=Item.where(:category_id => #active_category_id)
.where.not(:detail=> 1).order(:price)
And it is not working. What do I do?
NULL values will not match an equality or inequality. You have to explicity compare to NULL. Try Item.where(detail: nil).
If you need 0 OR NULL you might need to write raw SQL: Item.where("detail = 0 OR detail IS NULL")
You might also consider backfilling your db to eliminate the NULL values, then you can just compare with 1 and 0.

Implementing dynamic methods in rails for key/value pair

Let's say I have a database table called 'options' with corresponding model called Option. Structure of this table is simple and as follows ...
id -> primary key, auto increment
name -> key
value -> value for the key
Sample data rows could be as follows ...
id name value
---- ---------------------------- -----------
1 default_view DAILY
2 show_registration_number 0
3 notification_method IMMEDIATE
What I want is that all the options (keys) should be accessible to me as the method names.
For example if do as following ...
#options = Options.find(:all)
is it possible to access the data like #options.default_view which should return me the value as 'DAILY' and similarly #options.show_registration_number which should return the value as 0.
Also if that is possible, whether modification would be permissible like if #options.default_view = 'MONTHLY' and should update the corresponding record in the database.
This will get you almost the answer you were looking for: http://code.dblock.org/how-to-define-enums-in-ruby
It relies on const_missing and assumes that elements of your "enum" are defined as constants, in your case Option::default_view
However, it is easy to see how to adapt this code to use method_missing so that you can do Option.default_view
Another example of this same approach is contained in rails-settings gem, so you can browse this code for the answer you are looking for

Resources