I'm using the following expression to return an md5 hash of a concatenation of all values in a row.
md5(forEach(row.columnNames,cn,if(isNull(cells[cn]),"",cells[cn].value)).join("|"))
This is to create an easy index for identifying duplicates (I do not wish to remove them at this stage). However, I've just realised that because one of the columns contains the unique index for the data set, I cannot hash every column as the inclusion of this column will obviously make every hash unique! (duh)
Is there a way to exclude a nominated column from the forEach loop? A sort of forEach except this...
Thanks
Assuming the column you want to exclude is the first one, you can subset row.columnNames like this:
md5(forEach(row.columnNames.slice(1),cn,if(isNull(cells[cn]),"",cells[cn].value)).join("|"))
If you prefer to exclude a column by its name (for example, "ID"), you should use filter() :
md5(forEach(filter(row.columnNames, v, v!="ID"),cn,if(isNull(cells[cn]),"",cells[cn].value)).join("|"))
Similarly, you can also use filter()to include/exclude column names based on conditions (here : exclude columns that contain a capital "C" in their name):
filter(row.columnNames, v, v.contains("C")==false)
Related
Picture linked below as this is a bit tangled:
I am working with a data set that has "nested" values. There are three different types of entries: categories, then subcategories that are nested under the categories, then individual items that are nested under the subcategories (picture linked below). The entries are matched up using a filter system. Column A has the entry type, column B has the actual value, column C has the filter. The filter is always the value of entry you are nesting under. So, for a subcategory entry, Column A= "Subcategory", Column B= [name of subcategory] Column C = Column B of the category type entry above (the name of category it belongs to).
I need a way to automatically fill in the filters.
The way I am thinking I could do this is to search Column A (moving up) for the first instance of the entry type I need, and then return the value of the Column B cell in that row. Is this possible?
Given your exact data above (looking only at A14:C), delete everything from C14:C (including the header) and place the following formula in C14:
=ArrayFormula({"FILTER"; IF((A15:A="") + (A15:A="Category"),, IF(A15:A="Subcategory", VLOOKUP(ROW(A15:A), FILTER({ROW(A15:A), B15:B}, A15:A="Category"), 2, TRUE), VLOOKUP(ROW(A15:A), FILTER({ROW(A15:A), B15:B}, A15:A="Subcategory"), 2, TRUE)))})
This will create the title (which you can edit within the formula itself as you like) and all results for non-null rows thereafter.
You'll need to adjust the 15 in ranges to whatever the starting row of your non-header data actually is in your sheet.
I have a list of data with a title column (among many other columns) and I have a Power BI parameter that has, for example, a value of "a,b,c". What I want to do is loop through the parameter's values and remove any rows that begin with those characters.
For example:
Title
a
b
c
d
Should become
Title
d
This comma separated list could have one value or it could have twenty. I know that I can turn the parameter into a list by using
parameterList = Text.Split(<parameter-name>,",")
but then I am unsure how to continue to use that to filter on. For one value I would just use
#"Filtered Rows" = Table.SelectRows(#"Table", each Text.StartsWith([key], <value-to-filter-on>))
but that only allows one value.
EDIT: I may have worded my original question poorly. The comma separated values in the parameterList can be any number of characters (e.g.: a,abcd,foo,bar) and I want to see if the value in [key] starts with that string of characters.
Try using List.Contains to check whether the starting character is in the parameter list.
each List.Contains(parameterList, Text.Start([key], 1)
Edit: Since you've changed the requirement, try this:
Table.SelectRows(
#"Table",
(C) => not List.AnyTrue(
List.Transform(
parameterList,
each Text.StartsWith(C[key], _)
)
)
)
For each row, this transforms the parameterList into a list of true/false values by checking if the current key starts with each text string in the list. If any are true, then List.AnyTrue returns true and we choose not to select that row.
Since you want to filter out all the values from the parameter, you can use something like:
= Table.SelectRows(#"Changed Type", each List.Contains(Parameter1,Text.Start([Title],1))=false)
Another way to do this would be to create a custom column in the table, which has the first character of title:
= Table.AddColumn(#"Changed Type", "FirstChar", each Text.Start([Title],1))
and then use this field in the filter step:
= Table.SelectRows(#"Added Custom", each List.Contains(Parameter1,[FirstChar])=false)
I tested this with a small sample set and it seems to be running fine. You can test both and see if it helps with the performance. If you are still facing performance issues, it would probably be easier if you can share the pbix file.
This seems to work fairly well:
= List.Select(Source[Title], each Text.Contains(Parameter1,Text.Start(_,1))=false)
Replace Source with the name of your table and Parameter1 with the name of your Parameter.
I need to write a rails active record where clause where I have to fetch those rows where name (name is a column in my table) contains only one occurrence of the character '.'
For example, if there is two rows in the table where name is "a.b" and "a.b.c", then my query should return the row having name "a.b" only.
Please help me to solve this.
Thanks in advance!
You can for example remove the dots and compare length.
SELECT * FROM table WHERE (char_length(name) - char_length(replace(name, '.', '')))=1
This is not very efficient though, because indexes can't be utilized.
To make things smoother, you could store the number of dots (depth?) in its own column with an index and query based on that. This could be done in insert/update trigger or in application layer, whatever suits your situation.
dbfiddle
with regexp ? you can try this :
select * from "table" where "name" ~ '^[^\.]*\.[^\.]*$'
I have a table that looks like the following:
ID City Code
"1005AE" "Oakland" "Value1"
"1006BR" "St.Louis" "Value2"
"102AC" "Miami" "Value1"
"103AE" "Denver" "Value3"
And I want to transpose/pivot the Code examples/values into column attributes like this:
ID City Value1 Value2 Value3
"1005" "Oakland" 1 0 0
"1006" "St.Louis" 0 1 0
"1012" "Miami" 1 0 0
"1030" "Denver" 0 0 1
Note that the ID field is numeric values encoded as strings because Rapidminer had trouble importing bigint datatypes. So that is a separate issue I need to fix--but my focus here is the pivoting or transposing of the data.
I read through a few different Stackoverflow posts listed below. They suggested the Pivot or Transpose operations. I tried both of these, but for some reason I am getting either a huge table which creates City as a dummy variable as well, or just some subset of attribute columns.
How can I set the rows to be the attributes and columns the samples in rapidminer?
Rapidminer data transpose equivalent to melt in R
Any suggestions would be appreciated.
In pivoting, the group attribute parameter dictates how many rows there will be and the index attribute parameter dictates what the last part of the name of new attributes will be. The first part of the name of each new attribute is driven by any other regular attributes that are neither group nor index and the value within the cell is the value found in the original example set.
This means you have to create a new attribute with a constant value of 1; use Generate Attributes for this. Set the role of the ID attribute to be ID so that it is no longer a regular attribute; use Set Role for this. In the Pivot operator, set the group attribute to be City and the index attribute to be Code. The end result is close to what you want. The final steps are, firstly to set missing values to be 0; use Replace Missing Values for this and, secondly to rename the attributes to match what you want; use Rename for this.
You will have to join the result back to the original since the pivot operation loses the ID.
You can find a worked example here http://rapidminernotes.blogspot.co.uk/2011/05/worked-example-using-pivot-operator.html
I have heard that specifying records through tuples in the code is a bad practice: I should always use record fields (#record_name{record_field = something}) instead of plain tuples {record_name, value1, value2, something}.
But how do I match the record against an ETS table? If I have a table with records, I can only match with the following:
ets:match(Table, {$1,$2,$3,something}
It is obvious that once I add some new fields to the record definition this pattern match will stop working.
Instead, I would like to use something like this:
ets:match(Table, #record_name{record_field=something})
Unfortunately, it returns an empty list.
The cause of your problem is what the unspecified fields are set to when you do a #record_name{record_field=something}. This is the syntax for creating a record, here you are creating a record/tuple which ETS will interpret as a pattern. When you create a record then all the unspecified fields will get their default values, either ones defined in the record definition or the default default value undefined.
So if you want to give fields specific values then you must explicitly do this in the record, for example #record_name{f1='$1',f2='$2',record_field=something}. Often when using records and ets you want to set all the unspecified fields to '_', the "don't care variable" for ets matching. There is a special syntax for this using the special, and otherwise illegal, field name _. For example #record_name{record_field=something,_='_'}.
Note that in your example you have set the the record name element in the tuple to '$1'. The tuple representing a record always has the record name as the first element. This means that when you create the ets table you should set the key position with {keypos,Pos} to something other than the default 1 otherwise there won't be any indexing and worse if you have a table of type 'set' or 'ordered_set' you will only get 1 element in the table. To get the index of a record field you can use the syntax #Record.Field, in your example #record_name.record_field.
Try using
ets:match(Table, #record_name{record_field=something, _='_'})
See this for explanation.
Format you are looking for is #record_name{record_field=something, _ = '_'}
http://www.erlang.org/doc/man/ets.html#match-2
http://www.erlang.org/doc/programming_examples/records.html (see 1.3 Creating a record)