H2O randomForest column/feature selection

H2O randomForest column/feature selection - random-forest

In h2o.randomForest, lets say I have 5 input features x=c("A","B","C","D","E"), is there anyway to force the algorithm to always choose A,B AND one of the remaining features?

In this case h2o.randomForest is just asking you to pass correct x (list of columns to use in prediction) and y (the column name to do prediction) so anything you will pass will be used as input.
What you are asking is a python specific question. How you want to pass the list of columns you will need to write logic for it. You can defined the following is a function and use it as needed.
import random
myframe = ["a","b","c","d","e"]
//You can also set myframe as column name list
//myframe.remove(_use_response_column_name) this will make it generic
selectedkeys = ["a","b"]
for item in selectedkeys:
if item in myframe:
myframe.remove(item)
selectedkeys.append(random.choice(myframe))
print(selectedkeys)
print(myframe)
You just need to pass the selectedkeys as input for X.

Related

How do i remove rows based on comma-separated list of values in a Power BI parameter in Power Query?

I have a list of data with a title column (among many other columns) and I have a Power BI parameter that has, for example, a value of "a,b,c". What I want to do is loop through the parameter's values and remove any rows that begin with those characters.
For example:
Title
a
b
c
d
Should become
Title
d
This comma separated list could have one value or it could have twenty. I know that I can turn the parameter into a list by using
parameterList = Text.Split(<parameter-name>,",")
but then I am unsure how to continue to use that to filter on. For one value I would just use
#"Filtered Rows" = Table.SelectRows(#"Table", each Text.StartsWith([key], <value-to-filter-on>))
but that only allows one value.
EDIT: I may have worded my original question poorly. The comma separated values in the parameterList can be any number of characters (e.g.: a,abcd,foo,bar) and I want to see if the value in [key] starts with that string of characters.

Try using List.Contains to check whether the starting character is in the parameter list.
each List.Contains(parameterList, Text.Start([key], 1)
Edit: Since you've changed the requirement, try this:
Table.SelectRows(
#"Table",
(C) => not List.AnyTrue(
List.Transform(
parameterList,
each Text.StartsWith(C[key], _)
)
)
)
For each row, this transforms the parameterList into a list of true/false values by checking if the current key starts with each text string in the list. If any are true, then List.AnyTrue returns true and we choose not to select that row.

Since you want to filter out all the values from the parameter, you can use something like:
= Table.SelectRows(#"Changed Type", each List.Contains(Parameter1,Text.Start([Title],1))=false)
Another way to do this would be to create a custom column in the table, which has the first character of title:
= Table.AddColumn(#"Changed Type", "FirstChar", each Text.Start([Title],1))
and then use this field in the filter step:
= Table.SelectRows(#"Added Custom", each List.Contains(Parameter1,[FirstChar])=false)
I tested this with a small sample set and it seems to be running fine. You can test both and see if it helps with the performance. If you are still facing performance issues, it would probably be easier if you can share the pbix file.

This seems to work fairly well:
= List.Select(Source[Title], each Text.Contains(Parameter1,Text.Start(_,1))=false)
Replace Source with the name of your table and Parameter1 with the name of your Parameter.

Syntax to add a new case to the data

If i have a variable in SPSS, with name (My_Variable), label (My Variable), values(1: Yes, 2: No) etc but without data (the column in data view is empty), i want to add data using syntax! For example, i want to add a participant in 1st row, who answered "Yes", so i want 1 to be added!!! How can i do it???
I found similar questions, but the solutions refers to creating A NEW SPSS window and add the values there! But i dont want this! I want to add data in an existing variable, without creating new SPSS file!

Apparently there is no way to directly add cases to an SPSS dataset through syntax.
But the following seems to me pretty close - you don't create new files but you create a new dataset and add it to your original.
Let's first create a small data to demonstrate on:
Data list list/ID (a5) var1 var2 var3 (3f2).
begin data
"first" 1 17 7
"secnd" 5 5 12
"third" 34 11 91
end data.
dataset name originalDataset.
So this is your original data. Now imaging that you want to add a new case to the data, with the ID value of "hello" and the number 42 in all the columns. This is what you do:
* creating the new case in a separate dataset.
Data list list/ID (a5) var1 var2 var3 (3f2).
begin data
"hello" 42 42 42
end data.
dataset name addition.
* going back to original dataset and adding the new case.
dataset activate originalDataset.
add files /file=* /file=addition.
exe.
dataset close addition.

You don't have to create data in the first data set. Just create the variables and define them however you want.
DATASET CLOSE ALL.
INPUT PROGRAM.
NUMERIC My_Variable (F1).
VARIABLE LABELS My_Variable "I want this!".
VALUE LABELS My_Variable 1 "Yes" 2 "No".
END FILE.
END INPUT PROGRAM.
DATASET NAME Empty.
DATA LIST FREE /My_Variable.
BEGIN DATA.
1 2
END DATA.
APPLY DICTIONARY /FROM Empty
/SOURCE VARIABLES=My_Variable
/TARGET VARIABLES=My_Variable
/VARINFO VALLABELS=REPLACE VARLABEL.
DATASET CLOSE Empty.
FREQUENCIES VARIABLES ALL.
I used DATASET but you could have save the empty file to disk.
See the APPLY DICTIONARY command for more details about how it works.

Using python you can add data with the cases.append() method
begin program.
import spss
spss.StartDataStep()
dataset = spss.Dataset()
dataset.cases.append([1])
spss.EndDataStep()
end program.
Say you have 3 variables, you can assign values to each by appending the list passed to the method
begin program.
spss.StartDataStep()
dataset = spss.Dataset()
dataset.cases.append([1,2,3])
spss.EndDataStep()
end program.
Would add a case wit value 1 in the first variable, value 2 in the second variable, 3 in the third variable.
Note: the method will only work within an open datastep.

Check out the ADD FILES command. You can also add cases with Python code.

Ordering items by sizename when it's not alphabetically logic?

In my app, the admin may add sizes to his products in this order.
Variant.create(size_name: "L")
Variant.create(size_name: "S")
Variant.create(size_name: "XXL")
Variant.create(size_name: "XL")
Sizes could also be (30,24, 33, 31, 29)
In my product view, the select tag display in the order it has been created.
I would like to sort from the smallest size to the biggest (S, M, L ...).
With the numerically sizes,I can order from the smallest to the biggest it's Okay
How I am supped to make sure that both sizes (the numerically and the alphabetically) could be sorted from the smallest to the biggest?

There are many ways to solve this, but at the core of any solution you need to define the order manually (or use a third party library which has already written this manual ordering for you?).
For example, you could somewhere define e.g.
SIZE_NAMES = %w[XS S M L XL XXL]
and then elsewhere in the code, use something like:
variants.sort_by { |variant| SIZE_NAMES.index(variant.size) }
For a more "advanced" solution, you could instead consider defining each size as a custom object rather than a regular String. Take a look at the Comparable module, and the <=> ("spaceship") operator.
By utilising this, you could potentially implement it in such a way that e.g. variants.sort will automatically compare variants by their "converted" size, and order them as you expect.

If you wish to do sorting on db side then you have two options:
Predefined sort like so:
Variant.order(
"CASE size_name
WHEN 'S' THEN 1
WHEN 'L' THEN 2
WHEN 'XL' THEN 3
WHEN 'XXL' THEN 4
ELSE 10
END, size, id"
)
You might want to move it to scope so in case you need to add another size_name there is only one place to change
With active record enums:
enum size_name: { s: 0, l: 1, xl: 2, xxl: 3 }
That way, you can still assign the field by the string/symbol, but the underlying data will actually be an integer, so you can just use order(:size_name, :size) to sort by size_name and size.
Also this way you can add index to speed up ordering

How to get observations as a list in Stata?

Stata has r() macro for values that some commands return (return list after the command).
I need similar access to x after list x if y == 1, but list returns only r(N), not values themselves.
Is it possible to get the observations as a local or global macro to refer to it in the code?

Try levelsof command to get distinct values. It's the cat's pajamas.

One way to save values of all observations (i.e. including repeated) is with a loop:
clear
set more off
*----- exmple data -----
sysuse auto
keep rep78
list
*----- what you want -----
forvalues i = 1/`=_N' {
local myvals `myvals' `=rep78[`i']'
}
display "`myvals'"
But more importantly, why do you think you need such a thing?

XPages repeat from array value field

I have a field which value is an array of strings.
Example: Mom, dad, son, etc.
It is possible to repeat a link with those values?
Example:
Mom
dad
son
And when I click on the link to have a href=www."fieldvalue".com.
EDIT: it is not vector, it is Array.

Create your repeat control. For the value add in your field name. Something like :
document1.getItemValue("myMultiValueField")
I THINK that should repeat your field assuming it is a real multi-value. The comma deliminated string would require more work. So I'm not talking about that...
Make sure the collection name / var name of the repeat is something like "rowData"
rowData should then be a String.
Drop a link control inside the repeat.
Compute the label to be simple "rowData". (no quotes in the code)
Compute the URL - which I THINK is "value" in all properties of the link
That's just javaScript so you should be able to do something like:
return "http://" + rowData + ".com"
That's rough - you'll have to play with it but if I follow you correctly should work.
For a comma deliminated String... in the repeat control you'd need to use SSJS or #functions to break that into an array so the repeat can work on it.

In your repeat you'll need to map the value attribute to the Vector and set a var property, which is how you will reference each element. Note: a comma-separated string is a single value, and a repeat requires multiple values. So you'll need to convert it to a Vector or some other multi-value object.
Within the repeat you can use any other control and compute the value as you would elsewhere. To access each element in your repeat control's source (i.e. each String in your Vector, in this case), use the variable name you've defined in the var property.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

H2O randomForest column/feature selection - random-forest

In h2o.randomForest, lets say I have 5 input features x=c("A","B","C","D","E"), is there anyway to force the algorithm to always choose A,B AND one of the remaining features?

Related

How do i remove rows based on comma-separated list of values in a Power BI parameter in Power Query?

Syntax to add a new case to the data

Ordering items by sizename when it's not alphabetically logic?

How to get observations as a list in Stata?

XPages repeat from array value field

Categories

Resources