help in coding decision tree in python - machine-learning

I am not sure if this is the right place to post this, but I have been trying to code up a simple decision tree class for a while and am getting lost at various points.
Specifically, I'm not sure what kind of data structure would represent a recursive tree that uses (feature, value) as nodes.
class DecisionTree():
def entropy(self, data):
# if there's nothing in this region, entropy is 1
if len(data) <= 1:
return 1
target_col = data.ix[:,-1]
size = float(len(target_col))
classes = Counter(target_col)
# if there's only one class, entropy is 1
if len(classes) == 1:
return 1
else:
probs = [i / size for i in classes.values()]
entropy = np.sum([-probs[i]*np.log(probs[i]) for i in range(len(probs))])
return entropy
def what_to_split_on(self, data):
split_feature = -1
best_entropy = 0.0
base_entropy = self.entropy(data)
for f, feature in enumerate(data.T):
unique_vals = list(set(feature))
for val in unique_vals:
left, right = self.split(f, val)
prop_left = float(len(left)) / (len(left) + len(right))
prop_right = 1 - prop_left
e_1 = prop_left * self.entropy(left)
e_2 = prop_right * self.entropy(right)
entropy_change = base_entropy - e_1 - e_2
if entropy_change > best_entropy:
best_entropy = entropy_change
split_feature = f; split_val = val
if split_feature != -1:
return split_feature, split_val
def split(self, data, f, val):
left = np.array([row for row in data if row[f] == val])
right = np.array([row for row in data if row[f] != val])
return left, right
def create_tree(self, data):
if self.entropy(data) == 1:
return
feature, value = self.what_to_split_on(data)
dt = Tree(feature, value)
left_child = np.array([row for row in data if row[feature] == value])
right_child = np.array([row for row in data if row[feature] == value])
feature, value = self.what_to_split_on(left_child)
sub_left = create_tree(left_child)
dt.insert_left(sub_left)
feature, value = self.what_to_split_on(right_child)
sub_right = create_tree(right_child)
dt.insert_right(sub_right)
return dt

Related

Spawning an Object with State TTS

I would like to combine two and more objects in the TabletopSimulator. Wenn I spawn the objects I can combine like this page https://kb.tabletopsimulator.com/host-guides/creating-states/. I would like this create with Lua. So I need help... I spawn the objects like here, but I didn`t get two objects with 2 states.
function SpawnLevel1(Obj1, ID)
CID = ID
spawnparamslvl = {
type = 'Custom_Assetbundle',
position = Obj1.getPosition(),
rotation = Obj1.getRotation(),
scale = {x=1, y=1, z=1},
}
paramslvl = {
assetbundle = data[CID].assetbundle,
type = 1,
material = 0,
}
Obj2 = spawnObject(spawnparamslvl)
obj_name = data[CID].display_name
Obj2.setDescription(obj_name)
Obj2.setName(obj_name)
Obj2.setCustomObject(paramslvl)
Obj1.addAttachment(Obj2).SetState(1)
end
function deploy(PID)
display_name = data[PID].display_name
spawnparams = {
type = 'Custom_Assetbundle',
position = self.getPosition(),
rotation = self.getRotation(),
scale = {x=1, y=1, z=1},
}
params = {
assetbundle = data[PID].assetbundle,
type = 0,
material = 0,
}
Spawning(spawnparams, params, display_name, PID)
end
function Spawning(spawnparams, params, display_name, PID)
Obj1 = spawnObject(spawnparamsmain)
ID = PID
Level1 = SpawnLevel1(Obj1, ID)
Obj1.setCustomObject(paramsmain)
Obj1.setName(display_name)
end
Thank you for your help
Radoan
You have to use spawnObjectData or spawnObjectJSON. The format of the "object data" follows the format of the save file.
Rather than hardcoding the large data structures needed to build an object, I'll use existing objects as templates. This is a common practice that's also reflected by the following common idiom for modifying an existing object:
local data = obj.getData()
-- Modify `data` here --
obj.destruct()
obj = spawnObjectData({ data = data })
The following are the relevant bits of "object data" for states:
{ -- Object data (current state)
-- ...
States = {
["1"] = { -- Object data (state 1)
--- ...
},
["3"] = { -- Object data (state 3)
-- ...
},
["4"] = { -- Object data (state 4)
-- ...
}
},
-- ...
}
So you could use this:
function combine_objects(base_obj, objs_to_add)
if not objs[1] then
return base_obj
end
local data = base_obj.getData()
if not data.States then
data.States = { }
end
local i = 1
while data.States[tostring(i)] do i = i + 1 end
i = i + 1 -- Skip current state.
while data.States[tostring(i)] do i = i + 1 end
for _, obj in ipairs(objs_to_add) do
data.States[tostring(i)] = obj.getData()
obj.destruct()
i = i + 1
end
base_obj.destruct()
return spawnObjectData({ data = data })
end

Pulling several keys from a table with lowest integer values

I've made a table like this
TableAlpha = {
Alpha = 3648,
Beta = 6593,
Charlie = 2358,
Delta = 6483,
Echo = 4736
}
Im wondering how can i pull out 3 keys with lowest values inside the table?
local TableAlpha = {
Alpha = 3648,
Beta = 6593,
Charlie = 2358,
Delta = 6483,
Echo = 4736
}
--- Returns the keys of tab sorted numerically by their values
local function ascending(tab)
local list = {}
for key, integer in pairs(tab) do
table.insert(list, {integer, key})
end
table.sort(list, function(left, right) return left[1] < right[1] end)
for i, tuple in ipairs(list) do
list[i] = tuple[2]
end
return list
end
local unpack = unpack or table.unpack or error("Could not find an unpack function!")
print(unpack(ascending(TableAlpha), 1, 3))

Emulate string to label dict

Since Bazel does not provide a way to map labels to strings, I am wondering how to work around this via Skylark.
Following my partial horrible "workaround".
First the statics:
_INDEX_COUNT = 50
def _build_label_mapping():
lmap = {}
for i in range(_INDEX_COUNT):
lmap ["map_name%s" % i] = attr.string()
lmap ["map_label%s" % i] = attr.label(allow_files = True)
return lmap
_LABEL_MAPPING = _build_label_mapping()
And in the implementation:
item_pairs = {}
for i in range(_INDEX_COUNT):
id = getattr(ctx.attr, "map_name%s" % i)
if not id:
continue
mapl = getattr(ctx.attr, "map_label%s" % i)
if len(mapl.files):
item_pairs[id] = list(mapl.files)[0].path
else:
item_pairs[id] = ""
if item_pairs:
arguments += [
"--map", str(item_pairs), # Pass json data
]
And then the rule:
_foo = rule(
implementation = _impl,
attrs = dict({
"srcs": attr.label_list(allow_files = True, mandatory = True),
}.items() + _LABEL_MAPPING.items()),
Which needs to be wrapped like:
def foo(map={}, **kwargs):
map_args = {}
# TODO: Check whether order of items is defined
for i, item in enumerate(textures.items()):
key, value = item
map_args["map_name%s" % i] = key
map_args["map_label%s" % i] = value
return _foo(
**dict(map_args.items() + kwargs.items())
)
Is there a better way of doing that in Skylark?
To rephrase your question, you want to create a rule attribute mapping from string to label?
This is currently not supported (see list of attributes), but you can file a feature request for this.
Do you think using "label_keyed_string_dict" is a reasonable workaround? (it won't work if you have duplicated keys)

Deeplearning4j LSTM output size

I my case - at input I have List<List<Float>> (list of word representation vectors). And - have one Double at output from one sequence.
So I building next structure (first index - example number, second - sentence item number, third - word vector element number) : http://pastebin.com/KGdjwnki
And in output : http://pastebin.com/fY8zrxEL
But when I masting one of next (http://pastebin.com/wvFFC4Hw) to model.output - I getting vector [0.25, 0.24, 0.25, 0.25], not one value.
What can be wrong? Attached code (at Kotlin). classCount is one.
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
import org.deeplearning4j.nn.conf.NeuralNetConfiguration.Builder
import org.deeplearning4j.nn.api.OptimizationAlgorithm
import org.deeplearning4j.nn.conf.Updater
import org.deeplearning4j.nn.weights.WeightInit
import org.deeplearning4j.nn.conf.layers.GravesLSTM
import org.deeplearning4j.nn.conf.layers.RnnOutputLayer
import org.deeplearning4j.nn.conf.BackpropType
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.cpu.nativecpu.NDArray
import org.nd4j.linalg.indexing.NDArrayIndex
import org.nd4j.linalg.factory.Nd4j
import org.nd4j.linalg.lossfunctions.LossFunctions
import java.util.*
class ClassifierNetwork(wordVectorSize: Int, classCount: Int) {
data class Dimension(val x: Array<Int>, val y: Array<Int>)
val model: MultiLayerNetwork
val optimization = OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT
val iterations = 1
val learningRate = 0.1
val rmsDecay = 0.95
val seed = 12345
val l2 = 0.001
val weightInit = WeightInit.XAVIER
val updater = Updater.RMSPROP
val backtropType = BackpropType.TruncatedBPTT
val tbpttLength = 50
val epochs = 50
var dimensions = Dimension(intArrayOf(0).toTypedArray(), intArrayOf(0).toTypedArray())
init {
val baseConfiguration = Builder().optimizationAlgo(optimization)
.iterations(iterations).learningRate(learningRate).rmsDecay(rmsDecay).seed(seed).regularization(true).l2(l2)
.weightInit(weightInit).updater(updater)
.list()
baseConfiguration.layer(0, GravesLSTM.Builder().nIn(wordVectorSize).nOut(64).activation("tanh").build())
baseConfiguration.layer(1, GravesLSTM.Builder().nIn(64).nOut(32).activation("tanh").build())
baseConfiguration.layer(2, GravesLSTM.Builder().nIn(32).nOut(16).activation("tanh").build())
baseConfiguration.layer(3, RnnOutputLayer.Builder().lossFunction(LossFunctions.LossFunction.MCXENT)
.activation("softmax").weightInit(WeightInit.XAVIER).nIn(16).nOut(classCount).build())
val cfg = baseConfiguration.build()!!
cfg.backpropType = backtropType
cfg.tbpttBackLength = tbpttLength
cfg.tbpttFwdLength = tbpttLength
cfg.isPretrain = false
cfg.isBackprop = true
model = MultiLayerNetwork(cfg)
}
private fun dataDimensions(x: List<List<Array<Double>>>, y: List<Array<Double>>): Dimension {
assert(x.size == y.size)
val exampleCount = x.size
assert(x.size > 0)
val sentenceLength = x[0].size
assert(sentenceLength > 0)
val wordVectorLength = x[0][0].size
assert(wordVectorLength > 0)
val classCount = y[0].size
assert(classCount > 0)
return Dimension(
intArrayOf(exampleCount, wordVectorLength, sentenceLength).toTypedArray(),
intArrayOf(exampleCount, classCount).toTypedArray()
)
}
data class Fits(val x: INDArray, val y: INDArray)
private fun fitConversion(x: List<List<Array<Double>>>, y: List<Array<Double>>): Fits {
val dim = dataDimensions(x, y)
val xItems = ArrayList<INDArray>()
for (i in 0..dim.x[0]-1) {
val itemList = ArrayList<DoubleArray>();
for (j in 0..dim.x[1]-1) {
var rowList = ArrayList<Double>()
for (k in 0..dim.x[2]-1) {
rowList.add(x[i][k][j])
}
itemList.add(rowList.toTypedArray().toDoubleArray())
}
xItems.add(Nd4j.create(itemList.toTypedArray()))
}
val xFits = Nd4j.create(xItems, dim.x.toIntArray(), 'c')
val yItems = ArrayList<DoubleArray>();
for (i in 0..y.size-1) {
yItems.add(y[i].toDoubleArray())
}
val yFits = Nd4j.create(yItems.toTypedArray())
return Fits(xFits, yFits)
}
private fun error(epoch: Int, x: List<List<Array<Double>>>, y: List<Array<Double>>) {
var totalDiff = 0.0
for (i in 0..x.size-1) {
val source = x[i]
val result = y[i]
val realResult = predict(source)
var diff = 0.0
for (j in 0..result.size-1) {
val elementDiff = result[j] - realResult[j]
diff += Math.pow(elementDiff, 2.0)
}
diff = Math.sqrt(diff)
totalDiff += Math.pow(diff, 2.0)
}
totalDiff = Math.sqrt(totalDiff)
print("Epoch ")
print(epoch)
print(", diff ")
println(totalDiff)
}
fun train(x: List<List<Array<Double>>>, y: List<Array<Double>>) {
dimensions = dataDimensions(x, y)
val(xFit, yFit) = fitConversion(x, y)
for (i in 0..epochs-1) {
model.input = xFit
model.labels = yFit
model.fit()
error(i+1, x, y)
}
}
fun predict(x: List<Array<Double>>): Array<Double> {
val xList = ArrayList<DoubleArray>();
for (i in 0..dimensions.x[1]-1) {
var row = ArrayList<Double>()
for (j in 0..dimensions.x[2]-1) {
row.add(x[j][i])
}
xList.add(row.toDoubleArray())
}
val xItem = Nd4j.create(xList.toTypedArray())
val y = model.output(xItem)
val result = ArrayList<Double>()
return result.toTypedArray()
}
}
upd. Seems like next example have "near" task, so later I'll check it and post solution : https://github.com/deeplearning4j/dl4j-0.4-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/word2vecsentiment/Word2VecSentimentRNN.java
LSTM input/output can only be rank 3: see:
http://deeplearning4j.org/usingrnns
next to the recommendation to post this in the very active gitter and the hint of Adam to check out the great documentation, which explains how to set up the in- and output being of rank 3, I want to point out a few other things in your code, as I was struggling with similar problems:
check out the basic example here in examples/recurrent/basic/BasicRNNExample.java, here you see that for RNN you don't use model.output(xItem), but model.rnnTimeStep(xItem);
with class count of one you seem to be performing a regression, for that also check out the regression examples at examples/feedforward/regression/RegressionSum.java and documenation here, here you see that as an activiation function you should use "identity". "softmax" actually normalizes the output to sum up to one (see in glossary), so if you have just one output it will always output 1 (at least it did for my problem).
Not sure if I understand your requirements correctly, but if you want single output (that is predict a number or regression), you usually go with Identity activation, and MSE loss function. You've used softmax, which is usually used in classificatoin.

Unusual behavior of image saving and loading in torch7

I noticed an unusual behavior with torch7. I know a little about torch7. So I don't know how this behavior can be explained or corrected.
So, I am using CIFAR-10 dataset. Simply I fetched data for an image from CIFAR-10 and then saved it in my directory. When I loaded that saved image, it was different.
Here is my code -
require 'image'
i1 = testData.data[2] --fetching data from CIFAR-10
image.save("1.png", i) --saving the data as image
i2 = image.load("1.png") --loading the saved image
if(i1 == i2) then --checking if image1(i1) and image2(i2) are different
print("same")
end
Is this behavior expected? I thought png was supposed to be lossless.
If so how this can be corrected?
Code for loading CIFAR-10 dataset-
-- load dataset
trainData = {
data = torch.Tensor(50000, 3072),
labels = torch.Tensor(50000),
size = function() return trsize end
}
for i = 0,4 do
local subset = torch.load('cifar-10-batches-t7/data_batch_' .. (i+1) .. '.t7', 'ascii')
trainData.data[{ {i*10000+1, (i+1)*10000} }] = subset.data:t()
trainData.labels[{ {i*10000+1, (i+1)*10000} }] = subset.labels
end
trainData.labels = trainData.labels + 1
local subset = torch.load('cifar-10-batches-t7/test_batch.t7', 'ascii')
testData = {
data = subset.data:t():double(),
labels = subset.labels[1]:double(),
size = function() return tesize end
}
testData.labels = testData.labels + 1
testData.data = testData.data:reshape(10000,3,32,32)
== operator compares pointers to two tensors, not contents:
a = torch.Tensor(3, 5):fill(1)
b = torch.Tensor(3, 5):fill(1)
print(a == b)
> false
print(a:eq(b):all())
> true

Resources