Unable to get result of secondary1-nvinference-engine and secondary2-nvinference-engine on each frame - nvidia

Working on nvidia deep stream - inference engine, unable to get the classifier class. Always it shows index 0. Any help is appreciated.
l_classifier = obj_meta.classifier_meta_list
print('First Classifier at: ', l_classifier)
classifier_cnt = 0
while l_classifier is not None:
classifier_cnt += 1
print('Parsing Classifier at: ', l_classifier)
try:
classifier_meta = pyds.glist_get_nvds_classifier_meta(
l_classifier.data)
print('Classifier Component ID:' + str(classifier_meta.unique_component_id))
# nxt_classifier = classifier_meta.next
# print(nxt_classifier)
# print(dir(classifier_meta))
except Exception as ex:
print('Could not parse MetaData: ', ex)
l_label = classifier_meta.label_info_list
uid=classifier_meta.unique_component_id
numLabel=classifier_meta.num_labels
classId = classifier_meta.class_id
label_info=pyds.glist_get_nvds_label_info(l_label.data)
classifier_class = label_info.result_class_id
num_classes = label_info.num_classes
label_id = label_info.label_id
result_prob = label_info.result_prob
print("1 l_label :",l_label)
print("1 u id ------------ :",uid)
print("1 numLabel :",numLabel)
print("1 label_info :",label_info)
print("1 classifier_class:",classifier_class)
print("1 num_classes :",num_classes)
print("1 label_id :",label_id)
print("classId ==>", classId)
l_classifier = l_classifier.next
print('Next Classifier: ', l_classifier)
Sample output is given as
1 l_label : <pyds.GList object at 0x7fa740cfcf80>
1 u id ------------ : 4
1 numLabel : 1
1 label_info : <pyds.NvDsLabelInfo object at 0x7fa740cfcf48>
1 classifier_class: 0
1 num_classes : 0
1 label_id : 0
Classifier class always appears as 0

A bit too late but label_info has the classifier data. You need to cast l_label to get label_info. Then you need to add "l_label=l_label.next" so that it moves to the next item. Following link has the obtainable data.
https://docs.nvidia.com/metropolis/deepstream/python-api/PYTHON_API/NvDsMeta/NvDsLabelInfo.html
l_label = class_meta.label_info_list
while l_label is not None:
try:
label_info = pyds.NvDsLabelInfo.cast(l_label.data)
except StopIteration:
break
print(label_info.label_id)
try:
l_label=l_label.next
except StopIteration:
break
try:
l_class=l_class.next
except StopIteration:
break

Related

sklearn oneclass svm KeyError

My Dataset is a set of system calls for both malware and benign, I preprocessed it and now it looks like this
NtQueryPerformanceCounter
NtProtectVirtualMemory
NtProtectVirtualMemory
NtQuerySystemInformation
NtQueryVirtualMemory
NtQueryVirtualMemory
NtProtectVirtualMemory
NtOpenKey
NtOpenKey
NtOpenKey
NtQuerySecurityAttributesToken
NtQuerySecurityAttributesToken
NtQuerySystemInformation
NtQuerySystemInformation
NtAllocateVirtualMemory
NtFreeVirtualMemory
Now I'm using tfidf to extract the features and then use ngram to make a sequence of them
from __future__ import print_function
import numpy as np
import pandas as pd
from time import time
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.utils import shuffle
from sklearn.svm import OneClassSVM
nGRAM1 = 8
nGRAM2 = 10
weight = 4
main_corpus_MAL = []
main_corpus_target_MAL = []
main_corpus_BEN = []
main_corpus_target_BEN = []
my_categories = ['benign', 'malware']
# feeding corpus the testing data
print("Loading system call database for categories:")
print(my_categories if my_categories else "all")
import glob
import os
malCOUNT = 0
benCOUNT = 0
for filename in glob.glob(os.path.join('C:\\Users\\alika\\Documents\\testingSVM\\sysMAL', '*.txt')):
fMAL = open(filename, "r")
aggregate = ""
for line in fMAL:
linea = line[:(len(line)-1)]
aggregate += " " + linea
main_corpus_MAL.append(aggregate)
main_corpus_target_MAL.append(1)
malCOUNT += 1
for filename in glob.glob(os.path.join('C:\\Users\\alika\\Documents\\testingSVM\\sysBEN', '*.txt')):
fBEN = open(filename, "r")
aggregate = ""
for line in fBEN:
linea = line[:(len(line) - 1)]
aggregate += " " + linea
main_corpus_BEN.append(aggregate)
main_corpus_target_BEN.append(0)
benCOUNT += 1
# weight as determined in the top of the code
train_corpus = main_corpus_BEN[:(weight*len(main_corpus_BEN)//(weight+1))]
train_corpus_target = main_corpus_target_BEN[:(weight*len(main_corpus_BEN)//(weight+1))]
test_corpus = main_corpus_MAL[(len(main_corpus_MAL)-(len(main_corpus_MAL)//(weight+1))):]
test_corpus_target = main_corpus_target_MAL[(len(main_corpus_MAL)-len(main_corpus_MAL)//(weight+1)):]
def size_mb(docs):
return sum(len(s.encode('utf-8')) for s in docs) / 1e6
# size of datasets
train_corpus_size_mb = size_mb(train_corpus)
test_corpus_size_mb = size_mb(test_corpus)
print("%d documents - %0.3fMB (training set)" % (
len(train_corpus_target), train_corpus_size_mb))
print("%d documents - %0.3fMB (test set)" % (
len(test_corpus_target), test_corpus_size_mb))
print("%d categories" % len(my_categories))
print()
print("Benign Traces: "+str(benCOUNT)+" traces")
print("Malicious Traces: "+str(malCOUNT)+" traces")
print()
print("Extracting features from the training data using a sparse vectorizer...")
t0 = time()
vectorizer = TfidfVectorizer(ngram_range=(nGRAM1, nGRAM2), min_df=1, use_idf=True, smooth_idf=True) ##############
analyze = vectorizer.build_analyzer()
X_train = vectorizer.fit_transform(train_corpus)
duration = time() - t0
print("done in %fs at %0.3fMB/s" % (duration, train_corpus_size_mb / duration))
print("n_samples: %d, n_features: %d" % X_train.shape)
print()
print("Extracting features from the test data using the same vectorizer...")
t0 = time()
X_test = vectorizer.transform(test_corpus)
duration = time() - t0
print("done in %fs at %0.3fMB/s" % (duration, test_corpus_size_mb / duration))
print("n_samples: %d, n_features: %d" % X_test.shape)
print()
The output is:
Loading system call database for categories:
['benign', 'malware']
177 documents - 45.926MB (training set)
44 documents - 12.982MB (test set)
2 categories
Benign Traces: 72 traces
Malicious Traces: 150 traces
Extracting features from the training data using a sparse vectorizer...
done in 7.831695s at 5.864MB/s
n_samples: 177, n_features: 603170
Extracting features from the test data using the same vectorizer...
done in 1.624100s at 7.993MB/s
n_samples: 44, n_features: 603170
Now for the learning section I'm trying to use sklearn OneClassSVM:
print("==================\n")
print("Training: ")
classifier = OneClassSVM(kernel='linear', gamma='auto')
classifier.fit(X_test)
fraud_pred = classifier.predict(X_test)
unique, counts = np.unique(fraud_pred, return_counts=True)
print (np.asarray((unique, counts)).T)
fraud_pred = pd.DataFrame(fraud_pred)
fraud_pred= fraud_pred.rename(columns={0: 'prediction'})
main_corpus_target = pd.DataFrame(main_corpus_target)
main_corpus_target= main_corpus_target.rename(columns={0: 'Category'})
this the output to fraud_pred and main_corpus_target
prediction
0 1
1 -1
2 1
3 1
4 1
5 -1
6 1
7 -1
...
30 rows * 1 column
====================
Category
0 1
1 1
2 1
3 1
4 1
...
217 0
218 0
219 0
220 0
221 0
222 rows * 1 column
but when i try to calculate TP,TN,FP,FN:
##Performance check of the model
TP = FN = FP = TN = 0
for j in range(len(main_corpus_target)):
if main_corpus_target['Category'][j]== 0 and fraud_pred['prediction'][j] == 1:
TP = TP+1
elif main_corpus_target['Category'][j]== 0 and fraud_pred['prediction'][j] == -1:
FN = FN+1
elif main_corpus_target['Category'][j]== 1 and fraud_pred['prediction'][j] == 1:
FP = FP+1
else:
TN = TN +1
print (TP, FN, FP, TN)
I get this error:
KeyError Traceback (most recent call last)
<ipython-input-32-1046cc75ba83> in <module>
7 elif main_corpus_target['Category'][j]== 0 and fraud_pred['prediction'][j] == -1:
8 FN = FN+1
----> 9 elif main_corpus_target['Category'][j]== 1 and fraud_pred['prediction'][j] == 1:
10 FP = FP+1
11 else:
c:\users\alika\appdata\local\programs\python\python36\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
1069 key = com.apply_if_callable(key, self)
1070 try:
-> 1071 result = self.index.get_value(self, key)
1072
1073 if not is_scalar(result):
c:\users\alika\appdata\local\programs\python\python36\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4728 k = self._convert_scalar_indexer(k, kind="getitem")
4729 try:
-> 4730 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
4731 except KeyError as e1:
4732 if len(self) > 0 and (self.holds_integer() or self.is_boolean()):
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 30
1) I know the error is because it's trying to access a key that isn’t in a dictionary, but i can't just insert some numbers in the fraud_pred to handle this issue, any suggestions??
2) Am i doing anything wrong that they don't match?
3) I want to compare the results to other one class classification algorithms, Due to my method, what are the best ones that i can use??
Edit: Before calculating the metrics:
You could change your fit and predict functions to:
fraud_pred = classifier.fit_predict(X_test)
Also, your main_corpus_target and X_test should have the same length, can you put the code where you create main_corpus_target please?
its created it right after the benCOUNT += 1:
main_corpus_target = main_corpus_target_MAL main_corpus_target.extend(main_corpus_target_BEN)
This means that you are creating a main_corpus_target that includes MAL and BEN, and the error you get is:
ValueError: Found input variables with inconsistent numbers of samples: [30, 222]
The number of samples of fraud_pred is 30, so you should evaluate them with an array of 30. main_corpus_target contains 222.
Watching your code, I see that you want to evaluate the X_test, which is related to test_corpus X_test = vectorizer.transform(test_corpus). It would be better to compare your results to test_corpus_target, which is the target variable of your dataset and also has a length of 30.
These two lines that you have should output the same length:
test_corpus = main_corpus_MAL[(len(main_corpus_MAL)-(len(main_corpus_MAL)//(weight+1))):]
test_corpus_target = main_corpus_target_MAL[(len(main_corpus_MAL)-len(main_corpus_MAL)//(weight+1)):]
May I ask why are you calculating the TP, TN... by yourself?
You have a faster option:
Transform the fraud_pred series, replacing the -1 to 0.
Use the confusion matrix function that sklearn offers.
Use ravel to extract the values of the confusion matrix.
An example, after transforming the -1 to 0:
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(fraud_pred, main_corpus_target['Category'].values).ravel()
Also, if you are using the last pandas version:
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(fraud_pred, main_corpus_target['Category'].to_numpy()).ravel()

How to feed a dictionary to a Flux model in Julia

So I have a 20000x4 dataset, where the 4 columns have strings. The first is a description and the other three are categories, the last one being the one I wish to predict. I tokenized every word of the first column and saved it in a dictionary, with his respective Int value, and I changed the other columns to have numerical values. Now I'm having trouble to understand how to feed these data in a Flux model.
According to the documentation, I have to use a "collection of data to train with (usually a set of inputs x and target outputs y)". In the example, it separates the data x and y. But how can I make that with a dictionary plus two numeric columns?
Edit:
Here is a minimal example of what I have right now:
using WordTokenizers
using DataFrames
dataframe = DataFrame(Description = ["It has pointy ears", "It has round ears"], Size = ["Big", "Small"], Color = ["Black", "Yellow"], Category = ["Dog", "Cat"])
dict_x = Dict{String, Int64}()
dict_y = Dict{String, Int64}()
function words_to_numbers(data, column, dict)
i = 1
for row in range(1, stop=size(data, 1))
array_of_words = tokenize(data[row, column])
for (index, word) in enumerate(array_of_words)
if haskey(dict, word)
continue
else
dict[word] = i
i += 1
end
end
end
end
function categories_to_numbers(data, column, dict)
i = 1
for row in range(1, stop=size(data, 1))
if haskey(dict, data[row, column])
continue
else
dict[data[row, column]] = i
i += 1
end
end
end
words_to_numbers(dataframe, 1, dict_x)
categories_to_numbers(dataframe, 4, dict_y)
I want to use dict_x and dict_y as my input and output for a Flux model
Consider this example:
using DataFrames
df = DataFrame()
df.food = rand(["apple", "banana", "orange"], 20)
multiplier(fruit) = (1 + (0.1 * rand())) * (fruit == "apple" ? 95 :
fruit == "orange" ? 45 : 105)
foodtoken(f) = (fruit == "apple" ? 0 : fruit == "orange" ? 2 : 3)
df.calories = multiplier.(df.food)
foodtoken(f) = (fruit == "apple" ? 0 : fruit == "orange" ? 2 : 3)
fooddict = Dict(fruit => (fruit == "apple" ? 0 : fruit == "orange" ? 2 : 3)
for fruit in df.food)
Now we can add the token numeric values to the dataframe:
df.token = map(x -> fooddict[x], df.food)
println(df)
Now you should be able to run the prediction with df.token as an input and df.calories as an output.
========== addendum after you posted further code: ===========
With your modified example, you just need a helper function:
function colvalue(s, dict)
total = 0
for (k, v) in dict
if occursin(k, s)
total += 10^v
end
end
total
end
words_to_numbers(dataframe, 1, dict_x)
categories_to_numbers(dataframe, 4, dict_y)
dataframe.descripval = map(x -> colvalue(x, dict_x), dataframe.Description)
dataframe.catval = map(x -> colvalue(x, dict_y), dataframe.Category)
println(dataframe)

Returning a list of nearest neighbors from KNN

I am trying to use a KNN model to show the closest related brands to brand X. I have read in my data and transposed it so that the format is like this:
User1 User2 User3 User4 User5
Brand1 1 0 0 0 1
Brand2 0 0 0 1 1
Brand3 0 0 1 1 1
Brand4 1 1 1 0 1
Brand5 0 0 0 1 1
I've then defined my model:
from sklearn.neighbors import NearestNeighbors
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(df_mini)
Then I use the following code to list the nearest 5 brands to a randomly selected brand:
query_index = np.random.choice(df_mini.shape[0])
distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)
for i in range(0, len(distances.flatten())):
if i == 0:
print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
else:
print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
Returning sample results like this:
Recommendations for BRAND_X:
1: BRAND_a, with distance of 1.0:
2: BRAND_b, with distance of 1.0:
3: BRAND_c, with distance of 1.0:
4: BRAND_d, with distance of 1.0:
5: BRAND_e, with distance of 1.0:
All my results are showing all brands with a distance of 1.0, where have i gone wrong in my code for this to be the case? I have tried increasing the size of the sample data and this remains the same, which makes me feel it's a code error rather than a data quirk?
EDIT: Here is a fuller sample of my code:
import pandas as pd
df = pd.read_csv('sample.csv')
print(df.head())
df_mini = df[:5000]
df_mini = df_mini.transpose()
df_mini = df_mini.drop('UserID',axis=0)
from sklearn.neighbors import NearestNeighbors
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(df_mini)
query_index = np.random.choice(df_mini.shape[0])
distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)
for i in range(0, len(distances.flatten())):
if i == 0:
print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
else:
print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
Sample data file:
https://drive.google.com/open?id=19KRJDGrsLNpDD0WNAz4be76O66fGmQtJ

Minimum Change Maker Returning Optimal Solution and No Solution

I need Help adding a if clause to my Change Maker, so that if say I have denominations of coins that can't equal the input coin value. For Example I have Coins worth 2,4,6 and I have a Value of 1. I Want it to return Change Not Possible I tried adding a clause to it below but when I test it I get 1.#INF
I also am curious how I can find the optimal coin solution, So on top of saying the minimum number of coins it returns the optimal coin setup if there is one.
function ChangeMaking(D,n)
--[[
//Applies dynamic programming to find the minimum number of coins
//of denominations d1< d2 < . . . < dm where d1 = 1 that add up to a
//given amount n
//Input: Positive integer n and array D[1..m] of increasing positive
// integers indicating the coin denominations where D[1]= 1
//Output: The minimum number of coins that add up to n
]]
F = {} -- F is List Array of Coins
m = tablelength(D)
F[0] = 0
for i =1,n do
temp = math.huge
j = 1
while j <= m and i >= D[j] do
temp = math.min(F[ i - D[j] ], temp)
j = j + 1
end
F[i] = temp + 1
end
--I wanted to Catch the failed Solution here but I return 1.#INF instead
--if F[n] <= 0 and F[n] == 1.#INF then print("No Change Possible") return end
return F[n]
end
function main()
--[[
//Prints a Greeting, Asks for Denominations separated by spaces.
// Iterates through the input and assigns values to table
// Table is then input into ChangeMaker, and a while loop takes an n value for user input.
// User Enters 0 to end the Loop
]]
io.write("Hello Welcome the to Change Maker - LUA Edition\nEnter a series of change denominations, separated by spaces: ")
input = io.read()
deno = {}
for num in input:gmatch("%d+") do table.insert(deno,tonumber(num)) end
local i = 1
while i ~= 0 do
io.write("Please Enter Total for Change Maker, When Finished Enter 0 to Exit: ")
input2 = io.read("*n")
if input2 ~= 0 then io.write("\nMinimum # of Coins: " .. ChangeMaking(deno,input2).."\n") end
if input2 == 0 then i=0 print("0 Entered, Exiting Change Maker") end
end
end
function tablelength(T)
--[[
//Function for grabbing the total length of a table.
]]
local count = 0
for _ in pairs(T) do count = count + 1 end
return count
end
main()

Batch processing in Torch with ClassNLLCriterion

I'm trying to implement a simple NN in Torch to learn more about it. I created a very simple dataset: binary numbers from 0 to 15 and my goal is to classify the numbers into two classes - class 1 are numbers 0-3 and 12-15, class 2 are the remaining ones. The following code is what i have now (i have removed the data loading routine only):
require 'torch'
require 'nn'
data = torch.Tensor( 16, 4 )
class = torch.Tensor( 16, 1 )
network = nn.Sequential()
network:add( nn.Linear( 4, 8 ) )
network:add( nn.ReLU() )
network:add( nn.Linear( 8, 2 ) )
network:add( nn.LogSoftMax() )
criterion = nn.ClassNLLCriterion()
for i = 1, 300 do
prediction = network:forward( data )
--print( "prediction: " .. tostring( prediction ) )
--print( "class: " .. tostring( class ) )
loss = criterion:forward( prediction, class )
network:zeroGradParameters()
grad = criterion:backward( prediction, class )
network:backward( data, grad )
network:updateParameters( 0.1 )
end
This is how the data and class Tensors look like:
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
[torch.DoubleTensor of size 16x4]
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
[torch.DoubleTensor of size 16x1]
Which is what I expect it to be. However when running this code, i get the following error on line loss = criterion:forward( prediction, class ):
torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:69: attempt to
perform arithmetic on a nil value
When i modify the training routine like this (processing a single data point at a time instead of all 16 in a batch) it works and the network successfully learns to recognize the two classes:
for k = 1, 300 do
for i = 1, 16 do
prediction = network:forward( data[i] )
--print( "prediction: " .. tostring( prediction ) )
--print( "class: " .. tostring( class ) )
loss = criterion:forward( prediction, class[i] )
network:zeroGradParameters()
grad = criterion:backward( prediction, class[i] )
network:backward( data[i], grad )
network:updateParameters( 0.1 )
end
end
I'm not sure what might be wrong with the "batch processing" i'm trying to do. A brief look at the ClassNLLCriterion didn't help, it seems i'm giving it the expected input (see below), but it still fails. The input it receives (prediction and class Tensors) looks like this:
-0.9008 -0.5213
-0.8591 -0.5508
-0.9107 -0.5146
-0.8002 -0.5965
-0.9244 -0.5055
-0.8581 -0.5516
-0.9174 -0.5101
-0.8040 -0.5934
-0.9509 -0.4884
-0.8409 -0.5644
-0.8922 -0.5272
-0.7737 -0.6186
-0.9422 -0.4939
-0.8405 -0.5648
-0.9012 -0.5210
-0.7820 -0.6116
[torch.DoubleTensor of size 16x2]
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
[torch.DoubleTensor of size 16x1]
Can someone help me out here? Thanks.
Experience has shown that nn.ClassNLLCriterion expects target to be a 1D tensor of size batch_size or a scalar. Your class is a 2D one (batch_size x 1) but class[i] is 1D, that's why your non-batch version works.
So, this will solve your problem:
class = class:view(-1)
Alternatively, you can replace
network:add( nn.LogSoftMax() )
criterion = nn.ClassNLLCriterion()
with the equivalent:
criterion = nn.CrossEntropyCriterion()
The interesting thing is that nn.CrossEntropyCriterion is also able to take a 2D tensor. Why is nn.ClassNLLCriterion not?

Resources