display nodes dfs algorithm depth - graph-algorithm

during the creation of the adjacent nodes I have created a vector from where to view the adjacent nodes until you reach the destination node but when you create do not understand why because I delete it at the end.
vettore_nuovo_percorso =
1 2 4 5
vettore_nuovo_percorso =
1 2 4 5
nodi =
1 2 4 5
ans =
1 1 1 1 1 1 1
0 2 4 2 2 2 2
0 0 0 3 4 4 4
0 0 0 0 0 3 5
nodi_sorgenti =
3 5
nodi_sorgenti =
3 5
'BLACK'
'BLACK'
'BLACK'
'BLACK'
'BLACK'
ans =
1 1 1 1 1 1 1
0 2 4 2 2 2 2
0 0 0 3 4 4 4
0 0 0 0 0 3 5
conta_nodo_trovato =
1
crucial aspect of the code
vector path should not vanish
vettore_nuovo_percorso =
[]
nodi =
[]
ans =
1 1 1
0 2 4
nodi_sorgenti =
2 4
nodi_sorgenti =
2 4
'BLACK'
'BLACK'
'WHITE'
'BLACK'
'WHITE'
ans =
1 1 1
0 2 4
function [vertice,nodi]=DFS_Visit(edges,vertices,self,nodi_visitati,vb,conta_impo_colore,conta_nodo_trovato)
%con un ciclo copio sul vettore nodi_sorgenti tutte le destinazioni
%raggiugibili da dove poi viene passatto come terzo parametro alla chiamata
%DFS_Visit(edges,vertices,self,nodi_visitati,vb,conta_impo_colore,conta_nodo_trovato)
for j6=1:length(nodi_visitati)
vertices.conta_righe=vertices.conta_righe+1;
%copio tutte le possibili destinazioni sul vettore
%percorso solo una volta
display(vertices.conta_righe);
if vertices.conta_righe==1
for j=1:length(nodi_visitati)
self.vettore_percorso(vertices.conta_righe,j)=nodi_visitati(j6);
display(self.vettore_percorso);
end
end
[n1,m1]=size(self.vettore_percorso);
for j2=1:m1
%prelevo la sorgente
if nodi_visitati(j6)==self.vettore_percorso(n1,j2)
if nodi_visitati(j6)~=vb
conta_edge=0;
conta_nodi_sorgenti=0;
%dimensione riga e colonna
[n1,m1]=size(self.vettore_percorso);
display(self.vettore_percorso);
%se esiste prelevo tutte le destinazioni del nodo
edges2=ver.connectedEdges(edges,nodi_visitati(j6));
for j8=1:length(edges2)
%contatore nodi sorgenti
%inserimento destinzioni nel vettore nodi_sorgenti
conta_nodi_sorgenti=conta_nodi_sorgenti+1;
nodi_sorgenti(conta_nodi_sorgenti)=edges2(j8);
if nodi_sorgenti(conta_nodi_sorgenti)==vb
indice=conta_nodi_sorgenti;
end
%copio in un vettore in una nuova riga e nella stessa colonna il valore
%della destinazione
display('edg');
display(edges2);
end
[n2,m2]=size(self.vettore_percorso);
%copio sul vettore tutti i valori pari a
%valore delle destinazioni nella colonna j2
%trovata da dove risiede la sorgente trovata
if self.vettore_percorso(n1,j2)~=vb
for k=1:n2
for j=1:length(edges2)
self.vettore_percorso(k,m2+j)=self.vettore_percorso(k,j2);
display(self.vettore_percorso);
end
if k==n2
for k3=1:length(edges2)
%copio su tutti
%ivettori le
%destinazioni
%ragggiungibili
%sul vettore
%self.vettore_percorso
self.vettore_percorso(n2+1,m2+k3)=edges2(k3);
end
end
end
end
end
end
end
end
[n,m]=size(self.vettore_percorso);
%chiamata della funzione DFS
for j=1:length(nodi_sorgenti)
if nodi_sorgenti(j)~=vb
self.conta_v=1;
display(conta_visita);
display('nkl');
display(conta_visita);
display(nodi_sorgenti);
else
self.conta_v=0;
end
end
if self.conta_v==0
DFS.DFS_Visit(edges,vertices,self,nodi_sorgenti,vb,conta_impo_colore,0);
end
end
%copio sul vettore tutte i percorsi fino a raggiungere
%la destinazione
[n3,m3]=size(self.vettore_percorso);
for j3=1:m3
if self.vettore_percorso(j4,j3)==vb
conta_nodo_trovato=1;
[n3,m3]=size(self.vettore_percorso);
display('j3');
conta_visita=1;
display(j3);
conta_percorso=conta_percorso+1;
for j7=1:m3
if j7==j3
for j8=1:n3
vettore_nuovo_percorso(conta_percorso,j8)=self.vettore_percorso(j8,j3);
display(vb);
end
end
end
end
end
end

Related

What is this function doing in Lua?

function splitSat(str, pat, max, regex)
pat = pat or "\n" --Patron de búsqueda
max = max or #str
local t = {}
local c = 1
if #str == 0 then
return {""}
end
if #pat == 0 then
return nil
end
if max == 0 then
return str
end
repeat
local s, e = str:find(pat, c, not regex) -- Dentro del string str, busca el patron pat desde la posicion c
-- guarda en s el numero de inicio y en e el numero de fin
max = max - 1
if s and max < 0 then
if #(str:sub(c)) > 0 then -- Si la longitud de la porcion de string desde c hasta el final es mayor que 0
t[#t+1] = str:sub(c)
else values
t[#t+1] = "" --create a table with empty
end
else
if #(str:sub(c, s and s - 1)) > 0 then -- Si la longitud de la porcion de string str entre c y s
t[#t+1] = str:sub(c, s and s - 1)
else
t[#t+1] = "" --create a table with empty values
end
end
c = e and e + 1 or #str + 1
until not s or max < 0
return t
end
I'd like to know what this function is doing. I know that it makes a kind of table taking a string and a pattern. Especially I want to know what *t[#t+1] = str:sub(c, s and s - 1)* is doing.
From what I get, it splits a long string into substrings that match a certain pattern and ignores everything in between the pattern maches. For example, it might match the string 11aa22 to the pattern \d\d, resulting in the table ["11", "22"].
t[#t+1] = <something> inserts a value at the end of table t, it's the same as table.insert(t, <something>)
#t returns the length of an array (that is, a table with consecutive numeric indices), for example, #[1, 2, 3] == 3
str:sub(c, s and s - 1) takes advantage of many of luas features. s and s - 1 evaluates to s-1 if s is not nil, and nil otherwise. Just s-1 would throw an error if s was nil
10 and 10 - 1 == 9
10 - 1 == 9
nil and nil - 1 == nil
nil - 1 -> throws an error
str:sub(a, b) just returns a substring starting at a and ending at b (a and b being numeric indices)
("abcde"):sub(2,4) == "bcd"

Do math on string count (and text parsing with awk)

I have a 4 column file (input.file) with a header:
something1 something2 A B
followed by many 4-column rows with the same format (e.g.):
ID_00001 1 0 0
ID_00002 0 1 0
ID_00003 1 0 0
ID_00004 0 0 1
ID_00005 0 1 0
ID_00006 0 1 0
ID_00007 0 0 0
ID_00008 1 0 0
Where "1 0 0" is representative of "AA", "0 1 0" means "AB", and "0 0 1" means "BB"
First, I would like to create a 5th column to identify these representations:
ID_00001 1 0 0 AA
ID_00002 0 1 0 AB
ID_00003 1 0 0 AA
ID_00004 0 0 1 BB
ID_00005 0 1 0 AB
ID_00006 0 1 0 AB
ID_00007 0 0 0 no data
ID_00008 1 0 0 AA
Note that the A's and B's need to be parsed from columns 3 and 4 of the header row, as they are not always A and B.
Next, I want to "do math" on the counts for (the new) column 5 as follows:
(2BB + AB) / 2(AA + AB + BB)
Using the example, the math would give:
(2(1) + 3) / 2(3 + 3 + 1) = 5/14 = 0.357
which I would like to append to the end of the desired output file (output.file):
ID_00001 1 0 0 AA
ID_00002 0 1 0 AB
ID_00003 1 0 0 AA
ID_00004 0 0 1 BB
ID_00005 0 1 0 AB
ID_00006 0 1 0 AB
ID_00007 0 0 0 no data
ID_00008 1 0 0 AA
B_freq = 0.357
So far I have this:
awk '{ if ($2 = 1) {print $0, $5="AA"} \
else if($3 = 1) {print $0, $5="AB"} \
else if($4 = 1) {print $0, $5="BB"} \
else {print$0, $5="no data"}}' input.file > output.file
Obviously, I was not able to figure out how to parse the info from row 1 (the header row, edited out "column 1"), much less do the math.
Thanks guys!
a more structured approach...
NR==1 {a["100"]=$3$3; a["010"]=$3$4; a["001"]=$4$4; print; next}
{k=$2$3$4;
print $0, (k in a)?a[k]:"no data";
c[k]++}
END {printf "\nB freq = %.3f\n",
(2*c["001"]+c["010"]) / 2 / (c["100"]+c["010"]+c["001"])}
UPDATE
For non binary data you can follow the same logic with some pre-processing. Something like this should work in the main block:
for(i=2;i<5;i++) v[i]=(($i-0.9)^2<=0.1^2)?1:0;
k=v[2] v[3] v[4];
...
here the value is quantized at one for the range [0.8,1] and zero otherwise.
To capture "B" or substitute set h=$4 in the first block and use it as printf "\n%s freq...",h,(2*c...

How does Weka evaluate classifier model

I used random forest algorithm and got this result
=== Summary ===
Correctly Classified Instances 10547 97.0464 %
Incorrectly Classified Instances 321 2.9536 %
Kappa statistic 0.9642
Mean absolute error 0.0333
Root mean squared error 0.0952
Relative absolute error 18.1436 %
Root relative squared error 31.4285 %
Total Number of Instances 10868
=== Confusion Matrix ===
a b c d e f g h i <-- classified as
1518 1 3 1 0 14 0 0 4 | a = a
3 2446 0 0 0 1 1 27 0 | b = b
0 0 2942 0 0 0 0 0 0 | c = c
0 0 0 470 0 1 1 2 1 | d = d
9 0 0 9 2 19 0 3 0 | e = e
23 1 2 19 0 677 1 22 6 | f = f
4 0 2 0 0 13 379 0 0 | g = g
63 2 6 17 0 15 0 1122 3 | h = h
9 0 0 0 0 9 0 4 991 | i = i
I wonder how Weka evaluate errors(mean absolute error, root mean squared error, ...) using non numerical values('a', 'b', ...).
I mapped each classes to numbers from 0 to 8 and evaluated errors manually, but the evaluation was different from Weka.
How to reimplemen the evaluating steps of Weka?

How to combine similar fields parsed from CSV using Ruby

I am parsing baseball statistics from a CSV file, and I need to account for players who played for multiple teams within a season. Currently my code looks like this:
require 'CSV'
CSV.foreach("Batting-07-12-resaved.csv",{:headers=>:first_row}) do |row|
if row[7].to_i != 0 && row[5] != 0 && row[1].to_i == 2009
avg = row[7].to_f / row[5].to_f
puts row[0] + ": " + avg.round(3).to_s[1..-1]
end
end
The CSV headers look like this, and a player is identified by a key that sort of looks like their name and may recur based on different teams they played for (here are a few of the lines, copied from formatted file):
playerID yearID league teamID G AB R H 2B 3B HR RBI SB CS
aardsda01 2012 AL NYA 1
aardsda01 2010 AL SEA 53 0 0 0 0 0 0 0 0 0
aardsda01 2009 AL SEA 73 0 0 0 0 0 0 0 0 0
aardsda01 2008 AL BOS 47 1 0 0 0 0 0 0 0 0
aardsda01 2007 AL CHA 25 0 0 0 0 0 0 0 0 0
abadfe01 2012 NL HOU 37 7 0 1 0 0 0 0 0 0
abadfe01 2011 NL HOU 28 0 0 0 0 0 0 0 0 0
abadfe01 2010 NL HOU 22 1 0 0 0 0 0 0 0 0
abercre01 2008 NL HOU 34 55 10 17 5 0 2 5 5 2
abercre01 2007 NL FLO 35 76 16 15 3 0 2 5 7 1
abreubo01 2012 AL LAA 8 24 1 5 3 0 0 5 0 0
abreubo01 2012 NL LAN 92 195 28 48 8 1 3 19 6 2
So, for example, the bottom two lines, Bobby Abreu played for two different teams in the 2012 season.
How could I combine the numbers from these two rows under the same playerId for the 2012 season to calculate his 2012 batting average?
You need to keep a data structure that holds data about each playerID as you iterate through the CSV data. Using a hash would be perfect. ruby-doc.org manual page
require 'CSV'
# Hashes are built into ruby. Using a hash literal
# is more idomatic than h = Hash.new() */
h = {}
CSV.foreach("Batting-07-12-resaved.csv",{:headers=>:first_row}) do |row|
if row[7].to_i != 0 && row[5].to_i != 0 && row[1].to_i == 2009
playerData = h[row[0]]
if (!playerData)
playerData = [row[0], row[7].to_f, row[5].to_f]
else
playerData = [row[0], row[7].to_f+playerData[1], row[5].to_f+playerData[2]]
end
h[row[0]]=playerData
end
end
h.each {|key, value|
puts "#{value[0]} is #{value[1]/value[2]}"
}

Random Forest overfitting?

I'm facing the following problem: i'm training a random forest for binary prediction. the data is so structured:
> str(data)
'data.frame': 120269 obs. of 11 variables:
$ SeriousDlqin2yrs : num 1 0 0 0 0 0 0 0 0 0 ...
$ RevolvingUtilizationOfUnsecuredLines: num 0.766 0.957 0.658 0.234 0.907 ...
$ age : num 45 40 38 30 49 74 39 57 30 51 ...
$ NumberOfTime30.59DaysPastDueNotWorse: num 2 0 1 0 1 0 0 0 0 0 ...
$ DebtRatio : num 0.803 0.1219 0.0851 0.036 0.0249 ...
$ MonthlyIncome : num 9120 2600 3042 3300 63588 ...
$ NumberOfOpenCreditLinesAndLoans : num 13 4 2 5 7 3 8 9 5 7 ...
$ NumberOfTimes90DaysLate : num 0 0 1 0 0 0 0 0 0 0 ...
$ NumberRealEstateLoansOrLines : num 6 0 0 0 1 1 0 4 0 2 ...
$ NumberOfTime60.89DaysPastDueNotWorse: num 0 0 0 0 0 0 0 0 0 0 ...
$ NumberOfDependents : num 2 1 0 0 0 1 0 2 0 2 ...
- attr(*, "na.action")=Class 'omit' Named int [1:29731] 7 9 17 33 42 53 59 63 72 87 ...
.. ..- attr(*, "names")= chr [1:29731] "7" "9" "17" "33" ...
I split the data
index <- sample(1:nrow(data),round(0.75*nrow(data)))
train <- data[index,]
test <- data[-index,]
then i run the model and try to make predictions:
model.rf <- randomForest(as.factor(train[,1]) ~ ., data=train,ntree=1000,mtry=10,importance=TRUE)
pred.rf <- predict(model.rf, test, type = "prob")
rfpred <- c(1:22773)
rfpred[pred.rf[,1]<=0.5] <- "yes"
rfpred[pred.rf[,1]>0.5] <- "no"
rfpred <- factor(rfpred)
test[,1][test[,1]==1] <- "yes"
test[,1][test[,1]==0] <- "no"
test[,1] <- factor(test[,1])
confusionMatrix(as.factor(rfpred), as.factor(test$Y))
what I get is the following output:
> print(model.rf)
Call:
randomForest(formula = as.factor(train[, 1]) ~ ., data = train, ntree = 1000, mtry = 10, importance = TRUE)
Type of random forest: classification
Number of trees: 1000
No. of variables tried at each split: 10
OOB estimate of error rate: 0%
Confusion matrix:
0 1 class.error
0 43093 0 0
1 0 25225 0
> head(pred.rf)
0 1
45868.1 1 0
112445 1 0
39001 1 0
133443 1 0
137460 1 0
125835.1 1 0
> confusionMatrix(as.factor(rfpred), as.factor(test$Y))
Confusion Matrix and Statistics
Reference
Prediction no yes
no 14570 0
yes 0 8203
Accuracy : 1
95% CI : (0.9998, 1)
No Information Rate : 0.6398
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 1
Mcnemar's Test P-Value : NA
Sensitivity : 1.0000
Specificity : 1.0000
Pos Pred Value : 1.0000
Neg Pred Value : 1.0000
Prevalence : 0.6398
Detection Rate : 0.6398
Detection Prevalence : 0.6398
Balanced Accuracy : 1.0000
'Positive' Class : no
obviously the model cannot be so accurate!! what's wrong with my code?

Resources