How to interpret weka classification result J48 - machine-learning

I need help to interpret result in weka using the J48
I dont know how to explain the result, I am using the dataset Heart Disease Data Set from http://archive.ics.uci.edu/ml/datasets/Heart+Disease
And the J48 tree
Please help me, with some points importants for this analyse
my result is:
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: AnaliseCardiaca
Instances: 303
Attributes: 14
age
sex
cp
trestbps
chol
fbs
restecg
thalach
exang
oldpeak
slope
ca
thal
num
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree
cp <= 3
| sex <= 0: 0 (57.0/2.0)
| sex > 0
| | slope <= 1
| | | fbs <= 0
| | | | trestbps <= 152
| | | | | thalach <= 162
| | | | | | ca <= 1
| | | | | | | age <= 56: 0 (12.0/1.0)
| | | | | | | age > 56: 1 (3.0/1.0)
| | | | | | ca > 1: 1 (2.0)
| | | | | thalach > 162: 0 (27.0)
| | | | trestbps > 152: 1 (4.0/1.0)
| | | fbs > 0: 0 (9.0)
| | slope > 1
| | | slope <= 2
| | | | ca <= 0
| | | | | fbs <= 0
| | | | | | chol <= 261
| | | | | | | oldpeak <= 2.5: 0 (11.61/1.0)
| | | | | | | oldpeak > 2.5: 1 (3.0)
| | | | | | chol > 261: 1 (4.0)
| | | | | fbs > 0: 0 (4.0)
| | | | ca > 0
| | | | | thal <= 6: 1 (6.0/1.0)
| | | | | thal > 6
| | | | | | thalach <= 145: 0 (3.39)
| | | | | | thalach > 145: 1 (5.0/1.0)
| | | slope > 2: 0 (8.0/1.0)
cp > 3
| thal <= 3
| | ca <= 2
| | | exang <= 0
| | | | sex <= 0
| | | | | chol <= 304: 0 (14.0)
| | | | | chol > 304: 1 (3.0/1.0)
| | | | sex > 0
| | | | | ca <= 0: 0 (10.0/1.0)
| | | | | ca > 0: 1 (3.0)
| | | exang > 0
| | | | restecg <= 1
| | | | | slope <= 1: 0 (2.0)
| | | | | slope > 1: 1 (5.37)
| | | | restecg > 1
| | | | | ca <= 0: 0 (4.0)
| | | | | ca > 0
| | | | | | ca <= 1
| | | | | | | thalach <= 113: 0 (2.0)
| | | | | | | thalach > 113: 1 (4.0)
| | | | | | ca > 1: 0 (2.0)
| | ca > 2: 1 (4.0)
| thal > 3
| | fbs <= 0
| | | ca <= 0
| | | | chol <= 278: 0 (23.0/8.0)
| | | | chol > 278: 1 (6.0)
| | | ca > 0: 1 (46.0/12.0)
| | fbs > 0
| | | ca <= 1: 1 (3.88)
| | | ca > 1: 0 (11.75/4.75)
Number of Leaves : 31
Size of the tree : 61
Result img

If you are using Weka Explorer, you can right click on the result row in the results list (located on the left of the window under the start button). Then select visualize tree. This will display an image of the tree.
If you still want to understand the results as they are shown in your question:
The results are displayed as tree. The root of the tree starts at the left and the first feature used is called cp. If cp is smaller or equal to 3, then the next feature in the tree is sex and so on. You can see that when you split by sex and sex <= 0 you reach a prediction. The prediction is 0 and the (57/2) means that 57 observations in the training set end up at this path and 2 were incorrectly classified, i.e. 55 had the label 0 and 2 had the label 1.
Here is how the start of the tree looks like:
--------start---------
| |
| |
|cp > 3 | cp <= 3
_________|______ ____|__________
| | | |
|thal>3 |thal<=3 |sex>0 |sex<=0
| | | |
... ... ... prediction 0 57(55,2)

The AndreyF's explanation is good. I want to add some information.
Why does the tree have float numbers in its leaves? Can an instance (individual) be split and get a float value? (in the reality a person can not be split)
When the instance has all the attributes set perfectly then there isn't a problem. But when the instance has missing attributes, then the classifier (J48) doesn't know the way of the tree for that attribute.
For example, if an instance has its "oldpeak" attribute like a missing attribute then when it reaches the "chol <= 261" node (previous node to the "oldpeak" node) the classifier will divide the instance according to a probability and a percentage of the instance will go to "oldpeak <= 2.5" and the other percentage will go to "oldpeak > 2.5".
How does the classifier calculate that probability? It calculates through the instances that don't have the missing attribute for the actual node. For this example will be the "oldpeak" attribute.
If we have 25% instances with no missing "oldpeak" attribute that were classified in the "oldpeak <= 2.5" node, and we have 75% instances with no missing "oldpeak" attribute that were classified in the "oldpeak > 2.5" node then when the classifier wants to classify an instance with "oldpeak" attribute missing then the 25% of this instance will go through "oldpeak <= 2.5" and the rest (75%) will go through "oldpeak > 2.5".
You can try to remove instances with missing attributes and you will see that the tree will only have integer numbers instead of float numbers.
Thank you.

Related

How to select all fields greater than specific value in Influxdb

I have a database in Influxdb 1.7 with several columns and no tag key.
I want to receive only fields where the value is greater than 0 (zero)
(I'm using grafana to show)
like:
table1:
| time | A | B | C | D | E | F |....| AD | AE | ...
|167344...| 0 | 0 | 2 | 0 | 12 | 0 |....| 0 | 0 |
|167356... | 0 | 0 | 2 | 0 | 0 | 0 |....| 0 | 12 |
|167376... | 0 | 0 | 2 | 0 | 1 | 0 |....| 0 | 0 |
|167384... | 1 | 4 | 0 | 0 | 0 | 0 |....| 0 | 1 |
I would like to receive:
| time | A | B | AE |
|167384... | 1 | 4 | 1 |
I've tried:
select last("*") from "table1" where * > 0
select last(*) from "table1" where * > 0
select last(*) as bbb from "table1" where bbb > 0
and other similar queries, but doen't work

Cypher query aggregate sum of values

I have a Cypher query that shows the following output:
+----------------
| usid | count |
+----------------
| "000" | 1 |
| "000" | 0 |
| "000" | 0 |
| "001" | 1 |
| "001" | 1 |
| "001" | 0 |
| "002" | 2 |
| "002" | 2 |
| "002" | 0 |
| "003" | 4 |
| "003" | 2 |
| "003" | 2 |
| "004" | 4 |
| "004" | 4 |
| "004" | 4 |
+----------------
How can I get the below result with the condition SUM(count) <= 9.
+----------------
| usid | count |
+----------------
| "000" | 1 |
| "001" | 2 |
| "002" | 4 |
| "003" | 8 |
+----------------
Note: I have used the below query to get the 1st table data.
MATCH (us:USER)
WITH us
WHERE us.count <= 4
RETURN us.id as usid, us.count as count;
I don't know how you get your original data, so I will just use a WITH clause and assume the data is there:
// original data
WITH usid, count
// aggregate and filter
WITH usid, sum(count) as new_count
WHERE new_count <= 9
RETURN usid, new_count
Based on the updated question, the new query would look like:
MATCH (us:USER)
WHERE us.count <= 4
WITH us.id as usid, sum(us.count) as count
WHERE new_count <= 9
RETURN usid, count
˙˙˙

Finding root of a tree in a directed graph

I have a tree structure like node(1)->node(2)->node(3). I have name as an property used to retrieve a node.
Given a node say node(3), i wanna retrieve node(1).
Query tried :
MATCH (p:Node)-[:HAS*]->(c:Node) WHERE c.name = "node 3" RETURN p LIMIT 5
But, not able to get node 1.
Your query will not only return "node 1", but it should at least include one path containing it. It's possible to filter the paths to only get the one traversing all the way to the root, however:
MATCH (c:Node {name: "node 3"})<-[:HAS*0..]-(p:Node)
// The root does not have any incoming relationship
WHERE NOT (p)<-[:HAS]-()
RETURN p
Note the use of the 0 length, which matches all cases, including the one where the start node is the root.
Fun fact: even if you have an index on Node:name, it won't be used (unless you're using Neo4j 3.1, where it seems to be fixed since 3.1 Beta2 at least) and you have to explicitly specify it.
MATCH (c:Node {name: "node 3"})<-[:HAS*0..]-(p:Node)
USING INDEX c:Node(name)
WHERE NOT (p)<-[:HAS]-()
RETURN p
Using PROFILE on the first query (with a numerical id property instead of name):
+-----------------------+----------------+------+---------+-------------------------+----------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------------+----------------+------+---------+-------------------------+----------------------+
| +ProduceResults | 0 | 1 | 0 | p | p |
| | +----------------+------+---------+-------------------------+----------------------+
| +AntiSemiApply | 0 | 1 | 0 | anon[23], c -- p | |
| |\ +----------------+------+---------+-------------------------+----------------------+
| | +Expand(All) | 1 | 0 | 3 | anon[58], anon[67] -- p | (p)<-[:HAS]-() |
| | | +----------------+------+---------+-------------------------+----------------------+
| | +Argument | 1 | 3 | 0 | p | |
| | +----------------+------+---------+-------------------------+----------------------+
| +Filter | 1 | 3 | 3 | anon[23], c, p | p:Node |
| | +----------------+------+---------+-------------------------+----------------------+
| +VarLengthExpand(All) | 1 | 3 | 5 | anon[23], p -- c | (c)<-[:HAS*]-(p) |
| | +----------------+------+---------+-------------------------+----------------------+
| +Filter | 1 | 1 | 3 | c | c.id == { AUTOINT0} |
| | +----------------+------+---------+-------------------------+----------------------+
| +NodeByLabelScan | 3 | 3 | 4 | c | :Node |
+-----------------------+----------------+------+---------+-------------------------+----------------------+
Total database accesses: 18
and on the second one:
+-----------------------+----------------+------+---------+-------------------------+------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------------+----------------+------+---------+-------------------------+------------------+
| +ProduceResults | 0 | 1 | 0 | p | p |
| | +----------------+------+---------+-------------------------+------------------+
| +AntiSemiApply | 0 | 1 | 0 | anon[23], c -- p | |
| |\ +----------------+------+---------+-------------------------+------------------+
| | +Expand(All) | 1 | 0 | 3 | anon[81], anon[90] -- p | (p)<-[:HAS]-() |
| | | +----------------+------+---------+-------------------------+------------------+
| | +Argument | 1 | 3 | 0 | p | |
| | +----------------+------+---------+-------------------------+------------------+
| +Filter | 1 | 3 | 3 | anon[23], c, p | p:Node |
| | +----------------+------+---------+-------------------------+------------------+
| +VarLengthExpand(All) | 1 | 3 | 5 | anon[23], p -- c | (c)<-[:HAS*]-(p) |
| | +----------------+------+---------+-------------------------+------------------+
| +NodeUniqueIndexSeek | 1 | 1 | 2 | c | :Node(id) |
+-----------------------+----------------+------+---------+-------------------------+------------------+
Total database accesses: 13

Neo4j/Cypher effective pagination with order by over large sub-graph

I have following simple relationship between (:User) nodes.
(:User)-[:FOLLOWS {timestamp}]->(:User)
If I paginate followers ordered by FOLLOWS.timestamp I'm running into performance problems when someone has millions of followers.
MATCH (u:User {Id:{id}})<-[f:FOLLOWS]-(follower)
WHERE f.timestamp <= {timestamp}
RETURN follower
ORDER BY f.timestamp DESC
LIMIT 100
What is suggested approach for paginating big sets of data when ordering is required?
UPDATE
follower timestamp
---------------------------------------
id(1000000) 1455967905
id(999999) 1455967875
id(999998) 1455967234
id(999997) 1455967123
id(999996) 1455965321
id(999995) 1455964123
id(999994) 1455963645
id(999993) 1455963512
id(999992) 1455961343
....
id(2) 1455909382
id(1) 1455908432
I want to slice this list down using timestamp which set on :FOLLOWS relationship. If I want to return batches of 4 followers I take current timestamp first and return 4 most recent, then 1455967123 and 4 most recent and so on. In order to do this the whole list should be order by timestamp which results in performance issues on millions of records.
If you're looking for the most recent followers, i.e. where the timestamp is greater than a given time, it only has to traverse the most recent ones.
You can do it with (2) in 20ms
If you are really looking for the oldest (first) followers it makes sense to skip ahead and don't look at the timestamp of every of the million followers (which takes about 1s on my system, see (3)). If you skip ahead the time goes down to 230ms, see (1)
In general we can see that on my laptop it does 2M db-operations per core and second.
(1) Look at first / oldest followers
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> // skip ahead
> WITH f,follower SKIP 999000
> // do the actual check
> WITH f,follower WHERE f.ts < 500
> RETURN f, follower
> ORDER BY f.ts
> LIMIT 10;
+---------------------------------+
| f | follower |
+---------------------------------+
| :FOLLOWS[0]{ts:1} | Node[1]{} |
...
+---------------------------------+
10 rows
243 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 10 | 0 | anon[142], anon[155], anon[158], anon[178], f, follower, f, follower, u | anon[155]; anon[158] |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Top | 1 | 10 | 0 | anon[142], anon[155], anon[158], anon[178], f, follower, u | Literal(10); |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 499 | 499 | anon[142], anon[155], anon[158], anon[178], f, follower, u | anon[155]; anon[158]; anon[155].ts |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 499 | 0 | anon[142], anon[155], anon[158], f, follower, u | f; follower |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Filter | 1 | 499 | 0 | anon[142], f, follower, u | anon[142] |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 1000 | 1000 | anon[142], f, follower, u | f; follower; f.ts < { AUTOINT2} |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Skip | 1 | 1000 | 0 | f, follower, u | { AUTOINT1} |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Expand(All) | 1 | 1000000 | 1000001 | f, follower, u | (u)<-[ f#12:FOLLOWS]-( follower#24) |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
Total database accesses: 1001501
(2) Look at most recent followers
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts > 999500
> RETURN f, follower
> LIMIT 10;
+----------------------------------------------+
| f | follower |
+----------------------------------------------+
| :FOLLOWS[999839]{ts:999840} | Node[999840]{} |
...
+----------------------------------------------+
10 rows
23 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Limit | 1 | 10 | 0 | f, follower, u | Literal(10) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Filter | 1 | 10 | 16394 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts > { AUTOINT1}) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Expand(All) | 1 | 16394 | 16395 | f, follower, u | (u)<-[f:FOLLOWS]-(follower) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
Total database accesses: 32790
(3) Find oldest followers without skipping ahead
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts < 500
> RETURN f, follower
> LIMIT 10;
+-------------------------------------+
| f | follower |
+-------------------------------------+
...
| :FOLLOWS[491]{ts:492} | Node[492]{} |
+-------------------------------------+
10 rows
1008 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Limit | 1 | 10 | 0 | f, follower, u | Literal(10) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Filter | 1 | 10 | 999498 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts < { AUTOINT1}) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Expand(All) | 1 | 999498 | 999499 | f, follower, u | (u)<-[f:FOLLOWS]-(follower) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
Total database accesses: 1998998

Truth Table For Switching Functions

Can someone explain how these concept works?
I have 1 question. But I don't know have any ideas on constructing the truth table.
f(A,B,C) = AB + A’C
The answer given was ABC + ABC' + A'BC + A'B'C
And i have no idea how it get there. :-(
1. Create a column for each of the inputs, each intermediate functions, and the final function:
A B C | AB | A' | A'C | AB + A'C
--------------------------------
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
2. Enumerate all input possibilities, and start filling in the intermediate function values and then the final function value:
A B C | AB | A' | A'C | AB + A'C
--------------------------------
0 0 0 | 0 | 1 | 0 | 0
0 0 1 | | | |
0 1 0 | | | |
0 1 1 | | | |
1 0 0 | | | |
1 0 1 | | | |
1 1 0 | | | |
1 1 1 | | | |
3. Now, you finish the truth table.
Update per OP's edit of question:
The "answer given" can be reduced as follows using Boolean Algebra:
ABC + ABC' + A'BC + A'B'C
AB(C + C') + A'C(B + B')
AB + A'C
...which is the same as the given f(A,B,C). Not sure why ABC + ABC' + A'BC + A'B'C would be considered to be the "answer," but this does show equivalence between the two formulae.

Resources