I am seeking your insight on the differences between the var and regress commands in Stata. Given the same variables and the same number of lags, what makes these models different (judging by the differences in their outputs)?
var y x1 x2, lags(1/7)
regress L(1/7).y L(1/7).x1 L(1/7).x2
The series were transformed into stationary beforehand.
var y x1 x2, lags(1/7)
Vector autoregression
Sample: 9 - 159 No. of obs = 151
Log likelihood = -2461.622 AIC = 33.47844
FPE = 7.00e+10 HQIC = 34.01421
Det(Sigma_ml) = 2.90e+10 SBIC = 34.79725
Equation Parms RMSE R-sq chi2 P>chi2
---------------------------------------------------------------
y 22 627.086 0.4632 130.3037 0.0000
x1 22 16.4642 0.4150 107.1156 0.0000
x2 22 34.8932 0.3821 93.37647 0.0000
----------------------------------------------------------------
---------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
y |
y |
L1. | -.8034219 .0870606 -9.23 0.000 -.9740576 -.6327862
L2. | -.829339 .1112633 -7.45 0.000 -1.047411 -.611267
L3. | -.6881525 .1268751 -5.42 0.000 -.9368231 -.4394818
L4. | -.5958702 .1316349 -4.53 0.000 -.8538699 -.3378706
L5. | -.4941909 .1285658 -3.84 0.000 -.7461752 -.2422066
L6. | -.3478784 .1130961 -3.08 0.002 -.5695426 -.1262142
L7. | -.1273106 .0892459 -1.43 0.154 -.3022294 .0476083
|
x1 |
L1. | 2.814694 4.697886 0.60 0.549 -6.392995 12.02238
L2. | 13.40258 5.712821 2.35 0.019 2.205654 24.5995
L3. | 13.41822 6.119334 2.19 0.028 1.424542 25.41189
L4. | 7.634082 6.373183 1.20 0.231 -4.857128 20.12529
L5. | 2.001271 5.898859 0.34 0.734 -9.56028 13.56282
L6. | 3.421364 5.569404 0.61 0.539 -7.494468 14.3372
L7. | 4.068799 4.46953 0.91 0.363 -4.691319 12.82892
|
x2 |
L1. | -.5105249 2.210646 -0.23 0.817 -4.843312 3.822262
L2. | -2.108354 2.495037 -0.85 0.398 -6.998537 2.78183
L3. | -1.442043 2.592775 -0.56 0.578 -6.523789 3.639704
L4. | -.9065004 2.620667 -0.35 0.729 -6.042914 4.229913
L5. | -.0001391 2.53355 -0.00 1.000 -4.965806 4.965528
L6. | 2.146481 2.427015 0.88 0.376 -2.610381 6.903343
L7. | -1.118613 2.118762 -0.53 0.598 -5.271309 3.034084
|
_cons | 22.43668 48.04635 0.47 0.641 -71.73243 116.6058
----------------+----------------------------------------------------------------
x1 |
y |
L1. | .0036968 .0022858 1.62 0.106 -.0007833 .0081768
L2. | .0012158 .0029212 0.42 0.677 -.0045097 .0069413
L3. | .0035081 .0033311 1.05 0.292 -.0030208 .010037
L4. | .0032596 .0034561 0.94 0.346 -.0035142 .0100334
L5. | .0005852 .0033755 0.17 0.862 -.0060307 .007201
L6. | -.0018743 .0029693 -0.63 0.528 -.0076941 .0039455
L7. | -.0040389 .0023432 -1.72 0.085 -.0086314 .0005537
|
x1 |
L1. | -.5753736 .1233434 -4.66 0.000 -.8171223 -.3336249
L2. | -.3020477 .1499906 -2.01 0.044 -.5960239 -.0080714
L3. | -.3313213 .1606637 -2.06 0.039 -.6462164 -.0164263
L4. | -.1718872 .1673285 -1.03 0.304 -.4998451 .1560707
L5. | -.1834757 .1548751 -1.18 0.236 -.4870253 .1200739
L6. | .0489376 .1462252 0.33 0.738 -.2376586 .3355337
L7. | .1766427 .1173479 1.51 0.132 -.053355 .4066404
|
x2 |
L1. | -.1051509 .0580407 -1.81 0.070 -.2189086 .0086069
L2. | -.1006968 .0655074 -1.54 0.124 -.229089 .0276954
L3. | -.0906552 .0680736 -1.33 0.183 -.2240769 .0427665
L4. | -.1436015 .0688059 -2.09 0.037 -.2784585 -.0087445
L5. | -.0930764 .0665186 -1.40 0.162 -.2234505 .0372976
L6. | -.1018913 .0637215 -1.60 0.110 -.2267832 .0230006
L7. | -.1194924 .0556283 -2.15 0.032 -.2285218 -.0104629
|
_cons | 1.918878 1.261461 1.52 0.128 -.553541 4.391296
----------------+----------------------------------------------------------------
x2 |
y |
L1. | .0010281 .0048444 0.21 0.832 -.0084667 .0105228
L2. | -.0038838 .0061911 -0.63 0.530 -.0160181 .0082505
L3. | .0035605 .0070598 0.50 0.614 -.0102764 .0173974
L4. | .0041767 .0073246 0.57 0.569 -.0101793 .0185327
L5. | .0007593 .0071538 0.11 0.915 -.013262 .0147806
L6. | -.0027897 .0062931 -0.44 0.658 -.0151239 .0095445
L7. | .0018272 .004966 0.37 0.713 -.0079059 .0115603
|
x1 |
L1. | .3332696 .2614066 1.27 0.202 -.179078 .8456172
L2. | .6160613 .3178811 1.94 0.053 -.0069742 1.239097
L3. | .4139762 .3405009 1.22 0.224 -.2533934 1.081346
L4. | .2837896 .3546259 0.80 0.424 -.4112645 .9788436
L5. | .4448436 .3282329 1.36 0.175 -.1984811 1.088168
L6. | .6417029 .3099009 2.07 0.038 .0343084 1.249098
L7. | .4719593 .2487001 1.90 0.058 -.0154839 .9594025
|
x2 |
L1. | -.7465681 .123008 -6.07 0.000 -.9876594 -.5054769
L2. | -.6760273 .1388325 -4.87 0.000 -.948134 -.4039206
L3. | -.4367948 .144271 -3.03 0.002 -.7195607 -.1540289
L4. | -.4889316 .145823 -3.35 0.001 -.7747393 -.2031238
L5. | -.5310379 .1409755 -3.77 0.000 -.8073447 -.254731
L6. | -.4416263 .1350475 -3.27 0.001 -.7063146 -.1769381
L7. | -.3265204 .1178952 -2.77 0.006 -.5575907 -.09545
|
_cons | 3.568261 2.673465 1.33 0.182 -1.671634 8.808155
---------------------------------------------------------------------------------
regress L(1/7).y L(1/7).x1 L(1/7).x2
Source | SS df MS Number of obs = 151
-------------+------------------------------ F( 20, 130) = 7.23
Model | 49291082.3 20 2464554.11 Prob > F = 0.0000
Residual | 44322342.8 130 340941.099 R-squared = 0.5265
-------------+------------------------------ Adj R-squared = 0.4537
Total | 93613425.1 150 624089.501 Root MSE = 583.9
---------------------------------------------------------------------------------
L.y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
y |
L2. | -.8074369 .0868829 -9.29 0.000 -.9793244 -.6355494
L3. | -.7857941 .1076428 -7.30 0.000 -.9987525 -.5728357
L4. | -.6747462 .1186733 -5.69 0.000 -.9095271 -.4399654
L5. | -.5758927 .1192639 -4.83 0.000 -.811842 -.3399433
L6. | -.4199846 .1078154 -3.90 0.000 -.6332845 -.2066846
L7. | -.2444889 .0873128 -2.80 0.006 -.4172267 -.071751
|
x1 |
L1. | 9.174249 4.663798 1.97 0.051 -.0525176 18.40102
L2. | 6.026435 5.730833 1.05 0.295 -5.311334 17.3642
L3. | 13.03098 6.057813 2.15 0.033 1.046324 25.01564
L4. | 13.01178 6.318175 2.06 0.041 .5120225 25.51153
L5. | 6.146548 5.91807 1.04 0.301 -5.561646 17.85474
L6. | .8687361 5.610159 0.15 0.877 -10.23029 11.96776
L7. | -.6015264 4.502342 -0.13 0.894 -9.508873 8.30582
|
x2 |
L1. | 2.709283 2.214315 1.22 0.223 -1.671474 7.090041
L2. | 2.947753 2.500195 1.18 0.241 -1.998585 7.89409
L3. | .7449778 2.611172 0.29 0.776 -4.420914 5.910869
L4. | .8159876 2.639117 0.31 0.758 -4.405191 6.037166
L5. | 1.839693 2.54722 0.72 0.471 -3.199677 6.879062
L6. | 2.267241 2.436901 0.93 0.354 -2.553876 7.088358
L7. | 4.198018 2.102467 2.00 0.048 .0385389 8.357497
|
_cons | -3.078699 48.40164 -0.06 0.949 -98.83556 92.67816
---------------------------------------------------------------------------------
For me they have two different specifications .
The first one (VAR) is estimating the impact of the lags of three independent variables to the dependent variable (y, x1, x2) one at time. The second one is estimating the impact of Lags from 2:7 of y + Lags from (1:7) of x1 + Lags from (1:7) of x2 on the dependent variable L(y). So they have two different dependent variables and independent variables on the y side. See equations below (The first three are for var code and the last one is for regress code):
The OLS specification does not take into account the feedback effect that exists between the variables in your model. Although you maybe interested in the effect of say X1 on y, but X1 is also affect by y and its lag values- the feedback effect. for this reason, using OLS will result in spurious regression.
Related
I would like to find the median of a range of cells that correspond to a condition in another column.
For the example pictured below that would be finding the median of the numbers in Column2 that have the same numbers in Column1. This is shown in Column3. The number of rows is dynamic, hence needing to associate this to Column 1.
Here is an example table:
|---------------------|------------------|------------------|
| Column1 | Column2 | Column3 |
|---------------------|------------------|------------------|
| 1 | 0.1 | 0.25 |
|---------------------|------------------|------------------|
| 1 | 0.2 | 0.25 |
|---------------------|------------------|------------------|
| 1 | 0.3 | 0.25 |
|---------------------|------------------|------------------|
| 1 | 0.4 | 0.25 |
|---------------------|------------------|------------------|
| 2 | 1 | 1.5 |
|---------------------|------------------|------------------|
| 2 | 2 | 1.5 |
|---------------------|------------------|------------------|
| 3 | 1.1 | 1.1 |
|---------------------|------------------|------------------|
I've tried using INDEX, MATCH like this
=INDEX(MEDIAN(B:B), MATCH(A1,A:A,0))
but it's (obviously) incorrect. Any help appreciated!
google-spreadsheet
Use Filter to return the correct range to the Median function.
=median(filter($B$2:$B,$A$2:$A=A2))
I need help to interpret result in weka using the J48
I dont know how to explain the result, I am using the dataset Heart Disease Data Set from http://archive.ics.uci.edu/ml/datasets/Heart+Disease
And the J48 tree
Please help me, with some points importants for this analyse
my result is:
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: AnaliseCardiaca
Instances: 303
Attributes: 14
age
sex
cp
trestbps
chol
fbs
restecg
thalach
exang
oldpeak
slope
ca
thal
num
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree
cp <= 3
| sex <= 0: 0 (57.0/2.0)
| sex > 0
| | slope <= 1
| | | fbs <= 0
| | | | trestbps <= 152
| | | | | thalach <= 162
| | | | | | ca <= 1
| | | | | | | age <= 56: 0 (12.0/1.0)
| | | | | | | age > 56: 1 (3.0/1.0)
| | | | | | ca > 1: 1 (2.0)
| | | | | thalach > 162: 0 (27.0)
| | | | trestbps > 152: 1 (4.0/1.0)
| | | fbs > 0: 0 (9.0)
| | slope > 1
| | | slope <= 2
| | | | ca <= 0
| | | | | fbs <= 0
| | | | | | chol <= 261
| | | | | | | oldpeak <= 2.5: 0 (11.61/1.0)
| | | | | | | oldpeak > 2.5: 1 (3.0)
| | | | | | chol > 261: 1 (4.0)
| | | | | fbs > 0: 0 (4.0)
| | | | ca > 0
| | | | | thal <= 6: 1 (6.0/1.0)
| | | | | thal > 6
| | | | | | thalach <= 145: 0 (3.39)
| | | | | | thalach > 145: 1 (5.0/1.0)
| | | slope > 2: 0 (8.0/1.0)
cp > 3
| thal <= 3
| | ca <= 2
| | | exang <= 0
| | | | sex <= 0
| | | | | chol <= 304: 0 (14.0)
| | | | | chol > 304: 1 (3.0/1.0)
| | | | sex > 0
| | | | | ca <= 0: 0 (10.0/1.0)
| | | | | ca > 0: 1 (3.0)
| | | exang > 0
| | | | restecg <= 1
| | | | | slope <= 1: 0 (2.0)
| | | | | slope > 1: 1 (5.37)
| | | | restecg > 1
| | | | | ca <= 0: 0 (4.0)
| | | | | ca > 0
| | | | | | ca <= 1
| | | | | | | thalach <= 113: 0 (2.0)
| | | | | | | thalach > 113: 1 (4.0)
| | | | | | ca > 1: 0 (2.0)
| | ca > 2: 1 (4.0)
| thal > 3
| | fbs <= 0
| | | ca <= 0
| | | | chol <= 278: 0 (23.0/8.0)
| | | | chol > 278: 1 (6.0)
| | | ca > 0: 1 (46.0/12.0)
| | fbs > 0
| | | ca <= 1: 1 (3.88)
| | | ca > 1: 0 (11.75/4.75)
Number of Leaves : 31
Size of the tree : 61
Result img
If you are using Weka Explorer, you can right click on the result row in the results list (located on the left of the window under the start button). Then select visualize tree. This will display an image of the tree.
If you still want to understand the results as they are shown in your question:
The results are displayed as tree. The root of the tree starts at the left and the first feature used is called cp. If cp is smaller or equal to 3, then the next feature in the tree is sex and so on. You can see that when you split by sex and sex <= 0 you reach a prediction. The prediction is 0 and the (57/2) means that 57 observations in the training set end up at this path and 2 were incorrectly classified, i.e. 55 had the label 0 and 2 had the label 1.
Here is how the start of the tree looks like:
--------start---------
| |
| |
|cp > 3 | cp <= 3
_________|______ ____|__________
| | | |
|thal>3 |thal<=3 |sex>0 |sex<=0
| | | |
... ... ... prediction 0 57(55,2)
The AndreyF's explanation is good. I want to add some information.
Why does the tree have float numbers in its leaves? Can an instance (individual) be split and get a float value? (in the reality a person can not be split)
When the instance has all the attributes set perfectly then there isn't a problem. But when the instance has missing attributes, then the classifier (J48) doesn't know the way of the tree for that attribute.
For example, if an instance has its "oldpeak" attribute like a missing attribute then when it reaches the "chol <= 261" node (previous node to the "oldpeak" node) the classifier will divide the instance according to a probability and a percentage of the instance will go to "oldpeak <= 2.5" and the other percentage will go to "oldpeak > 2.5".
How does the classifier calculate that probability? It calculates through the instances that don't have the missing attribute for the actual node. For this example will be the "oldpeak" attribute.
If we have 25% instances with no missing "oldpeak" attribute that were classified in the "oldpeak <= 2.5" node, and we have 75% instances with no missing "oldpeak" attribute that were classified in the "oldpeak > 2.5" node then when the classifier wants to classify an instance with "oldpeak" attribute missing then the 25% of this instance will go through "oldpeak <= 2.5" and the rest (75%) will go through "oldpeak > 2.5".
You can try to remove instances with missing attributes and you will see that the tree will only have integer numbers instead of float numbers.
Thank you.
I have 2 tables.
1 table with all possible mistakes, looks like
mistake|description
m1 | a
m2 | b
m3 | c
second table is my data:
n | m1 | m2 | m3
1 | 1 | 0 | 1
2 | 0 | 1 | 1
3 | 1 | 1 | 0
where n is row_num, and for each m I put 1 with mistake, 0 - without.
In total I want to join them showing row_nums (or other info) for each mistake.
Something like:
mistake | n
m1 |1
m1 |3
m2 |2
m2 |3
m3 |1
m3 |2
It looks to me like you are just asking to transpose the data.
data have;
input n m1 m2 m3 ;
cards;
1 1 0 1
2 0 1 1
3 1 1 0
;
proc transpose data=have out=want ;
by n ;
var m1 m2 m3 ;
run;
I have a following statement and it generates the mentioned output by averaging data within every 20 minutes of range.
Statement :
SELECT record_no, date_time,
ROUND(AVG(UNIX_TIMESTAMP(date_time))) AS time_value,
ROUND(AVG(ph1_active_power),4) AS p1,
ROUND(AVG(ph2_active_power),4) AS p2,
ROUND(AVG(ph3_active_power),4) AS p3
FROM powerpro1
GROUP BY date_time DIV 2000
Portion of the output
+-----------+---------------------+------------+---------+----------+----------+
| record_no | date_time | time_value | p1 | p2 | p3 |
+-----------+---------------------+------------+---------+----------+----------+
| 1 | 2014-12-01 00:00:00 | 1417372770 | 72.6242 | -68.7428 | -72.6242 |
| 21 | 2014-12-01 00:20:00 | 1417373970 | 71.6624 | -69.7448 | -71.6624 |
| 41 | 2014-12-01 00:40:00 | 1417375170 | 70.6869 | -70.7333 | -70.6869 |
| 61 | 2014-12-01 01:00:00 | 1417376370 | 69.6977 | -71.7082 | -69.6977 |
| 81 | 2014-12-01 01:20:00 | 1417377570 | 68.6952 | -72.6692 | -68.6952 |
| 101 | 2014-12-01 01:40:00 | 1417378770 | 67.6794 | -73.6162 | -67.6794 |
| 121 | 2014-12-01 02:00:00 | 1417379970 | 66.6505 | -74.549 | -66.6505 |
| 141 | 2014-12-01 02:20:00 | 1417381200 | 65.5825 | -75.4901 | -65.5825 |
+-----------+---------------------+------------+---------+----------+----------+
According to the no of records in the table named "powerpro1", the above query selects 1368 records when the executing. (May be increased in the future when receiving new records)
My requirement is to create a highchart using time_value for the x-axis and p1, p1 and p3 for the y-axis. But I needs to limit the no of points in the x-axis.
Can anyone like to help me to show this 1368 points by 1000 points in the chart
Unfortunately have no this kind of apporixmation, only in reverse order (I mean datagrouping, if you have 100 points, return i.e 10). So you need to calcualte it on your own and push to your data all 1000 points.
Is it possible to do union of two queries (from the same entity) in core data? In SQL speak, if entity is called t, then consider that T has following data:
+------+------+------+
| x | y | z |
+------+------+------+
| 1 | 11 | 2 |
| 1 | 12 | 3 |
| 2 | 11 | 1 |
| 3 | 12 | 3 |
Then I am trying to run the following query (using core data - not SQLite)
select x, y, sum(z)
from t
group by 1, 2
union
select x, 1 as y, sum(z)
from t
group by 1, 2
order by x, y, 1
;
+------+------+--------+
| x | y | sum(z) |
+------+------+--------+
| 1 | 1 | 5 |
| 1 | 11 | 2 |
| 1 | 12 | 3 |
| 2 | 1 | 1 |
| 2 | 11 | 1 |
| 3 | 1 | 3 |
| 3 | 12 | 3 |
+------+------+--------+
7 rows in set (0.00 sec)
Is it possible?
Thanks!