scikit-learn GridSearchCV does not work properly with random forest - machine-learning

I have a grid search implementation for random forest models.
train_X, test_X, train_y, test_y = train_test_split(features, target, test_size=.10, random_state=0)
# A bit performance gains can be obtained from standarization
train_X, test_X = standarize(train_X, test_X)
tuned_parameters = [{
'n_estimators': [5],
'criterion': ['mse', 'mae'],
'random_state': [0]
}]
scores = ['neg_mean_squared_error', 'neg_mean_absolute_error']
for n_fold in [5]:
for score in scores:
print("# Tuning hyper-parameters for %s with %d-fold" % (score, n_fold))
start_time = time.time()
print()
# TODO: RandomForestRegressor
clf = GridSearchCV(RandomForestRegressor(verbose=2), tuned_parameters, cv=n_fold,
scoring=score, verbose=2, n_jobs=-1)
clf.fit(train_X, train_y)
... Rest omitted
Before I use it for this grid search, I have used the exact same dataset for many other tasks, so there should not be any problem with the data. In addition, for the test purpose, I first use LinearRegression to see if the entire pipeline goes smoothly, it works. Then I switch to RandomForestRegressor and set a very small number of estimators to test it again. A very strange thing happen them, I'll attach the verbose information. There is a very significant decrease in performance and I don't know what happened. There is no reason to spend 30 minute+ for running one small grid search.
Fitting 5 folds for each of 2 candidates, totalling 10 fits
[CV] criterion=mse, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[CV] criterion=mse, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[CV] criterion=mse, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[CV] criterion=mse, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.0s remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.0s remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.1s remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.1s remaining: 0.0s
building tree 2 of 5
building tree 3 of 5
building tree 3 of 5
building tree 3 of 5
building tree 3 of 5
building tree 4 of 5
building tree 4 of 5
building tree 4 of 5
building tree 4 of 5
building tree 5 of 5
building tree 5 of 5
building tree 5 of 5
building tree 5 of 5
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.0s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.0s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.0s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.2s finished
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.0s finished
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.3s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.3s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.2s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.8s finished
[CV] .... criterion=mse, n_estimators=5, random_state=0, total= 5.3s
[CV] criterion=mse, n_estimators=5, random_state=0 ...................
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.8s finished
[CV] .... criterion=mse, n_estimators=5, random_state=0, total= 5.3s
building tree 1 of 5
[CV] criterion=mae, n_estimators=5, random_state=0 ...................
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.9s finished
[CV] .... criterion=mse, n_estimators=5, random_state=0, total= 5.3s
building tree 1 of 5
[CV] criterion=mae, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.9s finished
[CV] .... criterion=mse, n_estimators=5, random_state=0, total= 5.3s
[CV] criterion=mae, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.0s remaining: 0.0s
building tree 2 of 5
building tree 3 of 5
building tree 4 of 5
building tree 5 of 5
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.3s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.2s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.5s finished
[CV] .... criterion=mse, n_estimators=5, random_state=0, total= 5.6s
[CV] criterion=mae, n_estimators=5, random_state=0 ...................
building tree 1 of 5
The above log is printed in a few second, then things seem to be stucked start here...
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.4min remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.5min remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.5min remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.8min remaining: 0.0s
building tree 2 of 5
building tree 3 of 5
building tree 3 of 5
building tree 3 of 5
building tree 3 of 5
building tree 4 of 5
building tree 4 of 5
building tree 4 of 5
building tree 4 of 5
building tree 5 of 5
building tree 5 of 5
building tree 5 of 5
It cost more than 20 minutes for these lines.
BTW, for each GridSearchCV run, linear regression cost less than 1 sec.
Do you have any idea why the performance decrease that much?
Any suggestion and comment are appreciated. Thank you.

Try setting max_depth for the RandomForestRegressor. This should reduce fitting time. By default max_depth=None.
For example:
tuned_parameters = [{
'n_estimators': [5],
'criterion': ['mse', 'mae'],
'random_state': [0],
'max_depth': [4],
}]
Edit: Also, by default RandomForestRegressor has n_jobs=1. It will build one tree at a time with this setting. Try setting n_jobs=-1.
In addition, instead of looping over the scoring parameters to GridSearchCV, you can specify multiple metrics. When doing so, you must also specify the metric you want to GridSearchCV to select on as the value of refit. Then, you can access all scores in the cv_results_ dictionary after the fit.
clf = GridSearchCV(RandomForestRegressor(verbose=2),tuned_parameters,
cv=n_fold, scoring=scores, refit='neg_mean_squared_error',
verbose=2, n_jobs=-1)
clf.fit(train_X, train_y)
results = clf.cv_results_
print(np.mean(results['mean_test_neg_mean_squared_error']))
print(np.mean(results['mean_test_neg_mean_absolute_error']))
http://scikit-learn.org/stable/auto_examples/model_selection/plot_multi_metric_evaluation.html#sphx-glr-auto-examples-model-selection-plot-multi-metric-evaluation-py

Related

Effect of --test_env and --test_arg on bazel cache

I'm naively passing along some variable test metadata to some py_test targets to inject that metadata into some test result artifacts that later get uploaded to the cloud. I'm doing so using either the --test_env or --test_arg values at the bazel test invocation.
Would this variable data negatively affect the way test results are cached such that running the same test back to back would effectively disturb the bazel cache?
Command Line Inputs
Command line inputs can indeed disturb cache hits. Consider the following set of executions
BUILD file
py_test(
name = "test_inputs",
srcs = ["test_inputs.py"],
deps = [
":conftest",
"#pytest",
],
)
py_library(
name = "conftest",
srcs = ["conftest.py"],
deps = [
"#pytest",
],
)
Test module
import sys
import pytest
def test_pass():
assert True
def test_arg_in(request):
assert request.config.getoption("--metadata")
if __name__ == "__main__":
args = sys.argv[1:]
ret_code = pytest.main([__file__, "--log-level=ERROR"] + args)
sys.exit(ret_code)
First execution
$ bazel test //bazel_check:test_inputs --test_arg --metadata=abc
INFO: Analyzed target //bazel_check:test_inputs (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
INFO: 2 processes: 1 internal (50.00%), 1 local (50.00%).
INFO: Cache hit rate for remote actions: -- (0 / 0)
INFO: Total action wall time 0.40s
INFO: Critical path 0.57s (setup 0.00s, action wall time 0.00s)
INFO: Elapsed time 0.72s (preparation 0.12s, execution 0.60s)
INFO: Build completed successfully, 2 total actions
//bazel_check:test_inputs PASSED in 0.4s
Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 2 total actions
Second execution: same argument value, cache hit!
$ bazel test //bazel_check:test_inputs --test_arg --metadata=abc
INFO: Analyzed target //bazel_check:test_inputs (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
INFO: 1 process: 1 internal (100.00%).
INFO: Cache hit rate for remote actions: -- (0 / 0)
INFO: Total action wall time 0.00s
INFO: Critical path 0.47s (setup 0.00s, action wall time 0.00s)
INFO: Elapsed time 0.61s (preparation 0.12s, execution 0.49s)
INFO: Build completed successfully, 1 total action
//bazel_check:test_inputs (cached) PASSED in 0.4s
Executed 0 out of 1 test: 1 test passes.
INFO: Build completed successfully, 1 total action
Third execution: new argument value, no cache hit
$ bazel test //bazel_check:test_inputs --test_arg --metadata=kk
INFO: Analyzed target //bazel_check:test_inputs (0 packages loaded, 93 targets configured).
INFO: Found 1 test target...
INFO: 2 processes: 1 internal (50.00%), 1 local (50.00%).
INFO: Cache hit rate for remote actions: -- (0 / 0)
INFO: Total action wall time 0.30s
INFO: Critical path 0.54s (setup 0.00s, action wall time 0.00s)
INFO: Elapsed time 0.71s (preparation 0.14s, execution 0.57s)
INFO: Build completed successfully, 2 total actions
//bazel_check:test_inputs PASSED in 0.3s
Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 2 total actions
Fourth execution: reused same argument as first two runs
Interestingly enough there is no cache hit despite the result being cached earlier. Somehow it did not persist.
$ bazel test //bazel_check:test_inputs --test_arg --metadata=abc
INFO: Analyzed target //bazel_check:test_inputs (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
INFO: 2 processes: 1 internal (50.00%), 1 local (50.00%).
INFO: Cache hit rate for remote actions: -- (0 / 0)
INFO: Total action wall time 0.34s
INFO: Critical path 0.50s (setup 0.00s, action wall time 0.00s)
INFO: Elapsed time 0.71s (preparation 0.17s, execution 0.55s)
INFO: Build completed successfully, 2 total actions
//bazel_check:test_inputs PASSED in 0.3s
Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 2 total actions
Environment Inputs
The same exact behavior applies for --test_env inputs
import os
import sys
import pytest
def test_pass():
assert True
def test_env_in():
assert os.environ.get("META_ENV")
if __name__ == "__main__":
args = sys.argv[1:]
ret_code = pytest.main([__file__, "--log-level=ERROR"] + args)
sys.exit(ret_code)
First execution
$ bazel test //bazel_check:test_inputs --test_env META_ENV=33
INFO: Build option --test_env has changed, discarding analysis cache.
INFO: Analyzed target //bazel_check:test_inputs (0 packages loaded, 7285 targets configured).
INFO: Found 1 test target...
INFO: 2 processes: 1 internal (50.00%), 1 local (50.00%).
INFO: Cache hit rate for remote actions: -- (0 / 0)
INFO: Total action wall time 0.29s
INFO: Critical path 0.66s (setup 0.00s, action wall time 0.00s)
INFO: Elapsed time 1.26s (preparation 0.42s, execution 0.84s)
INFO: Build completed successfully, 2 total actions
//bazel_check:test_inputs PASSED in 0.3s
Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 2 total actions
Second execution: same env value, cache hit!
$ bazel test //bazel_check:test_inputs --test_env META_ENV=33
INFO: Analyzed target //bazel_check:test_inputs (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
INFO: 1 process: 1 internal (100.00%).
INFO: Cache hit rate for remote actions: -- (0 / 0)
INFO: Total action wall time 0.00s
INFO: Critical path 0.49s (setup 0.00s, action wall time 0.00s)
INFO: Elapsed time 0.67s (preparation 0.15s, execution 0.52s)
INFO: Build completed successfully, 1 total action
//bazel_check:test_inputs (cached) PASSED in 0.3s
Executed 0 out of 1 test: 1 test passes.
INFO: Build completed successfully, 1 total action
Third execution: new env value, no cache hit
$ bazel test //bazel_check:test_inputs --test_env META_ENV=44
INFO: Build option --test_env has changed, discarding analysis cache.
INFO: Analyzed target //bazel_check:test_inputs (0 packages loaded, 7285 targets configured).
INFO: Found 1 test target...
INFO: 2 processes: 1 internal (50.00%), 1 local (50.00%).
INFO: Cache hit rate for remote actions: -- (0 / 0)
INFO: Total action wall time 0.29s
INFO: Critical path 0.62s (setup 0.00s, action wall time 0.00s)
INFO: Elapsed time 1.22s (preparation 0.39s, execution 0.83s)
INFO: Build completed successfully, 2 total actions
//bazel_check:test_inputs PASSED in 0.3s
Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 2 total actions
Fourth execution: reused same env value as first two runs
$ bazel test //bazel_check:test_inputs --test_env META_ENV=33
INFO: Build option --test_env has changed, discarding analysis cache.
INFO: Analyzed target //bazel_check:test_inputs (0 packages loaded, 7285 targets configured).
INFO: Found 1 test target...
INFO: 2 processes: 1 internal (50.00%), 1 local (50.00%).
INFO: Cache hit rate for remote actions: -- (0 / 0)
INFO: Total action wall time 0.28s
INFO: Critical path 0.66s (setup 0.00s, action wall time 0.00s)
INFO: Elapsed time 1.25s (preparation 0.40s, execution 0.85s)
INFO: Build completed successfully, 2 total actions
//bazel_check:test_inputs PASSED in 0.3s
Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 2 total actions

Best way for logging CPU & GPU utilization every second in linux

I want to get the CPU and GPU utilisation of my cuda program and plot them like this.
What's the best way?
Here is my script:
### [1] Running my cuda program in background
./my_cuda_program &
PID_MY_CUDA_PROGRAM=$!
### [2] Getting CPU & GPU utilization in background
sar 1 | sed --unbuffered -e 's/^/SYSSTAT:/' &
PID_SYSSTAT=$!
nvidia-smi --format=csv --query-gpu=timestamp,utilization.gpu -l 1 \
| sed --unbuffered -e 's/^/NVIDIA_SMI:/' &
PID_NVIDIA_SMI=$!
### [3] waiting for the [1] process to finish,
### and then kill [2] processes
wait ${PID_MY_CUDA_PROGRAM}
kill ${PID_SYSSTAT}
kill ${PID_NVIDIA_SMI}
exit
That output:
SYSSTAT:Linux 4.15.0-176-generic (ubuntu00) 05/06/22 _x86_64_ (4 CPU)
NVIDIA_SMI:timestamp, utilization.gpu [%]
NVIDIA_SMI:2022/05/06 23:57:00.245, 7 %
SYSSTAT:
SYSSTAT:23:57:00 CPU %user %nice %system %iowait %steal %idle
SYSSTAT:23:57:01 all 8.73 0.00 5.74 7.48 0.00 78.05
NVIDIA_SMI:2022/05/06 23:57:01.246, 1 %
SYSSTAT:23:57:02 all 23.31 0.00 6.02 0.00 0.00 70.68
NVIDIA_SMI:2022/05/06 23:57:02.246, 16 %
SYSSTAT:23:57:03 all 25.56 0.00 3.76 0.00 0.00 70.68
NVIDIA_SMI:2022/05/06 23:57:03.246, 15 %
SYSSTAT:23:57:04 all 22.69 0.00 6.48 0.00 0.00 70.82
NVIDIA_SMI:2022/05/06 23:57:04.246, 21 %
SYSSTAT:23:57:05 all 25.81 0.00 3.26 0.00 0.00 70.93
it's a bit annoying to parse the log above.

How to convert task-clock perf-event to seconds or milliseconds?

I am trying to use perf for performance analysis.
When I use perf stat it provides execution time
Performance counter stats for './quicksort_ver1 input.txt 10000':
7.00 msec task-clock:u # 0.918 CPUs utilized
2,679,253 cycles:u # 0.383 GHz (9.58%)
18,034,446 instructions:u # 6.73 insn per cycle (23.56%)
5,764,095 branches:u # 822.955 M/sec (37.62%)
5,030,025 dTLB-loads # 718.150 M/sec (51.69%)
2,948,787 dTLB-stores # 421.006 M/sec (65.75%)
5,525,534 L1-dcache-loads # 788.895 M/sec (48.31%)
2,653,434 L1-dcache-stores # 378.838 M/sec (34.25%)
4,900 L1-dcache-load-misses # 0.09% of all L1-dcache hits (20.16%)
66 LLC-load-misses # 0.00% of all LL-cache hits (6.09%)
<not counted> LLC-store-misses (0.00%)
<not counted> LLC-loads (0.00%)
<not counted> LLC-stores (0.00%)
0.007631774 seconds time elapsed
0.006655000 seconds user
0.000950000 seconds sys
However when I use perf record, I observe that for task-clock 45 samples and 14999985 events are collected.
Samples: 45 of event 'task-clock:u', Event count (approx.): 14999985
Children Self Command Shared Object Symbol
+ 91.11% 0.00% quicksort_ver1 quicksort_ver1 [.] _start
+ 91.11% 0.00% quicksort_ver1 libc-2.17.so [.] __libc_start_main
+ 91.11% 0.00% quicksort_ver1 quicksort_ver1 [.] main
is there any way to convert task-clock events to seconds to milliseconds?
Got answer with little bit of experimentation. Basic unit of task-cpu event is Nano second
stats collected with perf stat
$ sudo perf stat -e task-clock:u ./bubble_sort input.txt 50000
Performance counter stats for './bubble_sort input.txt 50000':
11,617.33 msec task-clock:u # 1.000 CPUs utilized
11.617480215 seconds time elapsed
11.615856000 seconds user
0.002000000 seconds sys
stats collected with perf record
$ sudo perf report
Samples: 35K of event 'task-clock:u', Event count (approx.): 11715321618
Overhead Command Shared Object Symbol
73.75% bubble_sort bubble_sort [.] bubbleSort
26.15% bubble_sort bubble_sort [.] swap
0.07% bubble_sort libc-2.17.so [.] _IO_vfscanf
observe in both the cases sample has changed but event count is approximately same.
perf stat reports elapsed time as 11.617480215 seconds and perf report reports total task-clock events: 11715321618
11715321618 nanoseconds = 11.715321618 seconds which is approximately equals to 11.615856000 seconds
apparently basic unit of task-cpu event is Nanosecond.

travis " Segmentation fault " but works fine locally

there, I ran into a 'Segmentation fault' error when using travis-ci for my project : IPython-Dashboard
there is no error msg and it works fine on local, I feel a little confusing. any one can give any idea on fixing this, thanks.
here is the travis build log on cloud:
travis-log
$ nosetests --with-coverage --cover-package=dashboard
../home/travis/build.sh: line 45: 3187 Segmentation fault (core dumped)
nosetests --with-coverage --cover-package=dashboard
The command "nosetests --with-coverage --cover-package=dashboard" exited with 139.
here is the build log on local [osx]
taotao#mac007:~/Desktop/github/IPython-Dashboard$sudo nosetests --with-coverage --cover-package=dashboard
.../Users/chenshan/Desktop/github/IPython-Dashboard/dashboard/tests/testCreateData.py:78: Warning: Can't create database 'IPD_data'; database exists
conn.cursor().execute('CREATE DATABASE IF NOT EXISTS {};'.format(config.sql_db))
/Library/Python/2.7/site-packages/pandas/io/sql.py:599: FutureWarning: The 'mysql' flavor with DBAPI connection is deprecated and will be removed in future versions. MySQL will be further supported with SQLAlchemy engines.
warnings.warn(_MYSQL_WARNING, FutureWarning)
...
Name Stmts Miss Cover Missing
---------------------------------------------------------------------
dashboard.py 13 0 100%
dashboard/client.py 1 0 100%
dashboard/client/sender.py 11 3 73% 26-27, 33
dashboard/conf.py 0 0 100%
dashboard/conf/config.py 29 0 100%
dashboard/server.py 0 0 100%
dashboard/server/resources.py 0 0 100%
dashboard/server/resources/dash.py 35 10 71% 36, 55-56, 67-69, 86-89
dashboard/server/resources/home.py 40 12 70% 25, 28-30, 83-91
dashboard/server/resources/sql.py 27 11 59% 30, 52-75
dashboard/server/resources/status.py 8 1 88% 19
dashboard/server/resources/storage.py 13 5 62% 26-28, 43-47
dashboard/server/utils.py 79 18 77% 20-24, 78-80, 82-83, 86, 96, 99-100, 126-127, 140-142
dashboard/server/views.py 21 1 95% 16
---------------------------------------------------------------------
TOTAL 277 61 78%
----------------------------------------------------------------------
Ran 6 tests in 4.600s
OK
taotao#mac007:~/Desktop/github/IPython-Dashboard$

Trying to create my Haartraining OpenCV

I'm trying to create my cascade classifier with this command:
haartraining -data haarcascade -vec samples.vec -bg negatives.dat -nstages 20 -nsplits 2 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 1000 -nneg 600 -w 20 -h 20 -nonsym -mem 2048 -mode ALL
I have 1500 samples created from one single image with this command:
createsamples -img foto.png -num 1500 -bg negatives.dat -vec samples.vec -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100 -bgcolor 0 -bgthresh 0 -w 20 -h 20
This is the output at stage 3:
Tree Classifier
Stage
+---+
| 0|
+---+
Number of features used : 125199
Parent node: NULL
*** 1 cluster ***
POS: 1000 1000 1.000000
NEG: 600 1
**BACKGROUND PROCESSING TIME: 0.02**
Precalculation time: 41.39
+----+----+-+---------+---------+---------+---------+
| N |%SMP|F| ST.THR | HR | FA | EXP. ERR|
+----+----+-+---------+---------+---------+---------+
| 1|100%|-|-0.989933| 1.000000| 0.988333| 0.003125|
+----+----+-+---------+---------+---------+---------+
| 2|100%|-| 0.006064| 1.000000| 0.000000| 0.000000|
+----+----+-+---------+---------+---------+---------+
Stage training time: 40.66
Number of used features: 4
Parent node: NULL
Chosen number of splits: 0
Total number of splits: 0
Tree Classifier
Stage
+---+
| 0|
+---+
0
Parent node: 0
*** 1 cluster ***
POS: 1000 1000 1.000000
NEG: 600 0.0169943
**BACKGROUND PROCESSING TIME: 0.23**
Precalculation time: 37.19
+----+----+-+---------+---------+---------+---------+
| N |%SMP|F| ST.THR | HR | FA | EXP. ERR|
+----+----+-+---------+---------+---------+---------+
| 1|100%|-|-0.981031| 1.000000| 1.000000| 0.007500|
+----+----+-+---------+---------+---------+---------+
| 2|100%|-| 0.005864| 1.000000| 0.010000| 0.003750|
+----+----+-+---------+---------+---------+---------+
Stage training time: 36.25
Number of used features: 4
Parent node: 0
Chosen number of splits: 0
Total number of splits: 0
Tree Classifier
Stage
+---+---+
| 0| 1|
+---+---+
0---1
Parent node: 1
*** 1 cluster ***
POS: 1000 1000 1.000000
NEG: 600 0.000522
**BACKGROUND PROCESSING TIME: 7.54**
Precalculation time: 40.80
+----+----+-+---------+---------+---------+---------+
| N |%SMP|F| ST.THR | HR | FA | EXP. ERR|
+----+----+-+---------+---------+---------+---------+
| 1|100%|-|-0.895043| 1.000000| 1.000000| 0.051875|
+----+----+-+---------+---------+---------+---------+
| 2|100%|-|-1.818561| 1.000000| 0.978333| 0.026250|
+----+----+-+---------+---------+---------+---------+
| 3|100%|-|-2.601195| 1.000000| 0.676667| 0.010000|
+----+----+-+---------+---------+---------+---------+
| 4|100%|-|-1.673473| 1.000000| 0.033333| 0.003125|
+----+----+-+---------+---------+---------+---------+
Stage training time: 80.58
Number of used features: 8
Parent node: 1
Chosen number of splits: 0
Total number of splits: 0
Tree Classifier
Stage
+---+---+---+
| 0| 1| 2|
+---+---+---+
0---1---2
Parent node: 2
*** 1 cluster ***
POS: 1000 1000 1.000000
NEG: 600 4.19496e-005
**BACKGROUND PROCESSING TIME: 93.92**
Precalculation time: 40.82
+----+----+-+---------+---------+---------+---------+
| N |%SMP|F| ST.THR | HR | FA | EXP. ERR|
+----+----+-+---------+---------+---------+---------+
| 1|100%|-|-0.955309| 1.000000| 1.000000| 0.059375|
+----+----+-+---------+---------+---------+---------+
| 2|100%|-|-1.676803| 1.000000| 0.931667| 0.065000|
+----+----+-+---------+---------+---------+---------+
| 3|100%|-|-1.313002| 1.000000| 0.233333| 0.010625|
+----+----+-+---------+---------+---------+---------+
Stage training time: 63.21
Number of used features: 6
Parent node: 2
Chosen number of splits: 0
Total number of splits: 0
Tree Classifier
Stage
+---+---+---+---+
| 0| 1| 2| 3|
+---+---+---+---+
0---1---2---3
Parent node: 3
*** 1 cluster ***
POS: 1000 1000 1.000000
NEG: 600 1.23118e-005
**BACKGROUND PROCESSING TIME: 327.57**
Precalculation time: 41.54
+----+----+-+---------+---------+---------+---------+
| N |%SMP|F| ST.THR | HR | FA | EXP. ERR|
+----+----+-+---------+---------+---------+---------+
| 1|100%|-|-0.939509| 1.000000| 1.000000| 0.054375|
+----+----+-+---------+---------+---------+---------+
| 2|100%|-|-1.812912| 1.000000| 0.821667| 0.047500|
+----+----+-+---------+---------+---------+---------+
| 3|100%|-|-0.907906| 1.000000| 0.128333| 0.016875|
+----+----+-+---------+---------+---------+---------+
Stage training time: 61.52
Number of used features: 6
Parent node: 3
Chosen number of splits: 0
Total number of splits: 0
Tree Classifier
Stage
+---+---+---+---+---+
| 0| 1| 2| 3| 4|
+---+---+---+---+---+
0---1---2---3---4
Parent node: 4
*** 1 cluster ***
POS: 1000 1000 1.000000
0%
My question is:
It's normal that Background Processing Time grows up so quickly?? To arrive to stage 20 i'll take some weeks!! there is something wrong??
It could also take longer. There is a reason if OpenCV comes with pre-calculated cascade files.

Resources