what do each of these numbers mean in darknet? - machine-learning

I'm running the basic example on the darknet https://pjreddie.com/darknet/train-cifar/ and are getting a bunch of outputs:
1382, 3.538: 0.954143, 0.969863 avg, 0.027415 rate, 0.559997 seconds, 176896 images
Loaded: 0.000033 seconds
1383, 3.540: 0.816129, 0.954489 avg, 0.027385 rate, 0.565249 seconds, 177024 images
Loaded: 0.000069 seconds
1384, 3.543: 0.961585, 0.955199 avg, 0.027355 rate, 0.564356 seconds, 177152 images
Loaded: 0.000037 seconds
What do these outputs mean and what is the actual interim accuracy?

1384 - iteration number (number of batch)
0.961585 - current loss
0.955199 avg - average loss (error) - the lower, the better
When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training.
for more details see AlexeyAB/darknet

Related

Why does a larger batch size not speed up evaluation time on Huggingface significantly?

I'm trying to evaluate my model on the Squad dataset:
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name).to(device)
model.eval() # eval mode
# trainer arguments
args = TrainingArguments(output_dir = 'tmp', per_device_eval_batch_size=256)
# trainer
trainer = Trainer(
model=model,
args=args,
tokenizer=tokenizer)
with torch.no_grad():
# load dataset
squad = load_dataset('squad')
squad_validation = squad['validation']
# preprocess
original_validation_dataset = squad_validation.map(preprocess_validation_examples,batched=True,remove_columns=squad_validation.column_names)
trainer_output = trainer.predict(original_validation_dataset)
However, it seems like it doesn't matter if my per_device_eval_batch_size is 1,8,16,256, or 512, it always results in about 1 min of evaluation time. Batch of size 1 results in 1 min and 32 sec, batch of size 8 results in 1 min and 16 sec, batch of size 512 results in 1 min and 6 sec.
I'd expect the batch size to matter much more than a few seconds different so I'm wondering if I'm doing something wrong. Loading the model takes 4.3 seconds, and with a batch size of 1024 I run of GPU memory.

Momentum Score exploration AFL

I want to build Exploration AFL. Below is the scenario.
Momentum Score:
Monthly momentum values are calculated as cumulative returns over the past 12 months.
The monthly momentum is calculated in 3 steps
1) We calculate gross monthly returns by adding one to the percent monthly return. For example, from a monthly return of 5% (0.05), we get the gross monthly return value of 1.05 (0.05 + 1) while from a monthly return of -5% (-0.05) we get a gross monthly return of 0.95 (0.05 + 1.0).
2) We multiply all the gross monthly returns of past 12 months.
3) We subtract one from the resultant value from step 2 to get the net 12-month momentum score.
To illustrate this calculation, let's say AUROPHARMA (Aurobindo Pharma) stock has moved by 2%, -5%, 4.3%, 0.5%, 10.1%, -2.2%, 6%, 3.6%, 0.1%, 0.4%, 1.4%, -2.6% over the past 12 months. Then, we add 1 to monthly return, multiply all of them & subtract one from it to get the momentum score.
Momentum Score = (1.02)(0.95)(1.043)(1.05)(1.101)(0.978)(0.94)(1.036)(1.001)(1.004)(1.014)*(0.974) - 1
This will give a momentum score of 10.45% (0.1045) to the Aurobindo Pharma Stock.
Can someone please help?
TimeFrameSet(inMonthly);
TtD_Change = 100 * (Close - Ref(Close, -12) ) / Ref(Close, -12);
_SECTION_BEGIN("Explorer");
Filter = 1;
AddColumn(TtD_Change,"Momentum",1.2,IIf(TtD_Change>0,colorGreen,colorRed));
_SECTION_END();

Select an integer number of periods

Suppose we have sinusoidal with frequency 100Hz and sampling frequency of 1000Hz. It means that our signal has 100 periods in a second and we are taking 1000 samples in a second. Therefore, in order to select a complete period I'll have to take fs/f=10 samples. Right?
What if the sampling period is not a multiple of the frequency of the signal (like 550Hz)? Do I have to find the minimum multiple M of f and fs, and than take M samples?
My goal is to select an integer number of periods in order to be able to replicate them without changes.
You have f periods a second, and fs samples a second.
If you take M samples, it would cover M/fs part of a second, or P = f * (M/fs) periods. You want this number to be integer.
So you need to take M = fs / gcd(f, fs) samples.
For your example P = 1000 / gcd(100, 1000) = 1000 / 100 = 10.
If you have 60 Hz frequency and 80 Hz sampling frequency, it gives P = 80 / gcd(60, 80) = 80 / 20 = 4 -- 4 samples will cover 4 * 1/80 = 1/20 part of a second, and that will be 3 periods.
If you have 113 Hz frequency and 512 Hz sampling frequency, you are out of luck, since gcd(113, 512) = 1 and you'll need 512 samples, covering the whole second and 113 periods.
In general, an arbitrary frequency will not have an integer number of periods. Irrational frequencies will never even repeat ever. So some means other than concatenation of buffers one period in length will be needed to synthesize exactly periodic waveforms of arbitrary frequencies. Approximation by interpolation for fractional phase offsets is one possibility.

SVM and peaks in electricity consumption

I'm using SVMs, specifically libsvm, in order to predict peaks in electricity consumption. In the training set, each vector has 24 values, representing the accumulated kWh for each hour. The vector is labeled "peak", if the next value is defined as a peak (basic outlier detection).
Sample vectors from the training set:
1 1:4.05 2:2.75 3:2.13 4:1.82 5:1.5 6:2.92 7:1.78 8:1.71 9:2.1 10:2.74 11:2.75 12:2.41 13:2.38 14:2.37 15:3.57 16:2.38 17:2.48 18:2.44 19:2.35 20:2.78 21:3.03 22:2.29 23:2.41 24:2.71
0 1:2.75 2:2.13 3:1.82 4:1.5 5:2.92 6:1.78 7:1.71 8:2.1 9:2.74 10:2.75 11:2.41 12:2.38 13:2.37 14:3.57 15:2.38 16:2.48 17:2.44 18:2.35 19:2.78 20:3.03 21:2.29 22:2.41 23:2.71 24:(3.63)<- Peak
0 1:2.13 2:1.82 3:1.5 4:2.92 5:1.78 6:1.71 7:2.1 8:2.74 9:2.75 10:2.41 11:2.38 12:2.37 13:3.57 14:2.38 15:2.48 16:2.44 17:2.35 18:2.78 19:3.03 20:2.29 21:2.41 22:2.71 23:3.63 24:(1.53)<- No peak
The training seems fine and I get a ~85% accuracy when performing cross validation. However, when I try to classify the testing set, the predicted class labels are all the same. No peaks are discovered.
I'm using the default radial basis function and haven't changed any parameters.
Output from training.model (without the vectors):
svm_type c_svc
kernel_type rbf
gamma 0.0416667
nr_class 2
total_sv 174
rho -0.883122
label 0 1
nr_sv 122 52
Am I doing something fundamentally wrong here?

Finding standard deviation using only mean, min, max?

I want to find the standard deviation:
Minimum = 5
Mean = 24
Maximum = 84
Overall score = 90
I just want to find out my grade by using the standard deviation
Thanks,
A standard deviation cannot in general be computed from just the min, max, and mean. This can be demonstrated with two sets of scores that have the same min, and max, and mean but different standard deviations:
1 2 4 5 : min=1 max=5 mean=3 stdev≈1.5811
1 3 3 5 : min=1 max=5 mean=3 stdev≈0.7071
Also, what does an 'overall score' of 90 mean if the maximum is 84?
I actually did a quick-and-dirty calculation of the type M Rad mentions. It involves assuming that the distribution is Gaussian or "normal." This does not apply to your situation but might help others asking the same question. (You can tell your distribution is not normal because the distance from mean to max and mean to min is not close). Even if it were normal, you would need something you don't mention: the number of samples (number of tests taken in your case).
Those readers who DO have a normal population can use the table below to give a rough estimate by dividing the difference of your measured minimum and your calculated mean by the expected value for your sample size. On average, it will be off by the given number of standard deviations. (I have no idea whether it is biased - change the code below and calculate the error without the abs to get a guess.)
Num Samples Expected distance Expected error
10 1.55 0.25
20 1.88 0.20
30 2.05 0.18
40 2.16 0.17
50 2.26 0.15
60 2.33 0.15
70 2.38 0.14
80 2.43 0.14
90 2.47 0.13
100 2.52 0.13
This experiment shows that the "rule of thumb" of dividing the range by 4 to get the standard deviation is in general incorrect -- even for normal populations. In my experiment it only holds for sample sizes between 20 and 40 (and then loosely). This rule may have been what the OP was thinking about.
You can modify the following python code to generate the table for different values (change max_sample_size) or more accuracy (change num_simulations) or get rid of the limitation to multiples of 10 (change the parameters to xrange in the for loop for idx)
#!/usr/bin/python
import random
# Return the distance of the minimum of samples from its mean
#
# Samples must have at least one entry
def min_dist_from_estd_mean(samples):
total = 0
sample_min = samples[0]
for sample in samples:
total += sample
sample_min = min(sample, sample_min)
estd_mean = total / len(samples)
return estd_mean - sample_min # Pos bec min cannot be greater than mean
num_simulations = 4095
max_sample_size = 100
# Calculate expected distances
sum_of_dists=[0]*(max_sample_size+1) # +1 so can index by sample size
for iternum in xrange(num_simulations):
samples=[random.normalvariate(0,1)]
while len(samples) <= max_sample_size:
sum_of_dists[len(samples)] += min_dist_from_estd_mean(samples)
samples.append(random.normalvariate(0,1))
expected_dist = [total/num_simulations for total in sum_of_dists]
# Calculate average error using that distance
sum_of_errors=[0]*len(sum_of_dists)
for iternum in xrange(num_simulations):
samples=[random.normalvariate(0,1)]
while len(samples) <= max_sample_size:
ave_dist = expected_dist[len(samples)]
if ave_dist > 0:
sum_of_errors[len(samples)] += \
abs(1 - (min_dist_from_estd_mean(samples)/ave_dist))
samples.append(random.normalvariate(0,1))
expected_error = [total/num_simulations for total in sum_of_errors]
cols=" {0:>15}{1:>20}{2:>20}"
print(cols.format("Num Samples","Expected distance","Expected error"))
cols=" {0:>15}{1:>20.2f}{2:>20.2f}"
for idx in xrange(10,len(expected_dist),10):
print(cols.format(idx, expected_dist[idx], expected_error[idx]))
Yo can obtain an estimate of the geometric mean, sometimes called the geometric mean of the extremes or GME, using the Min and the Max by calculating the GME= $\sqrt{ Min*Max }$. The SD can be then calculated using your arithmetic mean (AM) and the GME as:
SD= $$\frac{AM}{GME} * \sqrt{(AM)^2-(GME)^2 }$$
This approach works well for log-normal distributions or as long as the GME, GM or Median is smaller than the AM.
In principle you can make an estimate of standard deviation from the mean/min/max and the number of elements in the sample. The min and max of a sample are, if you assume normality, random variables whose statistics follow from mean/stddev/number of samples. So given the latter, one can compute (after slogging through the math or running a bunch of monte carlo scripts) a confidence interval for the former (like it is 80% probable that the stddev is between 20 and 40 or something like that).
That said, it probably isn't worth doing except in extreme situations.

Resources