Detectron2 - Same Code&Data // Different platforms // highly divergent results - machine-learning

I use different hardware to benchmark multiple possibilites. The Code runs in a jupyter Notebook.
When i evaluate the different losses i get highly divergent results.
I also checked the full .cfg with cfg.dump() - it is completely consistent.
Detectron2 Parameters:
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/retinanet_R_101_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("dataset_train",)
cfg.DATASETS.TEST = ("dataset_test",)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/retinanet_R_101_FPN_3x.yaml") # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025 # 0.00125 pick a good LR
cfg.SOLVER.MAX_ITER = 1200 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = [] # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512 # faster, and good enough for this toy dataset (default: 512)
#cfg.MODEL.ROI_HEADS.NUM_CLASSES = 25 # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
cfg.MODEL.RETINANET.NUM_CLASSES = 3
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
cfg.OUTPUT_DIR = "/content/drive/MyDrive/Colab_Notebooks/testrun/output"
cfg.TEST.EVAL_PERIOD = 25
cfg.SEED=5
1. Environment: Azure
Microsoft Azure - Machine Learning
STANDARD_NC6
Torch: 1.9.0+cu111
Results:
Training Log: Log Azure
2. Environment: Colab
GoogleColab free
Torch: 1.9.0+cu111
Results:
Training Log: Log Colab
EDIT:
3. Environment: Ubuntu
Ubuntu 22.04
RTX 3080
Torch: 1.9.0+cu111
Results:
Training Log: https://pastebin.com/PwXMz4hY
New dataset
Issue is not reproducible with a larger dataset:

Related

How to write SVR loss fuction?

Recently I am studying SVM models. And I have understood the mathematics of SVC and SVR.
But I have some confusions on the issue of loss functions.
For example SVCLoss:
SVC Loss function is here
SVCLoss = \sum_{i} a_i - 0.5 \sum_{i,j} a_i a_j y_{i} y_{j} K_(x_i, x_j)
# Train a quantum support vector classifier
svc = SVC()
svc.fit(samples, labels)
# Get dual coefficients
dual_coefs = svc.dual_coef_[0]
# Get support vectors
support_vecs = svc.support_
# Prune kernel matrix of non-support-vector entries
kmatrix = kmatrix[support_vecs, :][:, support_vecs]
# Calculate loss
loss = np.sum(np.abs(dual_coefs)) - (0.5 * (dual_coefs.T # kmatrix # dual_coefs))
And I am working to write out the loss function code for SVR in python, but I met some unsolved problem.
SVR Loss fuction
# Train a quantum support vector regressor
svr = SVR(kernel="rbf")
svr.fit(samples, labels)
# Get dual coefficients
dual_coefs = svr.dual_coef_[0]
# Get support vectors
support_vecs = svr.support_
# Prune kernel matrix of non-support-vector entries
kmatrix = kmatrix[support_vecs, :][:, support_vecs]
# Calculate loss
loss = ???
I wonder how to represent dual_coefs and support_vecs in SVR_Loss?
How to write SVR_Loss in Python according to advanced code?
I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this.
Thanks in advance.
I have tired to work out the following problems for a long time.
I wonder how to represent dual_coefs and support_vecs in SVR_Loss?
How to write SVR_Loss in Python according to advanced code?

Unexpected results of multiprocessing environments regarding time ad rewards

I set up a costum gym environment, trained a stable baseline 3 (SB3) PPO agent on a GPU and it worked quite well.
Now I wanted to speed up the training with multiprocessing after the example from the SB3 library. To investigate the perfect number of processes to use, I trained multiple models with varying numbers of workers. Because the result was surprising and I wanted to clear out the effects of "bad luck" of an agent at a given cumber of workers, I packed everything within a loop, ran the code 10 times and averaged over it.
This is my code:
max_num_processes = 16
n_timesteps = 100_000
iterations = 10
processes = range(max_num_processes)
processing_time = np.zeros(max_num_processes)
rewards = np.zeros(max_num_processes)
for l in range(iterations):
for num_p in processes:
env = SubprocVecEnv([make_env(env_parameter, rank=i) for i in range(num_p+1)])
model = PPO(policy, env, verbose=verbose, tensorboard_log=log_dir)
# Multiprocessed RL Training
start_time = time.time()
model.learn(n_timesteps)
total_time_multi = time.time() - start_time
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=eval_steps)
processing_time[num_p] += total_time_multi
rewards[num_p] += mean_reward
processing_time = processing_time/iterations
rewards = rewards/iterations
I expect the graph with the runtime to drop from 1 until a sweet spot (say 4) is reached where the runtime is lowest. And then rise again. But the results seem to be random. These are the plots:
I ran the multiple times and the result is always the same, there is no sweet spot. But why? Can you just expect one when training on a CPU? But when I run the example from the SB3 library in Colab on a GPU there is a rapid decrease in duration. But why not with my code?

What is the standard way to train a PyTorch script until convergence?

what is the standard way to detect if a model has converged? I was going to record 5 losses with 95 confidence intervals each loss and if they all agreed then I’d halt the script. I assume training until convergence must be implemented already in PyTorch or PyTorch Lightning somewhere. I don’t need a perfect solution, just the standard way to do this automatically - i.e. halt when converged.
My solution is easy to implement. Once create a criterion and changes the reduction to none. Then it will output a tensor of size [B]. Every you log you record that and it's 95 confidence interval (or std if you prefer, but that is much less accuracy). Then every time you add a new loss with it's confidence interval make sure it remains of size 5 (or 10) and that the 5 losses are within a 95 CI of each other. Then if that is true halt.
You can compute the CI with this:
def torch_compute_confidence_interval(data: Tensor,
confidence: float = 0.95
) -> Tensor:
"""
Computes the confidence interval for a given survey of a data set.
"""
n = len(data)
mean: Tensor = data.mean()
# se: Tensor = scipy.stats.sem(data) # compute standard error
# se, mean: Tensor = torch.std_mean(data, unbiased=True) # compute standard error
se: Tensor = data.std(unbiased=True) / (n**0.5)
t_p: float = float(scipy.stats.t.ppf((1 + confidence) / 2., n - 1))
ci = t_p * se
return mean, ci
and you can create the criterion as follow:
loss: nn.Module = nn.CrossEntropyLoss(reduction='none')
so the train loss is now of size [B].
note that I know how to train with a fixed number of epochs, so I am not really looking for that - just the halting criterion for when to stop when models looks converged, what a person would sort of do when they look at their learning curve but automatically.
ref:
https://forums.pytorchlightning.ai/t/what-is-the-standard-way-to-halt-a-script-when-it-has-converged/1415
Set an EarlyStopping (https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.callbacks.EarlyStopping.html#pytorch_lightning.callbacks.EarlyStopping) callback in your trainer by
checkpoint_callbacks = [
EarlyStopping(
monitor="val_f1_score",
min_delta=0.01,
patience=10, # NOTE no. val epochs, not train epochs
verbose=False,
mode="min",
),
]
trainer = pl.Trainer(callbacks=callbacks)
This will monitor changes in val_f1_score during training (notice that you have to log this value with self.log("val_f1_score", val_f1) in your pl.LightningModule). And it will stop the training if the minimum change to quantity to qualify as an improvement (min_delta) for more than the number of epoch specified as patience

Running pre trained neural net model crashes whole system

I am fairly new to machine learning and have been trying to implement a GAN from source code here https://github.com/tkarras/progressive_growing_of_gans
I have all the dependencies as far as I can tell and I receive no errors when I run their import script. However when I reach the line I have marked below to generate images from the received generator my system shuts off abruptly.
I get no error logs or system events other than the kernel power loss. I have tested some of the CUDA util examples for bandwidth and device testing and that seems to run no problem which leads me to believe its not a hardware issue.
import pickle
import numpy as np
import tensorflow as tf
import PIL.Image
# Initialize TensorFlow session.
tf.InteractiveSession()
# Import official CelebA-HQ networks.
with open('karras2018iclr-celebahq-1024x1024.pkl', 'rb') as file:
G, D, Gs = pickle.load(file)
# Generate latent vectors.
latents = np.random.RandomState(1000).randn(1000, *Gs.input_shapes[0][1:]) # 1000 random latents
latents = latents[[477, 56, 83, 887, 583, 391, 86, 340, 341, 415]] # hand-picked top-10
# Generate dummy labels (not used by the official networks).
labels = np.zeros([latents.shape[0]] + Gs.input_shapes[1][1:])
# Run the generator to produce a set of images.
!!!!!!!!!!!SYSTEM CRASHES ON THIS INSTRUCTION!!!!!!!!!!!!!!!!
images = Gs.run(latents, labels)
# Convert images to PIL-compatible format.
images = np.clip(np.rint((images + 1.0) / 2.0 * 255.0), 0.0, 255.0).astype(np.uint8) # [-1,1] => [0,255]
images = images.transpose(0, 2, 3, 1) # NCHW => NHWC
# Save images as PNG.
for idx in range(images.shape[0]):
PIL.Image.fromarray(images[idx], 'RGB').save('img%d.png' % idx)
However I have had the same power loss issue when running a different ML implementation which used Caffee. So at the moment I am at a loss as to what the core issue might be. Any ideas of what else I could test would be greatly appreciated.
System specs
-Windows 7
-2x - Intel Xeon CPU X5680 3.33GHz
-2x - Nvidia Quadro M6000 GPUs
-24gb Memory
-1250W power supply
-Miniconda 3 w/ python version 3.6.4
-CUDA version 9.0
-CUDNN version 7
-Tensorflow_gpu version 1.7

CUDA_ERROR_OUT_OF_MEMORY: How to activate multiple GPUs from Keras in Tensorflow

I am running a large model on tensorflow using Keras and toward the end of the training the jupyter notebook kernel stops and in the command line I have the following error:
2017-08-07 12:18:57.819952: E tensorflow/stream_executor/cuda/cuda_driver.cc:955] failed to alloc 34359738368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
This I guess is simple enough - I am running out of memory. I have 4 NVIDIA 1080ti GPUs. I know that TF uses only one unless specified. Therefore, I have 2 questions:
Is there a good working example of how to utilise all GPUs in Keras
In Keras, it seems it is possible to change gpu_options.allow_growth=True, but I cannot see exactly how to do this (I understand this is being a help-vampire, but I am completely new to DL on GPUs)
see CUDA_ERROR_OUT_OF_MEMORY in tensorflow
See this Official Keras Blog
Try this:
import keras.backend as K
config = K.tf.ConfigProto()
config.gpu_options.allow_growth = True
session = K.tf.Session(config=config)

Resources