Check the progress while training the model using tqdm - machine-learning

I know how to check the progression of iterations progress using tqdm:
for i in tqdm_notebook(range(100)):
time.sleep(0.1)
I wanted to check the progress of training of my Random Forest model. Something like:
//tqdm_notebook starts the progress bar
RF_model=RandomForestRegressor(max_features='sqrt',n_estimators=100,oob_score=True)
RF_model.fit(x_train,y_train)
//tqdm_notebook stops the progress bar

You can use the parameter verbose for the same.
As per your code, just add one more parameter:
RF_model=RandomForestRegressor(max_features='sqrt', n_estimators=100, oob_score=True, verbose=2)
RF_model.fit(x_train,y_train)

Related

tqdm progress bar with multiple position

enter image description here
I am using tqdm with multiple positions to show the progress bar. I have only 4 different positions but it printed more than 4 lines, may I know how should I solve this problem?

Huggingface Dataset.map shows red progress bar when batched=True

I have the following simple code copied from Huggingface examples:
model_checkpoint = "distilgpt2"
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)
def tokenize_function(examples):
return tokenizer(examples["text"])
from datasets import load_dataset
datasets = load_dataset('wikitext', 'wikitext-2-raw-v1')
tokenized_datasets = datasets.map(tokenize_function, batched=False, num_proc=4, remove_columns=["text"])
When I set batched=False then the progress bar shows green color which indicates success, but if I set batched=True then the progress bar shows red color and does not reach 100%. Does that mean my map function failed or something else?
It is likely a bug in the printing logic, not in processing itself.
Some relevant discussion at discuss.huggingface.co is here and on GitHub it is here.

How to downsample a Highstock Flag Series

I have a chart with several series on it, one of which is a flag series. The data on the flag series is reasonably sparse, but very bursty. As a result when I am showing a large amount of data, ~10 flags tend to line up up next to each other all pointing to basically the same point on the graph.
What I'd like is for those flags to get downsampled (in a sense) so I only show 1 flag indicator that points to the general area where all the flag points are, then when the user zooms in, all the flag points are displayed since it's now possible to actually distinguish what they are pointing at.
This seems like a job for data grouping and when I am zoomed out and showing large data ranges, all my other series end up getting downsampled by data grouping. However, this isn't being applied to the flag series, I suspect because the series doesn't qualify since it has relatively few points across the range being shown.
Does anyone know if there's anything built in that will help me achieve this? Or do I have to write my own downsampling that's tied into the setExtremes event somehow?
Thanks.

Correlate time series with Graphite

Does Graphite have a way to visualize correlation between two time series?
I would want somehting like this:
In this SlideShare presentation there's a mention of a correlate data transform function (slide 11) however I can't find documentation about it.
The trick to displaying events in Graphite is to apply the drawAsInfinite() function on the red metric. This displays events as a vertical line at the time of the event.
Update-
Perhaps you mean timeShift().
"..what if we want to directly correlate the activity between now and
the same time two weeks ago? This is where the timeShift() function
comes in. Let's take a look at the same 4-week period, but this time
we'll review two weeks of current data and overlay it with a
time-shifted span of the two weeks prior."
Source.
To answer my own question: it is not possible and would not fit Graphite's vision.
From their GitHub issue tracker:
If the X axis isn't time then it isn't a time series... Graphite is a graphing tool for time series data.
Divide one by the other. The straighter that line is, the more related they are. If that correlation is linear of course. Could be logarithmic or anything. But in these cases, your two axis example wouldn’t work either.

Color selection for matplotlib that prints well

I am using pandas and matplotlib to generate bar-graphs with lots of bars.
I know how to cycle through a list of selected colors (How to give a pandas/matplotlib bar graph custom colors).
The question is what colors to select so that my graph prints nicely on a paper (it is for a research paper). What I am most interested in is sufficient contrast between the columns and a selection of colors that looks pleasant. I would like to have multiple colors instead of gray-scale or single-hue colorschemes.
Are there any predetermined schemes to select from that people use?
So your requirements are "lots of colors" and "no two colors should map to the same grayscale value when printed", right? The second criteria should be met by any "sequential" colormaps (which increase or decrease monotically in luminance). I think out of all the choices in matplotlib, you are left with cubehelix (already mentioned), gnuplot, and gnuplot2:
The white line is the luminance of each color, so you can see that each color will map to a different grayscale value when printed. The black line is hue, showing they cycle through a variety of colors.
Note that cubehelix is actually a function (from matplotlib._cm import cubehelix), and you can adjust the parameters of the helix to produce more widely-varying colors, as shown here. In other words, cubehelix is not a colormap, it's a family of colormaps. Here are 2 variations:
For less wildly-varying colors (more pleasant for many things, but maybe not for your bar graphs), maybe try the ColorBrewer 3-color maps, YlOrRd, PuBuGn, YlGnBu:
https://www.flickr.com/photos/omegatron/7298887952/
I wouldn't recommend using only this color to identify bar graphs, though. You should always use text labels as the primary identifier. Also note that some of these produce white bars that completely blend in with the background, since they are intended for heatmaps, not chart colors:
from matplotlib import pyplot as plt
import pandas, numpy as np # I find np.random.randint to be better
# Make the data
x = [{i:np.random.randint(1,5)} for i in range(10)]
df = pandas.DataFrame(x)
# Make a list by cycling through the colors you care about
# to match the length of your data.
cmap = plt.get_cmap('cubehelix')
indices = np.linspace(0, cmap.N, len(x))
my_colors = [cmap(int(i)) for i in indices]
# Specify this list of colors as the `color` option to `plot`.
df.plot(kind='bar', stacked=True, color=my_colors)
And these are the new guys:
In 1.5 matplotlib will ship with 4 new rationally designed color maps:
'viridis' (default color map as of 2.0)
'magma'
'plasma'
'inferno'.
The process of designing these color maps is presented in A Better Default Colormap for Matplotlib | SciPy 2015 .
The tool developed for this process can be installed by pip install viscm.
I would suggest the cubehelix color map. It is designed to have correct luminosity ordering in both color and gray-scale.
I am not aware of predetermined schemes. I usually use a few colours for publication plots. I mostly take two things into consideration when choosing colours:
Colour-blindness: this page on wikipedia has lots of good info about choosing colours that are distinguishable to most color-blind people. If you notice on the "tips for editors" section, once you take the guidelines into account there are only a few sets of colours available. (A good rule of thumb is to never mix red and green!) You can also use the linked colour-blind simulators to see if your plot would be well visible.
Luminance: most of the journals in my field will publish in B&W by default. Even though most people read the papers online, I still like to make sure that the plots can be understood when printed in grayscale. So I take care to use colours that have different luminances. To test, a good way is to just desaturate the image produced, and you'll have a good idea of how it looks when printed in grayscale. In many cases (particularly line or scatter plots), I also use other things than colour to distinguish between sets (eg. line styles, different markers).
If no colours are specified in matplotlib plots, it has a default set of colours that it cycles through. This answer has a good explanation on how to change that default set of colours. You can customise that to your preferred set of colours, so the plots would use them in turn.

Resources