Huggingface Dataset.map shows red progress bar when batched=True

Huggingface Dataset.map shows red progress bar when batched=True - huggingface

I have the following simple code copied from Huggingface examples:
model_checkpoint = "distilgpt2"
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)
def tokenize_function(examples):
return tokenizer(examples["text"])
from datasets import load_dataset
datasets = load_dataset('wikitext', 'wikitext-2-raw-v1')
tokenized_datasets = datasets.map(tokenize_function, batched=False, num_proc=4, remove_columns=["text"])
When I set batched=False then the progress bar shows green color which indicates success, but if I set batched=True then the progress bar shows red color and does not reach 100%. Does that mean my map function failed or something else?

It is likely a bug in the printing logic, not in processing itself.
Some relevant discussion at discuss.huggingface.co is here and on GitHub it is here.

Related

I cant figure out how to get background colour in turtle python to work

I've been trying to figure out how to input background colour for a day or two now.
I have been given a booklet to work through as I learn however once I reached the chapter on setting a screen size or background colour, it simply wont work.
I was told to use the code bgcolor("colour here") and when that didn't work, I searched online and found turtle.bgcolor("colour here") which didn't work either.
All I get back is an error saying the names aren't defined. setup(number 1, number 2) was also coming up with the same error but i couldn't find a substitute to it online. Could somebody please help me?
my teacher said just fill a shape for the background but I feel like it would feel more achieving if I figured it out instead. Thank You
below is a few clips of my codes.

You need to change from turtle import* into from turtle import * (extra space)
So your code will be:
from turtle import *
setup(800, 500)
bgcolor("black")

Remove color cast using libvips

I have sRGB images with color casts. To remove it manually I usually use Photoshop Level Adjustments. Photoshop also have tools for that: Auto Contrast or even better Auto Tone which also takes shadows, midtones & highlights into account.
If I remove the cast manually I adjust each of the RGB channels individually so that the darkest pixels are set to pure black and the lightest to pure white and then redistribute all other values (spreading the histogram). This is a simple approach but shows good results for my images.
In my node.js app I'm using sharp for image processing which uses libvips as its processing engine. I tried to remove the cast with .normalize() but this command works on all channels together and not individual for each of the RGB channels. So it doesn't work for me.
I also asked this question on the sharp project page. I tested the suggestion from lovell to try it with hist_local but the results are not useable for me.
Now I would like to find out how this could be done using the native libvips. I've played around with nip2 GUI and different commands but could not figure out how it could be achieved:
Histogram > Equalise Histogram > Global => Picture looks over saturated
Image > Levels > Scale to 0 - 255 => Channels ar not all spreading from 0 - 255 (I don't understand exactly what this command does?)
Thanks for every hint!
Addition
Here is a example with pictures from Photoshop to show what I want.
The source image is a picture of a frame from a film negative.
Image before processing
Step1 Invert image
Image after inversion
Step2 using Auto tone in Photoshop (works the same way as my description above about manually remove the color cast)
Image after Auto Tone
This last picture is ok for me.

nip2 has a menu item for this.
Load your image and mark a region on it containing the area you'd like to be neutral. It can be any lightness, it doesn't need to be white.
Use File / Open to get the file dialog and you should see the image loaded in your workspace as a thumbnail.
Doubleclick on the thumbnail to open an image view window.
In the view window, zoom and pan to the right spot. The user guide (press F1) has a section on image navigation.
Hold down CTRL and click and drag down and right to mark a rectangular region.
Back in the main window, click Toolkits / Tasks / Capture / White balance. You should see something like:
You can drag an resize your region to change the neutral point. Use the colour picker to set what white means. You can make other whites with (for example) Colour / New / Colour from CCT and link them together.
Click Colour / New / Colour from CCT to make a colour picker from CCT (correlated colour temperature) -- the temperature in Kelvin of that white.
Set it to something interesting, like 4800 for warm white.
Click on the formula for A5.white to edit it, and enter the cell of your CCT widget (A7 in this case).
Now you can drag the region to adjust the pixels to set the neutral from, and drag the CCT slider to set the temperature.
It can be annoying to find things in the toolkit menu. There's a thing for searching toolkits: in the main window, click View / Toolkit browser. You can enter something like "white" and it'll show related toolkit entries.

Here's another answer, but using pyvips and responding to the previous comments. I didn't want to delete the first answer as it still seemed useful.
This version finds the image histogram, searches for thresholds which will select 0.5% and 99.5% of pixels in each image band, then rescales the image so that those pixel values become 0 and 255.
import sys
import pyvips
# trim off this percentage of pixels from the top and bottom
trim_percent = 0.5
def percent(hist, percentage):
"""From a histogram, find the threshold above which lie
#percentage of pixels."""
# normalised cumulative histogram
norm = hist.hist_cum().hist_norm()
# column and row profile over percentage
c, r = (norm > norm.width * percentage / 100).profile()
return r.avg()
image = pyvips.Image.new_from_file(sys.argv[1])
# photographic negative
image = image.invert()
# find image histogram, split to set of separate bands
bands = image.hist_find().bandsplit()
# for each band, the low and high thresholds
low = [percent(band, trim_percent) for band in bands]
high = [percent(band, 100 - trim_percent) for band in bands]
# rescale image
scale = [255.0 / (h - l) for h, l in zip(high, low)]
image = (image - low) * scale
image.write_to_file(sys.argv[2])
It seems to give roughly similar results to the PS button. If I run:
$ ./autolevel.py ~/pics/before.jpg x.jpg
I see:

In the meantime I've found the Simplest Color Balance Algorithm which exactly describes the problem with color casts and there you can also find a C source code.
It is exactly the same solution as John describes in his second answer but as a small piece of c-code.
I'm now trying to use it as C/C++ addon with N-API under node.js.

Check the progress while training the model using tqdm

I know how to check the progression of iterations progress using tqdm:
for i in tqdm_notebook(range(100)):
time.sleep(0.1)
I wanted to check the progress of training of my Random Forest model. Something like:
//tqdm_notebook starts the progress bar
RF_model=RandomForestRegressor(max_features='sqrt',n_estimators=100,oob_score=True)
RF_model.fit(x_train,y_train)
//tqdm_notebook stops the progress bar

You can use the parameter verbose for the same.
As per your code, just add one more parameter:
RF_model=RandomForestRegressor(max_features='sqrt', n_estimators=100, oob_score=True, verbose=2)
RF_model.fit(x_train,y_train)

high chart sample

looking for high chart sample program, I haven't used Highcharts before, but it seems there are no sliders built take a look at the answer.
Actually Highcharts in general are not free.. For me it's simple bullet graph. In Highcharts for that you can use bar chart with scatter point

The bullet concept will work for this, though it will take some work to get styling like that. There are plenty of useful options for styling such a chart though without relying on the physical gauge metaphor.
A quick variation on the bullet chart approach that puts them into a single chart and removes the banding:
http://jsfiddle.net/jlbriggs/kwtZr/41/
It relies on a custom extension to produce the 'line' marker type:
Highcharts.Renderer.prototype.symbols.line = ...
{{
edit in response to comments below:
updated example with some additional formatting options and clean up:
http://jsfiddle.net/jlbriggs/kwtZr/55/
be wary of using multiple colors unless the colors truly mean something.
Using additional color to highlight items that require attention is a good use of color.
Using color to highlight every possible status of something (shades of green, fading to shades of yellow, fading to shades of red...), is a bad use of color that is sadly over used and even expected by some.
FWIW
Also important to reiterate that the purpose of this type of display is very well handled by a bullet chart, which is definitely worth looking into migrating to somewhere along the way. Reference:
http://www.perceptualedge.com/articles/misc/Bullet_Graph_Design_Spec.pdf
http://en.wikipedia.org/wiki/Bullet_graph

Color selection for matplotlib that prints well

I am using pandas and matplotlib to generate bar-graphs with lots of bars.
I know how to cycle through a list of selected colors (How to give a pandas/matplotlib bar graph custom colors).
The question is what colors to select so that my graph prints nicely on a paper (it is for a research paper). What I am most interested in is sufficient contrast between the columns and a selection of colors that looks pleasant. I would like to have multiple colors instead of gray-scale or single-hue colorschemes.
Are there any predetermined schemes to select from that people use?

So your requirements are "lots of colors" and "no two colors should map to the same grayscale value when printed", right? The second criteria should be met by any "sequential" colormaps (which increase or decrease monotically in luminance). I think out of all the choices in matplotlib, you are left with cubehelix (already mentioned), gnuplot, and gnuplot2:
The white line is the luminance of each color, so you can see that each color will map to a different grayscale value when printed. The black line is hue, showing they cycle through a variety of colors.
Note that cubehelix is actually a function (from matplotlib._cm import cubehelix), and you can adjust the parameters of the helix to produce more widely-varying colors, as shown here. In other words, cubehelix is not a colormap, it's a family of colormaps. Here are 2 variations:
For less wildly-varying colors (more pleasant for many things, but maybe not for your bar graphs), maybe try the ColorBrewer 3-color maps, YlOrRd, PuBuGn, YlGnBu:
https://www.flickr.com/photos/omegatron/7298887952/
I wouldn't recommend using only this color to identify bar graphs, though. You should always use text labels as the primary identifier. Also note that some of these produce white bars that completely blend in with the background, since they are intended for heatmaps, not chart colors:
from matplotlib import pyplot as plt
import pandas, numpy as np # I find np.random.randint to be better
# Make the data
x = [{i:np.random.randint(1,5)} for i in range(10)]
df = pandas.DataFrame(x)
# Make a list by cycling through the colors you care about
# to match the length of your data.
cmap = plt.get_cmap('cubehelix')
indices = np.linspace(0, cmap.N, len(x))
my_colors = [cmap(int(i)) for i in indices]
# Specify this list of colors as the `color` option to `plot`.
df.plot(kind='bar', stacked=True, color=my_colors)
And these are the new guys:

In 1.5 matplotlib will ship with 4 new rationally designed color maps:
'viridis' (default color map as of 2.0)
'magma'
'plasma'
'inferno'.
The process of designing these color maps is presented in A Better Default Colormap for Matplotlib | SciPy 2015 .
The tool developed for this process can be installed by pip install viscm.
I would suggest the cubehelix color map. It is designed to have correct luminosity ordering in both color and gray-scale.

I am not aware of predetermined schemes. I usually use a few colours for publication plots. I mostly take two things into consideration when choosing colours:
Colour-blindness: this page on wikipedia has lots of good info about choosing colours that are distinguishable to most color-blind people. If you notice on the "tips for editors" section, once you take the guidelines into account there are only a few sets of colours available. (A good rule of thumb is to never mix red and green!) You can also use the linked colour-blind simulators to see if your plot would be well visible.
Luminance: most of the journals in my field will publish in B&W by default. Even though most people read the papers online, I still like to make sure that the plots can be understood when printed in grayscale. So I take care to use colours that have different luminances. To test, a good way is to just desaturate the image produced, and you'll have a good idea of how it looks when printed in grayscale. In many cases (particularly line or scatter plots), I also use other things than colour to distinguish between sets (eg. line styles, different markers).
If no colours are specified in matplotlib plots, it has a default set of colours that it cycles through. This answer has a good explanation on how to change that default set of colours. You can customise that to your preferred set of colours, so the plots would use them in turn.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart