Histogram interpretation of Weka - machine-learning

I have doubt on data interpretation on Weka. The data set on which I worked on as follows
outlook temperature humidity windy play
------------------------------------------------------
sunny hot high FALSE no
sunny hot high TRUE no
overcast hot high FALSE yes
rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
overcast cool normal TRUE yes
sunny mild high FALSE no
sunny cool normal FALSE yes
rainy mild normal FALSE yes
sunny mild normal TRUE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes
rainy mild high TRUE no
The histograms that has been generated in Weka depending upon the above data set, those histograms are not clear to me.
I know blue color means one can play and red color means one can not play. To draw a histogram we need to find out the frequency of data.
In the above picture, the sunny count is 5 because as per the above dataset outlook attribute has 5 values that are sunny. In the above picture, the overcast count is 4 because as per the above dataset outlook attribute has 4 values that are overcast. If the outlook is overcast one can play, there is no option that one cannot play even if the outlook is overcast. So the overcast bar is pure blue. However, if the outlook is sunny there are 3 possibilities one can play and 2 possibilities one cannot play. Therefore, the bar sunny is a mixture of blue and red.
Now, how could I know by just looking on the bar how many instances are yes if the outlook is sunny and how many instances are no, if the outlook is sunny?
Also, how could I draw such a histogram in excel?
Thank you.

The bar-plot (nominal class) or histogram (numeric class) in Weka's Explorer is only there to give you an idea about the data. At the time of writing, there is no way to tell the various counts for the associated class labels from the graph apart from the total, which is displayed on top.
I don't use Excel, so I can't comment on that.
For reference: the responsible class in Weka's source code for drawing these is weka.gui.AttributeVisualizationPanel. The inner classes BarCalc and HistCalc perform the respective calculations.

Related

Is there a way to overlay an on/off sensor over a graph in grafana?

I am using influxdb together with grafana to track temperatures in my house. I am also tracking some switches. One of the switches is connected to a heater, I want to see how the temperature outside relates to how often/long the heater is running. The heater just publishes a value of 0/1 for on/off so it is a little difficult to see the relationship in one graph.
I want to know if there is a way to have a filled-in background behind the temperature graph whenever the heater switch is on. Something similar to the time region feature of grafana, only with values from another source not time. A workaround (which I want to avoid), to better illustrate what I want, would be to have the value of 1 replaced by 200 and limit the graph to only display up to a more sane number. The result would be that the higher number of 200 is "off the charts" and I would see the fill of that graph.
Here is an image of the time region graph which is similar to how I picture what I am looking to do.
You can achieve the effect of background color by assigning the heater switch value to right y axis in the Visualization tab, setting axis limits to 0 and 1 and filling the area below the line with a solid color.

How to downsample a Highstock Flag Series

I have a chart with several series on it, one of which is a flag series. The data on the flag series is reasonably sparse, but very bursty. As a result when I am showing a large amount of data, ~10 flags tend to line up up next to each other all pointing to basically the same point on the graph.
What I'd like is for those flags to get downsampled (in a sense) so I only show 1 flag indicator that points to the general area where all the flag points are, then when the user zooms in, all the flag points are displayed since it's now possible to actually distinguish what they are pointing at.
This seems like a job for data grouping and when I am zoomed out and showing large data ranges, all my other series end up getting downsampled by data grouping. However, this isn't being applied to the flag series, I suspect because the series doesn't qualify since it has relatively few points across the range being shown.
Does anyone know if there's anything built in that will help me achieve this? Or do I have to write my own downsampling that's tied into the setExtremes event somehow?
Thanks.

How to increase contrast between colors generated from image?

Some details:
I'm making a small prototype in Framer, some kind a wallpaper app. I use vibrant.js to automatically pick colors from the images to add a bit of a tint to my interface. I use two vibrant color profiles: "DarkMuted" - for the backgrounds and "Vibrant" - for active controls / accents etc.
Unfortunately, color combintation looks dull and desaturated sometimes, active elements don't stand out as much as I want it.
So my first decision was to
Blindly edit colors.
I convert them to hsl and explicitly set s and l values.
s: .2, l: .2 # DarkMuted
s: .6, l: .8 # Vibrant
This creates enough contrast between the two, but also has a drawback: sometimes colors look a bit oversaturated and distorted (compared to the input).
By this link you can find pairs of screenshots to show you the difference between "original" color pair returned by "vibrant.js" and colors with adjusted s and l values.
I've already asked on another forum if it's possible to apply automated adjustments to the color, to normalize percieved bias for some color ranges. The answer was "almost impossible".
I would say that subjectively acceptable color rate is ~ 65% but the result is too unpredictable. Since it's an automatic solution I can't rely on that too much.
So I decided to approach it another way:
Generate a bunch of colors and filter one
The problem here is:
I've not found how to generate more than one color per profile with vibrant.js
Also, I've tried the color-thief.js library to generate a palette of dominant colors and then filter, what I call, a "vibrant" color.
# Threshold values I used
thr = {minL: .4, maxL: .8, minS: .6, maxS: .8}
But here the another problem occurs - not every image has a set of colors that fall under my threshold. Some images have a pastel gamma or b/w and don't return anything.
So,
Can I overcome the vibrant.js limitation of 1 color per profile to have a bunch of "Vibrant" colors and then pick one that suits my requirements?
Or, maybe, there is another / better solution of doing it?
There is a specification about minimum contrast between colors (WCAG) you can find it here. So a possible strategie would be extracting the colors with vibrant.js and after that you could check contrast with a function. You can find a guide to build a function to check color constrast here. The last step probably would be generate colors variations with good contrast based on the results from the color contrast function. You can generate variations using this lib.

Correlate time series with Graphite

Does Graphite have a way to visualize correlation between two time series?
I would want somehting like this:
In this SlideShare presentation there's a mention of a correlate data transform function (slide 11) however I can't find documentation about it.
The trick to displaying events in Graphite is to apply the drawAsInfinite() function on the red metric. This displays events as a vertical line at the time of the event.
Update-
Perhaps you mean timeShift().
"..what if we want to directly correlate the activity between now and
the same time two weeks ago? This is where the timeShift() function
comes in. Let's take a look at the same 4-week period, but this time
we'll review two weeks of current data and overlay it with a
time-shifted span of the two weeks prior."
Source.
To answer my own question: it is not possible and would not fit Graphite's vision.
From their GitHub issue tracker:
If the X axis isn't time then it isn't a time series... Graphite is a graphing tool for time series data.
Divide one by the other. The straighter that line is, the more related they are. If that correlation is linear of course. Could be logarithmic or anything. But in these cases, your two axis example wouldn’t work either.

Color selection for matplotlib that prints well

I am using pandas and matplotlib to generate bar-graphs with lots of bars.
I know how to cycle through a list of selected colors (How to give a pandas/matplotlib bar graph custom colors).
The question is what colors to select so that my graph prints nicely on a paper (it is for a research paper). What I am most interested in is sufficient contrast between the columns and a selection of colors that looks pleasant. I would like to have multiple colors instead of gray-scale or single-hue colorschemes.
Are there any predetermined schemes to select from that people use?
So your requirements are "lots of colors" and "no two colors should map to the same grayscale value when printed", right? The second criteria should be met by any "sequential" colormaps (which increase or decrease monotically in luminance). I think out of all the choices in matplotlib, you are left with cubehelix (already mentioned), gnuplot, and gnuplot2:
The white line is the luminance of each color, so you can see that each color will map to a different grayscale value when printed. The black line is hue, showing they cycle through a variety of colors.
Note that cubehelix is actually a function (from matplotlib._cm import cubehelix), and you can adjust the parameters of the helix to produce more widely-varying colors, as shown here. In other words, cubehelix is not a colormap, it's a family of colormaps. Here are 2 variations:
For less wildly-varying colors (more pleasant for many things, but maybe not for your bar graphs), maybe try the ColorBrewer 3-color maps, YlOrRd, PuBuGn, YlGnBu:
https://www.flickr.com/photos/omegatron/7298887952/
I wouldn't recommend using only this color to identify bar graphs, though. You should always use text labels as the primary identifier. Also note that some of these produce white bars that completely blend in with the background, since they are intended for heatmaps, not chart colors:
from matplotlib import pyplot as plt
import pandas, numpy as np # I find np.random.randint to be better
# Make the data
x = [{i:np.random.randint(1,5)} for i in range(10)]
df = pandas.DataFrame(x)
# Make a list by cycling through the colors you care about
# to match the length of your data.
cmap = plt.get_cmap('cubehelix')
indices = np.linspace(0, cmap.N, len(x))
my_colors = [cmap(int(i)) for i in indices]
# Specify this list of colors as the `color` option to `plot`.
df.plot(kind='bar', stacked=True, color=my_colors)
And these are the new guys:
In 1.5 matplotlib will ship with 4 new rationally designed color maps:
'viridis' (default color map as of 2.0)
'magma'
'plasma'
'inferno'.
The process of designing these color maps is presented in A Better Default Colormap for Matplotlib | SciPy 2015 .
The tool developed for this process can be installed by pip install viscm.
I would suggest the cubehelix color map. It is designed to have correct luminosity ordering in both color and gray-scale.
I am not aware of predetermined schemes. I usually use a few colours for publication plots. I mostly take two things into consideration when choosing colours:
Colour-blindness: this page on wikipedia has lots of good info about choosing colours that are distinguishable to most color-blind people. If you notice on the "tips for editors" section, once you take the guidelines into account there are only a few sets of colours available. (A good rule of thumb is to never mix red and green!) You can also use the linked colour-blind simulators to see if your plot would be well visible.
Luminance: most of the journals in my field will publish in B&W by default. Even though most people read the papers online, I still like to make sure that the plots can be understood when printed in grayscale. So I take care to use colours that have different luminances. To test, a good way is to just desaturate the image produced, and you'll have a good idea of how it looks when printed in grayscale. In many cases (particularly line or scatter plots), I also use other things than colour to distinguish between sets (eg. line styles, different markers).
If no colours are specified in matplotlib plots, it has a default set of colours that it cycles through. This answer has a good explanation on how to change that default set of colours. You can customise that to your preferred set of colours, so the plots would use them in turn.

Resources