Adding regression to multiple series imported from CSV

Adding regression to multiple series imported from CSV - highcharts

I have a multi-series Highcharts chart working fine when it just grabs data from a CSV file I create. I am trying to add a (loess) regression curve to it with the highcharts-regression plugin, but simply enabling the default regression results in the chart not showing up at all. The application is at http://bmcnoldy.rsmas.miami.edu/vk/
First, the proper JS file for the plugin is called in my HTML: https://rawgithub.com/phpepe/highcharts-regression/master/highcharts-regression.js
None of the examples for highcharts-regression use multiple series that were imported from the built-in CSV import.
The guts of the chart-making (if regression:false then the chart works and shows up, if regression:true it breaks):
$.get('chart.csv', function(csv) {
$('#container').highcharts({
data: {
csv: csv
},
series: [{
(basic series options that work),
regression: true,
},{
(basic series options that work),
regression: true,
},{
...
}]
});
});
Here's a snippet of my five-series CSV file for reference:
Date,Record High,Average High,Daily Average,Average Low,Record Low
"01-Jan-2000",80.2000,75.0000,72.0000,68.2000,45.5000
"02-Jan-2000",79.7000,75.0000,72.1000,68.3000,49.1000
"03-Jan-2000",79.2000,73.7000,70.0000,65.6000,46.4000
"04-Jan-2000",79.0000,72.0000,67.8000,63.7000,43.7000
"05-Jan-2000",80.2000,71.8000,67.4000,62.5000,44.2000
"06-Jan-2000",78.3000,73.0000,68.7000,63.5000,41.0000
"07-Jan-2000",78.3000,71.9000,67.5000,62.3000,45.5000
Inside of each series config, I tried adding
data: [],
just so the data object was present... it didn't matter. But, is there a way to set data to be the proper columns in the csv object like
data: [[csv[0]],[csv[1]]],
or something like that? Would that matter?
I just wanted to add a Loess regression curve to each of the five series, which looked so straightforward from the examples at https://www.highcharts.com/products/plugin-registry/single/22/Highcharts%20regression!
Thanks!

First of all, please notice that linearRegression indicator serves for finding single values (points, not lines) for the given period.
This demo illustrates how to get the regression line in Highchstock (it’s gonna work with data passed as data.csv too (although I didn’t do it to keep the clarity)): http://jsfiddle.net/BlackLabel/w0ohb647/
Highstock offers three indicators that will help us to find the line: linearRegressionSlope, linearRegressionIntercept and linearRegressionAngle. If we set their params.period to be the same as the data length then each of these indicator series will have only one point. It turns out that we can use y values of these points (slope, angle, intercept) to find the equation of the straight line we need: y = slope * x + intercept.
this.addSeries({
type: 'linearRegressionSlope',
linkedTo: 'recordHigh'
}, false);
this.addSeries({
type: 'linearRegressionIntercept',
linkedTo: 'recordHigh'
}, false);
this.addSeries({
type: 'linearRegressionAngle',
linkedTo: 'recordHigh'
}, false);
Highstock doesn’t offer any structures for representing infinite straight lines so we have to mimic it as a line segment:
data: [regressionLineStart, regressionLineEnd]
The parameter that you might find strange is interceptOffset. It has to be implemented because the place where our regression line crosses the mathematical y axis (y = 0) happens in 1st Jan 1970 (timestamp = 0) and we have to “pretend” that the mathematical y axis starts at y = Date.UTC(2018) (for purposes of my workaround).
Notice that auxiliary series (linearRegressionSlope, linearRegressionIntercept & linearRegressionAngle) don't ever appear thanks to setting redraw argument to false in addSeries and remove methods.

Related

Changing label names of Kmean clusters

I am doing the kmean clustering through sklearn in python. I am wondering how to change the generated label name for kmean clusters. For example:
data Cluster
0.2344 1
1.4537 2
2.4428 2
5.7757 3
And I want to achieve to
data Cluster
0.2344 black
1.4537 red
2.4428 red
5.7757 blue
I am not meaning to directly set1 -> black; 2 -> redby printing. I am wondering is it possible to set different cluster names in kmean clustering model in default.

No
There isn't any way to change the default labels.
You have to map them separately using a dictionary.
You can take look at all available methods in the documentation here.
None of the available methods or attributes allows you to change the default labels.
Solution using dictionary:
# Code
a = [0,0,1,1,2,2]
mapping = {0:'black', 1:'red', 2:'blue'}
a = [mapping[i] for i in a]
# Output
['black', 'black', 'red', 'red', 'blue', 'blue']
If you change your data or number of clusters:
First we will see the visualizations:
Code:
Importing and generating random data:
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(100, size =(10,2))
Applying Kmeans algorithm
kmeans = KMeans(n_clusters=3, random_state=0).fit(x)
Getting cluster centers
arr = kmeans.cluster_centers_
Your cluster centroids look like this:
array([[23.81072765, 77.21281171],
[ 8.6140551 , 23.15597377],
[93.37177176, 32.21581703]])
Here, 1st row is the centroid of cluster 0, 2nd row is centroid of cluster 1 and so on.
Visualizing centroids and data:
plt.scatter(x[:,0],x[:,1])
plt.scatter(arr[:,0], arr[:,1])
You get a graph that looks like this:
.
As you can see, you have access to centroids as well as training data. If your training data and number of clusters is constant these centroids dont really change.
But if you add more training data or more number of clusters then you will have to create new mapping according to the centroids that are generated.

check out the top response on this related post
sklearn doesn't include this functionality but you can map the values to your dataframe in a fairly straightforward manner.
current_labels = [1, 2, 3]
desired_labels = ['black', 'red', 'blue']
# create a dictionary for your corresponding values
map_dict = dict(zip(current_labels, desired_labels))
map_dict
>>> {1: 'black', 2: 'red', 3: 'blue'}
# map the desired values back to the dataframe
# note this will replace the original values
data['Cluster'] = data['Cluster'].map(map_dict)
# alternatively you can map to a new column if you want to preserve the old values
data['NewNames'] = data['Cluster'].map(map_dict)

Filling numpy arrays slower than for loop (h.fill vs h.fill.numpy)

It looks like filling a histogram with .fill is faster than filling with .fill.numpy.
For both cases my data is in a namedtuple:
Event = namedtuple("Event", ['nHGPulses', 'HGs1',
'HGs2', 'nHGs1', 'nHGs2', 'area_phd', 'width'])
and the histogram I am trying to fill is
h2_areawidth_pulses = hg.Bin(100, 0, 500, lambda x: x[0], hg.Bin(1000, 0, 5000, lambda x: x[1]))
for event in events:
for a, w in zip(event.area_phd, event.width):
h2_areawidth_pulses.fill((a, w))
or for the numpy case
h2_areawidth_pulses = hg.Bin(100, 0, 500, lambda event: event.area_phd, hg.Bin(1000, 0, 5000, lambda event: event.width))
for event in events:
h2_areawidth_pulses.fill.numpy(event)
Under identical conditions .fill runs in 10s while .fill.numpy takes 195s.
Am I doing something wrong or is this behaviour expected?

That can happen in cases with large numbers of bins. In Histogrammar's Numpy filling, the data to be sent to each bin is separately masked: with 100 bins, you run over the data 100 times. (That's not the case for the jit-compiled algorithms, such as cling and cuda.)
The culprit for this bad algorithm is Histogrammar's generality— at that level of structure, I don't know what's below it, so I have to provide separate inputs to each bin.
This is not the case for histbook, Histogrammar's successor. Now that I've added SparkSQL-filling to histbook, it may satisfy your needs. When it's a complete replacement, I'll put a redirect on Histogrammar's homepage, but for now, I've been speaking the word however I can.

Can you make negative values into positive values for easy comparison in a line chart in SPSS?

Let's say you want to create a line graph which plots a line for the amount of money coming in, and a line for the amount of money going out.
The variable (moneyIn) cases for money coming in is positive, like '30,000', but in this case, the amount of money being expended (moneyOut) is negative, like '-19,000'.
When I use a line graph to plot these results against eachother across a duration of time, one line is plotted way below in the negative numbers, and the other is plotted with the positive numbers, way above - so they're difficult to compare against one another.
Is there a way to change the negative values into positive ones JUST for the line graph, without computing a new variable or changing the database? I think it would essentially be a sum of (moneyOut*-1), but I don't know if this can be implemented JUST for the chart?

You can use the TRANS statement in inline GPL code to flip the sign. Example below.
DATA LIST FREE / In Out (2F5.0) Time (F1.0).
BEGIN DATA
1000 -1500 1
2000 -2500 2
3000 -3500 3
4000 -4500 4
END DATA.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Time In Out
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Time=col(source(s), name("Time"), unit.category())
DATA: In=col(source(s), name("In"))
DATA: Out=col(source(s), name("Out"))
TRANS: OutPos = eval(Out*-1)
GUIDE: axis(dim(1), label("Time"))
GUIDE: axis(dim(2), label("Values"))
SCALE: linear(dim(2), include(0))
ELEMENT: line(position(Time*In))
ELEMENT: line(position(Time*OutPos), color(color.blue))
END GPL.

Is a 1D flag chart possible?

I would like to create a chart similar to
(the line shapes, colors, flag shape are not important).
Each of the flags represent an x value and the x axis boundaries are fixed. This is very similar to a highchart demo (the round and rectangular flags along the x axis) but it looks like the flags are added (via options.series.push) to existing data.
The documentation mentions however that
Used alone flag series will make no sense.
Based on the demo, I tried to create a simple example by forcing the type to flags but it does not render
$('#container').highcharts({
series: [{
type: 'flags',
data: [{
x: 10,
title: 'hello',
text: 'say hello'
}, {
x: 20,
title: 'world',
text: 'say world'
}, {
x: 50,
title: 'bonjour',
text: 'say bonjour'
}],
shape: 'circlepin',
width: 16
}]
});
Is there a direct way to create such 1D flag charts?

If you check the console for errors you can see that you are getting this error:
Uncaught TypeError: $(...).highcharts is not a function
I wrapped your highcharts builder code in a function tag and it loads fine now. See this live demo.

Out of memory, plotting 24 images in 1 plot

Ho, I want to plot 24 images in 1 plot using subplot.
I've already made the empty plots using this method:
# Import everything from matplotlib (numpy is accessible via 'np' alias)
from pylab import *
# create new figure of a3 size.
figure(figsize=(16.5, 11.7), dpi=300)
# do plotting for 24 figs in 1 plot
for i in range(1, 25):
#print i
subplot(4, 6, i)
Now i want to fill my subplots with the same data in everyplot (a background to plot against) in a line plot.
I do this using the following line:
plot(myData)
Once i run the program, it crashes telling me:
"_tkinter.TclError: not enough free memory for image buffer"
So after searching the web I read that i need to close the plots after i make them so that the memory can be reused.
However, how do i do this when using subplots ?
Frank
Edit:
I think it would get easily solved if i could 2 lists, 1 with each uniq item in myData, and the second list with the number of occurences of that uniq item. any1 got tips on that ?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Adding regression to multiple series imported from CSV - highcharts

Related

Changing label names of Kmean clusters

Filling numpy arrays slower than for loop (h.fill vs h.fill.numpy)

Can you make negative values into positive values for easy comparison in a line chart in SPSS?

Is a 1D flag chart possible?

Out of memory, plotting 24 images in 1 plot

Categories

Resources