I have multiple time-series data, plotted as shown below:
Is there a better plotting type that can show the curves of a,b,c, and d in a clearer way? Is there any other plot types where it is easy to see the "twisted" points?
I see two options:
1)
You do some subplots.
pylab_examples example code: subplots_demo.
2)
You let all curves on the same plot. And you apply a rolling mean (also named moving average) on each curve.
Rolling Windows
One more advice.
You can limit your y axis on the graph. Space on the graph will be better optimized
plt.ylim([0,20])
Related
I have 20 years of data. I want to find the linear trend of the %s as a single number. EG if you were to plot the linear trend, there would be a coefficient by which the line increases/ decreases over time.
Google sheets has a trend function, but it's used for creating new data based on predicting trends.
Your question is too vague to answer clearly and precisely for what you want. Are you looking for the formula for the trend line? Just the correlation coefficient? Or a future value based on the info? The slope of the trend line?
What you have described is linear regression. I would suggest browsing the Insert drop down menu for formulas > statistics. There are formulas for each piece of info you want to draw (except creating the formula for you).
An easy and superficial way of obtaining the correlation coefficient and actual formula (and thus slope for linear trend lines), is to use excel. Copy your data table into excel and then create a scatterplot with the table. Go into the settings for the scatter plot and check the box for “trendline”. Then go into the trendline settings for the plot, and you can select which type of regression you want excel to use. You want linear. Towards the bottom of that menu, you want to check the boxes that say “show formula on chart” and “show R coefficient” or something along those lines. Excel will then print out your formula and coefficient in a text box on the chart. Your slope will be the coefficient of the x variable.
Hope this helps! Regression is a wormhole. I’d love to get more in depth if you’re interested!
NOTE: The outlier for year 2003 will have a significant impact on a linear regression line. Consider removing it from the data to create a line that will be more accurate for future predictions.
I have a explanatory variable x and a response variable y. I am trying to find which power of the feature i should train with. You can ignore the colors for my question. the scatter data is from the sensor and the line plot is the theoretical curve from the lab, which you can also ignore for my question.
For this answer I understand you want to obtain some polynomial curve going through the croissant shaped zone where points are dense.
Also I assume that the independent variable is on the horizontal axis, while the dependent is on the vertical one. Otherwise as you can see from the blue line, there is no functional that could give you this.
Now to select the degree of polynomial you can use stepwise regression.
This is about running the regression with more or less features one at a time (i.e decrease or increase the degree of polynomial in this case), and calculating a score such as AIC, BIC, or even adjusted R2 to assess if it's worth it or not to add or remove this feature.
I have numerous return time series spanning over a couple of years. I want to see how stable these series are across time. So far I have winsorized and z-scored my data and created histograms and AVG vs. StdDev graphs. Using the histograms I can see how the distribution looks and check for positive or negative skew, with the Avg vs. StdDev chart I tried to get some kind of density measure within the data set (each data point represents a point in time), i.e a big blob means less stable than a dense one
I am looking for other ways to visualise my data. Any ideas welcome
I have two 2 dimensional feature vector obtained from MFCC. How can I apply Dynamic Time Warping(DTW) on it? Can I find the similarties between two vector in percentage?
DTW has a cost associated with aligning time points.
You can essentially put in any distance function there, not only absolute or squared difference. In particular you can use $(a_1(t)-b_1(t'))^2+(a_2(t)-b_2(t'))^2$
I'm looking to create an animated series of photos that are as similar as possible. While researching, I've come across two methods:
Generate a pHash of the images and do a nearest neighbor using the Hamming distance of the hash.
Create color histograms and do an n-dimensional nearest neighbor using Euclidian distance.
Many people who have been commenting for the https://stackoverflow.com/questions/6971966/how-to-measure-percentage-similarity-between-two-images question claim that the two processes are essentially the same. I'm looking for a little more insight on this. They seem like different processes.
Thoughts?