It is known that if we are finding the most matching window to the current window in the entire image, then wherever the correlation is maximised then that is the matching window.
[22 12 14] (window)
(image)
[22 12 34 54 ]
[112 34 54 111 ]
[12 22 12 34 ]
[11 22 12 14 ]
But correlation is product of corresponding values in two windows.
So, if some of the window have high intensity values then they will always provide better match. e.g. in above example we will have higher correlation value for 2nd row.
Probably you need Normalized Cross Correlation, the maximum will be in 4th row.
Related
As im so new to this field and im trying to explore the data for a time series, and find the missing values and count them and study a distribution of their length and fill in these gaps, the thing is i have, let's say 10 file.txt and for each file i have 2 columns as follows:
C1 C2
944 0
920 1
920 2
928 3
912 7
920 8
920 9
880 10
888 11
920 12
944 13
and so on... lets say till 100 and not necessarily the 10 files have the same number of observations.
so here for example the missing values and not necessarily appears in all files that i have, missing value are: 4,5 and 6 in C2 and the corresponding 1st column C1(measured in milliseconds, so the value of 928ms is not a time neighbor of 912ms). So i want to find those gaps(the total missing values in all 10 files) and show a histogram of their lengths.
i wrote a piece of code in R, but the problem is that i don't get the exact total number that i should have for the missing values.
path = "files path"
out.file<-data.frame(TS = 0, Index = 0, File = '')
file.names <- dir(path, pattern =".txt")
for(i in 1:length(file.names)){
file <- cbind(read.table(file.names[i],
header=F,
sep ="\t",
stringsAsFactors=FALSE),
file.names[i])
colnames(file) <- c('TS', 'Index', 'File')
out.file <- rbind(out.file, file)
}
d = dim(out.file)[1]
misDa = 0
for(i in 2:(d-1)){
if(abs(out.file$Index[i]-out.file$Index[i+1]) > 1)
misDa = misDa+1
}
Hard to give specific hints without having a more extensive example of your data that contains some of the actual NAs.
If you are using R (like it seems) the naniar and the imputeTS packages offer nice functions for missing data visualizations.
Some examples from the naniar package, which is especially good for multivariate data (more plot examples):
Some examples from the imputeTS package, which is especially good for time series data (additional plot examples):
i am doing some research about implementing a Savitzky-Golay filter for images. As far as i have read, the main application for this filter is signal processing, e.g. for smoothing audio-files.
The idea is fitting a polynomial through a defined neighbourhood around point P(i) and setting this point P to his new value P_new(i) = polynomial(i).
The problem in 2D-space is - in my opinion - that there is not only one direction to do the fitting. You can use different "directions" to find a polynomial. Like for
[51 52 11 33 34]
[41 42 12 24 01]
[01 02 PP 03 04]
[21 23 13 43 44]
[31 32 14 53 54]
It could be:
[01 02 PP 03 04], (horizontal)
[11 12 PP 23 24], (vertical)
[51 42 PP 43 54], (diagonal)
[41 42 PP 43 44], (semi-diagonal?)
but also
[41 02 PP 03 44], (semi-diagonal as well)
(see my illustration)
So my question is: Does the Savitzky-Golay filter even make sense for 2D-space, and if yes, is there and any defined generalized form for this filter for higher dimensions and larger filter masks?
Thank you !
A first option is to use SG filtering in a separable way, i.e. filtering once on the horizontal rows, then a second time on the vertical rows.
A second option is to rewrite the equations with a bivariate polynomial (bicubic f.i.) and solve for the coefficients by least-squares.
Forecast(model) from Forecast package, it returns point forecast along with upper and lower forecast intervals. Is there a way to extract the exact distribution for each forecast value so that I can make a histogram for every row of forecast? Having the intervals are't sufficient enough to make histograms shown below.
> forecast(mod,12)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 0.000284821 0.0002356356 0.0003340064 2.095985e-04 0.0003600435
13 0.000284821 0.0002237453 0.0003458967 1.914137e-04 0.0003782283
14 0.000284821 0.0002138190 0.0003558230 1.762328e-04 0.0003934092
15 0.000284821 0.0002051195 0.0003645225 1.629281e-04 0.0004067140
16 0.000284821 0.0001972803 0.0003723617 1.509390e-04 0.0004187030
17 0.000284821 0.0001900876 0.0003795544 1.399388e-04 0.0004297033
18 0.000284821 0.0001834037 0.0003862383 1.297167e-04 0.0004399253
19 0.000284821 0.0001771339 0.0003925081 1.201278e-04 0.0004495142
20 0.000284821 0.0001712096 0.0003984324 1.110674e-04 0.0004585746
21 0.000284821 0.0001655793 0.0004040627 1.024565e-04 0.0004671855
22 0.000284821 0.0001602030 0.0004094390 9.423428e-05 0.0004754077
23 0.000284821 0.0001550494 0.0004145927 8.635240e-05 0.0004832896
The distribution is normal for all ARIMA models provided the residuals are normally distributed. So you can easily obtain the mean and variance of all future time periods from the point forecast and upper/lower bounds.
If you are unwilling to assume the residuals are normal, you can simulate the future distributions as follows.
library(forecast)
nsim <- 1000
mod <- auto.arima(WWWusage)
sim <- matrix(NA, nrow=9, ncol=nsim)
for(i in 1:nsim)
sim[,i] <- simulate(mod, nsim=9, future=TRUE, bootstrap=TRUE)
par(mfrow=c(3,3))
for(i in 1:9)
hist(sim[i,], breaks="FD", main=paste("h=",i))
I have a list of sporting matches by time with result and margin. I want Tableau to keep a running count of number of matches since the last x (say, since the last draw - where margin = 0).
This will mean that on every record, the running count will increase by one unless that match is a draw, in which case it will drop back to zero.
I have not found a method of achieving this. The only way I can see to restart counts is via dates (e.g. a new year).
As an aside, I can easily achieve this by creating a running count tally OUTSIDE of Tableau.
The interesting thing is that Tableau then doesn't quite deal with this well with more than one result on the same day.
For example, if the structure is:
GameID Date Margin Running count
...
48 01-01-15 54 122
49 08-01-15 12 123
50 08-01-15 0 124
51 08-01-15 17 0
52 08-01-15 23 1
53 15-01-15 9 2
...
Then when trying to plot running count against date, Tableau rearranges the data to show:
GameID Date Margin Running count
...
48 01-01-15 54 122
51 08-01-15 17 0
52 08-01-15 23 1
49 08-01-15 12 123
50 08-01-15 0 124
53 15-01-15 9 2
...
I assume it is doing this because by default it sorts the running count data in ascending order when dates are identical.
Just looking for some brief advice to put me back on the right track. I have been working on a solution to a problem where I have a very sparse input matrix (~25% of information filled, rest is 0's) stored in a sparse.coo_matrix:
sparse_matrix = sparse.coo_matrix((value, (rater, blurb))).toarray()
After some work on building this array from my data set and messing around with some other options, I currently have my NMF model fitter function defined as follows:
def nmf_model(matrix):
model = NMF(init='nndsvd', random_state=0)
W = model.fit_transform(matrix);
H = model.components_;
result = np.dot(W,H)
return result
Now, the issue is my output doesn't seem to be accounting for the 0 values correctly. Any value that was a 0 gets bumped to some value less than 1 and my known values fluctuate from the actual quite a bit (All data are ratings between 1 and 10). Can anyone spot what I am doing wrong? From the documentation for scikit, I assumed using the nndsvd initialization would help account for the empty values correct. Sample output:
#Row / Column / New Value
35 18 6.50746917334 #Actual Value is 6
35 19 0.580996641675 #Here down are all "estimates" of my function
35 20 1.26498699492
35 21 0.00194119935464
35 22 0.559623469753
35 23 0.109736902936
35 24 0.181657421405
35 25 0.0137801897011
35 26 0.251979684515
35 27 0.613055371646
35 28 6.17494590041 #Actual values is 5.5
Appreciate any advice any more experienced ML coders can offer!