How to estimate? "simple" Nonlinear Regression + Parameter Constraints + AR residuals - nls

I am new to this site so please bear with me. I want to
the nonlinear model as shown in the link: https://i.stack.imgur.com/cNpWt.png by imposing constraints on the parameters a>0 and b>0 and gamma1 in [0,1].
In the nonlinear model [1] independent variable is x(t) and dependent are R(t), F(t) and ξ(t) is the error term.
An example of the dataset can be shown here: https://i.stack.imgur.com/2Vf0j.png 68 rows of time series
To estimate the nonlinear regression I use the nls() function with no problem as shown below:
NLM1 = nls(**Xt ~ (aRt-bFt)/(1-gamma1*Rt), start = list(a = 10, b = 10, lamda = 0.5)**,algorithm = "port", lower=c(0,0,0),upper=c(Inf,Inf,1),data = temp2)
I want to estimate NLM1 with allowing for also an AR(1) on the residuals.
Basically I want the same procedure as we go from lm() to gls(). My problem is that in the gnls() function I dont know how to put contraints for the model parameters a, b, gamma1 and the model estimates wrong values for them.
nls() has the option for lower and upper bounds. I cant do the same on gnls()
In the gnls(): I need to add the contraints something like as in nls() lower=c(0,0,0),upper=c(Inf,Inf,1)
NLM1_AR1 = gnls( model = Xt ~ (aRt-bFt)/(1-gamma1*Rt), data = temp2, start = list(a =13, b = 10, lamda = 0.5),correlation = corARMA(p = 1))
Does any1 know the solution on how to do it?
Thank you

Related

cvxpy: Possible sign flip for Lagrange Multiplier/dual variable

I have encountered a very strange problem when using cvxpy. Consider the following two problems:
x = cvx.Variable(1, "x")
obj = cvx.Minimize(x)
cons = [x==1]
prob = cvx.Problem(obj, cons)
prob.solve()
print(cons[0].dual_value)
Output: -1
x = cvx.Variable(1, "x")
obj = cvx.Maximize(x)
cons = [x==1]
prob = cvx.Problem(obj, cons)
prob.solve()
print(cons[0].dual_value)
Output: 1
The only difference is that one is a minimization problem and the other is a maximization problem, but the sign of the dual variable is flipped.
Conceptually, this shouldn't happen as in both cases the Lagrangian is L=x + lambda*(x-1), but I cannot find the documentation on how it is defined.
Does anyone have an explanation on why this is happening?

The complier cannot call 'pivothighs' with argument

I've been trying to make a pivot point high low multitimeframe indicator but still a new learner and have no idea how to fix.
I tried to put 'tf' function in multiple places of code but it's not working.
//#version=4
study("Pivot Prices", overlay=true)
tf=input('120')
leftbars = input(10, minval=1, title='Bars to the left')
rightbars = input(2, minval=1, title='Bars to the right')
phigh = pivothigh(high, tf, leftbars,rightbars)
plow = pivotlow(low, tf, leftbars, rightbars)
if phigh
label1 = label.new(bar_index[rightbars], high[rightbars], text=tostring(high[rightbars]), style=label.style_labeldown, color=color.orange)
if plow
label2 = label.new(bar_index[rightbars], low[rightbars], text=tostring(low[rightbars]), s``tyle=label.style_labelup, color=color.green)
I want it to be showing multitimeframe perspective but couldn't figure what is wrong in the code.
Read the documentation. pivothigh() and pivotlow() can take two or three arguments.
pivothigh(source, leftbars, rightbars) → series[float]
pivothigh(leftbars, rightbars) → series[float]
You are passing four arguments.

Is it possible to vectorize this calculation in numpy?

Can the following expression of numpy arrays be vectorized for speed-up?
k_lin1x = [2*k_lin[i]*k_lin[i+1]/(k_lin[i]+k_lin[i+1]) for i in range(len(k_lin)-1)]
Is it possible to vectorize this calculation in numpy?
x1 = k_lin
x2 = k_lin
s = len(k_lin)-1
np.roll(x2, -1) #do this do bring the column one position right
result1 = x2[:s]+x1[:s] #your divider. You add everything but the last element
result2 = x2[:s]*x1[:s] #your upper part
# in one line
result = 2*x2[:s]*x1[:s] / (x2[:s]+x1[:s])
You last column wont be added or taken into the calculations and you can do this by simply using np.roll to shift the columns. x2[0] = x1[1], x2[1] = x1[2].
This is barely a demo of how you should approach google numpy roll. Also instead of using s on x2 you can simply drop the last column since it's useless for the calculations.

K-means initialization with further-first traversal and k-mean++

I am confused about k-mean++ initialization. I understand k-mean++ choose and furthest data point as next data center. But how about outlier? What is the different between `initialization with further-first traversal and k-mean++ ?
I saw someone explain in this way:
Here is a one-dimensional example. Our observations are [0, 1, 2, 3, 4]. Let the first center, c1, be 0. The probability that the next
cluster center, c2, is x is proportional to ||c1-x||^2. So, P(c2 = 1)
= 1a, P(c2 = 2) = 4a, P(c2 = 3) = 9a, P(c2 = 4) = 16a, where a = 1/(1+4+9+16).
Suppose c2=4. Then, P(c3 = 1) = 1a, P(c3 = 2) = 4a, P(c3 = 3) = 1a,
where a = 1/(1+4+1).
What is this array or list is [0,1,2,4,5,6,100]. Obviously, 100 is the outlier in this case and it will be chosen as the data center at some point. Can someone give a better explanation?
K-means chooses points with probability.
But yes, with extreme outliers it is likely to chose the outlier.
That is fine, because so will k-means. Most likely the best SSQ solution has a one-element cluster containing only that point.
If you have such data, the k-means solutions tend to be rather useless, and you probably should choose another algorithm such as DBSCAN instead.

show feature names after feature selection

I need to build a classifier for text, and now I'm using TfidfVectorizer and SelectKBest to selection the features, as following:
vectorizer = TfidfVectorizer(sublinear_tf = True, max_df = 0.5, stop_words = 'english',charset_error='strict')
X_train_features = vectorizer.fit_transform(data_train.data)
y_train_labels = data_train.target;
ch2 = SelectKBest(chi2, k = 1000)
X_train_features = ch2.fit_transform(X_train_features, y_train_labels)
I want to print out selected features name(text) after select k best features, is there any way to do that? I just need to print out selected feature names, maybe I should use CountVectorizer instead?
The following should work:
np.asarray(vectorizer.get_feature_names())[ch2.get_support()]
To expand on #ogrisel's answer, the returned list of features is in the same order when they've been vectorized. The code below will give you a list of top ranked features sorted according to their Chi-2 scores in descending order (along with the corresponding p-values):
top_ranked_features = sorted(enumerate(ch2.scores_),key=lambda x:x[1], reverse=True)[:1000]
top_ranked_features_indices = map(list,zip(*top_ranked_features))[0]
for feature_pvalue in zip(np.asarray(train_vectorizer.get_feature_names())[top_ranked_features_indices],ch2.pvalues_[top_ranked_features_indices]):
print feature_pvalue

Resources