Invalid order in Rapidsai cuml ARIMA - time-series

I'm trying to find right parameters for ARIMA but not able to use parameters higher than 4. Here is the code.
from cuml.tsa.arima import ARIMA
p = 5
q = 0
P = 1
Q = 0
model = ARIMA(train, order=(p,0,q), seasonal_order=(P,0,Q,24), simple_differencing= False)
model.fit()
forecast_df = model.forecast(10)
forecast_df
Error message
ValueError: ERROR: Invalid order. Required: p,q,P,Q <= 4
Is there any way to use parameters higher than 4. I have used higher parameters with statsmodel library but as my data is large I need GPU support provided by this library.

I am the main contributor to this model.
Unfortunately, it is currently impossible to use values greater than 4 for these parameters due to implementation reasons.
I see that you have opened a GitHub issue, thanks for that. We will consider adding support for higher parameter values and keep you updated on the GitHub issue.

Related

LDA: Coherence Values using u_mass v c_v

I am currently attempting to record and graph coherence scores for various topic number values in order to determine the number of topics that would be best for my corpus. After several trials using u_mass, the data proved to be inconclusive since the scores don't plateau around a specific topic number. I'm aware that CV ranges from -14 to 14 when using u_mass, however my values range from -2 to -1 and selecting an accurate topic number is not possible. Due to these issues, I attempted to use c_v instead of u_mass but I receive the following error:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
This is my code for computing the coherence value
cm = CoherenceModel(model=ldamodel, texts=texts, dictionary=dictionary,coherence='c_v')
print("THIS IS THE COHERENCE VALUE ")
coherence = cm.get_coherence()
print(coherence)
If anyone could provide assistance in resolving my issues for either c_v or u_mass, it would be greatly appreciated! Thank you!

Find the importance of each column to the model

I have a ML.net project and as of right now everything has gone great. I have a motor that collects a power reading 256 times around each rotation and I push that into a model. Right now it determines the state of the motor nearly perfectly. The motor itself only has room for 38 values on it at a time so I have been spending several rotations to collect the full 256 samples for my training data.
I would like to cut the sample size down to 38 so every rotation I can determine its state. If I just evenly space the samples down to 38 my model degrades by a lot. I know I am not feeding the model the features it thinks are most important but just making a guess and randomly selecting data for the model.
Is there a way I can see the importance of each value in the array during the training process? I was thinking I could use IDataView for this and I found the below statement about it (link).
Standard ML schema: The IDataView system does not define, nor prescribe, standard ML schema representation. For example, it does not dictate representation of nor distinction between different semantic interpretations of columns, such as label, feature, score, weight, etc. However, the column metadata support, together with conventions, may be used to represent such interpretations.
Does this mean I can print out such things as weight for each column and how would I do that?
I have actually only been working with ML.net for a couple weeks now so I apologize if the question is naive, I assure you I have googled this as many ways as I can think to. Any advice would be appreciated. Thanks in advance.
EDIT:
Thank you for the answer I was going down a completely useless path. I have been trying to get it to work following the example you linked to. I have 260 columns with numbers and one column with the conditions as one of five text strings. This is the condition I am trying to predict.
The first time I tried it threw an error "expecting single but got string". No problem I used .Append(mlContext.Transforms.Conversion.MapValueToKey("Label", "Label")) to convert to key values and it threw the error expected Single, got Key UInt32. any ideas on how to push that into this function?
At any rate thank you for the reply but I guess my upvotes don't count yet sorry. hopefully I can upvote it later or someone else here can upvote it. Below is the code example.
//Create MLContext
MLContext mlContext = new MLContext();
//Load Data
IDataView data = mlContext.Data.LoadFromTextFile<ModelInput>(TRAIN_DATA_FILEPATH, separatorChar: ',', hasHeader: true);
// 1. Get the column name of input features.
string[] featureColumnNames =
data.Schema
.Select(column => column.Name)
.Where(columnName => columnName != "Label").ToArray();
// 2. Define estimator with data pre-processing steps
IEstimator<ITransformer> dataPrepEstimator =
mlContext.Transforms.Concatenate("Features", featureColumnNames)
.Append(mlContext.Transforms.NormalizeMinMax("Features"))
.Append(mlContext.Transforms.Conversion.MapValueToKey("Label", "Label"));
// 3. Create transformer using the data pre-processing estimator
ITransformer dataPrepTransformer = dataPrepEstimator.Fit(data);//error here
// 4. Pre-process the training data
IDataView preprocessedTrainData = dataPrepTransformer.Transform(data);
// 5. Define Stochastic Dual Coordinate Ascent machine learning estimator
var sdcaEstimator = mlContext.Regression.Trainers.Sdca();
// 6. Train machine learning model
var sdcaModel = sdcaEstimator.Fit(preprocessedTrainData);
ImmutableArray<RegressionMetricsStatistics> permutationFeatureImportance =
mlContext
.Regression
.PermutationFeatureImportance(sdcaModel, preprocessedTrainData, permutationCount: 3);
// Order features by importance
var featureImportanceMetrics =
permutationFeatureImportance
.Select((metric, index) => new { index, metric.RSquared })
.OrderByDescending(myFeatures => Math.Abs(myFeatures.RSquared.Mean));
Console.WriteLine("Feature\tPFI");
foreach (var feature in featureImportanceMetrics)
{
Console.WriteLine($"{featureColumnNames[feature.index],-20}|\t{feature.RSquared.Mean:F6}");
}
I believe what you are looking for is called Permutation Feature Importance. This will tell you which features are most important by changing each feature in isolation, and then measuring how much that change affected the model's performance metrics. You can use this to see which features are the most important to the model.
Interpret model predictions using Permutation Feature Importance is the doc that describes how to use this API in ML.NET.
You can also use an open-source set of packages, they are much more sophisticated than what is found in ML.NET. I have an example on my GitHub how-to use R with advanced explainer packages to explain ML.NET models. You can get local instance as well as global model breakdown/details/diagnostics/feature interactions etc.
https://github.com/bartczernicki/BaseballHOFPredictionWithMlrAndDALEX

How to fetch values using Permutation Feature Importance

I have a dataset with 5K (and 60 features) records focused on binary classification.
Please note that this solution doesn't work here
I am trying to generate feature importance using Permutation Feature Importance. However, I get the below error. Can you please look at my code and let me know whether I am making any mistake?
import eli5
from eli5.sklearn import PermutationImportance
logreg =LogisticRegression()
model = logreg.fit(X_train_std, y_train)
perm = PermutationImportance(model, random_state=1)
eli5.show_weights(perm, feature_names = X.columns.tolist())
I get an error like as shown below
AttributeError: 'PermutationImportance' object has no attribute 'feature_importances_'
Can you help me resolve this error?
If you look at your attributes of PermutationImportance object via
ord(perm)
you can see all attributes and methods BUT after you fit your PI object, meaning that you need to do:
perm = PermutationImportance(model, random_state=1).fit(X_train,y)

Can I get a solution using "timeout" when using Optimize.minimize()?

I'm trying to minimize a variable, but z3 takes to long in order to give me a solution.
And I would like to know if it's possible to get a solution when timeout gets triggered.
If yes how can i do that?
Thx in advance!
If by "solution" you mean the latest approximation of the optimal value, then you may be able to retrieve it, provided that the optimization algorithm being used finds any intermediate solution along the way. (Some optimization algorithms --like, e.g., maxres-- don't find any intermediate solution).
Example:
import z3
o = z3.Optimize()
o.add(...very hard problem...)
cf = z3.Int('cf')
o.add(cf = ...)
obj = o.minimize(cf)
o.set(timeout=...)
res = o.check()
print(res)
print(obj.upper())
Even when res = unknown because of a timeout, the objective instance contains the latest approximation of the optimum value found by z3 before the timeout.
Unfortunately, I am not sure whether it is also possible to retrieve the corresponding sub-optimal model with o.model() (or any other method).
For OptiMathSAT, I show how to retrieve the latest approximation of the optimum value and the corresponding model in the unit-test timeout.py.

"Contrast Error" message with lsmeans Tukey Test on GLM

I have defined a generalised linear model as follows:
glm(formula = ParticleCount ~ ParticlePresent + AlgaePresent +
ParticleTypeSize + ParticlePresent:ParticleTypeSize + AlgaePresent:ParticleTypeSize,
family = poisson(link = "log"), data = PCB)
and I have the below significant interactions
Df Deviance AIC LRT Pr(>Chi)
<none> 666.94 1013.8
ParticlePresent:ParticleTypeSize 6 680.59 1015.4 13.649 0.033818 *
AlgaePresent:ParticleTypeSize 6 687.26 1022.1 20.320 0.002428 **
I am trying to proceed with a posthoc test (Tukey) to compare the interaction of ParticleTypeSize using the lsmeans package. However, I get the following message as soon as I proceed:
library(lsmeans)
leastsquare=lsmeans(glm.particle3,~ParticleTypeSize,adjust ="tukey")
Error in `contrasts<-`(`*tmp*`, value = contrasts.arg[[nn]]) :
contrasts apply only to factors
I've checked whether ParticleTypeSize is a valid factor by applying:
l<-sapply(PCB,function(x)is.factor(x))
l
Sample AlgaePresent ParticlePresent ParticleTypeSize
TRUE FALSE FALSE TRUE
ParticleCount
FALSE
I'm stumped and unsure as to how I can rectify this error message. Any help would be much appreciated!
That error happens when the variable you specify is not a factor. You tested and found that it is, so that's a mystery and all I can guess is that the data changed since you fit the model. So try re-fitting the model with the present dataset.
All that said, I question what you are trying to do. First, you have ParticleTypeSize interacting with two other predictors, which means it is probably not advisable to look at marginal means (lsmeans) for that factor. The fact that there are interactions means that the pattern of those means changes depending on the values of the other variables.
Second, are AlgaePresent and ParticlePresent really numeric variables? By their names, they seem like they ought to be factors. If they are really indicators (0 and 1), that's OK, but it is still cleaner to code them as factors if you are using functions like lsmeans where factors and covariates are treated in distinctly different ways.
BTW, the lsmeans package is being deprecated, and new developments are occurring in its successor, the emmeans package.

Resources