GLM Poisson thinks I have negative values in my dataset, throws error - glm

I am trying to do a poisson GLM, and yet I continue to get this error
Poisson1 <- glm(Number.Flowers ~ Site, data = Flowering2, family="poisson")
Error in eval(family$initialize) :negative values not allowed for the 'Poisson' family
My data is count data and so is all positive values and zeros. What could be going on?
Is it possible for my CSV file to contain hidden negative values?

It's possible your CSV might be flawed in some way. Try a different method of importing it into R (fread, read.table, etc). Check for NA or NaN issues. Compare the number of rows.

Related

Problem with VSN: Error in vsnML(sv) : L-BFGS-B needs finite values of 'fn'

I am analyzing a proteomics dataset with 3 conditions and a total of 10 samples (columns) and 9650 proteins (rows). I want to apply VSN to my data table for normalization, but I got this error message: Error in vsnML(sv) : L-BFGS-B needs finite values of 'fn'
I was checking whether my data contains infinite values or if all values in one column are the same, but this is not the case.
Can someone help me and tell me why vsn cannot run with this specific dataset and what I can do? (This is like the 20th proteomics data set that I am trying to normalize for my thesis and never had this problem before)
norm_dt <- justvsn(as.matrix(dt))
Error that I get:
vsn2: 9650 x 10 matrix (1 stratum).
Error in vsnML(sv) : L-BFGS-B needs finite values of 'fn'
Some details about by session (I am building a complete workflow therefore I am not posting my whole sessionInfo() here)
R version 4.2.2
vsn version 3.66.0
If someone needs the data, I can send it. It it too big to write it here.

Sheets is doing weird things with my datasets when I attempt STDEV, STDEVA, STDEVP, STDEVPA

I've built a function in a Google Sheets that takes pages of data and passes them through to pull metrics from the datasets (which are based on pivot tables of extracted data). All data is pulled programatically, formatted the same way, and named by standardized conventions. I want the standard deviation of all values in the set that are above 100 and in most cases the function I've built works:
=STDEV((IF(INDIRECT($A7&"!B:B")>100,INDIRECT($A7&"!B:B"),""))
However with some smaller datasets STDEV starts throwing errors representative of not being passed enough arguments. I've tried debugging by pulling out pieces, eliminating thresholds, and trying other varieties of STDEV (STDEVA gives me a DIV/0 error, STDEVP and STDEVPA return 0 as the standard deviation) and when I pull out that IF statement it looks like it's returning FALSE as though there's no data in the set that fits the criteria. Except, when I lower the thresholds to >0 or eliminate the threshold entirely it still doesn't work and I know that there are 4+ values in all erroring datasets that are >100. In fact, the same call is summing to non-zero in the column right next to it. What's more, the function works everywhere else but these datasets.
What gives?
For extra info here's a viewable link to the sheet:
https://docs.google.com/spreadsheets/d/1b_456W9UlkuIc6W_FjmFwgycY1xAUkD_W9aMSxG0N6o/edit?usp=sharing
And this is the error the STDEV is throwing:
"Function STDEV parameter 1 expects number values. But '' is a text and cannot be coerced to a number."
Halp
use:
=IFERROR(AVERAGEIF(INDIRECT($A15 & "!B:B"), ">100"))
and:
=IFERROR(STDEV(IF(INDIRECT($A15&"!B:B")>100, INDIRECT($A15&"!B:B"), )))

Using a sequence in CreateML to record device motion

so I want to train a MLClassifier to identify a specific device motion.
So what I did was to record the motion data and very recorded data I labeled accordingly. When that didn't quite worked as I hoped, I started to realize that I have to record the "motion" itself and not only momentarily.
So I packed 5 dataSets (dictionaries) in a row and that was my new training feature. So I thought, but trying to train my new data I watched this error trying to create my Classifier:
Value encountered in column 's' is of type 'dictionary' cannot be
mapped to a categorical value. Categorical values must be integer,
strings, or None.
Now I'm slowly giving up... Has anyone of you a suggestion or know why I can't use sequences (arrays) as features?
...
Btw, here is some sample data of my JSON:
[{"s":[{"rZ":-1.0,"p":0.2,"aY":-0.0,"rX":1.5,"y":0.1,"r":-1.3,"aZ":0.2,"rY":-2.8,"aX":0.6},{"rZ":-1.9,"p":0.2,"aY":0.0,"rX":2.0,"y":0.2,"r":-1.4,"aZ":0.0,"rY":-3.2,"aX":0.5},{"rZ":-1.8,"p":0.3,"aY":0.0,"rX":2.4,"y":0.2,"r":-1.5,"aZ":0.9,"rY":-4.8,"aX":0.5},{"rZ":-1.6,"p":0.3,"aY":0.0,"rX":2.5,"y":0.3,"r":-1.6,"aZ":0.9,"rY":-3.8,"aX":0.6},{"rZ":-1.8,"p":0.3,"aY":0.1,"rX":2.2,"y":0.3,"r":-1.7,"aZ":0.1,"rY":-3.0,"aX":0.6}],"v":0}]
And the code I use to create my model:
do{
let a = try MLDataTable(contentsOf: dummyJSONurl)
let recognizer = try MLClassifier(trainingData: a, targetColumn: "v")
}catch let er{
er
}
You can't use sequences because MLClassifier isn't a classifier that can work on sequences. Perhaps Apple will add this in a future release but for now it appears that you'll have to use a more capable tool.

XGBoost prediction always returning the same value - why?

I'm using SageMaker's built in XGBoost algorithm with the following training and validation sets:
https://files.fm/u/pm7n8zcm
When running the prediction model that comes out of the training with the above datasets always produces the exact same result.
Is there something obvious in the training or validation datasets that could explain this behavior?
Here is an example code snippet where I'm setting the Hyperparameters:
{
{"max_depth", "1000"},
{"eta", "0.001"},
{"min_child_weight", "10"},
{"subsample", "0.7"},
{"silent", "0"},
{"objective", "reg:linear"},
{"num_round", "50"}
}
And here is the source code: https://github.com/paulfryer/continuous-training/blob/master/ContinuousTraining/StateMachine/Retrain.cs#L326
It's not clear to me what hyper parameters might need to be adjusted.
This screenshot shows that I'm getting a result with 8 indexes:
But when I add the 11th one, it fails. This leads me to believe that I have to train the model with zero indexes instead of removing them. So I'll try that next.
Update: retraining with zero values included doesn't seem to help. I'm still getting the same value every time. I noticed i can't send more than 10 values to the prediction endpoint or it will return an error: "Unable to evaluate payload provided". So at this point using the libsvm format has only added more problems.
You've got a few things wrong there.
using {"num_round", "50"} with such a small ETA {"eta", "0.001"} will give you nothing.
{"max_depth", "1000"} 1000 is insane! (default value is 6)
Suggesting:
{"max_depth", "6"},
{"eta", "0.05"},
{"min_child_weight", "3"},
{"subsample", "0.8"},
{"silent", "0"},
{"objective", "reg:linear"},
{"num_round", "200"}
Try this and report your output
As I was grouping time series, certain frequencies created gaps in data.
I solved this issue by filling all NaN's.

Fortran entries of array change seemingly at random

I have been working with a FORTRAN program. I have noticed seemingly random changes in a 1D matrix I'm working with. It is a matrix of 4000 integers. Values are added to the matrix one by one, starting with index 1 and iterating by 1 for each added value. The matrix does not get fully "filled", currently only 100 values are placed into the matrix. So one would expect that the first 100 entries of the matrix will be non-zero (all added values are non-zero) and the remaining 3900 entries will be 0. However, several of the entries of the matrix end up being large negative numbers, but I'm certain that no portion of my code touches these entries.
What could be causing this issue? I'm sorry but I can't post the code for you all to work with.
The code has several other large matrices, taking up a total of ~100 MB of space. Could this potentially be a memory issue?
Thanks!
You have to initialize your array, otherwise it will almost always contain garbage. This would do it:
array = 0.0e0 ! real array
or
array = 0.0e0 ! double precision
or
array = 0 ! integer
A "matrix" is two-dimensional; your array is one-dimensional.
Things do not change unless you ask them to change.
FORTRAN does not initialize variables other than (as I recall) in a labeled COMMON. As such, they are guaranteed to start out with garbage values. Try initializing your data with a DATA statement. If you have to initialize a labeled COMMON, you will have to supply a BLOCK DATA subprogram.

Resources