Time series in R: How to convert raw data from int type to time type - time-series

For subject 1 in the training data, I am trying to plot nine time series corresponding to nine different features. The data is supposed to be a time series but R is not reading it as such. The first two columns, as you can see, are not time but the rest should be. How do I do this in R or Rmarkdown (I think they should be the same)?
I tried plotting it:
ggplot(train_ds, aes(x = Activity, y = TimeBodyAccelerometer-mean-X) +
theme_minimal() +
geom_point()
)
but I get this error:
Error in ggplot():
! mapping should be created with aes().
✖ You've supplied a object
Backtrace:
ggplot2::ggplot(...)
ggplot2:::ggplot.default(...)
Error in ggplot(train_ds, aes(x = Activity, y = TimeBodyAccelerometer - :
✖ You've supplied a object

Related

Getting error " no method or default for coercing “patchwork” to “dgCMatrix” in scRNA analysis, using seurat, normalization step

I have a scRNA dataset with 10 healthy controls and 17 patients. I am doing the comparative analysis. I did the following:
Created 10 seurat objects for 10 healthy controls and merged them to create one (healthy)
Created 17 seurat objects for 17 patients and merged them to create one (patients)
Created a list of the two objects: data <- list (healthy, patients)
Normalize the data:
data <- lapply(data, function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
I am getting the following error:
Error in as(object = data, Class = "dgCMatrix") : no method or default for coercing “patchwork” to “dgCMatrix”
Please help
After some trial and error I was able to reproduce your same error running this line of code before your lapply function:
data <- list(p1 + p2 , p2)
Where p1 and p2 are ggplot objects.
It looks to me that in your data list you don't have Seurat objects.
You should check for any mistakes in the code that you have used to generate your list of seurat objects.
I hope this helps :)

Naive Bayes - no samples for class label 1

I am using accord.net. I have successfully implemented the two Decision tree algorithms ID3 and C4.5, now I am trying to implement the Naive Bays algorithm. While there is a lot of sample code on the site, most of it seems to be out of date, or have various issues.
The best sample code I have found on the site so far has been here:
http://accord-framework.net/docs/html/T_Accord_MachineLearning_Bayes_NaiveBayes_1.htm
However, when I try and run that code against my data I get:
There are no samples for class label 1. Please make sure that class
labels are contiguous and there is at least one training sample for
each label.
from line 228 of this file:
https://github.com/accord-net/framework/blob/master/Sources/Accord.MachineLearning/Tools.cs
when I call
learner.learn(inputs, outputs) in my code.
I have already run into the Null bugs that accord has when implementing the other two regression trees, and my data has been sanitized against that issue.
Does any accord.net expert have an idea what would trigger this error?
An excerpt from my code:
var codebook = new Codification(fulldata, AllAttributeNames);
/*
* Get list of all possible combinations
* Status software blows up if it encounters a value it has not seen before.
*/
var attributList = new List<IUnivariateFittableDistribution>();
foreach (var attr in DeciAttributeNames)
{
{
/*
* By default we'll use a standard static list of values for this column
*/
var cntLst = codebook[attr].NumberOfSymbols;
// no decisions can be made off of the variable if it is a constant value
if (cntLst > 1)
{
KeptAttributeNames.Add(attr);
attributList.Add(new GeneralDiscreteDistribution(cntLst));
}
}
}
var data = fulldata.Copy(); // this is a datatable
/*
* Translate our training data into integer symbols using our codebook
*/
DataTable symbols = codebook.Apply(data, AllAttributeNames);
double[][] inputs = symbols.ToJagged<double>(KeptAttributeNames.ToArray());
int[] outputs = symbols.ToArray<int>(OutAttributeName);
progBar.PerformStep();
/*
* Create a new instance of the learning algorithm
* and build the algorithm
*/
var learner = new NaiveBayesLearning<IUnivariateFittableDistribution>()
{
// Tell the learner how to initialize the distributions
Distribution = (classIndex, variableIndex) => attributList[variableIndex]
};
var alg = learner.Learn(inputs, outputs);
EDIT: After further experimentation, it seems as though this error only occurs when I am processing a certain number of rows. If I process 60 rows or less than I am fine, if I process 500 rows or more then I am fine. But in between that range I throw this error. Depending on the amount of data I choose, the index number in the error message can change, I have seen it range from 0 to 2.
All the data is coming from the same sql server datasource, the only thing I am adjusting is the Select Top ### portion of the query.
You will receive this error in multi-class scenarios when you have defined a label that does not have any sample data. With a small data set your random sampling may by chance exclude all observations with a given label.

How to randomly get a value from a table [duplicate]

I am working on programming a Markov chain in Lua, and one element of this requires me to uniformly generate random numbers. Here is a simplified example to illustrate my question:
example = function(x)
local r = math.random(1,10)
print(r)
return x[r]
end
exampleArray = {"a","b","c","d","e","f","g","h","i","j"}
print(example(exampleArray))
My issue is that when I re-run this program multiple times (mash F5) the exact same random number is generated resulting in the example function selecting the exact same array element. However, if I include many calls to the example function within the single program by repeating the print line at the end many times I get suitable random results.
This is not my intention as a proper Markov pseudo-random text generator should be able to run the same program with the same inputs multiple times and output different pseudo-random text every time. I have tried resetting the seed using math.randomseed(os.time()) and this makes it so the random number distribution is no longer uniform. My goal is to be able to re-run the above program and receive a randomly selected number every time.
You need to run math.randomseed() once before using math.random(), like this:
math.randomseed(os.time())
From your comment that you saw the first number is still the same. This is caused by the implementation of random generator in some platforms.
The solution is to pop some random numbers before using them for real:
math.randomseed(os.time())
math.random(); math.random(); math.random()
Note that the standard C library random() is usually not so uniformly random, a better solution is to use a better random generator if your platform provides one.
Reference: Lua Math Library
Standard C random numbers generator used in Lua isn't guananteed to be good for simulation. The words "Markov chain" suggest that you may need a better one. Here's a generator widely used for Monte-Carlo calculations:
local A1, A2 = 727595, 798405 -- 5^17=D20*A1+A2
local D20, D40 = 1048576, 1099511627776 -- 2^20, 2^40
local X1, X2 = 0, 1
function rand()
local U = X2*A2
local V = (X1*A2 + X2*A1) % D20
V = (V*D20 + U) % D40
X1 = math.floor(V/D20)
X2 = V - X1*D20
return V/D40
end
It generates a number between 0 and 1, so r = math.floor(rand()*10) + 1 would go into your example.
(That's multiplicative random number generator with period 2^38, multiplier 5^17 and modulo 2^40, original Pascal code by http://osmf.sscc.ru/~smp/)
math.randomseed(os.clock()*100000000000)
for i=1,3 do
math.random(10000, 65000)
end
Always results in new random numbers. Changing the seed value will ensure randomness. Don't follow os.time() because it is the epoch time and changes after one second but os.clock() won't have the same value at any close instance.
There's the Luaossl library solution: (https://github.com/wahern/luaossl)
local rand = require "openssl.rand"
local randominteger
if rand.ready() then -- rand has been properly seeded
-- Returns a cryptographically strong uniform random integer in the interval [0, n−1].
randominteger = rand.uniform(99) + 1 -- randomizes an integer from range 1 to 100
end
http://25thandclement.com/~william/projects/luaossl.pdf

Parsing numeric data from text file using Python

I am attempting build a database from a numeric model output text file. The text file has four (4) rows of title block data followed by many rows (41,149) of data blocks which are each seperated by the word 'INTERNAL' followed by some numeric data as shown below:
Line1: Title block
Line2: Title block
Line3: Title block
Line4: Title block
Line5: INTERNAL 1.0 (10E16.9) -1
Line6: data data data data
Line7: data data data data
Line8 to Line25: data data data data
Line26: data data data data
Line27: INTERNAL 1.0 (10E16.9) -1
Line28: data data data data
..etc all the way down to line 41,149
The data blocks are not of consistent size (i.e., some have more rows of data than others). Thanks to a lot of help from this site, I have been able to take the 41,149 rows of data and organize each data block into seperate lists that I can parse through and build the database from. My problem is that this operation takes a very long time run. I was hoping someone could look at the code I have below and give me suggestions on how I might be able to run it more efficiently. I can attach the model output file if needed. Thanks!
inFile = 'CONFINED_AQIFER.DIS'
strings = ['INTERNAL']
rowList = []
#Create a list of each row number where a data block begins
with open(inFile) as myFile:
for num, line in enumerate(myFile, 1):
if any(s in line for s in strings):
rowList.append(num)
#Function to get line data from row number
def getlineno(filename, lineno):
if lineno < 1:
raise TypeError("First line is line 1")
f = open(filename)
lines_read = 0
while 1:
lines = f.readlines(100000)
if not lines:
return None
if lines_read + len(lines) >= lineno:
return lines[lineno-lines_read-1]
lines_read += len(lines)
#Organize each data block into a unique list and append to a final list (fList)
fList = []
for row in range(len(rowList[1:])):
combinedList = []
i = rowList[row]
data = []
while i < rowList[row+1]:
line = getlineno(inFile, i)
data.append(line.split())
i+=1
for d in range(len(data))[1:]:
for x in data[d]:
combinedList.append(x)
fList.append(combinedList)
Some comments:
In Python2, xrange is always better than range. Range builds the entire list while xrange just returns an iterator.
Use more list comprehensions: change
for x in data[d]:
combinedList.append(x)
to
combinedList.extend([x for x in data[d]])
see if you can extrapolate these techniques to more of your code.
In general you don't want to allocating memory (making new lists) inside of for loops.

Reading Pixel Value Method?

I'm having a problem regarding reading the pixel values (w=30, h=10). Suppose I'm using
int readValue = cvGetReal2D(img,y,x); and
int readValue = data[y*step+x];
Lets say I am trying to access pixel values at w=35, h=5 using the (1) and (2) method.
The (1) will output an error of index out of range. But why (2) does not output an error of index out of range?
After that, I'm trying to use try...catch()...
You have a continuous block of memory of
size = w*h = 300
At w = 35 and h = 5 your equation gives
data[5*30+35] = data[190] < data[300]
so there is no error. If this is c++ then even if your index in data was larger than 299 it wouldn't throw an error. In that case you would be accessing the data beyond its bounds which results in undefined behavior.
I assume cvGetReal2D(img,y,x) is smart enough to tell you that one of your indices is larger than the defined size of that dimension even though it could be resolved to a valid address.

Resources