Compare two columns of Unequal Length in R using logical Operator - time-series

I am dealing with a big time series dataset, And I would like to compare two columns
So my first Column looks like
timeperiod timefortreatment
2014-08-01 00:00:00 102.81818
2014-08-01 01:00:00 12.34483
2014-08-01 02:00:00 35.67568
2014-08-01 03:00:00 125.57692
2014-08-01 04:00:00 97.56250
2014-08-01 05:00:00 36.66667
And the second column looks like
arrivaltime
2014-08-01 00:14:00
2014-08-01 00:22:00
2014-08-01 00:47:00
2014-08-01 01:07:00
2014-08-01 01:19:00
2014-08-01 01:53:00
Both are of unequal lengths with second being larger than first. I have to compare the first column with second to get a final one which looks like below. The logic for comparison is that if arrival time in the second column is less than the entry in first column (time being 1 hour here) it gets the value of time of treatment for that specific period
arrival timefortreatment
2014-08-01 00:14:00 102.81818
2014-08-01 00:22:00 102.81818
2014-08-01 00:47:00 102.81818
2014-08-01 01:07:00 12.34483
2014-08-01 01:19:00 12.34483
2014-08-01 01:53:00 12.34483
I have made a logic based on two for loops and it is taking forever for 50k + values:
for (i in 1:nrow(date))
{
for (j in 1:nrow(period))
{
if (date[i,1]>=period[j,])
{
z[i,]=t[j,]
j=j+1
}
}
i=i+1
}
I was wondering is there any other way in which this can be done. Any help in this regard will be highly appreciated. Editing my answer to accommodate for the cases with different time period.
timeperiod timefortreatment
2014-08-01 00:14:00 75
2014-08-01 00:19:00 143
2014-08-01 00:44:00 126
2014-08-01 01:04:00 125
2014-08-01 01:19:00 125
2014-08-01 01:49:00 122
For this case, output will be as shown below based on same logic i.e. (arrival>=time period)
arrival timefortreatment
2014-08-01 00:14:00 75
2014-08-01 00:22:00 143
2014-08-01 00:47:00 126
2014-08-01 01:07:00 125
2014-08-01 01:19:00 125
2014-08-01 01:53:00 122
Let me know if more details needed

Here is a solution, with only one for loop, faster solution exists.
df1 = data.frame(timeperiod = seq(as.POSIXct("2014-08-01 00:00:00"), as.POSIXct("2014-08-01 05:00:00"), by = "1 hour"),
timefortreatment = c(102.81818, 12.34483, 35.67568, 125.57692, 97.56250, 36.66667))
df2 = data.frame(arrivaltime = c(as.POSIXct("2014-08-01 00:14:00"), as.POSIXct("2014-08-01 00:22:00"), as.POSIXct("2014-08-01 00:47:00"), as.POSIXct("2014-08-01 01:07:00"), as.POSIXct("2014-08-01 01:19:00"), as.POSIXct("2014-08-01 01:53:00")))
library(stringr)
df2$time_min = as.POSIXct(paste0(str_sub(df2$arrivaltime, 1, 14), "00:00"))
for (i in 1:nrow(df2))
{
df2$timefortreatment[i] = df1$timefortreatment[df1$timeperiod == df2$time_min[i]]
}
EDIT
With no periodicity in timeperiod, you can use difftime function :
df1 = data.frame(timeperiod = c(as.POSIXct("2014-08-01 00:14:00"), as.POSIXct("2014-08-01 00:19:00"), as.POSIXct("2014-08-01 00:44:00"), as.POSIXct("2014-08-01 01:04:00"), as.POSIXct("2014-08-01 01:19:00"), as.POSIXct("2014-08-01 01:49:00")), timefortreatment = c(75, 143, 126, 125, 125, 122))
df2 = data.frame(arrivaltime = c(as.POSIXct("2014-08-01 00:14:00"), as.POSIXct("2014-08-01 00:22:00"), as.POSIXct("2014-08-01 00:47:00"), as.POSIXct("2014-08-01 01:07:00"), as.POSIXct("2014-08-01 01:19:00"), as.POSIXct("2014-08-01 01:53:00")))
for (i in 1:nrow(df2))
{
df2$timefortreatment[i] = df1$timefortreatment[which.min(abs(difftime(df2$arrivaltime[i], df1$timeperiod)))]
}
# APPLY solution
my_function = function(value)
{
output = df1$timefortreatment[which.min(abs(difftime(value, df1$timeperiod)))]
}
df2$timefortreatment = apply(df2, 1, my_function)
> df2
arrivaltime timefortreatment
1 2014-08-01 00:14:00 75
2 2014-08-01 00:22:00 143
3 2014-08-01 00:47:00 126
4 2014-08-01 01:07:00 125
5 2014-08-01 01:19:00 125
6 2014-08-01 01:53:00 122

Related

How to get the difference of weekdays?

I have this information:
Days Dec'15 Jan'16
---------------------
Sun 27
Mon 28
Tue 29
Wed 30
Thu 31
Fri 1
Sat 2
I have 1st Jan'16. So I have to get Fri and then the difference of days from the Sun. So, in this case, the difference should be 5. Because, before Friday there are 5 other days. So if I want to know it for 2nd Jan'16 it should be 6. And likewise.
How do I get it easy with date functions?
The following code may help you
extension Date {
func weekdayDiffence() -> Int {
return Calendar.current.dateComponents([Calendar.Component.weekday], from: self).weekday ?? 0
}
}
Example
let d = Date().weekdayDiffence()
print(d)

DataTables in mvc

I have a string, named s respectively. The string s looks like :
"HD-M-16-H-000001*1001*1 HD-M-16-H-000001*1001 HD-M-16-H 000001 15 JUL 201614 JUL 20170816ACHEONG SIEW FUNG 12345678I 22 DEC 1960SPO F-
HD-M-16-H-000001*1001*2 HD-M-16-H-000001*1001 HD-M-16-H 000001 15 JUL 201614 JUL 20170816ACHEONG SIEW FUNG 12345678I 22 DEC 1960SPO F-
HD-M-16-H-000001*1001*3 HD-M-16-H-000001*1001 HD-M-16-H 000001 15 JUL 201614 JUL 20170816ACHEONG SIEW FUNG 12345678I 22 DEC 1960SPO F- "
I split the string based on rows. The code is
string s = sr.ReadToEnd();
string[] Mem = s.Split('\r');
So, Now the string[] Mem looks like :
"HD-M-16-H-000001*1001*1 HD-M-16-H-000001*1001 HD-M-16-H 000001 15 JUL 201614 JUL 20170816ACHEONG SIEW FUNG 12345678I 22 DEC 1960SPO F- "
"HD-M-16-H-000001*1001*2 HD-M-16-H-000001*1001 HD-M-16-H 000001 15 JUL 201614 JUL 20170816ACHEONG SIEW FUNG 12345678I 22 DEC 1960SPO F- "
"HD-M-16-H-000001*1001*3 HD-M-16-H-000001*1001 HD-M-16-H 000001 15 JUL 201614 JUL 20170816ACHEONG SIEW FUNG 12345678I 22 DEC 1960SPO F- "
Now, actually i am looking to create a dataTable with 'n' number of columns to hold Each word of each string in Mem.
The below code that takes each line of string[] and holds it into string. Then i split the string
string firststr = Mem.First();
string secondstr = Mem[1];
string thirdstr = Mem[2];
string str = firststr.Substring(0, 30);
string[] m = str.Split('\r');
Actually i want to store all the three string[] to one string datatype with 3 strings and split each line of string by substr using loop(foreach or While). So how can i acheive this???
Any help appreciated. Thanks in advance !!!!

Get hours an minutesfrom NSDate [duplicate]

This question already has answers here:
Getting date from [NSDate date] off by a few hours
(3 answers)
Closed 6 years ago.
I saved a date in a sqlite database. Know I try to get the hours and the minutes. But the hours are shifted by 2.
print(calendar.timeZone)
while result.next() {
var hour = 0
var minute = 0
let calendar = NSCalendar.currentCalendar()
if #available(iOS 8.0, *) {
calendar.getHour(&hour, minute: &minute, second: nil, nanosecond: nil, fromDate: result.dateForColumn("time"))
print(result.dateForColumn("time"))
print("the hour is \(hour) and minute is \(minute)")
}
}
I get the following output:
Europe/Berlin (GMT+2) offset 7200 (Daylight)
2016-08-17 18:44:57 +0000
the hour is 20 and minute is 44
2016-08-18 18:44:57 +0000
the hour is 20 and minute is 44
2016-08-19 15:44:57 +0000
the hour is 17 and minute is 44
2016-08-18 16:44:57 +0000
the hour is 18 and minute is 44
2016-08-17 18:44:57 +0000
the hour is 20 and minute is 44
2016-08-18 18:44:57 +0000
the hour is 20 and minute is 44
2016-08-19 15:44:57 +0000
the hour is 17 and minute is 44
2016-08-18 16:44:57 +0000
the hour is 18 and minute is 44
The timezone is correct. I tryed two other solutions. But it is always the same problem.
The result.dateForColumn("time") is in UTC since you have +0000 whereas the second output is in another timezone (Europe/Berlin (GMT+2)), so the date is the same.

Is my gradient checking method correct and my gradient calculation wrong, or vice versa?

My network only achieve around 80%, but the reported best score is around 85% accuracy. I m using same input data and same initalization. I dont know whats wrong, so I try to check my gradients and implemented what is recommended for gradient checking: http://ufldl.stanford.edu/tutorial/supervised/DebuggingGradientChecking/
But i m not sure, if my implementation is correct:
public void gradientchecking(double[] theta){
System.out.println("Gradient Checking started");
//costfunction returns cost and gradients
IPair<Double, double[]> org = costfunction(theta);
double[] theta_pos = new double[theta.length];
double[] theta_neg = new double[theta.length];
for (int i = 0; i < theta.length; i++) {
theta_pos[i]= theta[i];
theta_neg[i]=theta[i];
}
double mu = 1e-5;
for (int k = 0; k < 20; k++) {
theta_pos[k] = theta_pos[k] + mu;
theta_neg[k] = theta_neg[k] - mu;
IPair<Double, double[]> pos = costfunction(theta_pos);
IPair<Double, double[]> neg = costfunction(theta_neg);
System.out.println("Org: "+org.getSecond()[k] +" check:"+ ((pos.getSecond()[k]-neg.getSecond()[k])/(2*mu)));
//System.out.println("Org: "+org.getSecond()[k] +"check:"+ ((pos.getSecond()[k]-neg.getSecond()[k])/(2*mu)));
theta_pos[k] = theta_pos[k] - mu;
theta_neg[k] = theta_neg[k] + mu;
}
}
}
I got the following result after a freshly initialized theta:
Gradient Checking started
Cost: 1.1287071297725055 | Wrong: 124 | start: Thu Jul 30 22:57:08 CEST 2015 |end: Thu Jul 30 22:57:18 CEST 2015
Cost: 1.128707130295382 | Wrong: 124 | start: Thu Jul 30 22:57:18 CEST 2015 |end: Thu Jul 30 22:57:28 CEST 2015
Cost: 1.1287071292496391 | Wrong: 124 | start: Thu Jul 30 22:57:28 CEST 2015 |end: Thu Jul 30 22:57:38 CEST 2015
Org: 5.2287135944026004E-5 check:1.0184607936733826E-4
Cost: 1.1287071299252593 | Wrong: 124 | start: Thu Jul 30 22:57:38 CEST 2015 |end: Thu Jul 30 22:57:47 CEST 2015
Cost: 1.1287071296197628 | Wrong: 124 | start: Thu Jul 30 22:57:47 CEST 2015 |end: Thu Jul 30 22:57:56 CEST 2015
Org: 1.5274823511207024E-5 check:1.141254586229615E-4
Cost: 1.1287071299063134 | Wrong: 124 | start: Thu Jul 30 22:57:56 CEST 2015 |end: Thu Jul 30 22:58:05 CEST 2015
Cost: 1.1287071296387077 | Wrong: 124 | start: Thu Jul 30 22:58:05 CEST 2015 |end: Thu Jul 30 22:58:14 CEST 2015
Org: 1.3380293717695182E-5 check:1.0008639478696018E-4
Cost: 1.1287071297943114 | Wrong: 124 | start: Thu Jul 30 22:58:14 CEST 2015 |end: Thu Jul 30 22:58:23 CEST 2015
Cost: 1.1287071297507094 | Wrong: 124 | start: Thu Jul 30 22:58:23 CEST 2015 |end: Thu Jul 30 22:58:32 CEST 2015
Org: 2.1800899147740388E-6 check:9.980780136716263E-5
that indicates that my gradient calculation has an error, or the gradientchecking() method. I m not sure, can somebody help me?
In Java arrays are reference types.
int[] arr = { 8,7,6,5,4,3,2,1,8};
int[] b = arr;
b [0] = -10;
for (int i:arr) {
System.out.print (' ');
System.out.print (i);
}
outputs -10 7 6 5 4 3 2 1 8
So i mean that you incorrectly creating arrays
double[] theta_pos = theta;
double[] theta_neg = theta;
they are just references to theta, and by changing their contents you change theta also, +mu-mu = 0. Use clone() methods while copying array.
double[] theta_pos = theta.clone();
double[] theta_neg = theta.clone();
But remember that clone may not work as you expecting in some cases, with simple(non-reference) types it works ideal. Look at this
Does calling clone() on an array also clone its contents?

Script to parse a large text file, from rows to columns

I want a parsing script/ code solution in any programing language.
The file is too large to be open in Excel.
Problem:
I have large text file (300MB) which look like this:
[0] 2014 Jul 23 08:15:16.675
Current SFN = 604
Current Subframe Number = 3
Is Restricted = false
Cell Timing[0] = 298955
[1] 2014 Jul 24 08:15:16.675
Current SFN = 605
Current Subframe Number = 4
Is Restricted = false
Cell Timing[0] = 298900
[2] 2014 Jul 25 08:15:16.675
Current SFN = 700
Current Subframe Number = 7
Is Restricted = false
Cell Timing[0] = 39025
Wanted output:
5 columns ( Date , Current SFN , Current Subframe Number , Is Restricted , Cell Timing[0] )
Date Current SFN Current Subframe Number Is Restricted Cell Timing[0]
2014 Jul 23 08:15:16.675 604 3 TRUE 298955
2014 Jul 24 08:15:16.675 605 4 FALSE 298900
2014 Jul 25 08:15:16.675 700 7 FALSE 39025

Resources