I've a dataframe with daily items selling: the goal is forecasting on future selling for a good warehouse supply. I'm using XGBoost as Regressor.
date
qta
prezzo
year
day
dayofyear
month
week
dayofweek
festivo
2014-01-02 00:00:00
6484.8
1
2014
2
2
1
1
3
1
2014-01-03 00:00:00
5300
1
2014
3
3
1
1
4
1
2014-01-04 00:00:00
2614.9
1.1
2014
4
4
1
1
5
1
2014-01-07 00:00:00
114.3
1.1
2014
7
7
1
2
1
0
2014-01-09 00:00:00
11490
1
2014
9
9
1
2
3
0
The date is also the index of my dataframe. Qta is the label (the dependent variable) and all the others are the features.
As you can see it's a daily sampling but some days are missing (i.e. 5,6,8).
Could it be a problem during fitting and prediction of future days?
Am i supposed to fill the missing days with qta = 0?
Related
I'm having trouble filtering a column by month/year and counting the unique values. I started trying with ARRAYFORMULA, then with QUERY, but without success.
A
B
C
D
E
F
G
Date
Start Time
End Time
Duration
Month
Worked Days
Total Duration
01/06/2022
05:06
08:56
3h50min
06/2022
9 days
31h47min
02/06/2022
05:08
08:43
3h35min
07/2022
5 days
24h36min
02/06/2022
15:25
16:57
1h32min
03/06/2022
05:13
08:24
3h11min
04/06/2022
05:11
09:24
4h13min
06/06/2022
13:05
14:36
1h31min
07/06/2022
05:20
08:27
3h07min
08/06/2022
05:08
08:52
3h44min
09/06/2022
05:09
09:17
4h08min
10/06/2022
05:11
08:07
2h56min
01/07/2022
05:10
09:43
4h33min
02/07/2022
05:23
07:43
2h20min
04/07/2022
05:08
07:41
2h33min
04/07/2022
20:57
21:59
1h02min
05/07/2022
05:13
09:54
4h41min
06/07/2022
05:10
09:38
4h28min
06/07/2022
15:11
18:05
2h54min
06/07/2022
20:00
22:05
2h05min
Columns from A to D is what I have. Columns from E to G is what I expect.
One of the problems is that sometimes we have the day being repeated.
try:
=ARRAYFORMULA(QUERY({TEXT(A3:A; "mm/e")\
IF(COUNTIFS(A3:A; A3:A; ROW(A3:A); "<="&ROW(A3:A))=1; 1; 0)\ C3:C-B3:B};
"select Col1,sum(Col2),sum(Col3) where Col3>0
group by Col1 label sum(Col2)'',sum(Col3)''
format sum(Col3)'[h]\hmm\min'"))
I have data which can be used to find the amount of snowfall in a particular month.
MONTH SNOWFALL INDEX
Jan 0.25
Feb 0.1
Mar 0.6
Apr 0.99
May 0.2
Jun 0.2
Jul 0.01
Aug 0.09
Sep 1.0
Oct 0.5
Nov 0.8
Dec 0.39
To calculate how much snow falls in each month, I have the following formula:
snowfall_amount = (130 - snowfall_index) / 90
I want to write a formula which adds up the amount of snowfall between the months of march and april. Normally, I would create a third column and make the formula:
=130 - $B2 / 90
and then drag that formula down. Then my solution would be:
=SUM($C5:$C6)
However here I am looking for a one-cell solution. Intuitively it seems like this is the job for a Summation but I don't see any way to do that through formulas.
Try
=ArrayFormula(sum((130-index(B2:B,match(C2,A2:A,0)):index(B2:B,match(D2,A2:A,0)))/90))
I'm scratching my head trying to work with time functions within Cognos 10.2.1 (Report Studio), using an Informix db as a data source.
My time field is stored as a smallint, 4 digits, representing the 24 hour clock. I am trying to get the time to display as 6:00pm, 11:30am, 3:00pm, etc. I have a separate data expression that calculates the string 'AM' or 'PM' depending on the hour value, but I'm running into some errors when doing the overall concat/substring function.
case when char_length([Query1].[beg_tm]) = 4
then (substring(cast([StartTime], char(5)), 1, 2)) || ':' || (substring (cast ([StartTime], char(5)), 3, 2)) || ([beg_AMPMcalc])
when char_length([Query1].[beg_tm]) = 3
then (substring(cast([StartTime], char(5)), 1, 1)) || ':' || (substring(cast ([StartTime], char(5)), 3, 2)) || ([beg_AMPMcalc])
else '--'
end
Why not use DATETIME HOUR TO MINUTE; at least you then only have to deal with converting 24 hour clock to 12 hour clock. Is midnight stored as 0 and noon as 1200, and the minute before midnight as 2359? Cognos uses a fairly modern version of Informix, I believe, so you should be able to use the TO_CHAR function:
DROP TABLE IF EXISTS times;
CREATE TEMP TABLE times(p_time SMALLINT);
INSERT INTO times VALUES(0);
INSERT INTO times VALUES(59);
INSERT INTO times VALUES(100);
INSERT INTO times VALUES(845);
INSERT INTO times VALUES(1159);
INSERT INTO times VALUES(1200);
INSERT INTO times VALUES(1259);
INSERT INTO times VALUES(1300);
INSERT INTO times VALUES(1815);
INSERT INTO times VALUES(2359);
SELECT TO_CHAR(CURRENT HOUR TO MINUTE, "%I:%M %p"),
p_time,
DATETIME(00:00) HOUR TO MINUTE + MOD(p_time, 100) UNITS MINUTE + (p_time/100) UNITS HOUR,
TO_CHAR(DATETIME(00:00) HOUR TO MINUTE + MOD(p_time, 100) UNITS MINUTE + (p_time/100) UNITS HOUR, "%I:%M %p")
FROM times;
Output:
03:49 AM 0 00:00 12:00 AM
03:49 AM 59 00:59 12:59 AM
03:49 AM 100 01:00 01:00 AM
03:49 AM 845 08:45 08:45 AM
03:49 AM 1159 11:59 11:59 AM
03:49 AM 1200 12:00 12:00 PM
03:49 AM 1259 12:59 12:59 PM
03:49 AM 1300 13:00 01:00 PM
03:49 AM 1815 18:15 06:15 PM
03:49 AM 2359 23:59 11:59 PM
I'm using a database server that has its local time set to UTC, and I'm in time zone -07:00 (US/Pacific); the current time isn't the middle of the night where I am.
I want to schedule Jenkins to run a certain job at 8:00 am every Monday, Wednesday Thursday and Friday and 8:00 am every other Tuesday.
Right now, the best I can think of is:
# 8am every Monday, Wednesday, Thursday, and Friday:
0 8 * * 1,3-5
# 8am on specific desired Tuesdays, one line per month:
0 8 13,27 3 2
0 8 10,24 4 2
0 8 8,22 5 2
0 8 5,19 6 2
0 8 3,17,31 7 2
0 8 14,28 8 2
0 8 11,25 9 2
0 8 9,23 10 2
0 8 6,20 11 2
0 8 4,18 12 2
which is is fine (if ugly) for the remainder of 2012, but it almost certainly won't do what I want in 2013.
Is there a more concise way to do this, or one that's year-independant?
This is something that comes up quite often, see e.g. this document, this forum thread or this stackoverflow question.
The answer is basically no. What I would do in your situtation is to run the job every Tuesday and have the first build step check whether to actually run by e.g. checking whether a file exists and only running if it doesn't. If it exists, it would be deleted so that the job can run the next time this check occurs. You would of course also have to check whether it's Tuesday.
I got you fam: crontab.guru
10 22 1-7,14-21,28-31 * 6
If you abandon every other Tuesday, and can be satisfied with the first and third Tuesdays a month, the following should work:
0 9 1-7 * 2
0 9 15-21 * 2
You're running every day from 1-7, but only on Tuesday, and every day from 15-21, again only on Tuesday. A Tuesday will occur only once in each of those intervals.
Yes, it's not strictly every other week, as a 5-Tuesday month will throw off your cadence, but here you have a predictable job schedule that doesn't need to be adjusted in Jenkins as time goes on.
I use Excel to generate the cron expressions. The following formulas generate every other Monday at 8:00 AM starting from Oct 22.
A B C D
1 41204 =MONTH(A1) =DAY(A1) =CONCATENATE("0 8 ", C1, " ", B1, " 1")
2 =A1+14 =MONTH(A2) =DAY(A2) =CONCATENATE("0 8 ", C2, " ", B2, " 1")
This generates
A B C D
1 22-Oct 10 22 0 8 22 10 1
2 5-Nov 11 5 0 8 5 11 1
Just auto fill Row 2 to get additional days. I'm not sure how many separate expressions you can give to Jenkins. I know it works with 26 expressions.
I need to calculate the number of business days between two dates. How can I pull that off using Ruby (or Rails...if there are Rails-specific helpers).
Likewise, I'd like to be able to add business days to a given date.
So if a date fell on a Thursday and I added 3 business days, it would return the next Tuesday.
Take a look at business_time. It can be used for both the things you're asking.
Calculating business days between two dates:
wednesday = Date.parse("October 17, 2018")
monday = Date.parse("October 22, 2018")
wednesday.business_days_until(monday) # => 3
Adding business days to a given date:
4.business_days.from_now
8.business_days.after(some_date)
Historical answer
When this question was originally asked, business_time didn't provide the business_days_until method so the method below was provided to answer the first part of the question.
This could still be useful to someone who didn't need any of the other functionality from business_time and wanted to avoid adding an additional dependency.
def business_days_between(date1, date2)
business_days = 0
date = date2
while date > date1
business_days = business_days + 1 unless date.saturday? or date.sunday?
date = date - 1.day
end
business_days
end
This can also be fine tuned to handle the cases that Tipx mentions in the way that you would like.
We used to use the algorithm suggested in the mikej's answer and discovered that calculating 25,000 ranges of several years each takes 340 seconds.
Here's another algorithm with asymptotic complexity O(1). It does the same calculations in 0.41 seconds.
# Calculates the number of business days in range (start_date, end_date]
#
# #param start_date [Date]
# #param end_date [Date]
#
# #return [Fixnum]
def business_days_between(start_date, end_date)
days_between = (end_date - start_date).to_i
return 0 unless days_between > 0
# Assuming we need to calculate days from 9th to 25th, 10-23 are covered
# by whole weeks, and 24-25 are extra days.
#
# Su Mo Tu We Th Fr Sa # Su Mo Tu We Th Fr Sa
# 1 2 3 4 5 # 1 2 3 4 5
# 6 7 8 9 10 11 12 # 6 7 8 9 ww ww ww
# 13 14 15 16 17 18 19 # ww ww ww ww ww ww ww
# 20 21 22 23 24 25 26 # ww ww ww ww ed ed 26
# 27 28 29 30 31 # 27 28 29 30 31
whole_weeks, extra_days = days_between.divmod(7)
unless extra_days.zero?
# Extra days start from the week day next to start_day,
# and end on end_date's week date. The position of the
# start date in a week can be either before (the left calendar)
# or after (the right one) the end date.
#
# Su Mo Tu We Th Fr Sa # Su Mo Tu We Th Fr Sa
# 1 2 3 4 5 # 1 2 3 4 5
# 6 7 8 9 10 11 12 # 6 7 8 9 10 11 12
# ## ## ## ## 17 18 19 # 13 14 15 16 ## ## ##
# 20 21 22 23 24 25 26 # ## 21 22 23 24 25 26
# 27 28 29 30 31 # 27 28 29 30 31
#
# If some of the extra_days fall on a weekend, they need to be subtracted.
# In the first case only corner days can be days off,
# and in the second case there are indeed two such days.
extra_days -= if start_date.tomorrow.wday <= end_date.wday
[start_date.tomorrow.sunday?, end_date.saturday?].count(true)
else
2
end
end
(whole_weeks * 5) + extra_days
end
business_time has all the functionallity you want.
From the readme:
#you can also calculate business duration between two dates
friday = Date.parse("December 24, 2010")
monday = Date.parse("December 27, 2010")
friday.business_days_until(monday) #=> 1
Adding business days to a given date:
some_date = Date.parse("August 4th, 1969")
8.business_days.after(some_date) #=> 14 Aug 1969
Here is my (non gem and non holiday) weekday count example:
first_date = Date.new(2016,1,5)
second_date = Date.new(2016,1,12)
count = 0
(first_date...second_date).each{|d| count+=1 if (1..5).include?(d.wday)}
count
Take a look at Workpattern. It alows you to specify working and resting periods and can add/subtract durations to/from a date as well as calculate the minutes between two dates.
You can set up workpatterns for different scenarios such as mon-fri working or sun-thu and you can have holidays and whole or part days.
I wrote this as away to learn Ruby. Still need to make it more Ruby-ish.
Based on #mikej's answer. But this also takes into account holidays, and returns a fraction of a day (up to the hour accurancy):
def num_days hi, lo
num_hours = 0
while hi > lo
num_hours += 1 if hi.workday? and !hi.holiday?
hi -= 1.hour
end
num_hours.to_f / 24
end
This uses the holidays and business_time gems.
Simple script to calculate total number of working days
require 'date'
(DateTime.parse('2016-01-01')...DateTime.parse('2017-01-01')).
inject({}) do |s,e|
s[e.month]||=0
if((1..5).include?(e.wday))
s[e.month]+=1
end
s
end
# => {1=>21, 2=>21, 3=>23, 4=>21, 5=>22, 6=>22, 7=>21, 8=>23, 9=>22, 10=>21, 11=>22, 12=>22}
There are two problems with the most popular solutions listed above:
They involve loops to count every single day between each date (meaning that performance gets worse the further apart the dates are.
They are unclear about whether they count from the beginning of the day or the end. If you count from the morning, there is one weekday between Friday and Saturday. If you count from the night, there are zero weekdays between Friday and Saturday.
After stewing over it, I propose this solution that addresses both problems. The below takes a reference date and an other date and calculates the number of weekdays between them (returning a negative number if other is before the reference date). The argument eod_base controls whether counting is done from end of day (eod) or start of day. It could be written more compactly but hopefully it's relatively easy to understand and it doesn't require gems or rails.
require 'date'
def weekdays_between(ref,otr,eod_base=true)
dates = [ref,otr].sort
return 0 if dates[0] == dates[1]
full_weeks = ((dates[1]-dates[0])/7).floor
dates[eod_base ? 0 : 1] += (eod_base ? 1 : -1)
part_week = Range.new(dates[0],dates[1])
.inject(0){|m,v| (v.wday >=1 && v.wday <= 5) ? (m+1) : m }
return (otr <=> ref) * (full_weeks*5 + part_week)
end