Forecasting monthly sales - time-series

The data contains 2016 Jan- Dec and 2017 Jan-June order data for each customers. But most of the order amount is 0, that means most customers only order one or two times a year. Some of customers are new for 2017 and there are no records for these customers in 2016.
Data looks like(in dollars): 0,0,0,100,0,0,0,0,70,0,0,0,0,0,0,0,0,0
How can I forecast the order amount for each customer for July- Dec 2017?

What you are looking for in called intermittent forecasting. Check out the R package tsintermittent.
This link will be helpful https://stats.stackexchange.com/questions/127337/explain-the-croston-method-of-r

Related

Given a set of daily sales, how to predict future 'Weekly' sales?

I have a dataset with daily sales in a company. The columns are, category codes (4 categories), item code (195 items), day ID (from 1st Sep 2021 - 1st Feb 2022), Daily sales in qty.
In val and test sets, I have to predict WEEKLY sales from 14th Feb 2022 - to 13th March 2022. Columns are category codes, item code, week numbers (w1, w2, w3, w4). In the val set, I have weekly sales in qty, and in the test set, I have to predict weekly sales in qty.
Because my train set has DAILY sales and no week number, I am confused about how to approach this problem. I don't have historical data on sales of months they have given in val and test sets.
Should I map days in the train set to weeks as w1, w2, w3, w4 for each month? Are there any other good methods?
I tried expanding val set by dividing weekly sales by 7 and replacing a week row with 7 new rows for each day in that week, but it gave me very bad resutls.
I have to use the MAPE metric.
Welcome to the community!
Since you are asked to predict on a weekly basis, then it is better to transform your training data to weeks.
A pandas method for this is resample(), you can learn more about it in the documentation here. You can change the offset string to the one that you need to match the way in which the validation set was built. All the available choices can be found here.
You may find this useful too.

Is the regression model best for the inventory data

I'm very new to the machine learning and currently i have a realtime project that needs a forecasting of the inventory required for the coming months.
I have a dataset for the last 3 years. Data set is very clean(no null, no duplicates)
Example of the data
year_month device_type person_type # of devices
2017-01 laptop employee 2
2017-01 desktop temp 5
like this i have the data till December 2019 now i need to predict for the feb 2020
Request someone to suggest which model need to be used?
As per me i'm look at Linear regression

How to build a query in the TFS Query Editor to show stories were in active state for less than 10 minutes

I am using Microsoft Visual Studio Team Foundation Server, Version 16.131.27701.1.
I like to query the stories that are in resolved state but were in an active state for less than 10 minutes. I am using the TFS Query Editor. Does TFS Query editor have a macro for minutes? If not available, then just getting active state for less than a day would be good. I am not sure how to use the [Field] in the operator field to get what I want. TFS query that I have tried
So, looking for a data results that look like this. Basically, looking for stories that close very fast within 10 mins.
Work Item Activated Date Resolved Date
---------------------------------------------------------------
Story 1 1/1/2019 5:00:00 PM 1/1/2019 5:02:00 PM
Story 2 1/3/2019 4:00:00 PM 1/3/2019 4:02:00 PM
Story 3 1/5/2019 3:00:00 PM 1/5/2019 3:05:00 PM
Can you help?
You can filter for work items by the date on which they were changed or for a specific time period. If you limit the scope of your query, it can help with performance by only returning those results that fit the date range that you want to include.
However, it can not be precisely to minutes or hours. You could only query items resolved today. For example:
If you still want to get those user stories which active state for less than 10 minutes. As a workaround, you could export the query with both Activated Date and Resolved Date columns to Excel. With the help of excel to filter the time between Activated and Closed less than 10 minutes of all user stories.

CMS Payroll-Based Journal

I work for the state of Washington dept of Social & Health Services as development lead for the nursing home rates section. I need to download the Payroll-Based Journal from data.cms.gov to a sql table. I am already doing this for the Star and QM tables. I do this through a nightly job to check to see if newer data is available on the cloud and update my tables if it is available.
I have found the urls for individual views for a specific quarter for PBJ but I haven't found the entire dataset -- is that available (like it is for STAR and QM)?
My current method downloads the jSON for the views but it only has Q1 and Q2 for PBJ. When I use the browse there is Q1, Q2 and Q3 -- why the difference? Can I download the jSON source for the browse results instead of the views?
Your help is greatly appreciated.
PBJ data is submitted and published on a quarterly basis. So far, CMS has published 3 quarters worth of data. For more information, you can view the Public Use File documentation here: https://data.cms.gov/Special-Programs-Initiatives-Long-Term-Care-Facili/PBJ-PUF-Documentation-2018-01-23/ygny-gzks.
The three quarters they have published include the calendar based quarter time period rather than the expected fiscal based time period. I have contacted CMS to have this issue corrected.
Here are the three quarters they have published:
Dataset Label: PBJ Daily Nurse Staffing 2017 Q1
Actual Quarter: 2017 Q2
Expected Date Range: 10/1/2016-12/31/2016
Included Date Range: 1/1/2017-3/31/2017
Link: https://data.cms.gov/Special-Programs-Initiatives-Long-Term-Care-Facili/PBJ-Daily-Nurse-Staffing-2017-Q1/afnb-m6tn
JSON API: https://data.cms.gov/resource/pre7-wbdk.json
Dataset Label: PBJ Daily Nurse Staffing 2017 Q2
Actual Quarter: 2017 Q3
Expected Date Range: 1/1/2017-3/31/2017
Included Date Range: 4/1/2017-6/30/2017
Link: https://data.cms.gov/Special-Programs-Initiatives-Long-Term-Care-Facili/PBJ-Daily-Nurse-Staffing-2017-Q2/utrm-5phx
JSON API: https://data.cms.gov/resource/9h2r-8ja3.json
Dataset Label: PBJ Daily Nurse Staffing 2017 Q3
Actual Quarter: 2017 Q4
Expected Date Range: 4/1/2017-6/30/2017
Included Date Range: 7/1/2017-9/30-2017
Link: https://data.cms.gov/Special-Programs-Initiatives-Long-Term-Care-Facili/PBJ-Daily-Nurse-Staffing-2017-Q3/bpnu-uej3
JSON API: https://data.cms.gov/resource/krv7-6tjb.json
I work for ezPBJ, a company that specializes in assisting facilities with PBJ preparation, validation, and submission. We will soon be releasing a free tool that will allow users to compare their PBJ data with that of any other facility in the US. If you'd like to be included our our list of beta testers, send me an email at jaime#ezpbj.com.
Thanks,
Jaime
CIO, ezPBJ
I think I have figured out a solution ....
http://api.us.socrata.com/api/catalog/v1?q=pbj
will give me a json stream that I can put into a dataset and get all the url's I am looking for.

Is there a concept of slowly changing FACT in data warehouse

In data warehousing, we have the concept of slowly changing dimensions. I am just wondering why there is no jargon for 'slowly/rapidly changing FACTs' because the same Type1, Type 2 measures can be used to track changes in the FACT table.
According to the DW gods there are 3 types of FACT tables
Transaction: your basic measurements with dim references. measurements not rolled up or summarized, lots of DIM relations
Periodic: Rolled up summaries of transaction fact tables over a defined period of time.
Accumulating Snapshot: measurements associated with a 2+ defined time periods
From these we have at least 2 options that will result in something pretty similar to a slowly changing fact table. It all depends on how your source system is set up.
Option 1: Transactional based Source System
If your source system tracks changes to measurements via a series of transactions (ie, initial value + change in value + change value etc) then each of these transactions ends up in the transactional fact. This is then used by the periodic fact table to reflect the 'as of period' measures.
For example, if your source system tracks money in and out of an account you would probably have a transaction fact table that pretty much mirrored the source money in/out table. You would also have a periodic fact table that would be updated every period (in this case month) to reflect the total value of the account for that period
The periodic fact table is your Slowly Changing Fact table.
Source DW_Tansaction_Fact DW_Periodic_Fact
--------------- -> ------------------- -> --------------------
Acnt1 Jan +10$ Acnt1 Jan +10$ Acnt1 Jan 10$
Acnt1 Feb -1 $ Acnt1 Feb -1 $ Acnt1 Feb 9$
Acnt1 Apr +2 $ Acnt1 Apr +2 $ Acnt1 Mar 9$
Acnt1 Apr 11$
Option 2: CRUD/Overwriting Source System
Its more likely you have a source system that lets users directly update/replace the business measurements. At any point in time, according to the source system, there was and is only one value for each measure. You can make this transaction by some clever trickery in your ETL process but your only ever going to get a transaction window limited by your ETL schedule.
In this case you could go either with a Periodic Fact table OR an Accumulating fact table.
Lets stick with our account example, but instead of transactions the table just stores an amount value against each account. This is updated as required in the source system so that for Acnt1, in January it was 10$, February 9$ and April 11$
Sticking the the transaction and period fact tables we would end up with this data (As at end of April). Again, The periodic fact table is your Slowly Changing Fact table.
DW_Tansaction_Fact DW_Periodic_Fact
------------------- -> --------------------
Acnt1 11$ Acnt1-Jan-10$
Acnt1-Feb-09$
Acnt1-Mar-09$
Acnt1-Apr-11$
But we could also go with with an Accumulating Fact table which could contain all month values for a given year.
DW_Accumlative_Fact_CrossTab
Year Acnt Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001 Acnt1 10 9 9 11 - - - - - - - -
Or a more type3-ish version
DW_Accumlative_Fact_CrossTab
Acnt Year YearStartVal CurrentVal
Acnt1 2001 10 9
Kindof relevant
In my experience, this sort of question comes up when this common business scenario:
There is a Core Business System with a DATABASE.
Business Periodically Issues Reports that summaries values by time periods from Core Business System
Core Business System allows retrospective updating of data - This is handled by overwriting values.
Business demands to know why the January figures in the same report run in June no longer match the January figures from the report run in February.
Note that you are now dealing with FOUR sets of time (Initial period of report, measurement at date of initial period, current report period, measurement at current period) which will be hard enough for you to explain let alone your end users to understand.
Try to step back, explain to your end users which business measures change over time, listen to what results they want and build your facts accordingly. Note that you may end up with multiple fact tables for the same measure, that is OK and good.
Reference:
http://www.kimballgroup.com/2008/11/fact-tables/
http://www.zentut.com/data-warehouse/fact-table/

Resources