CMS Payroll-Based Journal - soda

I work for the state of Washington dept of Social & Health Services as development lead for the nursing home rates section. I need to download the Payroll-Based Journal from data.cms.gov to a sql table. I am already doing this for the Star and QM tables. I do this through a nightly job to check to see if newer data is available on the cloud and update my tables if it is available.
I have found the urls for individual views for a specific quarter for PBJ but I haven't found the entire dataset -- is that available (like it is for STAR and QM)?
My current method downloads the jSON for the views but it only has Q1 and Q2 for PBJ. When I use the browse there is Q1, Q2 and Q3 -- why the difference? Can I download the jSON source for the browse results instead of the views?
Your help is greatly appreciated.

PBJ data is submitted and published on a quarterly basis. So far, CMS has published 3 quarters worth of data. For more information, you can view the Public Use File documentation here: https://data.cms.gov/Special-Programs-Initiatives-Long-Term-Care-Facili/PBJ-PUF-Documentation-2018-01-23/ygny-gzks.
The three quarters they have published include the calendar based quarter time period rather than the expected fiscal based time period. I have contacted CMS to have this issue corrected.
Here are the three quarters they have published:
Dataset Label: PBJ Daily Nurse Staffing 2017 Q1
Actual Quarter: 2017 Q2
Expected Date Range: 10/1/2016-12/31/2016
Included Date Range: 1/1/2017-3/31/2017
Link: https://data.cms.gov/Special-Programs-Initiatives-Long-Term-Care-Facili/PBJ-Daily-Nurse-Staffing-2017-Q1/afnb-m6tn
JSON API: https://data.cms.gov/resource/pre7-wbdk.json
Dataset Label: PBJ Daily Nurse Staffing 2017 Q2
Actual Quarter: 2017 Q3
Expected Date Range: 1/1/2017-3/31/2017
Included Date Range: 4/1/2017-6/30/2017
Link: https://data.cms.gov/Special-Programs-Initiatives-Long-Term-Care-Facili/PBJ-Daily-Nurse-Staffing-2017-Q2/utrm-5phx
JSON API: https://data.cms.gov/resource/9h2r-8ja3.json
Dataset Label: PBJ Daily Nurse Staffing 2017 Q3
Actual Quarter: 2017 Q4
Expected Date Range: 4/1/2017-6/30/2017
Included Date Range: 7/1/2017-9/30-2017
Link: https://data.cms.gov/Special-Programs-Initiatives-Long-Term-Care-Facili/PBJ-Daily-Nurse-Staffing-2017-Q3/bpnu-uej3
JSON API: https://data.cms.gov/resource/krv7-6tjb.json
I work for ezPBJ, a company that specializes in assisting facilities with PBJ preparation, validation, and submission. We will soon be releasing a free tool that will allow users to compare their PBJ data with that of any other facility in the US. If you'd like to be included our our list of beta testers, send me an email at jaime#ezpbj.com.
Thanks,
Jaime
CIO, ezPBJ

I think I have figured out a solution ....
http://api.us.socrata.com/api/catalog/v1?q=pbj
will give me a json stream that I can put into a dataset and get all the url's I am looking for.

Related

Jira JQL: how to find the busiest hours of a queue?

Jira Server v7.12.1#712002
We have noticed that at certain periods of the day there are more tickets assigned to "Operations" queue than usual, so we need to back this impression with real statistics.
We extracted all the tickets that at some point were assigned to "Operations" queue via the following query:
project = "Client Services" AND assignee WAS "Operations"
The results of the query above include the timestamp value in the "Updated" field, however this field reflects the last time the ticket was updated - not what we want. We want a timestamp which shows when the ticket arrived to "Operations" queue.
The tickets can arrive in two ways:
1) Ticket may come from other teams. In such cases, under History tab we can observe how 3 different fields change their values. For example, if ticket comes from certain Joe Smith, it would look like this:
FIELD ORIGINAL VALUE NEW VALUE
Joe Smith made changes - 09/04/2020 12:08
Assignee Joe Smith Operations
2) Ticket may be created directly (by other teams). In such cases the first 2 entries under History tab always have this pattern:
Joe Smith created issue - 02/04/2020 19:27
_______________________________________________________________________________________________________
Joe Smith made changes - 02/04/2020 19:27
FIELD ORIGINAL VALUE NEW VALUE
Link Referred from ABC-12345
The pattern above is that created issue and made changes always have the identical timestamps.
Based on these examples, is there some way to extract the timestamps of all tickets' arrival to "Operations" queue? If not with JQL, maybe some other solution/tool exists?
There can be two ways you could achieve most of what you've asked for:
Use Recently Created Chart JIRA gadget.
With this, you could get a clear picture of number of tickets that you'd get in an hour of a day.
Or you could use the Created Vs Resolved built-in JIRA report
This helps in bringing out better information from the tickets, do some analysis etc.
You could find more details from this answer on Atlassian Community forum. Hope this answer helps!

Get Exact Answer from IBM Watson Discovery

I use IBM Watson Discovery with my own document collection. When I enter a query that is "When was Stephen Hawking born?", Discovery returns related passages and one of them is "Stephen Hawking was born on 8th January 1942". The point I want to learn is that could I return just 8th January 1942 from this passage by "DATE" entitiy type?
The point I want to learn is that could I return just 8th January 1942 from this passage by "DATE" entitiy type?
The best way to do this is probably to chunk the documents and annotate the chunks at ingestion time. Then search the chunked documents instead of using the passage retrieval feature. Passage retrieval does not currently identify entities within passages.
Another option is to try adjusting the passages.characters field. The disadvantage with this approach is that the text will probably not be truncated around the date, or at least not consistently.
Another option is to try post processing the returned passages to extract / annotate the date entities from the results.

Fact Table Design - How to capture a fact which precedes the data start date

We have a fact table which collects information detailing when an employee selected a benefit. The problem we are trying to solve is how to count the total benefits selected by all employee's.
We do have a BenefitSelectedOnDay flag and ordinarily, we can do a SUM on this to get a result, but this only works for benefit selections since we started loading the data.
For Example:
Suppose Client#1 has been using our analytics tool since October 2016. We have 4 months of data in the platform.
When the data is loaded in October, the Benefits source data will show:
Employee#1 selected a benefit on 4th April 2016.
Employee#2 selected a benefit on 3rd October 2016
Setting the BenefitSelectedOnDay flag for Employee#2 is very straight forward.
The issue is what to do with Employee#1 because we can’t set a flag on a day which doesn’t exist for that client in the fact table. Client#1's data will start on 1st October 2016.
Counting the benefit selection is problematic in some scenarios. If we’re filtering the report by date and only looking at benefit selections in Q4 2016, we have no problem. But, if we want a total benefit selection count, we have a problem because we haven’t set a flag for Employee#1 because the selection date precedes Client#1’s dataset range (Oct 1st 2016 - Jan 31st 2017 currently).
Two approaches seem logical in your scenario:
Load some historical data going back as far as the first benefit selection date that is still relevant to current reporting. While it may take some work and extra space, this may be your only solution if employees qualify for different benefits based on how long the benefit has been active.
Add records for a single day prior to the join date (Sept 30 in this case) and flag all benefits that were selected before and are active on the Client join date (Oct 1) as being selected on that date. They will fall outside of the October reporting window but count for unbounded queries. If benefits are a binary on/off thing this should work just fine.
Personally, I would go with option 1 unless the storage requirements are ridiculous. Even then, you could load only the flagged records into the fact table. Your client might get confused if he is able to select a period prior to the joining date and get broken data, but you can explain/justify that.

How to show multiple Date/Times per location?

Using Google Spreadsheets, I need to enter data structured like the example below.
There will be multiple "quadrants"
Each "quadrant" can contain one or many "days",
Each "day" can contain one or many "times".
This data will ultimately be imported in some backend db (e.g. Access DB, SQL, MySQL).
Question: For each day, how do I represent multiple times? Do I create a new row?
Quadrant One Team Schedules
Sunday
10:00 AM - Red Team
3:00 PM - Green Team
Monday
6:00 AM - Red Team
10:00 AM - Yellow Team
3:30 PM - Green Team
Tuesday
Wednesday
6:00 PM - Yellow Team
Thursday
1:00 PM - Red Team
Friday
Saturday
10:00 AM - Blue Team
3:00 PM - Red Team
I’m not quite sure what answer you are expecting but wanting to post an image (and probably length!) is why this is not a comment.
Poor data layout that requires changes to help legibility or changes to facilitate further processing is, IMO, a very big issue – much more so than, it seems, is appreciated by novices (see perhaps Kruger-Dunning). Again merely my opinion, but I think about half of all questions on SO have data layout as an issue, in whole or part.
Some suggestions:
With databases, always have an index (ID) to identify unique records (rows). Often added automatically.
Try to ensure each record is complete for every field (nulls may cause issues). ID6 seems not required.
Use dates rather than days of the week (it is easier to get the day from the date than the date from the day!)
(Personal preference – not always viable) Use ‘scientific’ notation for dates (YYYYMMDD) to avoid ambiguity between ‘US’ and ‘UK’ systems – and the difficulties in switching between them.
Use the 24-hour clock (saves the space for AM and PM, reduces ambiguity and generally is easier to process).
Not so important nowadays but should consider codes (with a lookup table if desired) such as YL for Yellow rather than indeterminate length strings – saves on data storage so less cost, more speed win/win.

Delta Extraction + Business Intelligence

What does Delta Extraction mean in regards to Data Warehousing.
Only picking up data that have changed since the last run. This saves you the effort of processing data that you've already extracted before. For example, if your last extract of customer data was at April 1 00:00:00, your delta run would extract all customers who were added or have had details updated since April 1 00:00:00. To do this, you will either need an attribute that stores when a record was last updated or use a log scraper.

Resources