If I pull up a task for a particular project there is a field called original estimate. The only place I can find the original estimate is in the Tfs_Warehouse database in the FactWorkItemHistory table under the Microsoft_VSTS_Scheduling_OriginalEstimate column.
Can someone clarify how this value works? The reason I ask is it changes in the FactWorkItemHistory table for the same task. There are some positive entries and negative entries. If I sum up all of the Microsoft_VSTS_Scheduling_OriginalEstimate values by Task and ProjectNodeSK I come up with the amount that will be show up in the TFS task UI. I'm just wondering why the value changes over entries. In fact, I also would be curious as to why there are multiple WorkItem entries for the same task. I figured (incorrectly) that the WorkItem table would be a one to one as far as the Task in the TFS UI goes.
Because it's a data warehouse. It's showing you how the value changed over time, not just a snapshot of the current value. This enables deeper reporting on trends.
Related
I have a requirement to create a Fact table which stores granted_share_qty awarded to employees. There are surrounding Dimensions like SPS Grant_dim which stores info about each grant, SPS Plan Dim which stores info about the Plan, SPS Client Dim which stores info about the Employer and SPS Customer Dim which stores info about the customer. The DimKeys (Surrogate Key) and DurableKeys(Supernatural Keys) from each Dimension is added to the Fact.
Reporting need is "as-of" ie on any given date, one should be able to see the granted_share_qty as of that date (similar to account balance as of that date) along with point-in-time values of few attributes from the Grant,Plan, Client, Customer dimensions.
First, we thought of creating a daily snapshot table where the data is repeated everyday in the fact (unless source sends any changes). However since there could be more than 100 million grant records , repeating this everyday was almost impossible, moreover the granted_share_qty doesnt change that often so why copy this everyday?.
So instead of a daily snapshot we thought of adding an EFFECTIVE_DT and EXPIRATION_DT on the Fact table (like a TIMESPAN PERIODIC SNAPSHOT table if such a thing exists)
This reduces the volume and perfectly satisfies a reporting need like "get me the granted_qty and grant details,client, plan, customer details as of 10/01/2022 " will translate to "select granted_qty from fact where 10/01/2022 between EFFECTIVE_DT and EXPIRATION_DT and Fact.DimKeys=Dim.DimKeys"
The challenge however is to keep the Dim Keys of the Fact in sync with Dim Keys of the Dimensions. Even if the Fact doesn't change, any DimKey changes due to versioning in any of the Dimension need to be tracked and versioned in the Fact. This has become an implementation nightmare
(To worsen the things, the Dims could undergo multiple intraday changes , so these are to be tracked near-real-time :-( )
Any thoughts how to handle such situations will be highly appreciated (Database: Snowflake)
P:S: We could remove the DimKeys from the Fact and use DurableKeys+Date to join between the Facts and Type 2 Dims, but that proposal is not favored/approved as of now
Thanks
Sunil
First, we thought of creating a daily snapshot table where the data is repeated everyday in the fact (unless source sends any changes). However
Stop right there. Whenever you know the right model but think it's un-workable for some reason, try harder. At a minimum test your assumption that it would be "too much data", and consider not materializing the snapshot but leaving it as a view and computing it at query time.
... moreover the granted_share_qty doesnt change that often so why copy this everyday?.
And there's your answer. Use a monthly snapshot instead of a daily snapshot, and you've divided the data by 30.
I always wonder if the FactInternetSale table of the AdventureworksDW is a accumulating snapshot table. It has a ShipDateKey in it.
According to the AdventureWorks OLTP documentation, it says that the ShipDate of the SalesOrderHeader is the date that "the order was shipped to customer". I interpret this line as, when the order is shipped, the ship date will be updated.
That also means the rows in the DW FactInternetSale will also need to be updated as well. The ship date marks the an important milestone of an order and this is clearly the behavior of an accumulating snapshot fact table.
So should this table be considered an accumulating snapshot fact table? If so then is there any problem that there is no real transaction fact table?
In the Kimball's data warehouse toolkit book, in this kind of problem, he separates the Order transaction fact table and the Shipping Fact table very strictly, with the Order Transaction Fact table only contains only the information which is recorded when the order is made, and will not be updated. The dates in the Order Transaction Fact table are always expected date, not the real date. The shipping fact table contains the true ship date of an item. After that there is an accumulating snapshot fact table that contains all the important milestones of an order. Not only the ship date, but also other important milestones... By having dates of important milestones, we of course can know the current status of the order.
In my personal opinion, I consider that the Order Fact Table that does not contain the current status of it is totally useless. What is the point of knowing the total amount of orders but cannot know how much is from fulfilled (shipped) ones and how much is from unfulfilled ones? In my experience, users (data analysts) will always just use the accumulating snapshot table to do their job all the time, as the search predicate of "current status" is never absent in their query.
In my real world, I usually design this Order (information) fact table as a accumulating snapshot straightforwardly, skipping the transaction fact table (like what Kimball does, strictly separates things), as I feel that is very time-consuming and have no use. The transaction fact tables are usually just the actions done on the order (for example: shipping).
How do you think about this?
No, it's not an accumulating snapshot fact table
We have a scenario where we want to frequently change the tag of a (single) measurement value.
Our goal is to create a database which is storing prognosis values. But it should never loose data and track changes to already written data, like changes or overwriting.
Our current plan is to have an additional field "write_ts", which indicates at which point in time the measurement value was inserted or changed, and a tag "version" which is updated with each change.
Furthermore the version '0' should always contain the latest value.
name: temperature
-----------------
time write_ts (val) current_mA (val) version (tag) machine (tag)
2015-10-21T19:28:08Z 1445506564 25 0 injection_molding_1
So let's assume I have an updated prognosis value for this example value.
So, I do:
SELECT curr_measurement
INSERT curr_measurement with new tag (version = 1)
DROP curr_mesurement
//then
INSERT new_measurement with version = 0
Now my question:
If I loose the connection in between for whatever reason in between the SELECT, INSERT, DROP:
I would get double records.
(Or if I do SELECT, DROP, INSERT: I loose data)
Is there any method to prevent that?
Transactions don't exist in InfluxDB
InfluxDB is a time-series database, not a relational database. Its main use case is not one where users are editing old data.
In a relational database that supports transactions, you are protecting yourself against UPDATE and similar operations. Data comes in, existing data gets changed, you need to reliably read these updates.
The main use case in time-series databases is a lot of raw data coming in, followed by some filtering or transforming to other measurements or databases. Picture a one-way data stream. In this scenario, there isn't much need for transactions, because old data isn't getting updated much.
How you can use InfluxDB
In cases like yours, where there is additional data being calculated based on live data, it's common to place this new data in its own measurement rather than as a new field in a "live data" measurement.
As for version tracking and reliably getting updates:
1) Does the version number tell you anything the write_ts number doesn't? Consider not using it, if it's simply a proxy for write_ts. If version only ever increases, it might be duplicating the info given by write_ts, minus the usefulness of knowing when the change was made. If version is expected to decrease from time to time, then it makes sense to keep it.
2) Similarly, if you're keeping old records: does write_ts tell you anything that the time value doesn't?
3) Logging. Do you need to over-write (update) values? Or can you get what you need by adding new lines, increasing write_ts or version as appropriate. The latter is a more "InfluxDB-ish" approach.
4) Reading values. You can read all values as they change with updates. If a client app only needs to know the latest value of something that's being updated (and the time it was updated), querying becomes something like:
SELECT LAST(write_ts), current_mA, machine FROM temperature
You could also try grouping the machine values together:
SELECT LAST(*) FROM temperature GROUP BY machine
So what happens instead of transactions?
In InfluxDB, inserting a point with the same tag keys and timestamp over-writes any existing data with the same field keys, and adds new field keys. So when duplicate entries are written, the last write "wins".
So instead of the traditional SELECT, UPDATE approach, it's more like SELECT A, then calculate on A, and put the results in B, possibly with a new timestamp INSERT B.
Personally, I've found InfluxDB excellent for its ability to accept streams of data from all directions, and its simple protocol and schema-free storage means that new data sources are almost trivial to add. But if my use case has old data being regularly updated, I use a relational database.
Hope that clear up the differences.
We have a customized TFS Process (template) based on the newest Agile Process and everything is working beautifully, except for the order of the valid-values within the State field (see screenshot).
I've searched high and low and cannot seem to find out any tips on how these values are ordered. I would like to order them in their proper-flow, the same as how the Kanban board orders the Process's set State values.
Any ideas? I've looked within the .xml files for the WIT definitions and the Process flow but nothing is standing out. Thanks for any info!
Image: Customized State values, not exactly order how we'd wish..
No such setting to sort order for State field. It arranges in alphabetical order.
Is there a way in TFS in VS2010 to specify that a particular iteration is the current one, and then return that for use in queries similar to the way #Project works? If not is there a way to do sub-queries in TFS work item queries?
Looks like Microsoft listened. #CurrentIteration is being added as a token.
That’s great, of course. When looking to write a query against the current sprint, however, you are in danger of losing sight of unclosed work items in previous sprints. When you reach for #CurrentIteration, you probably just mean “all unfinished work that has been committed to a sprint.” If you filter to a single sprint, you’ll miss any stragglers you failed to close or move forward from previous sprints.
Consider using the following pattern, where “ScrumOfScrums\Release 1.0.0.0″ is your backlog path, and all of your sprint paths are children to that:
Filter for work items under your backlog iteration node, but not equal to the backlog iteration node. That will give you all items committed to a sprint.
This will also catch any items that weren’t closed in your previous sprints. Since the goal is to close every item in a sprint before moving to the next one, this query pattern will generally be better than using #CurrentIteration, unless you're looking to find the closed items in the current iteration.
P.S. While this is an old question, it was my top hit when I searched for info on querying the current iteration in TFS.
I'm afraid that there is not such a macro. I personally just have a few "X in current iteration" team queries and then edit those queries to point to the new iteration path at the start of each iteration.
I am going to try using a standard name for the current iteration such as 'Current'. The queries for this iteration would all reference this name. Once the iteration is completed, I will rename it using a naming convention that includes the date for example and the next iteration would then be created with the name 'Current' (or renamed to this if it already exists). The queries would then return results from the new iteration.
- 2010-49
- Current
- 2010-51
I am not sure whether renaming iterations this way will cause any conflicts or confuse the data warehouse for example but this would save on having to create or modify a heap of queries at the start of each iteration.
I would be very interested to hear feedback on this approach!
Query for Sprint in a date interval as shown here:
Team Project = #Project
And Work Item Type = Sprint
And Start Date <= #Today
And Finish Date >= #Today
I have found that Telerik's free Work Item Manager provides an elegant solution to this problem.
Just define your queries as you usually would but leave out any filters relating to iterations (note that this also applies to areas). There is a treeview pane named 'Area/Iteration Filters' that will add extra, recursive filtering based on the iteration (or area) that you select there.
Note that if the pane is not visible then you can enable it via the View menu.