I have raw data in Tableau that looks like:
Month,Total
2021-08,17
2021-09,34
2021-10,41
2021-11,26
2021-12,6
And by using the following calculation
RUNNING_SUM(
COUNTD(IF [Inserted At]>=[Parameters].[Start Date]
AND [Inserted At]<=[End Date]
THEN [Id] ELSE NULL END
))
/
LOOKUP(RUNNING_SUM(
COUNTD(IF [Inserted At]>=[Parameters].[Start Date]
AND [Inserted At]<=[End Date]
THEN [Id] ELSE NULL END
)),-1)*100-100
I get
Month,My_Calc
2021-08,NULL
2021-09,200
2021-10,80.4
2021-11,28.3
2021-12,5.1
And all I really want is 5.1 (last monthly value) as one big metric (% Month-Over-Month Growth).
How can I accomplish this?
I'm relatively new to Tableau and don't know how to use calculated fields in conjunction with the date groupings aspect to express I want to calculate month-over-month growth. I've tried the native year-over-year growth running total table calculation but that didn't end with the same result since I think my calculation method is different.
First a brief table calc intro, and then the answer at the end.
Most calculations in Tableau are actually performed by the data source (e.g. database server), and the results are then returned to Tableau (i.e. the client) for presentation. This separation of responsibilities allows high performance, even when facing very large data sets.
By contrast, table calculations operate on the table of query results that were returned from the server. They are executed late in the order of operations pipeline. That is why table calcs operate on aggregated data -- i.e. you have to ask for WINDOW_SUM(SUM([Sales)) and not WINDOW_SUM([Sales])
Table calcs give you an opportunity to make final passes of calculations over the query results returned from the data source before presentation to the user. You can for instance calculate a running total or make the visualization layout dynamically depend in part on the contents of the query results. This flexibility comes at a cost, the calculation is only one part of defining a table calc. You also have to specify how to apply the calculation to the table of summary results, known as partitioning and addressing. The Tableau on-line help has a useful definition of partitioning and addressing.
Essentially, table calcs are applied to blocks of summary data at a time, aka vectors or windows. Partitioning is how you tell Tableau how you wish to break up the summary query results into windows for purposes of applying your table calc. Addressing is how you specify the order in which you wish to traverse those partitions. Addressing is important for some table calcs, such as RUNNING_SUM, and unimportant for others, such as WINDOW_SUM.
Besides understanding partitioning and addressing very well, it is also helpful to learn about the functions INDEX(), SIZE(), FIRST(), LAST(), WINDOW_SUM(), LOOKUP() and (eventually) PREVIOUS_VALUE() to really understand table calcs. If you really understand them, you'll be able to implement all of these functions using just two of them as the fundamental ones.
Finally, to partially address your question:
You can use the boolean formula LAST() = 0 to tell if you are at the last value of your partition. If you use that formula as a filter, you can hide all the other values. You'll have to get partitioning and addressing specified correctly. You would essentially be fetching a batch of data from your server, using it in calculations on the client side, but only displaying part of it. This can be a bit brittle depending on which fields are on which shelves, but it can work.
Normally, it is more efficient to use a calculation that can be performed server-side, such as LOD calc, if that allows you to avoid fetching data only for client side calculations. But if the data is already fetched for another purpose, or if the calculation requires table calc features, such as the ability to depend on the order of the values, then table calcs are a good tool.
However you do it, the % month-to-month change from 2021.11 (a value of 26) to the value for 2021.12 (a value of 6) is not 5.1%.
It's (( 6 - 26 ) / 26) * 100 = -76.9 %
OK, starting from scratch, this works for me: ( I don't know how to get exactly the table format I want without using ShowMe and Flip, but it works. Anyone else? )
drag Date to rows, change it to combined Month(Date)
drag sales to column shelf
in showme select TEXT-TABLES
flip rows for columns using tool bar
that gets a table like the one you show above
Drag Sales to color (This is a trick to simply hold it for a minute ),
click the down-arrow on the new SALES pill in the mark card,
select "Add a table calculation",
select Running Total, of SUM, compute using Table(down), but don't close this popup window yet.
click Add Secondary Calculation checkbox at the bottom
select Percent Different From
compute using table down
relative to Previous
Accept your work by closing the popup (x).
NOW, change the new pill in the mark card from color to text
you can see the 5.1% at the bottom. Almost done.
Reformat again by clicking table in ShowMe
and flipping axes.
click the sales column header and hide it
create a new calculated field
label 'rows-from-bottom'
formula = last()
close the popup
drag the new pill rows-from-bottom to the filters shelf
select range 0 to 0
close the popup.
Done.
For the next two weeks you can see the finished workbook here
https://public.tableau.com/app/profile/wade.schuette/viz/month-to-month/hiderows?publish=yes
I have a Prometheus metric called device_number. What I want is to show the difference in value between now and one day/week/month etc ago. Which means subtracting two values with two different timestamps.
Checking around I don't find any useful documentation on how to do it.
Something I would do, but doesn't work is:
sum(device_number) - sum(device_number[$__range])
I found offset is the correct keyword.
Query like this:
sum(vss_device_number) - sum(vss_device_number offset 1d)
Will return difference between now and yesterday.
Docs.
PromQL also provides delta() function, which can be used for returning the delta between the current time and the time specified in square brackets passed to this function. For example, the following query should return the delta for vss_device_number over the last day (see [1d]):
delta(vss_device_number[1d])
The query returns deltas per each matching time series. If you need summary delta across all the matching time series, then wrap the query into sum():
sum(delta(vss_device_number[1d]))
I wish to store time series data with versioning. By versioning, I mean that I might have a metric energy_mwh with tag meter_id=123 and a fieldset something like this time=2016-01-01 10:00, mwh=20.50, read-time=2016-01-01 20:15 and if I re-read the meter at a later time I want to keep both the new and old version of the meter reading. Later when I query the data I will be mostly just interested in the mwh value with the highest read-time for any given time. If I query over a range of times the read-time is going to vary.
I am thinking of using InfluxDB or some other time series database with a similar data model.
Is there a right way of doing this? I believe that I must keep read-time as a tag - not a field or I will lose the older version of the data. I guess that is answer - but it doesn't feel right to me to have what I see as a piece of data: read-time sitting in an identifier - specifically a tag. Am I on the right track?
Ive been working on a spreadsheet to examine something. I was converting a recursive formula into a linear one. The initial value or first state was 60. To get to the second state you 240. This is done by taking the previous state, doubling it, and adding 120. Which leads to the 3rd and 4th states of 600 and 1320. Now this base formula was clear enough that the (60+120)*2^(n-1)-120 accurately expresses it.
My second part comes from needing to add in the ability to decrease the costs while still staying true to the state. So the last formula only works when the cost reduction is 0. After considerable effort (I kept having minor rounding errors) I arrived at (ROUND(60-60*0.015*B$2)+(120-round(120*rounddown(B$2/3)*0.03,0)))*2^($A2-1)-(120-round(120*rounddown(B$2/3)*0.03,0)).
To test the formulas I created a table with the following values, with a2=1 to a5=4, and b2=0 to h2=6. Now I was using google spreadsheets to examine the information. When I populate the table I found that all the values were correct with the formula, except on G. On G the values are identical to F.
So to try and correct this I have deleted the information from the cells, deleted the columns, and even tried again in a new spreadsheet. But in all cases G=F when it should not. I cant figure out why I'm getting a duplicate column.
The information on row 3 is the values that it should be using.
The expected values are G4=55, G5=226, G6=568, G7=1252.
In case anyone wanted to know, I finally managed to solve the issue. I needed to round in one more place. The following is the formula that has worked for my current testing.
sum((ROUND(60-round(60*0.015*A$2))+(120-round(120*rounddown(A$2/3)*0.03,0)))*2^($A25-1)-(120-round(120*rounddown(A$2/3)*0.03,0)))
I am having a hard time generating precisely the frequency table I am looking for using SPSS.
The data in question: cases (n = ~800) with categorical variables DX_n (n = 1-15), each containing ICD9 codes, many of which are the same code. I would like to create a frequency table that groups the DX_n variables such that I can view frequency of every diagnosis in this sample of cases.
The next step is to test the hypothesis that the clustering of diagnoses in this sample is different than that of another. If you have any advice as to how to test this, that would be really appreciated as well!
Thanks!
Edit: My attempts:
1) Analyze -> Descriptive Statistics -> Frequencies; then add variables DX_n (1-15) and display frequency charts. The output is frequencies of each ICD9 code per DX_n variable (so 15 tables are generated - I'm hoping to just have one grouped table).
2) I tried adjusting the output format to organize by variable and also to compare variables but neither option gives the output I'm looking for.
I think what you are looking for CTABLES. It can do parallel columns of frequencies, and it includes a column proportions test that can see whether the distributions differ
Thank you, JKP! You set me on exactly the right track. I'm not sure how I overlooked that menu. Just to clarify in case anyone else comes along needing to figure this out:
Group diagnosis variables into a multiple response set using Analyze > Custom Tables > Multiple Response Sets. Code the variables as categories.
http:// i.imgur.com/ipE9suf.png
Create a custom table with your new multiple response set as a row and the subsets to compare as columns. I set summary statistics to compute from rows and added the column n% column (sorted descending).
http:// i.imgur.com/hptIkfh.png
Under test statistics, include a column proportions z-test as JKP suggested.
http:// i.imgur.com/LYI6ZRl.png
Behold, your results:
http:// i.imgur.com/LgkBA8X.png
Thanks again, and best of luck to anyone else who runs across this.
-GCH
p.s. Sorry everyone, I was going to post images but don't have enough reputation points yet. Images detailing the steps in the GUI can be found at the obfuscated links above.