What are the columns in perf-stat when run with -r and -x - perf

I'm trying to interpret the results of perf-stat run on a program. I know that it was run with -r 30 and -x. From https://perf.wiki.kernel.org/index.php/Tutorial is says that if run with -r the stddev will be reported but I'm not sure which of these columns that is and I'm having trouble finding information on the output when run with -x. One example of the output I've recieved is this
19987,,cache-references,0.49%,562360,100.00
256,,cache-misses,10.65%,562360,100.00
541747,,branches,0.07%,562360,100.00
7098,,branch-misses,0.78%,562360,100.00
60,,page-faults,0.43%,560411,100.00
0.560244,,cpu-clock,0.28%,560411,100.00
0.560412,,task-clock,0.28%,560411,100.00
My guess is that the % column is the standard deviation as a percentage of the first column but I'm not sure.
My question in summary is what do the columns represent? Which column is the standard deviation?

You are very close. Here are some blanks filled in.
Arithmetic mean of the measured values.
The unit if known. E.g. on my system it shows 'msec' for 'cpu-clock'.
Event name
Standard deviation scaled to 100% = mean
Time during which counting this event was actually running
Fraction of enabled time during which this event was actually running (in %)
The last two are relevant for multiplexing: If there are more counters selected than can be recorded concurrently, the denoted percentage will drop below 100.
On my system (Linux 5.0.5, not sure since when this is available), there is also a shadow stat for some metrics which compute a derived metric. For example the cpu-clock will compute CPUs utilized or branch-misses computes the fraction of all branches that are missed.
Shadow stat value
Shadow stat description
Note that this format changes with some other options. For example if you display the metrics with a more fine granular grouping (e.g. per cpu), information about these groups will be prepended in additional columns.

Related

Ran a MANOVA where Pillai's/Wilks isn't significant, but one of the DVs is very significant in my output table of between-subjects effects

I'm a stats newb and was told by my professor to run a MANOVA for something I was checking out. Basically, I wanted to see if there was an interaction between ethnicity and a certain quadrant grouping for a set of outcome variables that are subscales of an overall measure (ders_tot).
An ANCOVA (one DV) already found an interaction between ethnicity and the quadrant grouping for ders_tot.
My MANOVA output is showing me that with Pillai's/Wilks there is no significance (p = .098 for both), but in SPSS there is also a table of between-subjects effects automatically generated that indicates strong interaction significance for one particular outcome variable (p = .003). The other DVs are far from significance (some as high as p = .27 or p = .66).
Is my MANOVA significance (or lack thereof) being seriously skewed by the highly nonsignificant variables? Am I still "allowed" to run analysis on that one particular variable included in the MANOVA that suggests strong significance? I also have data viz/chart output that makes a strong case for analyzing that particular variable.
(EDIT: BELOW PROBLEM HAS BEEN FIXED)
[Also, I've noticed that one of my covariates is always being run in SPSS with 1 df when it should be 2. I've triple checked the variable type and added labels and all that, and can't get it to run appropriately. When I run the same analysis in R, df = 2. This isn't affecting my sig. findings by much, but it's driving me crazy!]

Time series- Not periodic, despite having included frequency

This is actually part of my thesis research, where I have to run a time series analysis on pollution and economic growth of a single country.
I have data of over 144 years of the two variables with each value representing a single year. I imported, set the values as numeric and attached the dataset through the console and ran:
ts_gdp= (data=`GDP per capita, start=1871,end=2014,frequency=1, names=gdp)
I get to see all the values for the first variable and then follow up with the stl() but I get this error. Any clues why this shows up, although I have set the frequency=1, which is the number of observations for the unit of time, in this case a year? Thank you in advance!
Error in stl(GDP, s.window = "periodic") :
series is not periodic or has less than two periods

Scale-building in SPSS

I'm using Cronbach's alpha to analyze data in order to build/refine a scale. This is a tedious process in SPSS, since it doesn't automatically optimize the scale, so I'm hoping there is a way to use syntax to speed it up.
So I start with a set of items, set up the OMS control panel to capture the item-total statistics table, and the run the alpha analysis. This pushes the item-total stats into a new dataset. Then I check the alpha value, and use it in syntax to screen out items that have a greater alpha-if-deleted value.
Then I re-run the analysis with only the items passed the screening. And I repeat, until all the items pass the screening. Here is the syntax:
* First syntax sets up OMS, and then runs the alpha analysis.
* In the reliability syntax, I have to manually add the variables and the Scale name.
* OMS.
DATASET DECLARE alpha_worksheet.
OMS
/SELECT TABLES
/IF COMMANDS=['Reliability'] SUBTYPES=['Item Total Statistics']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='alpha_worksheet' VIEWER=YES.
RELIABILITY
/VARIABLES=
points_18618
points_18618
points_3286
points_3290
points_3583
points_4018
points_7775
points_7789
points_7792
points_18631
points_18652
/SCALE('2017 Fall CRN 4157 Exam 01 v. 1.0') ALL
/MODEL=ALPHA
/SUMMARY=TOTAL.
* Second syntax identifies any variables in the OMS dataset that are LTE the alpha value.
* I have to manually enter the alpha value...
DATASET ACTIVATE alpha_worksheet.
IF (CronbachsAlphaifItemDeleted <= .694) Keep =1.
EXECUTE.
SORT CASES BY Keep(D).
Ideally, instead of having to repeat this process over and over, I'd like syntax that would automate this process.
Hope that makes sense, and if you have a solution thanks in advance (this has been bugging me for years!) Cheers

SPSS "No cases were input" warning - Is it possible to get a table with 0 counts?

I am running a huge syntax, with lots of CTABLES and FREQUENCIES commands. Some of them have a filter:
TEMPORARY.
SELECT IF [condition].
FREQUENCIES VAR1.
In some cases, this results in no cases being selected, so the output is just a warning text. Is it possible to still get a table with 0 counts...?
If all cases are screened out, a procedure never gets a chance to run. However, suppose you create one case with everything missing but a filter value of 1. Then use CTABLES instead of FREQUENCIES and specify that empty categories should be shown (on the Categories subdialog if using the gui.)
If you want to make this perfectly accurate, create a weight variable with case 1 weighted by a very small value (1e-8, say), and all the other cases with a a weight of 1.

How does OpenTSDB downsample data

I have a 2 part question regarding downsampling on OpenTSDB.
The first is I was wondering if anyone knows whether OpenTSDB takes the last end point inclusive or exclusive when it calculates downsampling, or does it count the end data point twice?
For example, if my time interval is 12:30pm-1:30pm and I get DPs every 5 min starting at 12:29:44pm and my downsample interval is summing every 10 minute block, does the system take the DPs from 12:30-12:39 and summing them, 12:40-12:49 and sum them, etc or does it take the DPs from 12:30-12:40, then from 12:40-12:50, etc. Yes, I know my data is off by 15 sec but I don't control that.
I've tried to calculate it by hand but the data I have isn't helping me. The numbers I'm calculating aren't adding up to the above, nor is it matching what the graph is showing. I don't have access to the system that's pushing numbers into OpenTSDB so I can't setup dummy data to check.
The second question is how does downsampling plot its points on the graph from my time range and downsample interval? I set downsample to sum 10 min blocks. I set my range to be 12:30pm-1:30pm. The graph shows the first point of the downsampled graph to start at 12:35pm. That makes logical sense.I change the range to be 12:24pm-1:29pm and expected the first point to start at 12:30 but the first point shown is 12:25pm.
Hopefully someone can answer these questions for me. In the meantime, I'll continue trying to find some data in my system that helps show/prove how downsampling should work.
Thanks in advance for your help.
Downsampling isn't currently working the way you expect, although since this is a reasonable and commonly made expectations, we are thinking of changing this in a later release of OpenTSDB.
You're assuming that if you ask for a "10 min sum", the data points will be summed up within each "round" (or "aligned") 10 minute block (e.g. 12:30-12:39 then 12:40-12:49 in your example), but that's not what happens. What happens is that the code will start a 10-minute block from whichever data point is the first one it finds. So if the first one is at time 12:29:44, then the code will sum all subsequent data points until 600 seconds later, meaning until 12:39:44.
Within each 600 second block, there may be a varying number of data points. Some blocks may have more data points than others. Some blocks may have unevenly spaced data points, e.g. maybe all the data points are within one second of each other at the beginning of the 600s block. So in order to decide what timestamp will result from the downsampling operation, the code uses the average timestamp of all the data points of the block.
So if all your data points are evenly spaced throughout your 600s block, the average timestamp will fall somewhere in the middle of the block. But if you have, say, all the data points are within one second of each other at the beginning of the 600s block, then the timestamp returned will reflect that by virtue of being an average. Just to be clear, the code takes an average of the timestamps regardless of what downsampling function you picked (sum, min, max, average, etc.).
If you want to experiment quickly with OpenTSDB without writing to your production system, consider setting up a single-node OpenTSDB instance. It's very easy to do as is shown in the getting started guide.

Resources