Can get correct statistics from fastparquet - dask

I am getting None statistics (min / max) when reading file from S3 using fastparquet.
When calling
fp.ParquetFile(fn=path, open_with=myopen).statistics['min']
Most of the values are None, and some of the values are valid.
However, when I read the same file with other framework, I am able to get the correct min/max for all values.
How can I get all the statistics?
Thanks

The full set of row groups are available as the list
pf = fp.ParquetFile(fn=path, open_with=myopen)
pf.row_groups
and each row group has a .columns attribute, which in turn have meta_data; so you can dig around to see what the individual min/max of the columns are.

Related

Google Data Studio Dates and Text

I am having trouble with two things trying to setup my report on Google Data Studio.
Using numbers with dates.
I have some numeric values to use with dates. However, I have the same date for different values and google sums up those values and I need them separated.
Also, when a certain day does not have any value, google gives it value 0 and I cant have that because the value isnt 0, I just need it to skip those days. I have generated a graph as an example.
I can´t generate a graph using text as a dimension. I think I am doing something wrong but I wasn´t able to.
Table:
Graph:
1) On the line chart, set the date as dimension, set the city as the breakdown dimension, the clicks as the metric.
2) create a chart filter and exclude when the "clicks are null"

Weighted Issue Prioritization in Jira

I'm a product manager, so I want to be able to collect data on my issues in Jira and automatically see a calculated priority value of this issue:
I have a set of standardized metrics that I can apply to my tickets:
Urgency, Count of users impacted, Strategic Focus, Issue Value, LOE
and the like.
Each of these metrics has some numerical value that can
be added up to get a score. This numerical value is a representation
of a picklist. E.g. Hotfix = 5, Critical = 3, Trivial = 1. Each of
these metrics has some weight when compared to each other, e.g.
Urgency counts 15% towards a final score, LOE counts 20%.
I use these
values to calculate a priority score like this: (Metric1 *
WeightMetric1) + (Metric2 * WeightMetric2) + (Metric3 *
WeightMetric3) + (Metric4 * WeightMetric4) + (Metric5 *
WeightMetric5) = SCORE
Metrics and WeightMetric are global values
that I want to be able to change at times to accommodate shifting
priorities and focuses.
I would like to be able to:
STORE global values for Metric and WeigthMetric
CALCULATE a score based on the global values selected for each issue
CHANGE global values as needed
Anyone ever tried that? Anyone a clue if Jira can pull this off?
This is not something that you'll get from JIRA out of the box, but you can script that behaviour if you're willing to use a commercial add-on like Script Runner.
Script Runner supports "script fields". Which are field that calculate their value based on whatever script you put together. Once you have that, you'll be able to use your field in search filter, gadgets, reports, etc.
More info is available in the documentation.

SPSS Frequency Plot Complication

I am having a hard time generating precisely the frequency table I am looking for using SPSS.
The data in question: cases (n = ~800) with categorical variables DX_n (n = 1-15), each containing ICD9 codes, many of which are the same code. I would like to create a frequency table that groups the DX_n variables such that I can view frequency of every diagnosis in this sample of cases.
The next step is to test the hypothesis that the clustering of diagnoses in this sample is different than that of another. If you have any advice as to how to test this, that would be really appreciated as well!
Thanks!
Edit: My attempts:
1) Analyze -> Descriptive Statistics -> Frequencies; then add variables DX_n (1-15) and display frequency charts. The output is frequencies of each ICD9 code per DX_n variable (so 15 tables are generated - I'm hoping to just have one grouped table).
2) I tried adjusting the output format to organize by variable and also to compare variables but neither option gives the output I'm looking for.
I think what you are looking for CTABLES. It can do parallel columns of frequencies, and it includes a column proportions test that can see whether the distributions differ
Thank you, JKP! You set me on exactly the right track. I'm not sure how I overlooked that menu. Just to clarify in case anyone else comes along needing to figure this out:
Group diagnosis variables into a multiple response set using Analyze > Custom Tables > Multiple Response Sets. Code the variables as categories.
http:// i.imgur.com/ipE9suf.png
Create a custom table with your new multiple response set as a row and the subsets to compare as columns. I set summary statistics to compute from rows and added the column n% column (sorted descending).
http:// i.imgur.com/hptIkfh.png
Under test statistics, include a column proportions z-test as JKP suggested.
http:// i.imgur.com/LYI6ZRl.png
Behold, your results:
http:// i.imgur.com/LgkBA8X.png
Thanks again, and best of luck to anyone else who runs across this.
-GCH
p.s. Sorry everyone, I was going to post images but don't have enough reputation points yet. Images detailing the steps in the GUI can be found at the obfuscated links above.

Ad-words Average order value and Max CPC [API]

I am pulling data from Adwords report API. I can get everything that I need clicks, impressions, position for my keyword. But there are two things what I can't find in those:
Average order value.
Max CPC (hourly)
Could you please recommend me where I can get this date I need.
Thanks for any suggestions!
The query item MaxCpc was changed within the last year or so. It was replaced with the query item CpcBid. I'm not sure what to do about order value, but the documentation containing all of the query items for the different report types can be found at:
https://developers.google.com/adwords/api/docs/appendix/reports
Hope this helps!

blackberry reading a text file and updating after sort

I am successfully able to read and print the contents of a text file. My text file contains 5 data entries such as
Rashmi 120
Prema 900
It must sort only the integers in descending order and swap the respective names attached to them. the first column of serial number must remain the same. Each time a new entry is made that score must be compared to the existing 5 records and placed accordingly with new name and score.
Since this is blackberry programming and blackberry APIs don't support Collections.sort,please tell me how do I do this. I tried using SimpleSortingVector but I am unable to put it into coding form.
i believe u need to start with your own logic like
1) sorting depends on comparison
2) before making any comparison u need to split each string by spaces
3) after splitting save the name and numbers in different arrays
4) compare the numbers and accordingly do sorting
5) after this merge the array contents using indexing
m just giving u a way may be its not the perfect but drilling down may refine logics and usage of the api

Resources