I like to make a histogram of some data that is saved in a nested BigQuery table. In a simplified manner the table can be created in the following way:
CREATE TEMP TABLE Sessions (
id int,
hits
ARRAY<
STRUCT<
pagename STRING>>
);
INSERT INTO Sessions (id, hits)
VALUES
( 1,[STRUCT('A'),STRUCT('A'),STRUCT('A')]),
( 2,[STRUCT('A')]),
( 3,[STRUCT('A'),STRUCT('A')]),
( 4,[STRUCT('A'),STRUCT('A')]),
( 5,[]),
( 6,[STRUCT('A')]),
( 7,[]),
( 8,[STRUCT('A')]),
( 9,[STRUCT('A')]),
(10,[STRUCT('A'),STRUCT('A')]);
and it looks like
id
hits.pagename
1
A
A
A
2
A
3
A
A
and so on for the other ids. My goal is to obtain a histogram showing the distribution of A-occurences per id in data studio. The report for the MWE can be seen here: link
So far I created a calculated field called pageviews that evaluates the wanted occurences for each session via SUM(IF(hits.pagename="A",1,0)). Looking at a table showing id and pageviews I get the expected result:
table showing the number of occurences of page A for each id
However, the output of the calculated field is a metric, which might cause trouble for the following. In the next step I wanted to follow the procedure presented in this post. Therefore, I created another field bin to assign my sessions to bins according to the homepageviews as:
CASE
WHEN pageviews = 0 OR pageviews = 1 THEN "bin 1"
WHEN pageviews = 2 OR pageviews = 3 THEN "bin 2"
END
According to this bin-defintion I hope to obtain a histogram having 6 counts in bin 1 and 4 counts in bin 2. Well, in this particular example it will actually have 4 counts in bin one as ids 5 and 7 had "null" entries, but never mind. This won't happen in my real world table.
As you can see in the next image showing the same table as above, but now with a bin-column, this assignment works as well - each id is assigned the correct bin, but now the output field is a metric of type text. Therefore, the bar-chart won't let me use it (it needs it as dimension).
Assignment of each id to a bin
Somewhere I read the workaround to create a selfjoined blend, which outputs metrics as dimension. This works only by name: my field is now a dimension and i can use it as such for the bar-chart, but the bar chart won't load and shows a configuration error of the data source, which can be seen in this picture:
bar-chart of id-count over bin. In the configuration of the chart one can see that "bin" is now a dimension. The chart won't plot, however, as it shows a data configuration error (sorry for the German version of data studio).
I have the following situation: a loop (stack data) with only 1 index variable and with multiple items corresponding to the statements, as in the picture below (sorry it is Excel, but is the same as in SPSS):
stack data - cases on multiple lines, but never filling for 1 respondent all the columns
I want to reach to the following situation but without using casestovars to restructure, because that creates a lot of empty variables. I remember for older versions it was a command like Update, which was moving up the cases, to reach the following result:
reducing the cases per respondent
Like starting from this:
ID Index Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6
1 1 1 1
1 2 1 1
1 3 1 1
To reach to this:
ID Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6
1 1 1 1 1 1 1
But without using casestovars. Is there any command in SPSS syntax for this?
Thank you very much, have a nice day!
Not entirely sure how variable your data structure is likely to be in reality but if as demo'ed where you have only a single response for each q1_1 to q1_6 per respondent ID, then the below would be sufficient:
dataset declare dsAgg.
aggregate outfile="dsAgg" /break=respid /q1_1 to q1_6=max(q1_1 to q1_6).
Also not sure of the significance of duplicate index values within the same respondent IDs, if this was intended or not.
The following syntax could do the job -
* first we'll recreate your example data.
data list list/respid index q1_1 to q1_6.
begin data
1,1,1,,,,,
1,2,,2,,,,
1,3,,,1,,,
1,4,,,,2,,
1,5,,,,,1,
1,6,,,,,,2
2,1,3,,,,,
2,1,,4,,,,
2,2,,,5,,,
2,2,,,,4,,
2,3,,,,,3,
2,3,,,,,,2
end data.
* now to work: first thing is to make sure the data from each ID are together.
sort cases by respid index.
* the loop will fill down the data to the last line of each ID.
do repeat qq=q1_1 to q1_6.
if respid=lag(respid) and missing(qq) qq=lag(qq).
end repeat.
* the following lines will help recognize the last line for each ID and select it.
compute lineNR=$casenum.
aggregate /outfile=* mode=ADDVARIABLES/break=respid/MXlineNR=max(lineNR).
select if lineNR=MXlineNR.
exe.
I have a series, disk, that contains a path (/mnt/disk1, /mnt/disk2, etc) and total space of a disk. It also includes free and used values. These values are updated at a specified interval. What I would like to do, is query to get the sum of the total of the last() of each path. I would also like to do the same for free and for used, to get a aggregate of the total size, free space, and used space of all of my disks on my server.
I have a query here that will get me the last(total) of all the disks, grouped by its path (for distinction):
select last(total) as total from disk where path =~ /(mnt\/disk).*/ group by path
Currently, this returns 5 series, each containing 1 row (the latest) and the value of its total. I then want to take the sum of those series, but I cannot just wrap the last(total) into a sum() function call. Is there a way to do this that I am missing?
Carrying on from my comment above about nested functions.
Building a toy example:
CREATE DATABASE FOO
USE FOO
Assuming your data is updated at intervals greater than[1] every minute:
CREATE CONTINUOUS QUERY disk_sum_total ON FOO
BEGIN
SELECT sum("total") AS "total_1m" INTO disk_1m_total FROM "disk"
GROUP BY time(1m)
END
Then push some values in:
INSERT disk,path="/mnt/disk1" total=30
INSERT disk,path="/mnt/disk2" total=32
INSERT disk,path="/mnt/disk3" total=33
And wait more than a minute. Then:
INSERT disk,path="/mnt/disk1" total=41
INSERT disk,path="/mnt/disk2" total=42
INSERT disk,path="/mnt/disk3" total=43
And wait a minute+ again. Then:
SELECT * FROM disk_1m_total
name: disk_1m_total
-------------------
time total_1m
1476015300000000000 95
1476015420000000000 126
The two values are 30+32+33=95 and 41+42+43=126.
From there, it's trivial to query:
SELECT last(total_1m) FROM disk_1m_total
name: disk_1m_total
-------------------
time last
1476015420000000000 126
Hope that helps.
[1] Picking intervals smaller than the update frequency prevents minor timing jitters from making all the data being accidentally summed twice for a given group. There might be some "zero update" intervals, but no "double counting" intervals. I typically run the query twice as fast as the updates. If the CQ sees no data for a window, there will be no CQ performed for that window, so last() will still give the correct answer. For example, I left the CQ running overnight and pushed no new data in: last(total_1m) gives the same answer, not zero for "no new data".
If I have an object that has_many - how would I go about getting back only the results that are related to the original results related ids?
Example:
tier_tbl
| id | name
1 low
2 med
3 high
randomdata_tbl
| id | tier_id | name
1 1 xxx
2 1 yyy
3 2 zzz
I would like to build a query that returns only, in the case of the above example, rows 1 and 2 from tier_tbl, because only 1 and 2 exist in the tier_id data.
Im new to activerecord, and without a loop, don't know a good way of doing this. Does rails allow for this kind of query building in an easier way?
The reasoning behind this is so that I can list only menu items that relate to the specific object I am dealing with. If the object i am dealing with has only the items contained in randomdata_tbl, there is no reason to display the 3rd tier name. So i'd like to omit it completely. I need to go this direction because of the way the models are set up. The example im dealing with is slightly more complicated.
Thanks
Lets call your first table tiers and second table randoms
If tier has many randoms and you want to find all tiers whoes id present in table randoms, you can do it that way:
# database query only
Tier.joins(:randoms).uniq
or
# with some ruby code
Tier.select{ |t| t.randoms.any? }
we have big challenge : Application is too slow because of huge records :(
we look for best data access pattern in web application (asp.net mvc 3)
with big data (sql server view with more than 66 million records.)
customers have forms with search fields and will see results within a grid of web page.
since now
we used both these data access ways:
1- entity framework with some features like compiled query
--get result
GridModelsResult = VwLoanInquiryList.Skip(pageIndex * pageSize).Take(pageSize).ToList();
--get count
int totalPages = (int)Math.Ceiling((float)totalRecords / (float)pageSize);
2- sql server store procedure for pagination and get only current page results
--get result
select t1.* into #t from
(select *
FROM VwLoanInquiry
where CustomerNumber like '%'+#CustomerNumber+'%'
) AS [t1]
WHERE (#PageIndex=-1 or (t1.ID BETWEEN (#PageIndex-1)* #RecordPerPage AND ((#PageIndex)* #RecordPerPage)-1))
--get count
select count(*) from #t
3- OLAP; we made cube in analysis services (SSAS)
but here our problem is to show records with string column type within Cube Cells (its impossible to define Non-numeric fields for measures )
in additon ,
to get result faster,our view has index and data types of column has been set .
please help us to know which one of these ways are better ,or if there is an other way what is that?
thanks