Filter values where combination of two columns in one table matches another table - google-sheets

I'm working in Google Sheets and trying to create a FILTER function that returns only the results from a second table where a pair of values exists in the first table. Here's a simplified example:
SpellsInitial (Table 1)
Level
Name
1
Heal
2
Flaming Sphere
3
Fireball
SpellsHeightened (Table 2)
Level
Name
1
Heal
2
Flaming Sphere
2
Heal
3
Fireball
3
Flaming Sphere
3
Heal
And I want to filter SpellsHeightened to return only the results that are in SpellsInitial—essentially "(Level=Level)*(Name=Name)=1".
I have a FILTER function taking a level value as input to print a list of names, but I can't seem to get the ArrayFormula part to work.
=TRANSPOSE(FILTER(SpellsHeightened_Name, A30=SpellsHeightened_Level, (SpellsHeightened_Name=SpellsInitial_Name)*(A30=SpellsInitial_Level)))
I know what I actually need on the last line is "the value on a given line in SpellsHeightened_Name" because otherwise it's the whole array, but I guess I'm struggling to identify and pass in that value using only a level value as input. I tried nesting one FILTER (to get the list of names from Heightened) inside a second FILTER (to match the names up with Initial) but could get that figured out either.
Here's the actual thing in practice.

Perhaps:
=FILTER( SpellsHeightened,
ISNUMBER( MATCH( SpellsHeightened_Level&SpellsHeightened_Name,
SpellsInitial_Level&SpellsInitial_Name, 0 ) ) )

Related

[Google Data Studio]: Can't create histogram as bin dimension is interpreted as metric

I like to make a histogram of some data that is saved in a nested BigQuery table. In a simplified manner the table can be created in the following way:
CREATE TEMP TABLE Sessions (
id int,
hits
ARRAY<
STRUCT<
pagename STRING>>
);
INSERT INTO Sessions (id, hits)
VALUES
( 1,[STRUCT('A'),STRUCT('A'),STRUCT('A')]),
( 2,[STRUCT('A')]),
( 3,[STRUCT('A'),STRUCT('A')]),
( 4,[STRUCT('A'),STRUCT('A')]),
( 5,[]),
( 6,[STRUCT('A')]),
( 7,[]),
( 8,[STRUCT('A')]),
( 9,[STRUCT('A')]),
(10,[STRUCT('A'),STRUCT('A')]);
and it looks like
id
hits.pagename
1
A
A
A
2
A
3
A
A
and so on for the other ids. My goal is to obtain a histogram showing the distribution of A-occurences per id in data studio. The report for the MWE can be seen here: link
So far I created a calculated field called pageviews that evaluates the wanted occurences for each session via SUM(IF(hits.pagename="A",1,0)). Looking at a table showing id and pageviews I get the expected result:
table showing the number of occurences of page A for each id
However, the output of the calculated field is a metric, which might cause trouble for the following. In the next step I wanted to follow the procedure presented in this post. Therefore, I created another field bin to assign my sessions to bins according to the homepageviews as:
CASE
WHEN pageviews = 0 OR pageviews = 1 THEN "bin 1"
WHEN pageviews = 2 OR pageviews = 3 THEN "bin 2"
END
According to this bin-defintion I hope to obtain a histogram having 6 counts in bin 1 and 4 counts in bin 2. Well, in this particular example it will actually have 4 counts in bin one as ids 5 and 7 had "null" entries, but never mind. This won't happen in my real world table.
As you can see in the next image showing the same table as above, but now with a bin-column, this assignment works as well - each id is assigned the correct bin, but now the output field is a metric of type text. Therefore, the bar-chart won't let me use it (it needs it as dimension).
Assignment of each id to a bin
Somewhere I read the workaround to create a selfjoined blend, which outputs metrics as dimension. This works only by name: my field is now a dimension and i can use it as such for the bar-chart, but the bar chart won't load and shows a configuration error of the data source, which can be seen in this picture:
bar-chart of id-count over bin. In the configuration of the chart one can see that "bin" is now a dimension. The chart won't plot, however, as it shows a data configuration error (sorry for the German version of data studio).

How can I find the first instance (searching up) of a value in a column?

Picture linked below as this is a bit tangled:
I am working with a data set that has "nested" values. There are three different types of entries: categories, then subcategories that are nested under the categories, then individual items that are nested under the subcategories (picture linked below). The entries are matched up using a filter system. Column A has the entry type, column B has the actual value, column C has the filter. The filter is always the value of entry you are nesting under. So, for a subcategory entry, Column A= "Subcategory", Column B= [name of subcategory] Column C = Column B of the category type entry above (the name of category it belongs to).
I need a way to automatically fill in the filters.
The way I am thinking I could do this is to search Column A (moving up) for the first instance of the entry type I need, and then return the value of the Column B cell in that row. Is this possible?
Given your exact data above (looking only at A14:C), delete everything from C14:C (including the header) and place the following formula in C14:
=ArrayFormula({"FILTER"; IF((A15:A="") + (A15:A="Category"),, IF(A15:A="Subcategory", VLOOKUP(ROW(A15:A), FILTER({ROW(A15:A), B15:B}, A15:A="Category"), 2, TRUE), VLOOKUP(ROW(A15:A), FILTER({ROW(A15:A), B15:B}, A15:A="Subcategory"), 2, TRUE)))})
This will create the title (which you can edit within the formula itself as you like) and all results for non-null rows thereafter.
You'll need to adjust the 15 in ranges to whatever the starting row of your non-header data actually is in your sheet.

stack data and restructure without using var to cases or casestovar in SPSS

I have the following situation: a loop (stack data) with only 1 index variable and with multiple items corresponding to the statements, as in the picture below (sorry it is Excel, but is the same as in SPSS):
stack data - cases on multiple lines, but never filling for 1 respondent all the columns
I want to reach to the following situation but without using casestovars to restructure, because that creates a lot of empty variables. I remember for older versions it was a command like Update, which was moving up the cases, to reach the following result:
reducing the cases per respondent
Like starting from this:
ID Index Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6
1 1 1 1
1 2 1 1
1 3 1 1
To reach to this:
ID Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6
1 1 1 1 1 1 1
But without using casestovars. Is there any command in SPSS syntax for this?
Thank you very much, have a nice day!
Not entirely sure how variable your data structure is likely to be in reality but if as demo'ed where you have only a single response for each q1_1 to q1_6 per respondent ID, then the below would be sufficient:
dataset declare dsAgg.
aggregate outfile="dsAgg" /break=respid /q1_1 to q1_6=max(q1_1 to q1_6).
Also not sure of the significance of duplicate index values within the same respondent IDs, if this was intended or not.
The following syntax could do the job -
* first we'll recreate your example data.
data list list/respid index q1_1 to q1_6.
begin data
1,1,1,,,,,
1,2,,2,,,,
1,3,,,1,,,
1,4,,,,2,,
1,5,,,,,1,
1,6,,,,,,2
2,1,3,,,,,
2,1,,4,,,,
2,2,,,5,,,
2,2,,,,4,,
2,3,,,,,3,
2,3,,,,,,2
end data.
* now to work: first thing is to make sure the data from each ID are together.
sort cases by respid index.
* the loop will fill down the data to the last line of each ID.
do repeat qq=q1_1 to q1_6.
if respid=lag(respid) and missing(qq) qq=lag(qq).
end repeat.
* the following lines will help recognize the last line for each ID and select it.
compute lineNR=$casenum.
aggregate /outfile=* mode=ADDVARIABLES/break=respid/MXlineNR=max(lineNR).
select if lineNR=MXlineNR.
exe.

Get Rapidminer to transpose/pivot a single attribute/column in a table

I have a table that looks like the following:
ID City Code
"1005AE" "Oakland" "Value1"
"1006BR" "St.Louis" "Value2"
"102AC" "Miami" "Value1"
"103AE" "Denver" "Value3"
And I want to transpose/pivot the Code examples/values into column attributes like this:
ID City Value1 Value2 Value3
"1005" "Oakland" 1 0 0
"1006" "St.Louis" 0 1 0
"1012" "Miami" 1 0 0
"1030" "Denver" 0 0 1
Note that the ID field is numeric values encoded as strings because Rapidminer had trouble importing bigint datatypes. So that is a separate issue I need to fix--but my focus here is the pivoting or transposing of the data.
I read through a few different Stackoverflow posts listed below. They suggested the Pivot or Transpose operations. I tried both of these, but for some reason I am getting either a huge table which creates City as a dummy variable as well, or just some subset of attribute columns.
How can I set the rows to be the attributes and columns the samples in rapidminer?
Rapidminer data transpose equivalent to melt in R
Any suggestions would be appreciated.
In pivoting, the group attribute parameter dictates how many rows there will be and the index attribute parameter dictates what the last part of the name of new attributes will be. The first part of the name of each new attribute is driven by any other regular attributes that are neither group nor index and the value within the cell is the value found in the original example set.
This means you have to create a new attribute with a constant value of 1; use Generate Attributes for this. Set the role of the ID attribute to be ID so that it is no longer a regular attribute; use Set Role for this. In the Pivot operator, set the group attribute to be City and the index attribute to be Code. The end result is close to what you want. The final steps are, firstly to set missing values to be 0; use Replace Missing Values for this and, secondly to rename the attributes to match what you want; use Rename for this.
You will have to join the result back to the original since the pivot operation loses the ID.
You can find a worked example here http://rapidminernotes.blogspot.co.uk/2011/05/worked-example-using-pivot-operator.html

Sybase compare columns with duplicate row ids

So far I have a query with a result set (in a temp table) with several columns but I am only concerned with four. One is a customer ID(varchar), one is Date (smalldatetime), one is Amount(money) and the last is Type(char). I have multiple rows with the same custmer ID and want to evaluate them based on Date, Amount and Type. For example:
Customer ID Date Amount Type
A 1-1-10 200 blue
A 1-1-10 400 green
A 1-2-10 400 green
B 1-11-10 100 blue
B 1-11-10 100 red
For all occurrences of A I want to compare them to identify only one, first by earliest date, then by greatest Amount, then if still tied by comparing Types. I would then return one row for each customer.
I would provide some of the query but I am at home now after spending two days trying to get a correct result. It looks something like this:
(query to populate #tempTable)
GROUP BY customer_id
HAVING date_cd =
(SELECT MIN(date_cd)
FROM order_table ot
WHERE ot.customerID = #tempTable.customerID
)
OR date_cd IS NULL
I assume the HAVING would result in only one row per customer_id. This did not end up being the case since there were some ties there.
I am not sure I can do the OR - there are some with NULL values here - and it did not account for the step to the next comparison if they were all the same anyway. I am not seeing a way to avoid doing some row processing of the temp table with some kind of IF or WHERE loop.
As I write I am thinking maybe I use #tempTable.date_cd in the HAVING clause instead of looking at the original table. but that should return the same dates?
Am I on the right track or is there something missing? Suggestions? More info??
try below query :-
select * from #tempTable
GROUP BY customer_id
HAVING isnull(date_cd,"1900/01/01") =min(isnull(date_cd,"1900/01/01"))

Resources