I have data in the following format:
ID Var1
1 a
1 a
1 b
1 b
2 c
2 c
2 c
I'd like to convert it (restructure it) to the following format in SPSS:
ID Var1_1 Var1_2 Var1_3 Total_Count
1 n(a)=2 n(b)=2 n( c )=0 4
2 n(a)=0 n(b)=0 n( c )=3 3
First I'll create some fake data to work with:
data list list/ID (f1) Var1 (a1).
begin data
1 a
1 a
1 b
1 b
2 c
2 c
2 c
3 b
3 c
3 c
3 c
end data.
dataset name ex.
Now you can run the following - aggregate, restructure, create the string with the counts:
aggregate outfile=* /break ID Var1/n=n.
sort cases by ID Var1.
casestovars /id=ID /index=var1.
recode a b c (miss=0).
string Var1_1 Var1_2 Var1_3 (a10).
do repeat abc=a b c/Var123=Var1_1 Var1_2 Var1_3/val="a" "b" "c".
compute Var123=concat("n(", val, ")=", ltrim(string(abc, f3))).
end repeat.
compute total_count=sum(a, b, c).
If you're doing this in SPSS Modeler, here is a stream image that works for this. The order is:
Create Data Set using User Input node, setting ID to integer and Var1 to string
Restructure by Var1 values to generate field Var1_a, Var1_b, and Var1_c
Aggregate using key field ID to sum counts Var1_a, Var1_b, and Var1_c, allowing Record Count field to be generated
Output to Table
Restructure and Aggregate in SPSS Modeler
The transpose node comes in handy if you use version 18.1.
As it is a simple pivot, you can go to "Fields and Records", then place the ID in "Index", Var1 in "Fields" and see if you can add another field for Count aggregation. If not, just derive it.
Related
In Column A I have the id of the home team, B the name of the home team, C the id of the visiting team and in D the name of the visiting team:
12345 Borac Banja Luka 98765 B36
678910 Panevezys 43214 Milsami
1112131415 Flora 7852564 SJK
1617181920 Magpies 874236551 Dila
I want to create a column of ids and another of names but keeping the sequence of who will play with whom:
12345 Borac Banja Luka
98765 B36
678910 Panevezys
43214 Milsami
1112131415 Flora
7852564 SJK
1617181920 Magpies
874236551 Dila
Currently (the model works) I'm joining the columns with a special character, using flatten and finally split:
=ARRAYFORMULA(SPLIT(FLATTEN({
FILTER(A1:A&"§§§§§"&B1:B,(A1:A<>"")*(B1:B<>"")),
FILTER(C1:C&"§§§§§"&D1:D,(C1:C<>"")*(D1:D<>""))
}),"§§§§§"))
Is there a less archaic and correct approach to working in this type of case?
Spreadsheet to tests
889
A
5687
C
532
B
8723
D
Stack up the columns using {} and SORT them by a SEQUENCE of 1,2,1,2:
=SORT({A1:B2;C1:D2},{SEQUENCE(ROWS(A1:B2));SEQUENCE(ROWS(A1:B2))},1)
889
A
5687
C
532
B
8723
D
You can also try with function QUERY, enter this formula in F1:
={QUERY((A1:B), "SELECT * WHERE A IS NOT NULL and B IS NOT NULL",1);
QUERY((C1:D), "SELECT * WHERE C IS NOT NULL and D IS NOT NULL",1)}
I have a model view controller in my rails app, which pulls in user entered data from the view and places it into a table in the database. The table named "table" that looks like this:
__X__|__Y__|__Created_At__
A | 1 | 2021-01-02
B | 5 | 2021-01-02
C | 3 | 2021-01-02
A | 4 | 2021-01-01
What I need is a function to find the unique values in the X column of the table (i.e. a singular A, B, C... in order of the most recently entered value... so in this case the value I'd want pulled for A would be the first one there since it was created most recently), then it needs to take those unique valued rows and pull the Y values and sum them together.
This is what I was trying but it didn't work:
Table.distinct(:x).sum(:y)
The issue is that seems to just be summing all values in Y, instead of disregarding the bottom duplicate A.
If it makes any difference I'm using Rails 6.0.0
I'm not sure I'm understanding this perfectly: you have a Table model with an X:string column. You want to get all instances of Table
Table.all
and return an array of them
Table.all.collect {|x| x }
You then want to sort them by created_at
Table.all.collect {|x| x }.sort_by{|x| x.created_at}
and keep the most recent ones. This should do everything so far:
Table.all.collect {|x| x }.sort_by{|x| x.created_at}.uniq
What do you mean by "take these unique valued rows and pull the Y values and sum them together"? Sum all the Y values? Or sum the Xs and the Ys? Are the X integers to be added?
If you're using postgres it supports "distinct on" which makes it relatively easy to do this directly in the database. That would look like:
SELECT SUM(distinct_on_x.Y)
FROM (
SELECT distinct ON (X) Y
FROM table
ORDER BY created_at DESC
) AS distinct_on_x
You can execute this directly with select_values, something like:
select_distinct_count_sql = <<-SQL
SELECT SUM(distinct_on_x.Y)
FROM (
SELECT distinct ON (X) Y
FROM table
ORDER BY created_at DESC
) AS distinct_on_x
SQL
Table.connection.select_values(select_distinct_count_sql)
If you just want the distinct rows so you can sum them in memory you can do:
Table.select("DISTINCT ON (X) *").order("X, created_at")
I'm using a QUERY function in Google Sheets. I have a named data range ("Contributions" in table on another sheet) that consists of many columns, but I'm only concerned with two of them. For simplicity sake, it looks something like this:
I have another table that contains the unique set of names (e.g.: "Fred", "Ginger", etc. each only once) and I want to extract the level # (column B) from the above table to insert the most recent (largest number) in this second table.
Right now, my query looks like this:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
The problem is, that it outputs both B & C data - e.g.:
11 Fred
But since I already have the name (in column A of this other table) I only want it to output the value from B - e.g.:
11
Is there a way to output only a subset (in this case 1 of 2) of the columns of output based on a directive within the query itself (as opposed to doing post-processing of the results)?
Outputting a Subset of Columns Used in Query
In order to output only certain columns of a query result, the query only needs to select the columns to be displayed while the constraints / conditions may utilize other columns of data.
For example (as an answer to my own question) - I have a table like this:
I needed to get the data from the row with a name matching another cell (on another sheet) and with the latest (largest) number - but I only want to output the number part.
My initial attempt was:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
But that output both B & C where I only wanted B. The answer (thanks to # Calculuswhiz) was to continue using C for the condition but only select on B:
=QUERY(Contributions, "select B where C='"&A5&"' order by B desc limit 1",1)
I have a file that is something like below format.
test.txt
1 | ABC | A, B, C, D
I need a stored procedure that insert record in details table in row by row basis. e.g.
ID Name Type
1 ABC A
1 ABC B
1 ABC C
1 ABC D
Is it possible through stored procedure in sql. Any help will be appreciated. Thanks in advance.
You can either:
Split it in your code and then insert them
Bulk insert them in a temporary table and split them all like this:
-- SAMPLE Data
declare #data table(id int, name varchar(10), type varchar(100))
insert into #data(id, name, type) values
(1, 'ABCD', 'A, B, C, D')
, (2, 'EFG', 'E, F, G')
, (3, 'HI', 'H, I')
-- Split All Rows and Types
Select ID, Name, ltrim(rtrim(value))
From (
Select *, Cast('<x>'+Replace(d.type,',','</x><x>')+'</x>' As XML) As types
From #data d
) x
Cross Apply (
Select types.x.value('.', 'varchar(10)') as value
From x.types.nodes('x') as types(x)
) c
Output:
ID Name Type
1 ABCD A
1 ABCD B
1 ABCD C
1 ABCD D
2 EFG E
2 EFG F
2 EFG G
3 HI H
3 HI I
I have a database (*.mdb), scheme of connection, that I use in my program:
TADOConnection -> TADOTable
DB has a table named Table1, which is connected by ADOTable. In Table1 there are fields A, B, C - floating point values. I need to sort the table by sums of these numbers.
For example:
Name A B C
------ --- --- ---
John 1 2 5
Nick 1 5 3
Qwert 1 5 2
Yuiop 2 3 1
I need to sort them, so the name, which A+B+C is bigger, would be first.
Sorted variant:
Name A B C
------ --- --- ---
Nick 1 5 3
John 1 2 5
Qwert 1 5 2
Yuiop 2 3 1
How to do this ?
While writing this, I understood what to do: I need a calculated field in the table, which is equal to A+B+C, and I must sort the table using it.
I do not have MS Access but with other Data Base Systems, I would use SQL to achieve this:
There are several SO answers along these lines for MS Access (try Microsoft Access - grand total adding multiple fields together)
So start with something like this:
Select Name, (A+B+C) as total, A, B, C
from table1
order by total desc