Suppose I have the following dataset lookup:
ID T001 T002 T002 T004 T005
1 0 1 2 3 4
2 1 2 3 4 5
And I want to merge this onto my main dataset main:
proc sql;
create table main as
select a.*, b.*
from main as a
left join lookup as b on a.ID = b.ID;
quit;
However, this will merge the variables at "T001", "T002", "T003" etc.
I am trying to rename the variables with the merge/join, without having to manually rename each of them as there are 100's of these variables in the dataset. I am looking to get something like
ID V1 V2 V3 V4 V5
1 0 1 2 3 4
2 1 2 3 4 5
You can change the variables names dynamically with a simple macro function after the join
data have;
input ID T001 T002 T003 T004 T005;
datalines;
1 0 1 2 3 4
2 1 2 3 4 5
;
%macro rn;
%do i = 1 %to 5;
T00&i. = V&i.
%end;
%mend;
proc datasets lib=work nolist;
modify have;
rename %rn;
run;quit;
EDIT:
data have;
array t T001-T586 (586*100);
run;
%macro rn;
%do i=1 %to 586;
T%sysfunc(putn(&i., z3.)) = V&i.
%end;
%mend;
proc datasets lib=work nolist;
modify have;
rename %rn;
run;quit;
Related
I am building an ms access db to manage part numbers of mixtures. It’s pretty much a bill of materials. I have a table, tblMixtures that references itself in the PreMixture field. I set this up so that a mixture can be a pre-mixture in another mixture, which can in turn be a pre-mixture in another mixture, etc. Each PartNumber in tblMixture is related to many Components in tblMixtureComponents by the PartNumber. The Components and their associated data is stored in tblComponentData. I have put in example data in the tables below.
tblMixtures
PartNumber
Description
PreMixtures
1
Mixture 1
4, 5
2
Mixture 2
4, 6
3
Mixture 3
4
Mixture 4
3
5
Mixture 5
6
Mixture 6
tblMixtureComponents
ID
PartNumber
Component
Concentration
1
1
A
20%
2
1
B
40%
3
1
C
40%
4
2
A
40%
5
2
B
30%
6
2
D
30%
tblComponentData
ID
Name
Density
Category
1
A
1.5
O
2
B
2
F
3
C
2.5
I
4
D
1
F
I have built the queries needed to pull the information together for the final mixture and even display the details of the pre-mixtures and components used for each mixture. However, with literally tens of thousands of part numbers, there can be a lot of overlap in pre-mixtures used for mixtures. In other words, Mixture 4 can be used as a pre-mixture for Mixture 1 and Mixture 2 and a lot more. I want to build a query that will identify all possible mixtures that can be used as a pre-mixture in a selected mixture. So I want a list of all the mixtures that have the same components or subset of components as the selected mixtures. The pre-mixture doesn’t have to have all the components in the mixture, but it can’t have any components that are not in the mixture.
If you haven't solved it yet...
The PreMixtures column storing a collection of data is a sign that you need to "Normalize" your database design a little more. If you are going to be getting premixture data from a query then you do not need to store this as table data. If you did, you would be forced to update the premix data every time your mixtures or components changed.
Also we need to adress that tblMixtures doesn't have an id field. Consider the following table changes:
tblMixture:
id
description
1
Mixture 1
2
Mixture 2
3
Mixture 3
tblMixtureComponent:
id
mixtureId
componentId
1
1
A
2
1
B
3
1
C
4
2
A
5
2
B
6
2
D
7
3
A
8
4
B
I personally like to use column naming that exposes primary to foreign key relationships. tblMixtures.id is clearly related to tblMixtureComponenets.mixtureId. I am lazy so i would also probably abreviate everything too.
Now as far as the query, first lets get the components of mixture 1:
SELECT tblMixtureComponent.mixtureId, tblMixtureComponent.componentId
FROM tblMixtureComponent
WHERE tblMixtureComponent.mixtureId = 1
Should return:
mixtureId
componentId
1
A
1
B
1
C
We could change the WHERE clause to the id of any mixture we wanted. Next we need to get all the mixture ids with bad components. So we will build a join to compare around the last query:
SELECT tblMixtureComponent.mixtureId
FROM tblMixtureComponenet LEFT JOIN
(SELECT tblMixtureComponent.mixtureId,
tblMixtureComponent.componentId
FROM tblMixtureComponent
WHERE tblMixtureComponent.mixtureId = 1) AS GoodComp
ON tblMixtures.componentId = GoodComp.componentId
WHERE GoodComp.componentId Is Null
Should return:
mixtureId
2
Great so now we have ids of all the mixtures we don't want. Lets add another join to get the inverse:
SELECT tblMixture.id
FROM tblMix LEFT JOIN
(SELECT tblMixtureComponent.mixtureId
FROM tblMixtureComponenet LEFT JOIN
(SELECT tblMixtureComponent.mixtureId,
tblMixtureComponent.componentId
FROM tblMixtureComponent
WHERE tblMixtureComponent.mixtureId = 1) AS GoodComp
ON tblMixtures.componentId = GoodComp.componentId
WHERE GoodComp.componentId Is Null) AS BadMix
ON tblMixtures.id = BadMix.mixtureId
WHERE BadMix.mixtureId = Null AND tblMixture.id <> 1
Should return:
mixtureId
3
4
Whats left is all of the ids of that have similar components but not nonsimilar components to mixture 1.
Sorry i did this on a phone...
I have 2 active record relation objects with the code as follow:
#obj1 = User.select('user.X, table2.Y, table2.Z, count (*)')
.merge(#some_variable)
.joins(:table1, :table2)
.group(1, 2, 3)
#obj2 = User.select('user.X, table2.Y, count (*)')
.merge(#some_variable)
.joins(:table1, :table2)
.group(1, 2)
Basically, the only difference between #obj1 and #obj2 is that #obj2 is not selecting table2.Z column data.
Here is a sample data that I would like both #obj to have:
#obj1
-------------------------------------
user.X table2.Y table2.Z count
-------------------------------------
1 1 A 1
1 1 B 1
2 1 A 1
2 1 B 1
2 1 C 1
#obj2
-------------------------
user.X table2.Y count
-------------------------
1 1 2
2 1 3
Currently the queries above are working fine, but I believe it is possible to further refactor the code? Like having #obj2 to get the records based on #obj1 data without having to do similar sql query? Appreciate if anyone got input on this. Many thanks in advance.
columns = %w(users.X, table2.Y table2.Z count(*))
#obj1 = User.merge(#some_variable)
.joins(:table1, :table2)
.group(1, 2, 3)
.select(*columns)
#obj2 = #obj1.select(*(columns - ["table2.Z"]))
A further step in refactoring would be to use Arel to replace the string conditions for portability.
I have a database (*.mdb), scheme of connection, that I use in my program:
TADOConnection -> TADOTable
DB has a table named Table1, which is connected by ADOTable. In Table1 there are fields A, B, C - floating point values. I need to sort the table by sums of these numbers.
For example:
Name A B C
------ --- --- ---
John 1 2 5
Nick 1 5 3
Qwert 1 5 2
Yuiop 2 3 1
I need to sort them, so the name, which A+B+C is bigger, would be first.
Sorted variant:
Name A B C
------ --- --- ---
Nick 1 5 3
John 1 2 5
Qwert 1 5 2
Yuiop 2 3 1
How to do this ?
While writing this, I understood what to do: I need a calculated field in the table, which is equal to A+B+C, and I must sort the table using it.
I do not have MS Access but with other Data Base Systems, I would use SQL to achieve this:
There are several SO answers along these lines for MS Access (try Microsoft Access - grand total adding multiple fields together)
So start with something like this:
Select Name, (A+B+C) as total, A, B, C
from table1
order by total desc
I'm trying to find a nice way to store word compositions of the following form:
exhaustcleaningsystem
exhaust cleaning system
exhaustcleaning system
exhaust cleaningsystem
The combinations are given by a default per case. Every word in a composition is stored as a unique row in table 'labels'.
labels
id value
--------------------------
1 exhaustcleaningsystem
2 exhaust
3 cleaning
4 system
5 exhaustcleaning
6 cleaningsystem
I thought about a new table called 'compositions':
compositions
id domain_id range
----------------------
1 1 2,3,4
2 1 5,4
etc...
But storing multiple separated values in a column isn't normalized design. Any ideas for that?
BTW: I'm using MySQL und ActiveRecord/Rails.
The design you propose is not even in first normal form, since range is not atomic
The schema I'd use here would be
compositions
id domain_id
-------------
1 1
2 1
compositions-content
composition_id rank label_id
------------------------------------------
1 1 2
1 2 3
1 3 4
2 1 5
2 2 4
with composition_id referencing an composition.id and label_id referencing label.id
The rank column is optional and should be here if and only if the range you define here is order-sensitive.
With this design, you have some referential integrity at DB level.
Well, this is as far as I can think of in terms of normalisation:
sets
id domain_id
--------------
1 1
2 1
etc...
compositions
id set_id label_id order
---------------------------
1 1 2 1
2 1 3 2
3 1 4 3
4 2 5 1
5 2 4 2
etc...
What query should I execute in MySQL database to get a result containing partial sums of source table?
For example when I have table:
Id|Val
1 | 1
2 | 2
3 | 3
4 | 4
I'd like to get result like this:
Id|Val
1 | 1
2 | 3 # 1+2
3 | 6 # 1+2+3
4 | 10 # 1+2+3+4
Right now I get this result with a stored procedure containing a cursor and while loops. I'd like to find a better way to do this.
You can do this by joining the table on itself. The SUM will add up all rows up to this row:
select cur.id, sum(prev.val)
from TheTable cur
left join TheTable prev
on cur.id >= prev.id
group by cur.id
MySQL also allows the use of user variables to calculate this, which is more efficient but considered something of a hack:
select
id
, #running_total := #running_total + val AS RunningTotal
from TheTable
SELECT l.Id, SUM(r.Val) AS Val
FROM your_table AS l
INNER JOIN your_table AS r
ON l.Val >= r.Val
GROUP BY l.Id
ORDER By l.Id