LEFT JOIN in SAS using PROC SQL - join

I am new to SAS and have this basic problem. I have a list of NYSE trading dates in table A as follows -
trading_date
1st March 2012
2nd March 2012
3rd March 2012
4th March 2012
5th March 2012
6th March 2012
I have another table B that has share price information as -
Date ID Ret Price
1st March 2012 1 … …
3rd March 2012 1 … …
4th March 2012 1 … …
5th March 2012 1 … …
6th March 2012 1 … …
1st March 2012 2 … …
3rd March 2012 2 … …
4th March 2012 2 … …
... has numeric data related to price and returns.
Now I need to join the NYSE Data table to the above table to get the following table -
Date ID Ret Price
1st March 2012 1 … …
2nd March 2012 1 0 0
3rd March 2012 1 … …
4th March 2012 1 … …
5th March 2012 1 … …
6th March 2012 1 … …
1st March 2012 2 … …
2nd March 2012 2 0 0
3rd March 2012 2 … …
4th March 2012 2 … …
i.e. a simple left join. The zero's will be filled with . in SAS to indicate missing values, but you get the idea. But if I use the following command -
proc sql;
create table joined as
select table_a.trading_date, table_b.* from table_a LEFT OUTER join table_b on table_a.trading_date=table_b.date;
quit;
The join happens only for the first ID (i.e. ID=1) while for the rest of the IDs, the same data is maintained. But I need to insert the trade dates for all IDs.
How can get the final data without running a do while loop for all IDs? I have 1000 IDs and looping and joining 1000 times is not an option due to limited memory.

Joe is right, you need to take also ID into consideration, but with his solution you cannot get 2nd March 2012 because no one is trading that day. You can do everything with just one sql step (which will take a bit longer):
proc sql;
create table final as
select d.trading_date, d.ID, t.Price, t.Ret
from
(
select trading_date, ID
from table_a, (select distinct ID from table_b)
) d
left join
(
select *
from table_b
) t
on t.Date=d.trading_date and t.ID=d.ID
order by d.id, d.trading_date;
quit;

Your left join doesn't work since it doesn't take ID into account. SAS (or rather SQL) doesn't know that it should repeat by ID.
The easiest way to get the full combination is PROC FREQ with SPARSE, assuming someone has a trade on every valid trading day.
proc freq data=table_b noprint;
tables id*trading_date/sparse out=table_all(keep=id trading_date);
run;
Then join that to the original table_b by id and date.
Alternately, you can use PROC MEANS, which can get your numerics (it can't get characters this way, unless you can use them as a class value).
Using table_b as created by Anton (With ret and price variables):
proc means data=table_b noprint completetypes nway;
class id trading_date;
var ret price;
output out=table_allmeans sum=;
run;
This will output missing for missing rows and values for present rows, and will have a _FREQ_ variable that allows you to differentiate whether a row is really present in the trading dataset or not.

I suppose there must be something off with the data because your query looks fine and worked on the testing data I generated along the lines you described:
data table_a;
format trading_date date9.;
do trading_date= "01MAR2012"d to "06MAR2012"d;
output;
end;
run;
data table_b;
format date date9.;
ret = 0;
price = 0;
do date= "01MAR2012"d to "06MAR2012"d;
do ID = 1 to 4;
if ranuni(123) < 0.3 then
output;
end;
end;
run;
Below is what I get after running your query copied verbatim:
trading_date date ret price ID
01MAR2012 01MAR2012 0 0 3
02MAR2012 02MAR2012 0 0 2
03MAR2012 03MAR2012 0 0 1
03MAR2012 03MAR2012 0 0 2
04MAR2012 04MAR2012 0 0 2
05MAR2012 05MAR2012 0 0 3
06MAR2012 . . . .
It is worth checking the format of your dates- are they numeric? If they are character, are they formatted the same way? If they are numeric, are they dates or datetimes with some odd format applied?

Related

select all columns based on a specific row value Google-Sheets

Hi everybody I am trying to query an already formatted google sheets, I am able to filter some of those data (I used =query(x,select * where ... )). The output I get is the following:
may
may
june
june
july
july
july
planned
name
1
0
1
1
2
3
1
Now I want to refer to all the numbers under may (or june or july) in order to do some operation. I can' t just select the value I want because I need to automate it.
How can I get all the columns containing a specific marker(in my case the name of the month)? If it is not possible can you suggest me a different way to do that ? (I am not very experienced with google sheets or excel)
Since query can't select rows, you'd transpose it first and then select the columns you want and then retranspose it back, if needed:
Input:
may
may
june
june
july
july
july
planned
name
1
0
1
1
2
3
1
Formula(select columns >0):
=QUERY(TRANSPOSE(A27:I28),"Select * where Col2>0")
Output:
planned name
may
1
june
1
june
1
july
2
july
3
july
1

Split revenue by year with Google Sheets' QUERY function

I get monthly revenue data from the finance department that I have clean to input into a reporting format. Its monthly data that lists all revenue in a single column. I need to split out the revenue by years (2018, 2019, etc.).
I believe that I need to use a query function for this but if you have some other solution, then I'm open to that too.
The data looks like this:
Client Source Month Year Revenue
abc Google 1 2019 100
abc Google 1 2018 100
abc Facebook 1 2018 50
abc Facebook 2 2018 50
And I need it to look like this:
Client Source Month 2018 Revenue 2019 Revenue
abc Google 1 100 100
abc Facebook 1 50 0
abc Facebook 2 50 0
I'm familiar with query functions but I can't wrap my head around how to do this.
The pseudo code for this would be something like:
select Client,
Source,
Month,
Case when Year in 2019 then sum(Revenue) as 2019 Revenue else 0 end,
Case when Year in 2018 then sum(Revenue) as 2018 Revenue else 0 end
from Data
Group by Client, Source, Month
Please let me know if I need to provide any additional information. And I appreciate your help with this problem.
=QUERY(A1:E, "select A,B,C,sum(E) where A is not null group by A,B,C pivot D", 1)

What datatype is used to insert only 'YEAR' in Informix?

I am trying to create a table in that one column should accept only 'YEAR'
eg: 2017, 2018, 2019 along with 'CURRENT YEAR' as default value.
I am unable to create such a column — can you please help me?
Expected Output :
ID Today_Year
----- -----------
100 2017
101 2018
102 2018
The database server is Informix.
You could do that as follows:
create table t2 (c1 datetime year to year default current year to year, c2 int);
insert into t2 (c2) values(100);
insert into t2 values ('2017',101);
insert into t2 values (current,102);
select * from t2;
c1 c2
2018 100
2017 101
2018 102
You can use any of a number of types. The 'obvious' one is:
DATETIME YEAR TO YEAR
The least obvious one is:
INTERVAL YEAR TO YEAR — not recommended
The other alternative is simply an integer type:
SMALLINT
INTEGER
BIGINT — overkill; not recommended
INT8 — deprecated; do not use
The integer types may be easier to use in the long run than DATETIME YEAR TO YEAR; I'd almost certainly use INTEGER or SMALLINT.
How can I create a table with a 'year' column and the default value in that column is the current year?
Hmmm; that's hard to do with an INTEGER type. At that point, DATETIME comes to the rescue:
CREATE TABLE YearTable
(
RowNumber SERIAL NOT NULL PRIMARY KEY,
YearColumn DATETIME YEAR TO YEAR NOT NULL DEFAULT CURRENT YEAR TO YEAR,
DateColumn DATE NOT NULL DEFAULT TODAY
);
INSERT INTO YearTable(RowNumber) VALUES(0);
SELECT * FROM YearTable;
That yields, for example (assuming DBDATE=Y4MD- in the environment):
1 2018 2018-03-05
You can't use arbitrary expressions for defaults even when they seem reasonable — which is irksome. Hence, if you need the default, the DATETIME YEAR TO YEAR type is the type of choice. (That's what jrenaut suggested in their answer.)
You could arrange for a trigger with an integer type; that is considerably harder work, though.

Insert a column from another table using calculated column as a key in Spotfire

I have two tables (A and B) that are related through the following four columns:
TECHNOLOGY - LAYER - YEAR - WORKWEEK
YEAR and WORKWEEK are calculated columns in both of the tables.
I have a column (WAFCOUNT) in table A that I want to insert into table B based off of those four related columns.
I've tried Insert->Columns, but it won't allow me to join the YEAR and WORKWEEK columns. I know this will work if I freeze them, but I'm trying not to do that so the tables don't become embedded.
It's my goal to keep this library item as dynamic as possible.
Here's a data sample for table A.
TECHNOLOGY LAYER YEAR WORKWEEK WAFCOUNT
XV-15 A 2016 1 23
XV-15 A 2016 2 14
XV-15 B 2016 2 49
XV-20 A 2016 1 7
XV-20 B 2016 1 19
Here's a data sample for table B.
TECHNOLOGY LAYER YEAR WORKWEEK
XV-20 A 2016 1
XV-20 B 2016 1
XV-15 A 2016 1
XV-15 A 2016 2
XV-15 B 2016 2
I have created a 'Unique_ID' column concatenating TECHNOLOGY - LAYER - YEAR - WORKWEEK in both the tables. Using the Unique_ID column, I have added 'WAFCOUNT' column into B. Please let me know if this helps.
Below is the screenshot for your reference:
enter image description here
Use a transformation instead of a calculated column from Insert>tranformations>Calculate new Column and then try to join

"Join" facts and dimensions in olap cube

I have to calculate / present number of team trough years in olap cube
My team fact is structured this way:
TeamId DateFrom DateTo (FactTeams)
1 2012 2015
2 2012 2015
3 2012 2015
4 2015 2018
1 2018 2019
Cube must be able to answer, for example, how many teams have been active in year 2012 (3 teams)
I have prepared another helper fact table that contains all combination of teams id and their dates.
TeamId DateRange (FactTeamDates)
1 2012
1 2013
1 2014
1 2015
1 2018
1 2019
2 2012
2 2013
2 2014
2 2015
...and so on ...
I have created two facts one FactTeams and another FactTeamDates. I have also standard date dimension. Here is my data source view:
https://www.dropbox.com/s/5d2gzumxv5fejdq/teams.jpg
FactTeams.TeamId is linked to FactTeamDates.TeamId and FactTeamDates.DateRange is linked to DimDates.DateKey.
I have measure “Team Number” that is distinct count of FactTeams on column TeamId.
My desired MDX query output for measure Team Number on COLUMNS and Years ON ROWS is:
Team Number
Year 2012 3
Year 2013 3
Year 2014 3
Year 2015 4
My question: How to organize my fact and set dimension usage in my cube to get desired output?
SQL query that produce desired output:
SELECT
d.CalendarYear
,COUNT(DISTINCT TeamId)
FROM FactTeams zt
INNER JOIN FactTeamsDates td ON zt.TeamId = td.TeamId
INNER JOIN DimDates d ON d.DateKey = td.DateRange
GROUP BY d.CalendarYear ORDER BY 1
Note that I know that I can create data view based on the above sql query (with joins) and then have one joined fact table, but I want to have some kind of join between my cubes dimensions and facts – to have joins in cube (olap) level only, not in sql (database, or cube data view)
Thanks in advance
You can create a distinct count measure and this should solve your problem

Resources