How to get observations as a list in Stata? - return

Stata has r() macro for values that some commands return (return list after the command).
I need similar access to x after list x if y == 1, but list returns only r(N), not values themselves.
Is it possible to get the observations as a local or global macro to refer to it in the code?

Try levelsof command to get distinct values. It's the cat's pajamas.

One way to save values of all observations (i.e. including repeated) is with a loop:
clear
set more off
*----- exmple data -----
sysuse auto
keep rep78
list
*----- what you want -----
forvalues i = 1/`=_N' {
local myvals `myvals' `=rep78[`i']'
}
display "`myvals'"
But more importantly, why do you think you need such a thing?

Related

Print ResponseIDs of missing values SPSS

I'm creating a variable that will hold missing values from a specific variable. Currently, this works but it gives the missing a value a 1. How do I tell spss to print the respondent's ResponseID instead?
My code below:
COMPUTE Q_2_MIS = MISSING(Q_2).
EXECUTE.
Thanks
Your code returns value of 1 because the condition missing(q_2) is evaluated to TRUE.
Try this:
DO IF MISSING(Q_2).
COMPUTE Q_2_MIS = ResponseID .
END IF.
EXECUTE.
or (as per eli-k's comment) simply use IF:
IF MISSING(Q_2) Q_2_MIS = ResponseID .
EXECUTE.
Note that you might need to create the Q_2_MIS variable first, if you do not have it in your dataset.
Alternatively, if you want to print out the IDs of the respondents with missing in Q_2:
TEMPORARY.
SELECT IF missing(q_2).
LIST ResponseID q_2.
You will see a list of IDs in the SPSS Output, with a (blank) Q_2 next to each ID.

Interpolating numeric values in Stata without creating new variables

I have a longitudinal data set with recurring observations (id 1,2,3...) per year. I have thousands of variables of all types. Some rows (indicated by a variable to_interpolate == 1) need to have their numeric variables linearly interpolated (they are empty) based on values of the same id from previous and next years.
Since I can't name all variables, I created a varlist of numeric variables. Also, I do not want to recreate thousands of extra variables, so I need to replace the existing missing values.
What I did so far:
quietly ds, has(type numeric)
local varlist `r(varlist)'
sort id year
foreach var of local varlist {
by id: ipolate `var' year replace(`var') if to_interpolate==1
}
No matter what I do, I get an error message:
factor variables and time-series operators not allowed
r(101);
My questions:
How is the 'replace' even proper syntax? if not, how to replace the existing variable values instead of creating new variables?
If the error means that factors exist in my varlist - how to detect them?
If not, how to get around this?
Thanks!
As #William Lisowski underlines, there is no replace() option to `ipolate'. Whatever is not allowed by its syntax diagram is forbidden. In any case, keeping a copy of the original is surely to be commended as part of an audit trail.
sort id
quietly ds, has(type numeric)
foreach var in `r(varlist)' {
by id: ipolate `var' year, gen(`var'2)
}
Ok, this is a workaround since I can't find a way to replace values with ipolate that is feasible for thousands of variables:
quietly ds, has(type double float long int)
local varlist `r(varlist)'
sort id year
foreach var of local varlist {
quietly by id: replace `var' = (`var'[_n-1] + `var'[_n+1])/2 if to_interpolate==1
}
This is a linear interpolation, which will work for single year gaps, but not for two years in a row, but for my purposes it is enough. I will be very happy to see a better solution :)

H2O randomForest column/feature selection

In h2o.randomForest, lets say I have 5 input features x=c("A","B","C","D","E"), is there anyway to force the algorithm to always choose A,B AND one of the remaining features?
In this case h2o.randomForest is just asking you to pass correct x (list of columns to use in prediction) and y (the column name to do prediction) so anything you will pass will be used as input.
What you are asking is a python specific question. How you want to pass the list of columns you will need to write logic for it. You can defined the following is a function and use it as needed.
import random
myframe = ["a","b","c","d","e"]
//You can also set myframe as column name list
//myframe.remove(_use_response_column_name) this will make it generic
selectedkeys = ["a","b"]
for item in selectedkeys:
if item in myframe:
myframe.remove(item)
selectedkeys.append(random.choice(myframe))
print(selectedkeys)
print(myframe)
You just need to pass the selectedkeys as input for X.

Java 8- forEach method iterator behaviour

I recently started checking new Java 8 features.
I've come across this forEach iterator-which iterates over the Collection.
Let's take I've one ArrayList of type <Integer> having values= {1,2,3,4,5}
list.forEach(i -> System.out.println(i));
This statement iteates over a list and prints the values inside it.
I'd like to know How am I going to specify that I want it to iterate over some specific values only.
Like, I want it to start from 2nd value and iterate it till 2nd last value. or something like that- or on alternate elements.
How am I going to do that?
To iterate on a section of the original list, use the subList method:
list.subList(1, list.length()-1)
.stream() // This line is optional since List already has a foreach method taking a Consumer as parameter
.forEach(...);
This is the concept of streams. After one operation, the results of that operation become the input for the next.
So for your specific example, you can follow #Joni's command. But if you're asking in general, then you can create a filter to only get the values you want to loop over.
For example, if you only wanted to print the even numbers, you could create a filter on the streams before you forEached them. Like this:
List<Integer> intList = Arrays.asList(1,2,3,4,5);
intList.stream()
.filter(e -> (e & 1) == 0)
.forEach(System.out::println);
You can similarly pick out the stuff you want to loop over before reaching your terminal operation (in your case the forEach) on the stream. I suggest you read this stream tutorial to get a better idea of how they work: http://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/

Adding second variable in foreach command - Stata

I have datasets like this:
C:\temp\SalesFigures FY13.dta
C:\temp\SalesFigures FY14.dta
C:\temp\SalesFigures FY15.dta
etc.
Each file contains sales data from 50 states. I often need to run a block of code for just some of the states in these files. I specify those states in a file called StatesToRun.dta (e.g., AK, CA, WA) and use a foreach command to loop through each state. I also use a macro to specify the FY .dta file I want to use.
For example:
* Specify file to run.
local FY "FY14"
* Run code only for the states I list in StatesToRun.dta.
use "C:/temp/StatesToRun.dta", clear
levelsof state, local(statelist)
foreach MyState of local statelist
{
use "C:/temp/SalesFigures 'FY'.dta", clear
keep if state == `"`MyState'"'
* etc. ...
}
THE NEED
I sometimes need to run my code for several of the FY files in C:\temp. So I'd like to create a loop for that, too. For example, if I wanted to run the code for AK, CA, and WA, for the FY14 and FY15 .dta files, I'd enter "AK", "CA", and "WA" for state in StatesToRun.dta, and "FY14" and "FY15" for a variable I could call "FY" in StatesToRun.dta. I'm just not sure how to incorporate this second variable into the loop. I read you can nest foreach statements, but I'm not sure if that's the best approach.
Being rather new to Stata, this is my best guess:
* Run code only for the states and FYs I list in StatesToRun.dta.
use "C:/temp/StatesToRun.dta", clear
levelsof state, local(statelist)
levelsof FY, local(FYlist)
foreach MyState of local statelist {
foreach MyFY of local FYlist {
use "C:/temp/SalesFigures 'MyFY'.dta", clear
keep if state == `"`MyState'"'
* etc. ...
}
}
Am I on the right path?
You don't need a loop (nor a macro) to keep observations, as dictated by some "list" in another dataset. You can use merge:
clear
set more off
*----- example file with list of interest ----
sysuse auto
keep make
drop in 6/69
list
tempfile MakesToRun
save "`MakesToRun'"
*---- work with selected observations ----
clear
set more off
sysuse auto
keep make price mpg rep78
list
// keep observations that only appear in list of interest
merge 1:1 make using "`MakesToRun'", keep(matched)
list
Check help merge and the corresponding manual entry to get a good grasp of its working.
You can do this for multiple files using a loop.
Maybe there's a better way to setup the whole thing, but we don't have enough information.

Resources