Consider the following toy data:
input strL Country Population Median_Age Sex_Ratio GDP Trade year
"United States of America" 3999 55 1.01 5000 13.1 2012
"United States of America" 6789 43 1.03 7689 7.6 2013
"United States of America" 9654 39 1.00 7689 4.04 2014
"Afghanistan" 544 24 0.76 457 -0.73 2012
"Afghanistan" 720 19 0.90 465 -0.76 2013
"Afghanistan" 941 17 0.92 498 -0.81 2014
"China" 7546 44 1.01 2000 10.2 2012
"China" 10000 40 0.96 3400 14.3 2013
"China" 12000 38 0.90 5900 16.1 2014
"Canada" 7546 44 1.01 2000 1.2 2012
"Canada" 10000 40 0.96 3400 3.1 2013
"Canada" 12000 38 0.90 5900 8.5 2014
end
I run different regressions (using three different independent variables):
*reg1
local var "GDP Trade"
foreach ii of local var{
qui reg `ii' Population i.year
est table, b p
outreg2 Population using table, drop(i.year*) bdec(3) sdec(3) nocons tex(nopretty) append
}
*reg2
local var "GDP Trade"
foreach ii of local var{
qui reg `ii' Median_Age i.year
est table, b p
outreg2 Population using table2, drop(i.year*) bdec(3) sdec(3) nocons tex(nopretty) append
}
*reg3
local var "GDP Trade"
foreach ii of local var{
qui reg `ii' Sex_Ratio i.year
est table, b p
outreg2 Population using table3, drop(i.year*) bdec(3) sdec(3) nocons tex(nopretty) append
}
I use the append option to append different dependent variables that are to be regressed on the same set of independent variables. Hence, I obtain three different tables.
I wish to "merge" these tables when I compile in LaTeX, so that they appear as a single table, with three different panels, one below the other.
Table1
Table2
Table3
I can use the tex(frag) option of the community-contributed command outreg2, but that will not give me the desired outcome.
Here is a simple way of doing this, using the community-contributed command esttab:
clear
input strL Country Population Median_Age Sex_Ratio GDP Trade year
"United States of America" 3999 55 1.01 5000 13.1 2012
"United States of America" 6789 43 1.03 7689 7.6 2013
"United States of America" 9654 39 1.00 7689 4.04 2014
"Afghanistan" 544 24 0.76 457 -0.73 2012
"Afghanistan" 720 19 0.90 465 -0.76 2013
"Afghanistan" 941 17 0.92 498 -0.81 2014
"China" 7546 44 1.01 2000 10.2 2012
"China" 10000 40 0.96 3400 14.3 2013
"China" 12000 38 0.90 5900 16.1 2014
"Canada" 7546 44 1.01 2000 1.2 2012
"Canada" 10000 40 0.96 3400 3.1 2013
"Canada" 12000 38 0.90 5900 8.5 2014
end
local var "GDP Trade"
foreach ii of local var{
regress `ii' Population i.year
matrix I = e(b)
matrix A = nullmat(A) \ I[1,1]
local namesA `namesA' Population_`ii'
}
matrix rownames A = `namesA'
local var "GDP Trade"
foreach ii of local var{
regress `ii' Median_Age i.year
matrix I = e(b)
matrix B = nullmat(B) \ I[1,1]
local namesB `namesB' Median_Age_`ii'
}
matrix rownames B = `namesB'
local var "GDP Trade"
foreach ii of local var{
regress `ii' Sex_Ratio i.year
matrix I = e(b)
matrix C = nullmat(C) \ I[1,1]
local namesC `namesC' Sex_Ratio_`ii'
}
matrix rownames C = `namesC'
matrix D = A \ B \ C
Results:
esttab matrix(D), refcat(Population_GDP "Panel 1" ///
Median_Age_GDP "Panel 2" ///
Sex_Ratio_GDP "Panel 3", nolabel) ///
gaps noobs nomtitles ///
varwidth(20) ///
title(Table 1. Results)
Table 1. Results
---------------------------------
c1
---------------------------------
Panel 1
Population_GDP .3741343
Population_Trade .0009904
Panel 2
Median_Age_GDP 202.1038
Median_Age_Trade .429315
Panel 3
Sex_Ratio_GDP 18165.85
Sex_Ratio_Trade 27.965
---------------------------------
Using the tex option:
\begin{table}[htbp]\centering
\caption{Table 1. Results}
\begin{tabular}{l*{1}{c}}
\hline\hline
& c1\\
\hline
Panel 1 & \\
[1em]
Population\_GDP & .3741343\\
[1em]
Population\_Trade & .0009904\\
[1em]
Panel 2 & \\
[1em]
Median\_Age\_GDP & 202.1038\\
[1em]
Median\_Age\_Trade & .429315\\
[1em]
Panel 3 & \\
[1em]
Sex\_Ratio\_GDP & 18165.85\\
[1em]
Sex\_Ratio\_Trade & 27.965\\
\hline\hline
\end{tabular}
\end{table}
EDIT:
This preserves the original format:
local var "GDP Trade"
foreach ii of local var{
regress `ii' Population i.year
matrix I = e(b)
matrix A = (nullmat(A) , I[1,1])
local namesA `namesA' `ii'
}
matrix rownames A = Population
matrix colnames A = `namesA'
local var "GDP Trade"
foreach ii of local var{
regress `ii' Median_Age i.year
matrix I = e(b)
matrix B = nullmat(B) , I[1,1]
local namesB `namesB' `ii'
}
matrix rownames B = "Median Age"
matrix colnames B = `namesB'
local var "GDP Trade"
foreach ii of local var{
regress `ii' Sex_Ratio i.year
matrix I = e(b)
matrix C = nullmat(C) , I[1,1]
local namesC `namesC' `ii'
}
matrix rownames C = "Sex Ratio"
matrix colnames C = `namesC'
matrix D = A \ B \ C
Table 1. Results
--------------------------------------
GDP Trade
--------------------------------------
Population .3741343 .0009904
Median Age 202.1038 .429315
Sex Ratio 18165.85 27.965
--------------------------------------
Related
The European Nucleotide Archive (ENA) provides annotated coding sequences (.cds) of many genomes at https://ftp.ebi.ac.uk/pub/databases/ena/coding/con-std_latest/con/.
A pice of file:
ID BAM65753; SV 1; linear; genomic DNA; CON; PRO; 1074 BP.
XX
PA BA000057.1
XX
DT 02-NOV-2012 (Rel. 114, Created)
DT 07-NOV-2012 (Rel. 114, Last updated, Version 2)
XX
DE Ralstonia pickettii outer membrane protein (porin)
XX
KW .
XX
OS Ralstonia pickettii
OC Bacteria; Proteobacteria; Betaproteobacteria; Burkholderiales;
OC Burkholderiaceae; Ralstonia.
XX
RN [1]
RA Hatta T., Hara H., Takizawa N.;
RT ;
RL Submitted (11-OCT-2011) to the INSDC.
RL Contact:Takashi Hatta Okayama University of Science, Department of
RL Biomedical Engineering, Faculty of Engineering; Ridai-cho 1-1, Okayama,
RL Okayama 700-0005, Japan
XX
RN [2]
RX PUBMED; 22738955.
RA Hatta T., Fujii E., Takizawa N.;
RT "Analysis of two gene clusters involved in 2,4,6-trichlorophenol
RT degradation by Ralstonia pickettii DTP0602";
RL Biosci. Biotechnol. Biochem. 76(5):892-899(2012).
XX
DR MD5; f9c860c4130219abd3d574f26fa6df85.
XX
FH Key Location/Qualifiers
FH
FT source 1..1074
FT /organism="Ralstonia pickettii"
FT /strain="DTP0602"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:329"
FT CDS BA000057.1:333324..334397
FT /codon_start=1
FT /transl_table=11
FT /product="outer membrane protein (porin)"
FT /db_xref="GOA:G9M5T3"
FT /db_xref="InterPro:IPR023614"
FT /db_xref="InterPro:IPR033900"
FT /db_xref="UniProtKB/TrEMBL:G9M5T3"
FT /protein_id="BAM65753.1"
FT /translation="MAKRPRNAALCTALLTAGLGFNANAQSSVTLYGQVDSYIGSTRAA
FT GGERALVVGAGGMQTSYWGMKGVEDLGSGMRAIFDLNGFYRVDTGRSGRSDTDGFFTRS
FT AFVGLQSNRYGTVKLGRNTTPYFLSTILFNPLVDSYAFGPSIFHTYKAATNGQVYDPGI
FT IGDSGWSNSVVYSTPTFGGLTANLIYAFGEQAGSTGQSKWGGNLTYFNGAFGATAAFQQ
FT VKFNATPGDVTAPSALVGFNKQNAAQVGLSYDFKVVKMFAQGQYIKTDINGGAGDIRHT
FT NAQLGASVPLGAGSVLLSYAYGRTRHGTNDFSRNTAAIAYDYNLSKRTDLYAAYFYDKL
FT TSQSHGDAFGVGMRHRF"
XX
SQ Sequence 1074 BP; 218 A; 340 C; 318 G; 198 T; 0 other;
atggccaaaa gaccgcgcaa cgctgcactg tgcaccgccc tgctgacagc gggactaggc 60
ttcaatgcca atgcgcaatc gagcgtgacg ctgtacgggc aagtcgattc ctacatcggc 120
agcacacgcg ccgcgggcgg ggaacgcgcc ttggtcgtcg gtgcaggcgg tatgcagacg 180
tcctactggg ggatgaaggg cgtcgaggat cttgggagcg gcatgcgtgc catcttcgac 240
ctgaacgggt tctaccgcgt cgatacgggg cgatccggca gatcggatac tgacggcttc 300
ttcacccgca gcgccttcgt gggcctgcag agcaatcgct acggtacggt caagctgggc 360
cgcaacacca cgccatactt cctgtcgacg atcctgttca acccgctggt cgattcgtac 420
gcgttcgggc catcgatctt tcatacctac aaggccgcca ccaacggaca ggtctacgac 480
cccggcatca ttggcgactc cggctggtcg aactccgtcg tgtactcgac gccgacgttc 540
ggcggcctga ccgccaacct catctacgcc ttcggcgagc aggccggcag taccggccag 600
agcaagtggg gcggaaacct gacctatttc aacggcgcat tcggagccac ggcagcgttc 660
cagcaagtca agttcaatgc gacaccagga gacgtcaccg ctcccagcgc cctggttggc 720
ttcaacaagc agaatgcggc ccaggtcgga ctgtcttacg atttcaaggt ggtcaagatg 780
tttgcccagg gtcagtacat caagaccgat atcaatgggg gcgcgggcga catcagacac 840
acgaacgccc agctcggcgc ctcggttccc cttggcgctg gcagcgtctt gctgtcatac 900
gcgtacggcc ggaccaggca tggcactaac gacttcagca ggaataccgc ggcaatcgcc 960
tatgactaca acctgtcaaa gcgcaccgac ttgtacgcgg cctactttta cgacaagctg 1020
acttcccaat cccatggcga tgcgttcggg gtggggatgc ggcatcgctt ctga 1074
//
How can I parse the file without missing any information? My goal is to mapping the UniProtKB Accession with the nucleotide sequences.
I tried to use the SeqIO in Biopython to parse this file. My goal is to mapping the UniProtKB Accession with the nucleotide sequences, my code:
# Bio.__version__ = '1.79'
from Bio import SeqIO
cds_file = open("/data3/jsun/spgen/ena_data/CON_PRO_1.cds", 'r')
for record in SeqIO.parse(cds_file, "gb"):
print(record.id)
break
However, the db_xref information of CDS is missing in record.features. Is there any way I can get this information using the SeqIO parser? Thanks.
I am running a ttest command and exporting results to LaTeX using estpost and the community-contributed command esttab.
I am testing for a difference for means (of variable height, by child gender) for several years and would like the years to be displayed vertically (in rows) rather than horizontally.
My code and is given below:
foreach i in 2009 2010 2013 {
use "`i'.dta", clear
global year `i'
eststo _$year : estpost ttest height, by(child_gender)
}
esttab . using "trends.tex", nonumber append
Data for 2009:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(child_gender height)
0 156
1 135
0 189
1 168
0 157
1 189
1 135
1 145
0 124
1 139
end
Data for 2010:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(child_gender height)
0 151
1 162
0 157
1 134
0 157
1 189
1 135
1 145
0 143
1 166
end
Data for 2013:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(child_gender height)
0 177
0 135
0 189
0 168
0 157
1 189
1 135
1 145
1 124
1 127
end
I would like the output arranged as follows (but in LaTeX):
Any suggestions on how to make this work?
The way to do this can be found below. You need to play with the options to further polish the table.
First define the program append_ttests, which is a quickly modified version of appendmodels, Ben Jann's program for stacking models:
program append_ttests, eclass
version 8
syntax namelist
tempname b V tmp
foreach name of local namelist {
qui est restore `name'
mat `tmp' = e(b)
local eq1: coleq `tmp'
gettoken eq1 : eq1
mat `tmp' = `tmp'[1,"`eq1':"]
local cons = colnumb(`tmp',"_cons")
if `cons'<. & `cons'>1 {
mat `tmp' = `tmp'[1,1..`cons'-1]
}
mat `b' = nullmat(`b') , `tmp'
mat `tmp' = e(t)
mat `tmp' = `tmp'["`eq1':","`eq1':"]
if `cons'<. & `cons'>1 {
mat `tmp' = `tmp'[1..`cons'-1,1..`cons'-1]
}
capt confirm matrix `V'
if _rc {
mat `V' = `tmp'
}
else {
mat `V' = ///
( `V' \ ///
`tmp' )
}
}
mat `b' = `b''
mat A = `b' , `V'
mat rown A = `0'
ereturn matrix results = A
eret local cmd "append_ttests"
end
Then run your loop and append the t-tests:
foreach i in 2009 2010 2013 {
use "`i'.dta", clear
estpost ttest height, by(child_gender)
estimates store year`i'
}
append_ttests year2009 year2010 year2013
See the results as follows:
esttab e(results), nonumber mlabels(none) ///
varlabels(year2009 2009 year2010 2010 year2013 2013) ///
collabels("Height" "t statistic")
--------------------------------------
Height t statistic
--------------------------------------
2009 4.666667 .3036859
2010 -3.166667 -.2833041
2013 21.2 1.415095
--------------------------------------
Add the tex option to see the LaTeX output.
I need to convert some musical note inputs representing a chord to numbers above it's root note 0 using Lua.
So from the midi data we get the notes of a C13 Chord
input: C, E, G, A#, D, F, A
as the root note 0 is C we start on the C note,
below we have 2 octaves of a piano keyboard, 12 notes on each where chords are played
0C 1C# 2D 3D# 4E 5F 6F# 7G 8G# 9A 10A# 11B 12C 13C# 14D 15D# 16E 17F 18F# 19G 20G# 21A 22A# 23B
so C is the root note 0
D,F,A are played on the next octave
result: 0,4,7,10,14,17,21
so if we have a D chord
input: D,F#,A
D is the root note 0
all notes played on the first octave
0D 1D# 2E 3F 4F# 5G 6G# 7A 8A# 9B 10C 11C# 12D 13D# 14E 15F 16F# 17G 18G# 19A 20A# 21B 22C 23C#
result: 0,4,7
G#m7#9 Chord
input: G#,B,D#,F#,B
0G# 1A 2A# 3B 4C 5C# 6D 7D# 8E 9F 10F# 11G 12G# 13A 14A# 15B 16C 17C# 18D 19D# 20E 21F 22F# 23G
result: 0,3,7,10,15
Something like this may work:
local function notes2nums(input)
local map = {A = 9, ["A#"] = 10, B = 11, C = 0, ["C#"] = 1, D = 2, ["D#"] = 3, E = 4, F = 5, ["F#"] = 6, G = 7, ["G#"] = 8}
local base, prev
return (input:gsub("([^,]+)", function(note)
local num = map[note] or error(("Unexpected note value '%s'"):format(note))
base = base or num
num = num - base
if prev and num < prev then num = num + 12 end
prev = num
return tostring(num)
end))
end
print(notes2nums("D,F#,A"))
print(notes2nums("C,E,G,A#,D,F,A"))
print(notes2nums("G#,B,D#,F#,B"))
This prints:
0,4,7
0,4,7,10,14,17,21
0,3,7,10,15
I would like to export summary statistics produced with the xtsum command:
webuse nlswork, clear
xtsum hours birth_yr
Variable | Mean Std. Dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
hours overall | 36.55956 9.869623 1 168 | N = 28467
between | 7.846585 1 83.5 | n = 4710
within | 7.520712 -2.154726 130.0596 | T-bar = 6.04395
| |
birth_yr overall | 48.08509 3.012837 41 54 | N = 28534
between | 3.051795 41 54 | n = 4711
within | 0 48.08509 48.08509 | T-bar = 6.05689
Is there a way to do this in Stata?
Below you can find an implementation, which uses the community-contributed command esttab (type ssc install estout to download) for exporting the produced (LaTeX) table.
First define program xtsum2:
program define xtsum2, eclass
syntax varlist
foreach var of local varlist {
xtsum `var'
tempname mat_`var'
matrix mat_`var' = J(3, 5, .)
matrix mat_`var'[1,1] = (`r(mean)', `r(sd)', `r(min)', `r(max)', `r(N)')
matrix mat_`var'[2,1] = (., `r(sd_b)', `r(min_b)', `r(max_b)', `r(n)')
matrix mat_`var'[3,1] = (., `r(sd_w)', `r(min_w)', `r(max_w)', `r(Tbar)')
matrix colnames mat_`var'= Mean "Std. Dev." Min Max "N/n/T-bar"
matrix rownames mat_`var'= `var' " " " "
local matall `matall' mat_`var'
local obw `obw' overall between within
}
if `= wordcount("`varlist'")' > 1 {
local matall = subinstr("`matall'", " ", " \ ",.)
matrix allmat = (`matall')
ereturn matrix mat_all = allmat
}
else ereturn matrix mat_all = mat_`varlist'
ereturn local obw = "`obw'"
end
You can then run xtsum2 and get the results with esttab:
xtsum2 hours birth_yr
esttab e(mat_all), mlabels(none) labcol2(`e(obw)') varlabels(r2 " " r3 " ")
------------------------------------------------------------------------------------------
Mean Std. Dev. Min Max N/n/T-bar
------------------------------------------------------------------------------------------
hours overall 36.55956 9.869623 1 168 28467
between . 7.846585 1 83.5 4710
within . 7.520712 -2.154726 130.0596 6.043949
birth_yr overall 48.08509 3.012837 41 54 28534
between . 3.051795 41 54 4711
within . 0 48.08509 48.08509 6.056888
------------------------------------------------------------------------------------------
For LaTeX output, simply add the tex option:
esttab e(mat_all), mlabels(none) labcol2(`e(obw)') varlabels(r2 " " r3 " ") tex
\begin{tabular}{lc*{5}{c}}
\hline\hline
& & Mean& Std. Dev.& Min& Max& N/n/T-bar\\
\hline
hours & overall & 36.55956& 9.869623& 1& 168& 28467\\
& between & .& 7.846585& 1& 83.5& 4710\\
& within & .& 7.520712& -2.154726& 130.0596& 6.043949\\
birth\_yr & overall & 48.08509& 3.012837& 41& 54& 28534\\
& between & .& 3.051795& 41& 54& 4711\\
& within & .& 0& 48.08509& 48.08509& 6.056888\\
\hline\hline
\end{tabular}
I have a time-series dataset in this format:
Time Val1 Val2
0 0.68 0.39
30 0.08 0.14
35 0.12 0.07
40 0.17 0.28
45 0.35 0.31
50 0.14 0.45
100 1.01 1.31
105 0.40 1.20
110 2.02 0.57
115 1.51 0.58
130 1.32 2.01
Using this dataset I want to extract(not predict) Time at which FC1=1 and FC2=1. Here is a plot that I created with annotated points I would like to extract.
I am looking for a solution using or function to interpolate/intercept to extract values. For example, if I draw a straight line at fold change 1 (say in y-axis), I want to extract all the points on X-axis where the line intercepts.
Looking forward for suggestions and thanks in advance !
You can use approxfun to do interpolations and uniroot to find single roots (places where the line crosses). You would need to run uniroot multiple times to find all the crossings, the rle function may help choose the starting points.
The FC values in your data never get close to 1 let alone cross it, so you must either have a lot more data than shown, or mean a different value.
If you can give more detail (possibly include a plot showing what you want) then we may be able to give more detailed help.
Edit
OK, here is some R code that finds where the lines cross:
con <- textConnection(' Time Val1 Val2
0 0.68 0.39
30 0.08 0.14
35 0.12 0.07
40 0.17 0.28
45 0.35 0.31
50 0.14 0.45
100 1.01 1.31
105 0.40 1.20
110 2.02 0.57
115 1.51 0.58
130 1.32 2.01')
mydat <- read.table(con, header=TRUE)
with(mydat, {
plot( Time, Val1, ylim=range(Val1,Val2), col='green', type='l' )
lines(Time, Val2, col='blue')
})
abline(h=1, col='red')
afun1 <- approxfun( mydat$Time, mydat$Val1 - 1 )
afun2 <- approxfun( mydat$Time, mydat$Val2 - 1 )
points1 <- cumsum( rle(sign(mydat$Val1 - 1))$lengths )
points2 <- cumsum( rle(sign(mydat$Val2 - 1))$lengths )
xval1 <- numeric( length(points1) - 1 )
xval2 <- numeric( length(points2) - 1 )
for( i in seq_along(xval1) ) {
tmp <- uniroot(afun1, mydat$Time[ points1[c(i, i+1)] ])
xval1[i] <- tmp$root
}
for( i in seq_along(xval2) ) {
tmp <- uniroot(afun2, mydat$Time[ points2[c(i, i+1)] ])
xval2[i] <- tmp$root
}
abline( v=xval1, col='green' )
abline( v=xval2, col='blue')