Separating grouped histogram data in rows in Stata - histogram

I have some histogram data that needs transforming in order for me to use the Stata command DCdensity.
Here is the current form of the data:
-5--56-
-10--70-
-15--60-
-20--67-
-25--62-
But I need it such that I have 56 rows of 5, 70 rows of 10, 60 rows of 15 etc...
How could I make this transformation? The alternative is to edit the source code for the command but that would be far more complicated.

As suggested, you should use expand. Supposing that your variable is named var1 this is how I would do it:
replace var1 = subinstr(subinstr(var1,"--","_",.),"-","",.)
split var1, p("_")
destring var1?, replace
expand var12

Related

Convert text to equation and return the sum result in Google Sheets

Let's say I have text in the following format in Column A imported to another spreadsheet (impossible to add = manually because the data is imported automatically and change):
45+5
45+3
90+2
90+7
Is there any formula that can convert this text into an equation that gives the result of the sum in Column B?
For example:
=ARRAYFORMULA(FUNCTIONTOCONVERTTEXTTOEQUATION(A1:A))
Expected Result:
50
48
92
97
Note: The texts will always be a number after the + sign and then another number.
Given your response to my clarifying question above, let's assume that your raw data is in A2:A. Place the following in the Row-2 cell (e.g., B2) of an otherwise empty column:
=ArrayFormula(IF(A2:A="",,MMULT(IFERROR(TRIM(SPLIT(A2:A,"+"))*1,0),SEQUENCE(COLUMNS(SPLIT(A2:A,"+")),1,1,0))))
MMULT is a powerful yet underused function. I'll include a graphic that explains what it does better than words might:
SPLIT will form the elements of the first matrix, while SEQUENCE will simply create the second matrix consisting of a column of 1's the same length as the number of horizontal elements formed by the SPLIT (which, in your case, will apparently always be 2).
Try, assuming the imported data starts at A1
=arrayformula(sum(value(split(A1,"+"))))
or, in a single formula at the top of the column
=mmult(arrayformula(value(split(A1:A4,"+"))),sequence(2,1,1,0))

Can change shape of range with ARRAYFORMULA() in Google Sheets?

My intention is to convert a single line of data into rows consist of a specific number of columns in Google Sheets.
For example, starting with the raw data:
A
B
C
D
E
F
1
id1
attr1-1
attr2-1
id2
attr2-1
attr2-2
And the expected result is:
(by dividing columns by three)
A
B
C
1
id1
attr1-1
attr1-2
2
id2
attr2-1
attr2-2
I already know that it's possible a bit manually, like:
=ARRAYFORMULA({A1:C1;D1:F1})
But I have to start over with it every time the target range is moved OR the subset size needs to be changed (in the case above it was three)!
So I guess there will be a much more graceful way (i.e. formula does not require manual update) to do the same thing and suspect ARRAYFORMULA() is the key.
Any help will be appreciated!
I added a new sheet ("Erik Help") where I reduced your manually entered parameters from two to one (leaving only # of columns to be entered in A2).
The formula that reshapes the grid:
=ArrayFormula(IFERROR(VLOOKUP(SEQUENCE(ROUNDUP(COUNTA(7:7)/A2),A2),{SEQUENCE(COUNTA(7:7),1),FLATTEN(FILTER(7:7,7:7<>""))},2,FALSE)))
SEQUENCE is used to shape the grid according to whatever is entered in A2. Rows would be the count of items in Row 7 divided by the number in A2 (rounded to the nearest whole number); and the columns would just be whatever number is entered in A2.
Example: If there are 11 items in Row 7 and you want 4 columns, ROUNDUP(11/4)=3 rows to the SEQUENCE and your requested 4 columns.
Then, each of those numbers in the grid is VLOOKUP'ed in a virtual array consisting of a vertical SEQUENCE of ordered numbers matching the number of data pieces in Row 7 (in Column 1) and a FLATTENed (vertical) version of the Row-7 data pieces themselves (in Column 2). Matches are filled into the original SEQUENCE grid, while non-matches are left blank by IFERROR
Though it's a bit messy, managed to get it done thanks to SEQUENCE() function anyway.
It constructs a grid by accepting number of rows/columns input, and that was exactly I was looking for.
For reference set up a sheet with the sample data here:
https://docs.google.com/spreadsheets/d/1p972tYlsPvC6nM39qLNjYRZZWGZYsUnGaA7kXyfJ8F4/edit#gid=0
Use a custom formula
Although you already solved this. If you are doing this kind of thing a lot, it could be beneficial to look into Apps Script and custom formulas.
In this case you could use something like:
function transposeSingleRow(range, size) {
// initialize new range
let newRange = []
// initialize counter to keep track
let count = 0;
// start while loop to go through row (range[0])
while (count < range[0].length){
// add a slice of the original range to the new range
newRange.push(
range[0].slice(count, count + size)
);
// increment counter
count += size;
}
return newRange;
}
Which works like this:
The nice thing about the formula here is that you select the range, and then you put in a number to represent its throw, or how many elements make up a complete row. So if instead of 3 attributes you had 4, instead of calling:
=transposeSingleRow(A7:L7, 3)
you could do:
=transposeSingleRow(A7:L7, 4)
Additionally, if you want this conversion to be permanent and not dependent on formula recalculation. Making it in run fully in Apps Script without using formulas would be neccesary.
Reference
Apps Script
Custom Functions

ARRAYFORMULA with SUM OR SUMIF?

I'm trying my to use ARRAYFORMULA with SUM (or SUMIF?). I basically want to lock C1 and always SUM from C1 down
=ARRAYFORMULA((SUM(C1:C2) + 1)&":"&(SUM(C1:C3))) IN D3 is this
=ARRAYFORMULA((SUM(C1:C3) + 1)&":"&(SUM(C1:C4))) IN D4 is this
Here is sample sheet and below is visual.
Col C is 50, 20, 16, etc.
Col D is 2:50, 51:70, 71:86, etc.
https://docs.google.com/spreadsheets/d/1DANMNEahYAoYBCQO1BsfXfUrgPj2mVWNKjn7VuYIIyI/edit#gid=0
units desired_result
50 2:50
20 51:70
16 71:86
8 87:94
2 95:96
If you could give a brief explanation on logic that'd be great. Google's is confusing (as always) and Youtube is limited.
This gives a result close to the one you want, but will need a bit of tweaking if you want to get 2:50 in F2 and 163:163 further down
=arrayformula(if(C2:C="","",sumif(row(C2:C),"<"&row(C2:C),C2:C)+1&":"&sumif(row(C2:C),"<="&row(C2:C),C2:C)))
I think it should be fairly self explanatory - the first part of the formula gives the sum for all rows where row number is less than row number of the current row and the second part of the formula gives the sum for all rows less than or equal to the current row. The slightly tricky thing is to realise that when the criteria part "<"&row(C2:C) of the SUMIF is itself an array, the SUMIF is evaluated separately for each array element and gives a new row in the resulting output array.
To lock a range, use $
=(SUM($C$1:C2) + 1)&":"&(SUM($C$1:C3))
Drag fill down.

Find the sum of each row in a spreadsheet

I'm new to Sheets and I don't know any terminology yet so I wasn't sure how to look this up.
If I have:
A1[=SUM(B1:1)]
How do I automatically copy that to A2 so that:
A2[=SUM(B2:2)]
And the same thing continues either indefinitely or until I declare a stopping point?
First of all, if you simply copy-paste the formula from A1 to A2 (or several cells below), it will automatically change as you want. This is how relative references work.
But it's also possible to get all the sums with one formula.
The following formula, entered in A1, will create sums of the first seven row in column A. To change the number of rows summed, replace 7 in B1:7 with another number.
=arrayformula(mmult(B1:7 + 0, transpose(B1:1 * 0 + 1)))
Explanation:
B1:7 + 0 coerces the entries to numbers (so that blank cells become 0).
transpose(B1:1 * 0 + 1) creates a column vector of 1s of suitable size.
matrix multiplication mmult by a column of 1s amounts to summing each row.
the wrapper arrayformula indicates that the operations are to be done on arrays.

Small in arrayformula (Google Spreadsheet)

I have 5 columns of numbers that I want to sort per row into another set of columns. I figured I need to use small() (e.g. small(a2:e2,1) for f2; small(a2:e2,2) for g2 and so on). Is there away to iterate this for the next rows; if possible using only native google spreadsheet formulas?
Thanks in advance
I was able to make a temporary work around, but I had to use 3 cheat columns. It looks ok for now but I imagine it will be troublesome for really huge numbers.
Here's a sample sheet for reference: https://docs.google.com/spreadsheets/d/1MQTP2XkRsPRAnPQ5wLhkR8JoNVY6YOExVlOkkX8UeRs/edit#gid=0
The original data are in A3:E
The first cheat column (G3:G) simply creates a column of numbers from 1 to the largest number found in the source data. 1-9 is changed to 01-09 for easier searching. "#" is then added at the end-this will come handy later:
Cheat Column 1 =filter(if(row(A:A)=max(A:E)+1,ʺ#ʺ,text(row(A:A),ʺ00ʺ)),row(A:A)<=max(A:E)+1)
The second cheat column (H3:H) combines each row into a string separated by "-" with a "#" marker:
Cheat Column 2=filter(text(A3:A,ʺ00ʺ)&ʺ-ʺ&text(B3:B,ʺ00ʺ)&ʺ-ʺ&text(C3:C,ʺ00ʺ)&ʺ-ʺ&text(D3:D,ʺ00ʺ)&ʺ-ʺ&text(E3:E,ʺ00ʺ)&ʺ#ʺ,A3:A<>ʺʺ)
The last cheat column (I3:I) sorts each line (from cheat column 2) by finding each number from cheat column from 01 up to the max number, then the "#" char (this ensures that each line will still have the # end marker). "Find" will return the "position" of each number or an error if it's not found. By using "if", we can make "find" return the actual number or "" instead.
=filter(arrayformula(if(iferror(find(transpose(filter(G3:G,G3:G<>ʺʺ)),H3:H),ʺʺ), transpose(filter(G3:G,G3:G<>ʺʺ)),ʺʺ)),A3:A<>ʺʺ)
The formula above creates as many columns as there are numbers from cheat column 1. To prevent this, a "-" is added to each number then "Concatenate" is used to combine everything into one massive string with each set separated by "#". The string is then split using the "#" marker.
Cheat Column 3 =transpose(split(concatenate(filter(arrayformula(if(iferror(find(transpose(filter(G3:G,G3:G<>ʺʺ)),H3:H),ʺʺ),ʺ-ʺ&transpose(filter(G3:G,G3:G<>ʺʺ)),ʺʺ)),A3:A<>ʺʺ)),ʺ#ʺ))
Each number is then separated into each corresponding column by using mid().
Small 1 =filter(mid(I3:I,2,2)*1,A3:A<>ʺʺ)
Small 2 =filter(mid(I3:I,5,2)*1,A3:A<>ʺʺ)
Small 3 =filter(mid(I3:I,8,2)*1,A3:A<>ʺʺ)
Small 4 =filter(mid(I3:I,11,2)*1,A3:A<>ʺʺ)
Small 5 =filter(mid(I3:I,14,2)*1,A3:A<>ʺʺ)
Note that the formula above is only for numbers 1-99. For larger numbers, the Text() formulas should have more zeroes to correspond to the number of digits of the biggest number. The Mid() formulas should also be adjusted accordingly.
I would like to stress that I am very far from being a spreadsheet expert and that this solution is very "unoptimized". It requires several cheat columns; with the first one even having more rows than the original data. If anyone can help me get rid of the cheat columns (or at least the first one) I will be very grateful.
How about using SMALL like you mentioned in your question?
=small($A3:$E3,column()-columns($A3:$G3))
You will need to change the ranges accordingly. The last $G$3 is the cell just before the cell where the formula is placed.
Sample

Resources