How do I find out the longest run of a number? - google-sheets

This seemed like a trivial question to me, but I cannot get it done correctly. Part of my dataset looks like this
1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0
and contains two “runs” of 1 (not sure if that’s the correct word), one with a length 3, the other with a length of 5.
How can I use Google Docs or similar spreadsheet applications to find the longest of those runs?

In Excel you can use a single formula to get the maximum number of consecutive 1s, i.e.
=MAX(FREQUENCY(IF(A2:A100=1,ROW(A2:A100)),IF(A2:A100<>1,ROW(A2:A100))))
confirmed with CTRL+SHIFT+ENTER
In Google Sheets you can use the same formula but wrap in arrayformula rather than use CSE, i.e.
=arrayformula(MAX(FREQUENCY(IF(A2:A100=1,ROW(A2:A100)),IF(A2:A100<>1,ROW(A2:A100)))))
Assumes data in A2:A100 without blanks

EDIT: whuber's suggestion is just too simple for me to not update this response. One can just use a simple IF statement checking if the current row is equal to 1. If it is, it starts a counter (the prior row + 1), if it is not it starts the counter again at 0.
You just need to initialize the first row of B1 to 1 or 0. Using the dynamic updating of cell formulas once you have it written once it fills in the rest.
So you would start out;
A B
1 1
1 =IF(A2=1, B1+1, 0)
1
0
0
1
1
1
1
0
0
0
Then fill in;
A B
1 1
1 =IF(A2=1, B1+1, 0)
1 =IF(A3=1, B2+1, 0)
0 =IF(A4=1, B3+1, 0)
0 =IF(A5=1, B4+1, 0)
1 =IF(A6=1, B5+1, 0)
1 =IF(A7=1, B6+1, 0)
1 =IF(A8=1, B7+1, 0)
1 =IF(A9=1, B8+1, 0)
0 =IF(A10=1, B9+1, 0)
0 =IF(A11=1, B10+1, 0)
0 =IF(A12=1, B11+1, 0)
And here the result in column B is;
A B
1 1
1 2
1 3
0 0
0 0
1 1
1 2
1 3
1 4
0 0
0 0
0 0
Hopefully the logic is extendable to Google Docs.

Related

Possibility of only dealing with specific region of binary image

Recently I study the image processing.
When I go through the problem of filling the hole, it confuses me (I assume that the people able to answer the question is familiar with the step of doing this so I skip to the problem):
Let's say if I have a binary image like this:
0 0 0 0 0 0 0
0 0 1 1 0 0 0
0 1 0 0 1 0 0
0 1 0 0 1 0 0
0 0 1 0 1 0 0
0 0 1 0 1 0 0
0 1 0 0 0 1 0
0 1 0 0 0 1 0
0 1 1 1 1 0 0
0 0 0 0 0 0 0
And the book says to start form the region that is inside of the hole and perform the dilation operation and set the bound in case it fills the whole image.
I have no problem understanding the whole process, but if I try to code it, how can I only deal with a specific region (in the hole for this case)? Or the actual implement would be different method ?
If you can assume that the object with holes does not touch the border of the image, you can create an intermediate image where you call flood fill (with value e.g. 2) on the top left pixel. Any remaining '0' pixels have to be inside the contour. Take the position of the first encountered remaining '0' pixel and flood fill it in the original image.

Dynamic QUERY range

I have a spreadsheet and in one of the tabs I have a table with computed data from other tabs. This is small table with 11 columns. Row(1) is the Header row and Column A is the list of items, Column B to J is the types. Data consists of numbers only.
As the data is computed, time to time values in some of the columns thru B to J can be totally zero. I want to create a subset of this table with QUERY but constructing a dynamic range getting only the columns which has at least 1 value which is greater than zero.
I'm aware that a range can be created as an array like {A:A\B:B\D:D} but in my case I don't know which columns can have values of greater than zero and I don't want to take columns into the range which has completely zero values.
I have created an expression to concatenate this array value as a text in a cell, however I can't use it with the QUERY formula either with INDEX or TEXT functions. Table is like this:
Items TypeA TypeB TypeC TypeD
Bronze 0 0 0 0
Silver 0 0 1 0
Gold 0 0 1 0
Titanimum 1 0 0 0
For this snapshot of table, I want to QUERY range to be {A:A\B:B\D:D}. However, as the data is computed, the table can be like this after 2hrs or the next day:
Items TypeA TypeB TypeC TypeD
Bronze 1 0 0 1
Silver 0 0 1 0
Gold 0 1 1 0
Titanimum 1 0 0 0
And so, for this snapshot of table, I want to QUERY range to be {A:A\B:B\C:C\D:D\E:E}.
Is this doable? And how can I achieve or construct a dynamic QUERY range?
Thanks for everyone...
You can remove columns from a range based on a criteria using the FILTER command.
Unfiltered
Items TypeA TypeB TypeC TypeD TypeE TypeF TypeG
Bronze 1 0 0 1 0 0 1
Silver 1 1 0 1 0 0 1
Gold 1 0 0 1 0 0 1
Titan 1 0 0 1 1 0 1
1 4 1 0 4 1 0 4
Filtered to remove columns with total of 0
Items TypeA TypeB TypeD TypeE TypeG
Bronze 1 0 1 0 1
Silver 1 1 1 0 1
Gold 1 0 1 0 1
Titan 1 0 1 1 1
The 'trick' is to sum the sum the column data (for your example) and then test for >0
The filter expression is:
=FILTER(A1:H5,A6:H6 >0)
By way of explanation:
A1:H5 is the range to be filtered;
A6:H6 >0 selects all columns that have a value > 0 in row 6
I placed a 1 in A6 to make sure colA is included.
You can now do queries on the range returned by the above expression.

Selecting a specific row with a condition ? LibreOffice Calc

I have this LibreOffice calc file with raws with full of zero
raw1 raw2 raw3 raw4 raw5 raw6 raw7 raw8 raw9
0 0 0 0 C 0 0 0 0
0 0 0 0 0 0 0 W 0
I want to print only the character inside the row, like this
Result
C
W
I did try with 'if' condition
IF(CD2:CR16 = 1, CD2:CR16)
but it's give me an error
Use MATCH to find the column that contains a character, and then INDEX to get the cell's value.
=INDEX(CD2:CR2, MATCH("[A-Z]", CD2:CR2, 0))
For this to work, go to Tools -> Options -> LibreOffice Calc -> Calculate, and choose Enable regular expressions in formulas.
EDIT:
According to https://help.libreoffice.org/Common/List_of_Regular_Expressions, [:print:] represents any printable character, so it grabs the first zero, which is probably why it does not seem to do what you want.
To match one of several words, the regular expression should be like this:
"word1|word2|word3"
Or for any word consisting of one or more letters:
"[:alpha:]+"
EDIT 2:
To grab C and 8 from 0 0 C 0 and 8 0 0 0 respectively, use "[A-Z1-9]".

SPSS Syntax - Identify duplicate responses and systematically identify cases to keep

I have a large set of survey data in SPSS where around 15% of respondents answered the survey more than once (this was not intended). I have formulated a systematic method to determine which cases to keep but am not sure how to write the loop to perform this task.
The variables I have are:
ID: unique identifier for every individual (some repeated submissions)
SurveyComplete: 0/1 (is the survey complete)
Duplicate: 0/1 (are they a person who submitted more than one survey)
PrimaryFirst: 0/1 (identifies first submission)
MatchSequence: integer (numerical indicator of which submission number the survey is)
Date: date of submission
keep: 0/1 (yet-to-be-created indicator of whether or not the record is being retained)
Here are what my data look like:
ID SurveyComplete Duplicate PrimaryFirst MatchSequence Date keep
123 1 1 1 1 07162015 .
123 1 1 0 2 07182015 .
456 0 1 1 1 07152015 .
456 1 1 0 2 07192015 .
789 0 1 1 1 07112015 .
789 0 1 0 2 07182015 .
789 0 1 0 3 07212015 .
012 1 0 1 1 07122015 .
Theoretically, I would like to determine the following in the order below:
IF Primary = 1 AND SurveyComplete = 1 THEN keep = 1. Other submissions for this ID keep = 0.
ELSE IF Primary = 0 AND SurveyComplete = 1 THEN keep = 1. Other submissions for this ID keep = 0.
ELSE (where SurveyComplete = 0 for all responses) keep most recent submission.
And here is the resulting keep column:
ID SurveyComplete Duplicate PrimaryFirst MatchSequence Date keep
123 1 1 1 1 07162015 1
123 1 1 0 2 07182015 0
456 0 1 1 1 07152015 0
456 1 1 0 2 07192015 1
789 0 1 1 1 07112015 0
789 0 1 0 2 07182015 0
789 0 1 0 3 07212015 1
012 1 0 1 1 07122015 1
Ideally I would like to be able to complete this in SPSS syntax without plugins as my work doesn't take kindly to software add-ons. Any help that can be provided is much appreciated!
After every step an AGGREGATE function determines for every ID if a decision was already made. An ID that already has a decision will be taken out of the game, undecided IDs go on to the next step:
* creating fake data to play around with.
* note I added an extra line for ID=456 to demonstrate choice between multiple non-primary lines.
DATA LIST list (", ") / ID SurveyComplete Duplicate PrimaryFirst MatchSequence Date.
begin data
123, 1, 1, 1, 1, 7162015
123, 1, 1, 0, 2, 7182015
456, 0, 1, 1, 1, 7152015
456, 1, 1, 0, 2, 7192015
456, 1, 1, 0, 3, 7192015
789, 0, 1, 1, 1, 7112015
789, 0, 1, 0, 2, 7182015
789, 0, 1, 0, 3, 7212015
12, 1, 0, 1, 1, 7122015
end data.
execute.
* now starting work on defining the KEEP variable.
if (PrimaryFirst = 1 AND SurveyComplete = 1) keep=1.
if (PrimaryFirst = 0 AND SurveyComplete = 1) NotPrimarySeq=MatchSequence.
aggregate /outfile=* mode=addvariables /break=ID /decided=max(keep)/NotPrimarySeq_min=min(NotPrimarySeq).
if missing(decided) and (PrimaryFirst = 0 AND SurveyComplete = 1) keep=(NotPrimarySeq=NotPrimarySeq_min).
aggregate/outfile=* mode=addvariables overwritevars=yes /break=ID/decided=max(keep)/Date_max=MAX(Date).
if missing(decided) keep=(date=date_max).
recode keep (miss=0).
execute.

Torch tensors swapping dimensions

I came across these two lines (back-to-back) of code in a torch project:
im4[{1,{},{}}] = im3[{3,{},{}}]
im4[{3,{},{}}] = im3[{1,{},{}}]
What do these two lines do? I assumed they did some sort of swapping.
This is covered in indexing in the Torch Tensor Documentation
Indexing using the empty table {} is shorthand for all indices in that dimension. Below is a demo which uses {} to copy an entire row from one matrix to another:
> a = torch.Tensor(3, 3):fill(0)
0 0 0
0 0 0
0 0 0
> b = torch.Tensor(3, 3)
> for i=1,3 do for j=1,3 do b[i][j] = (i - 1) * 3 + j end end
> b
1 2 3
4 5 6
7 8 9
> a[{1, {}}] = b[{3, {}}]
> a
7 8 9
0 0 0
0 0 0
This assignment is equivalent to: a[1] = b[3].
Your example is similar:
im4[{1,{},{}}] = im3[{3,{},{}}]
im4[{3,{},{}}] = im3[{1,{},{}}]
which is more clearly stated as:
im4[1] = im3[3]
im4[3] = im3[1]
The first line assigns the values from im3's third row (a 2D sub-matrix) to im4's first row and the second line assigns the first row of im3 to the third row of im4.
Note that this is not a swap, as im3 is never written and im4 is never read from.

Resources