I'm downloading a CSV file from an external API. It returns a table with the following structure:
foo
bar 1 4
baz 2 3
Is there any way to make a QUERY (or some other function?) to get the first two rows below the foo cell?
There are several other occurrences of bar and baz rows, that's why I only want the ones below the foo cell. Doable?
Here is one alternative also: using regexextract and concatenate , with the additional rept function to grab each of the values , pretending the url is in A1:
=regexextract(concatenate(IMPORTDATA(A1)),"(Basic)"&rept("(\d\.\d{1,3})",6))
=regexextract(concatenate(IMPORTDATA(A1)),"(Diluted)"&rept("(\d\.\d{1,3})",6))
comes out looking like this:
if you want to be doubly sure: add the Earnings per share in front of basic:
=regexextract(concatenate(IMPORTDATA(A1)),"Earnings per share(Basic)"&rept("(\d\.\d{1,3})",6))
=regexextract(concatenate(IMPORTDATA(A1)),"Earnings per shareBasic.*(Diluted)"&rept("(\d\.\d{1,3})",6))
To have the output all on one line, one after the other:
=regexextract(concatenate(IMPORTDATA(A1)),"Earnings per share(Basic)"&rept("(\d\.\d{1,3})",6)&"(Diluted)"&rept("(\d\.\d{1,3})",6))
Basically if you want to see the raw data , remove the regex part and just leave the concatenate and the importdata - the regex part helps to ignore the beginning portion and then specify which pieces to capture using the parentheses. These are called capture groups. Anything outside of them technically gets ignored.
try this formula:
=QUERY(FILTER(A1:C13,row(A1:C13)>MATCH("foo",A1:A13,0)),"select * limit 2")
example workbook
To use imported data instead of range, use:
=QUERY(FILTER(Data,row(Data)>MATCH("foo",query(Data,"select Col1"),0)),"select * limit 2")
Try:
=query(A2:C,"select * where A='bar' or A='baz' limit 2")
Related
I'm trying to remove a piece of text (Perfomance) from a column in Google Spreadsheet that contains (XX Performance) XX is a number like 89. I'm using:
=REGEXREPLACE(D:D, " Performance "," - ")
But no love...
enter image description here
Try this Example Sheet
=ArrayFormula(IF(D2:D="",, REGEXEXTRACT(D2:D, "[0-9]+")))
You can use the expression \D+:
\D matches any character that's not a digit (equivalent to [^0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed
The formula will be like:
=REGEXREPLACE(D:D, "\D+","")
UPDATE
I did put it in another column otherwise it creates a circular dependency. The data is imported via API from another app.
Then you will need to create another sheet or use a hidden column to put that information and then use the regex on the column you want the final result.
I have a google spreadsheet where I am trying to count the number of rows where a certain value is present in at least one column. The number of columns with data varies by row.
For example, let's use the following sheet as an example:
https://docs.google.com/spreadsheets/d/1yUnYBsmjKIOF_PubYQ6G41fmIPcw_7hzkiQ6qMIoN64/edit?usp=sharing
Each row represents a task, and the data for who worked on the project is added by adding additional columns.
I would like count how many tasks each person has worked on at least once. (If Person A worked on Task A multiple times, it would only count as 1).
I've tried using formulas such as COUNTIFS or COUNTUNIQUEIFS, but am being thrown off by the fact that the number of columns can vary.
Any ideas of how I can accomplish this?
See if this helps
=countif(ArrayFormula(mmult(N(Sheet1!E2:100=A1), transpose(column(Sheet1!E2:2)^0))), ">1")
See the added sheet "Erik Help," cell A1, for the following array formula:
=ArrayFormula(QUERY({{"Employee";UNIQUE(QUERY({Sheet1!E2:E;Sheet1!H2:H;Sheet1!K2:K;Sheet1!N2:N;Sheet1!Q2:Q;Sheet1!T2:T;Sheet1!W2:W;Sheet1!Z2:Z;Sheet1!AC2:AC;Sheet1!AF2:AF},"Select * Where Col1 Is Not Null",0))},{"Projects";COUNTIF({Sheet1!E2:E&Sheet1!H2:H&Sheet1!K2:K&Sheet1!N2:N&Sheet1!Q2:Q&Sheet1!T2:T&Sheet1!W2:W&Sheet1!Z2:Z&Sheet1!AC2:AC&Sheet1!AF2:AF},"*"&UNIQUE(QUERY({Sheet1!E2:E;Sheet1!H2:H;Sheet1!K2:K;Sheet1!N2:N;Sheet1!Q2:Q;Sheet1!T2:T;Sheet1!W2:W;Sheet1!Z2:Z;Sheet1!AC2:AC;Sheet1!AF2:AF},"Select * Where Col1 Is Not Null",0))&"*")}},"Select * Order By Col2 Desc"))
I will break it down somewhat here, and then encourage you to further dissect it if deeper learning is required.
Obviously, it's an array formula. In this case, that means that one formula is producing the entire report.
The outermost QUERY is just putting the results inside the double curly brackets {{ }} in order by project count: QUERY( {{...}} ,"Select * Order By Col2 Desc")
You see those double curly brackets. But really, it's an outer set of curly brackets containing two more sets of curly brackets: { {...},{...} } The inner two arrays create the first column and second column of the report, respectively. The comma means to place them side by side. You will notice that the first element of each of those inner arrays produces the header for the respective columns (i.e., "Employee" and "Projects").
The semicolon following each of these headers means to place what follows underneath rather than side by side. You will notice a lot of those semicolons in the first inner array. Because you said that there would never be any more than 10 people working on any one project, we can predetermine all columns that might hold names and virtually "stack" them with those semicolons, forming one long virtual column. Of course, many of those columns will be empty, because many projects won't have a full ten names of people attributed to them. So this virtual column of names is wrapped in its own QUERY that will weed out nulls:
QUERY({Sheet1!E2:E;Sheet1!H2:H;Sheet1!K2:K;Sheet1!N2:N;Sheet1!Q2:Q;Sheet1!T2:T;Sheet1!W2:W;Sheet1!Z2:Z;Sheet1!AC2:AC;Sheet1!AF2:AF},"Select * Where Col1 Is Not Null",0)
To this, I applied UNIQUE, which provides the first-column list of unique names (rather than every time a name appears):
UNIQUE(QUERY({Sheet1!E2:E;Sheet1!H2:H;Sheet1!K2:K;Sheet1!N2:N;Sheet1!Q2:Q;Sheet1!T2:T;Sheet1!W2:W;Sheet1!Z2:Z;Sheet1!AC2:AC;Sheet1!AF2:AF},"Select * Where Col1 Is Not Null",0))
So the complete first inner virtual array (which forms the complete first column of the final report) looks like this:
{"Employee";UNIQUE(QUERY({Sheet1!E2:E;Sheet1!H2:H;Sheet1!K2:K;Sheet1!N2:N;Sheet1!Q2:Q;Sheet1!T2:T;Sheet1!W2:W;Sheet1!Z2:Z;Sheet1!AC2:AC;Sheet1!AF2:AF},"Select * Where Col1 Is Not Null",0))}
The second inner virtual array uses the ampersand to join all names assigned to each project into one long string. So, for instance, if Chris, John and Ryan all worked on a project (and some of them multiple times), their rows (unseen) concatenation might look like this: ChrisChrisJohnChrisRyanChris.
We run a COUNTIF on each of this virtual array made up of such concatenations, and the condition you'll see is made up largely of the entire UNIQUE clause from the first inner virtual array (which is the shortlist of all names possible). You will notice that this is appended front and back by asterisks like this: ""&UNIQUE(...)&"" Asterisks are wildcards for any number of characters. So essentially this will search those long concatenated strings for the appearance of each name anywhere; and as soon as a name is found, COUNTIF registers it as TRUE... once (not each time it appears in the string).
So that second inner virtual array looks like this in isolation:
{"Projects";COUNTIF({Sheet1!E2:E&Sheet1!H2:H&Sheet1!K2:K&Sheet1!N2:N&Sheet1!Q2:Q&Sheet1!T2:T&Sheet1!W2:W&Sheet1!Z2:Z&Sheet1!AC2:AC&Sheet1!AF2:AF},"*"&UNIQUE(QUERY({Sheet1!E2:E;Sheet1!H2:H;Sheet1!K2:K;Sheet1!N2:N;Sheet1!Q2:Q;Sheet1!T2:T;Sheet1!W2:W;Sheet1!Z2:Z;Sheet1!AC2:AC;Sheet1!AF2:AF},"Select * Where Col1 Is Not Null",0))&"*")}
Without that outermost QUERY I mentioned up front here, you'd still get accurate results; they'd just be displayed in whatever order the UNIQUE names list happened to appear in the projects. I felt it would make more send to order them by whoever had the most projects to the least.
I have two ranges: one is a list of headers (Column A), and one is a list of categories (On a separate sheet in column B). Both are generated from other sources, and can be dynamic in length (i.e they cannot be guaranteed to be the same size).
I need to make a summary sheet from these. I want to take the first value of headers, then add all the categories, then the second value of headers, then all the categories, etc. Similar to:
HEADER 1
Role 1
Role 2
Role ...
HEADER 2
Role 1
Role 2
Role ...
And so on.
I've tried various options, and I currently have this:
=ARRAYFORMULA( SPLIT(JOIN("|", A1:A6), "|") & "+" & JOIN("~", UNIQUE(Roles!B3:B)) )
This gets me one row with a column for each header with the entire roles range appended. For instance, column one has:
ON-SITE+Project Management & Creative Design~Production Staff~Video~Audio~
and so on across the sheet.
Ideally, I'd add two more SPLIT functions to separate this to a bunch of columns, then simply transpose into a single column and be done. However, it seems you only get one instance of SPLIT in an ARRAYFORMULA. When I add another SPLIT function:
=ARRAYFORMULA( SPLIT(SPLIT(JOIN("|", $A$1:$A$6), "|") & "+" & JOIN("~", UNIQUE(Roles!$B$4:$B)), "+") )
It simply splits the first column into two, then ignores the rest. If I add a second split to that, I only get the Header. It appears you only get one use of SPLIT inside ARRAYFORMULA, then it breaks down. I've read several things about how JOIN and SPLIT don't seem to play nicely inside ARRAYFORMULA.
Is there something I can add or order into this to make it work as desired? I'm also open to other methods, such as using QUERY or REGEX (those I know very little about REGEX). I attempted to create a literal array using TEXTJOIN and {}, but passing this via INDIRECT never seemed to work. I also need to solve this inside gSheets - no scripting unfortunately.
Editable Sheet Here
try:
=ARRAYFORMULA(TRANSPOSE(SPLIT(QUERY("♦"&TRANSPOSE(UNIQUE(Roles!C1:H1))&"♦"&
TEXTJOIN("♦", 1, UNIQUE(Roles!B2:B)),,99^99), "♦")))
I have a large list of people where each person has a line like this.
Bill Gates, IT Manager, Microsoft, <https://www.linkedin.com/in/williamhgates>
I want to extract the company name in a specific cell. In this example, it would be Microsoft, which is between the second and third delimiters (in this case, the delimiter is ", "). How can I do this?
Right now I'm using the split method (=SPLIT(A2, ", ",false)). But it gives me four different cells with information. I would like a command only to output the company in one cell. Can anyone help? I have tried different things, but I can't seem to find anything that works.
Maybe some regex can do it, but I'm not into regex.
Short answer
Use INDEX and SPLIT to get the value between two separators. Example
=INDEX(SPLIT(A1,", ",FALSE),2)
Explation
SPLIT returns an 1 x n array.
The first argument of INDEX could be a range or an array.
The second and third arguments of INDEX are optional. If the first parameter is an array that has only one row or one column, it will assume that the second argument corresponds to the larger side of the array, so there is no need to use the third argument.
A bit nasty, but this formula works, assuming data in cell D3.
=MID(D3,FIND(",",D3,FIND(",",D3)+1)+2,FIND(",",D3,FIND(",",D3,FIND(",",D3)+1)+1)-FIND(",",D3,FIND(",",D3)+1)-2)
Broken down, this is what it does:
Take the Mid point of D3 =MID(D3
starting two characters after the 2nd comma FIND(",",D3,FIND(",",D3)+1)+2
and the number of characters between the 2nd and 3rd comma, excluding spaces FIND(",",D3,FIND(",",D3,FIND(",",D3)+1)+1)-FIND(",",D3,FIND(",",D3)+1)-2)
I'll add my favourite ArratFormula, which you could use to expand list automatically without draggind formula down. Assumptions:
you have list with data in range "A1:A20"
all data have same sintax "...,Company Name, <..."
In this case you could use Arrayformula, pasted in cell B1:
=ArrayFormula(REGEXEXTRACT(A1:A20,", ([^,]+), <"))
If your data doest's always look like "...,Company Name, <..." or you wish to get different ounput, use this formula in cell B1:
=QUERY(QUERY(TRANSPOSE(SPLIT(JOIN(", ",A1:A20),", ",0)),"offset 2"),"skipping 4")
in this formula:
change 2 in offset 2 to 0, 1, 2, 3 to get name, position, company, link
in skipping 4 4 is a number of items.
Number of items can be counted by formula:
=len(A1)-len(SUBSTITUTE(A1,",",""))+1
and final formula is:
=QUERY(QUERY(TRANSPOSE(SPLIT(JOIN(", ",A1:A20),", ",0)),"offset 2"),
"skipping "&len(A1)-len(SUBSTITUTE(A1,",",""))+1)
I have a column XXX like this :
XXX
A
Aruin
Avolyn
B
Batracia
Buna
...
I would like to count a cell only if the string in the cell has a length > 1.
How to do that?
I'm trying :
COUNTIF(XXX1:XXX30, LEN(...) > 1)
But what should I write instead of ... ?
Thank you in advance.
For ranges that contain strings, I have used a formula like below, which counts any value that starts with one character (the ?) followed by 0 or more characters (the *). I haven't tested on ranges that contain numbers.
=COUNTIF(range,"=?*")
To do this in one cell, without needing to create a separate column or use arrayformula{}, you can use sumproduct.
=SUMPRODUCT(LEN(XXX1:XXX30)>1)
If you have an array of True/False values then you can use -- to force them to be converted to numeric values like this:
=SUMPRODUCT(--(LEN(XXX1:XXX30)>1))
Credit to #greg who posted this in the comments - I think it is arguably the best answer and should be displayed as such. Sumproduct is a powerful function that can often to be used to get around shortcomings in countif type formulae.
Create another list using an =ARRAYFORMULA(len(XXX1:XXX30)>1) and then do a COUNTIF based on that new list: =countif(XXY1:XXY30,true()).
A simple formula that works for my needs is =ROWS(FILTER(range,LEN(range)>X))
The Google Sheets criteria syntax seems inconsistent, because the expression that works fine with FILTER() gives an erroneous zero result with COUNTIF().
Here's a demo worksheet
Another approach is to use the QUERY function.
This way you can write a simple SQL like statement to achieve this.
For example:
=QUERY(XXX1:XXX30,"SELECT COUNT(X) WHERE X MATCHES '.{1,}'")
To explain the MATCHES criteria:
It is a regex that matches every cell that contains 1 or more characters.
The . operator matches any character.
The {1,} qualifies that you only want to match cells that have at 1 or more characters in them.
Here is a link to another SO question that describes this method.