Split transpose not considering first character as separator - any alternative?

Split transpose not considering first character as separator - any alternative? - google-sheets

I have this table:
In each of the cells there is a long list separated by ",":
Rosângela Lima, Ana Carolina Chagas, Sergio Vasquez, Rosana Corrêa, Michele Kida
Analista De Treinamento E …, A procura de nova oportunidade no mercado, Comercial na BRASILMAXI Logística Ltda., Executiva de Vendas e Solucionadora do Grupo BRASILMAXI, Executiva de Vendas na Brasilmaxi Logística LTDA
, achagas#brasilmaxi.com.br, sergio.vasquez#brasilmaxi.com.br, rosana.correa#brasilmaxi.com.br, ,
#ros%C3%A2ngela-lima-4a90a324, #ana-carolina-chagas-24997022, #sergio-vasquez-25a040a4, #rosanacorrea, #michele-kida-83929230
I want it to look like this:
Here is the sample data
I tried to use transpose(split()) however it doesn't work the email line properly:
Google Sheets is not considering the first character as separator.

From "MY TRY" of your shared Spreadsheet, how about modifying the formula at "C15" as follows?
From:
=TRANSPOSE(SPLIT(C3;","))
To:
=TRANSPOSE(SPLIT(C3;",";;FALSE))
About 4th argument, the official document says as follows.
remove_empty_text - [ OPTIONAL - TRUE by default ] - Whether or not to remove empty text messages from the split results. The default behavior is to treat consecutive delimiters as one (if TRUE). If FALSE, empty cells values are added between consecutive delimiters.
Reference:
SPLIT
Added:
In order to retrieve the front space of each row, how about the following modified formulas?
=TRANSPOSE(SPLIT(SUBSTITUTE(C3;" ";"");",";;FALSE))
or
=TRANSPOSE(ARRAYFORMULA(TRIM(SPLIT(C3;",";;FALSE))))

Related

How can I split a string and sum all numbers from that string?

I'm making a list for buying groceries in Google Sheets and have the following value in cell B4.
0.95 - Lemon Juice
2.49 - Pringle Chips
1.29 - Baby Carrots
9.50 - Chicken Kebab
What I'm trying to do is split using the dash character and combine the costs (0.95+2.49+1.29+9.50).
I've tried to use Index(SPLIT(B22,"-"), 7) and SPLIT(B22,"-") but I don't know how to use only numbers from the split string.
Does someone know how to do this? Here's a sample sheet.

Answer
The following formula should produce the result you desire:
=SUM(ARRAYFORMULA(VALUE(REGEXEXTRACT(SPLIT(B4,CHAR(10)),"(.*)-"))))
Explanation
The first thing to do is to split the entry in B4 into its component parts. This is done by using the =SPLIT function, which takes the text in B4 and returns a separate result every time it encounters a specific delimiter. In this case, that is =CHAR(10), the newline character.
Next, all non-number information needs to be removed. This is relatively easy in your sample data because the numbers always appear to the left of a dash. =REGEXEXTRACT uses a regular expression to only return the text to the left of the dash.
Before the numbers can be added together, however, they must be converted to be in a number format. The =VALUE function is used to convert each result from a text string containing a number to an actual number.
All of this is wrapped in an =ARRAYFORMULA so that =VALUE and =REGEXEXTRACT parse each returned value from =SPLIT, rather than just the first.
Finally, all results are added together using =SUM.
Functions used:
=CHAR
=SPLIT
=REGEXEXTRACT
=VALUE
=ARRAYFORMULA
=SUM

Firstly you can add , symbols start and ends of numbers with below code:
REGEXREPLACE(B4,"([0-9\.]+)",",$1,")
Then split it based of , sign.
SPLIT(A8, ",")

Try below formula (see your sheet)-
=SUM(ArrayFormula(--REGEXEXTRACT(SPLIT(B4,CHAR(10)),"-*\d*\.?\d+")))

Extracting numbers with REGEXEXTRACT that might have a comma or dot

I have a list of numbers in a few formats that may or may not include a dot and a comma. The numbers are locked in a string. For example:
hello 1,000 goodbye
hola 2,000.12 ciao
Hallo 3000.00 Auf Wiedersehen
How can I extract the numbers?
I don't care if the comma is added but the dot is obviously important.
I need the regular_expression to be used in REGEXEXTRACT (and the rest of the REGEX formulas.
The output should be:
1000
2000.12
3000.00

Supposing that your raw data is in A2:A, use this in B2 (or the second cell) of an otherwise empty column:
=ArrayFormula(IF(A2:A="",,IFERROR(VALUE(REGEXEXTRACT(A2:A,"\d[\d,\.]*\d")))))
The REGEX portion reads, in plain English, "Extract any portion that starts with a digit followed by any number of digits, commas or periods (or none of these) and ends with a digit."
You will likely want to apply Format > Number > Currency to the results column.

How to make a variable non delimited file to be a delimited one

Hello guys I want to convert my non delimited file into a delimited file
Example of the file is as follows.
Name. CIF Address line 1 State Phn Address line 2 Country Billing Address line 3
Alex. 44A. Biston NJ 25478163 4th,floor XY USA 55/2018 kenning
And so on all the data are in this format.
First three lines are metadata and then the data.
How can I make it delimited in proper format using logic.

There are two parts in the problem:
how to find the column widths
how to split each line into fields and output a new line with delimiters
I could not propose an automated solution for the first one, because (not knowing anything about the metadata format), there is no clear way to find where one column ends and the next one begins. Some of the column headings contain multiple space-separated words and space is also used as a separator between the headings (and apparently one cannot use the rule "more than one space means the end of a heading name" because there's only one space between "Address line 2" and "Country" - and they're clearly separate columns. Clearly, finding the correct column widths requires understanding English and this is not something that you can write a program for.
For the second problem, things are much easier - once you have the column positions. If you figure the column positions manually (or programmatically, if you know something about the metadata that I don't - and you have a simple method for finding what's a column heading), then a program written in AWK can do this, for example:
cols="8,15,32,40,53,66,83,105"
awk_prog='BEGIN {
nt=split(cols,tabs,",")
delim=","
ORS=""
}
{ o=1 ;
for (i in tabs) { t=tabs[i] ; f=substr($0,o,t-o); sub(" *$","",f) ; print f
delim ; o=t } ;
print substr($0, o) "\n"
}'
awk -v cols="$cols" "$awk_prog" input_file
NOTE that the above program does not deal correctly with the case when the separator character (e.g. ",") appears inside the data. If you decide to use this as-is, be sure to use a separator that is not present in the input data. It may be better to modify the code to escape any separator characters found in the input data (there are different ways to do this - depends on what you plan to feed the output file to).

Extracting text from APA citation

I have a spreadsheet containing APA citation style text and I want to split them into author(s), date, and title.
An example of a citation would be:
Parikka, J. (2010). Insect Media: An Archaeology of Animals and Technology. Minneapolis: Univ Of Minnesota Press.
Given this string is in field I2 I managed to do the following:
Name: =LEFT(I2, FIND("(", I2)-1) yields Parikka, J.
Date: =MID(I2,FIND("(",I2)+1,FIND(")",I2)-FIND("(",I2)-1) yields 2010
However, I am stuck at extracting the name of the title Insect Media: An Archaeology of Animals and Technology.
My current formula =MID(I2,FIND(").",I2)+2,FIND(").",I2)-FIND(".",I2)) only returns the title partially - the output should show every character between ).and the following ..
I tried =REGEXEXTRACT(I2, "\)\.\s(.*[^\.])\.\s" ) and this generally works but does not stop at the first ". " - Like with this example:
Sanders, E. B.-N., Brandt, E., & Binder, T. (2010). A framework for organizing the tools and techniques of participatory design. In Proceedings of the 11th biennial participatory design conference (pp. 195–198). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=1900476
Where is the mistake?

The title can be found (in the two examples you've given, at least) with this:
=MID(I2,find("). ",I2)+3,find(". ",I2,find("). ",I2)+3)-(find("). ",I2)+3)+1)
In English: Get the substring starting after the first occurrence of )., up to and including the first occurrence of . following.
If you wish to use REGEXEXTRACT, then this works (on your two examples). (You can also see a Regex101 demo.):
=REGEXEXTRACT(I3,"(?:.*\(\d{4}\)\.\s)([^.]*\.)(?: .*)")
Where is the mistake?
In your expression, you were capturing (.*[^\.]), which greedily includes any number of characters followed by a character in the character class not (backslash or dot), which means that multiple sentences can be captured. The expression finished with \.\s, which wasn't captured, so the capture group would end before a period-then-space, rather than including it.

Try:
=split(SUBSTITUTE(SUBSTITUTE(I2, "(",""), ")", ""),".")
If you don't replace the parentheses around 2010, it thinks it is a negative number -2010.
For your Title try adding index split to your existing formula:
=index(split(REGEXEXTRACT(A5, "\)\.\s(.*[^\.])\.\s" ),"."),0,1)&"."

Is --- Cobol picture valid

I'm running some tests on Cobol pictures and wondering if --- is a valid picture. Am I right in saying that this picture accepts values in the range of -99 through to +99. If it is valid then it is possible for the picture to accept 3 spaces as a value?
For example:
12 would return 12
1 would return 1
Cheers

Yes --- is a valid PICTURE clause. The variable corresponding to this PICTURE will accept assignments of numeric values in the range -99 through to +99. It cannot be assigned non-numerics (space for example). However, if you were to DISPLAY this variable after assigning a numeric value to it, leading zeros will be replaced by spaces. Consequently, if you MOVE ZERO to this item it will DISPLAY only spaces. Attempting to MOVE SPACES to this item will result in a compile error (incompatible data types). This last bit may seem a little counter intutive, but remember that this type of PICTURE clause implies a USAGE of display - basically items defined in this manner are used to 'pretty print' numbers. About the only operations you can preform with USAGE DISPLAY items is MOVE to or from and DISPLAY them.
EDIT - Response to Comment
A PICTURE of ---X(2) is invalid. The chart below illustrates combinations and the order that symbols may appear in a PICTURE string. Notice that parenthesis are not in the chart. Logically you can replace them with the corresponding number of occurences of the preceding character before reading the string. For example X(3) is read as XXX. If you really want to parse out a PICTURE string properly, you can use this chart to construct a BNF grammar specifically for them.

If this is a numeric picture, it won't accept spaces.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Split transpose not considering first character as separator - any alternative? - google-sheets

Related

How can I split a string and sum all numbers from that string?

Extracting numbers with REGEXEXTRACT that might have a comma or dot

How to make a variable non delimited file to be a delimited one

Extracting text from APA citation

Is --- Cobol picture valid

Categories

Resources