google sheets turnning html to plain text - google-sheets

I've got some short html lists where I wont the tags to be removed.
Basiclly I've tried replace any thing in tags <> with nothing.
I've been using REGEXREPLACE and REGEXEXTRACT using the following regular expression: <.+>(.+)</.+> to try and get all that is between but it dosn't work.
the full formula is ​=join("",REGEXEXTRACT(S74,REGEXREPLACE(S74,"<.+>[(.+)]{2,9}</.+>","($1)"))​)​
​and the html looks like this:
PRODUCT INFO Honey wear me long sleeves Press closure Item care: Machine Washable

I've managed to solve it so if anyone comes accrose here's what I did (simpler than I thought)
=REGEXREPLACE(S74,"(<.+>)","")
Hope I helped someone at some time :-)

Related

Extracting Wanted Data from the Raw Cell

I have been trying to extract the required data from a single cell and I have tried using some common formulas but its not working for all the cells exactly.
I would appreciate your help in this regards.
Google Sheet
Formula 1
=LEFT(A2,FIND(C2,A2)-1)
Formula 2
=SUBSTITUTE(TRIM(SUBSTITUTE(SUBSTITUTE(LEFT(RIGHT(A2,len(A2)-FIND(") ",A2)),6),")",""),"(","")),"|","")
I duplicated your tab and entered the following formula in cell E2:
=ArrayFormula(ifna(regexextract(A2:A,"\[\s\]\s(.+)?\s\((.+)\)")))
Explanation
\[\s\]\s - find [ ]
(.+)?\s\( - extract everything after it until the next occurence of (
(.+)\) - extract everything after the above ( and before the next occurence of )
EDIT: The first time I've tried #ztiaa answer it didn't work... don't know why. I kept investigating REGEX and gave it another try, and it did... You'd probably prefer that. I leave my answer just as a memory, and if it's useful for someone else in another scenario
Honestly, I don't handle regex as #ztiaa, but what I've found difficult about your example is that there are sometimes more than one opening parenthesis... that's why I looked for a way of finding the last appearance of "(". You can learn more about this workaround here
I changed "#" with "CUT HERE" in my example, just in case "#" may appear in your example. With that in mind, you can set these two formulas:
=ArrayFormula(IF(A3:A="","",MID(A3:A,5,FIND("CUT HERE",SUBSTITUTE(A3:A,"(","CUT
HERE",LEN(A3:A)-LEN(SUBSTITUTE(A3:A,"(",""))))-5)))
=arrayformula(if(A3:A="","",mid(A3:A,FIND("CUT HERE",SUBSTITUTE(A3:A,"(","CUT
HERE",LEN(A3:A)-LEN(SUBSTITUTE(A3:A,"(",""))))+1,FIND(")",A3:A,FIND("CUT
HERE",SUBSTITUTE(A3:A,"(","CUT
HERE",LEN(A3:A)-LEN(SUBSTITUTE(A3:A,"(",""))))+1)-FIND("CUT
HERE",SUBSTITUTE(A3:A,"(","CUT
HERE",LEN(A3:A)-LEN(SUBSTITUTE(A3:A,"(",""))))-1)))
The second one is really long because it has to find the amount of characters in between brackets. But it appears to work. Probably there's a more ellegant way with Regex, I repeat :)
Look in J and K of your example:

=LEN Function in Google Sheet not working corretly

I have a column with lots of rows containing text. I want to highlight cells with over an x-amount of characters, but how? The code I'm using in combination with 'Conditional Formatting' is not working all the time. Sometimes it highlights text over the x amount and sometimes it doesn't, so there is something I'm doing wrong here. The x-amount in the example below is: 300.
you may also need to lock it like:
=LEN(E$1:E$170)>300
Silly me... I found the answer myself. I need to put in the same range in the formula as well. Formula with the range E1:E170 needs to be: =LEN(E1:E170)>300

Countif function not counting word with an apostrophe

I’ve looked through the forum, but haven’t found a solution. I’ve got some survey responses in a table like so:
It’s okay
I don't like school
It’s okay
Good, I like it
I’m using a countif function to count the number of times each response was received in the survey. The thing is my function works well with these values:
I don’t like school
Good, I like it
but my function does not pick up the phrase
It’s okay
As I’m using named ranges, the formula I am using is:
=COUNTIF(Question,"It's okay")
Please see this shared link for the example file and check out sheet 2 for the actual formula.
https://drive.google.com/open?id=1e1ccJh3TDeOsIrcn0f5ewQ3M6xOuBrfKBqqJ3mzXfV0
Initially, I thought the issue was that the countif function wasn’t working because of the apostrophe in the word “it’s okay”. As you can see from my example, there are other words with apostrophes in them that get counted so I’m baffled as to why this function is not working for the phrase “it’s okay”.
Has anyone seen this problem before, or any ideas as to how I could accomplish the same thing using another process?
I’ve also tried to escape the apostrophe like so :
=COUNTIF(Question,"It''s okay")
=COUNTIF(Question,"It\'s okay")
But neither case made any difference.
Many thanks in advance
That's because you have different apostrophes in data and in formula:
’ ("Right single quotation mark", ASCII code 146) in data
' ("Single quote", ASCII code 39) in formula

Filter and logical operators

EZ stuff but after an hour.. =filter(May15!A:S , May15!E:E="Authorization") is yielding a rich populated sheet. However I can't get OR working! Despite it working elsewhere in the sheet. I'd like other possibilities via the same filter. I tried several including the OR this way
=filter(May15!A:S , OR(May15!E:E="Authorization" , May15!E:E="bigwhale", May15!E:E="hi"))
.. to no avail. Any help appreciated. Also, I read somewhere the OR could be accessed using a "+" and that sounded like a neat method.. Thanks!
One possible way is to use RegEx:
=filter(H:H , REGEXMATCH(E:E,JOIN("|",A1:A3)))
put in A1:A3:
Authorization
bigwhale
hi
This trick is useful when you need to add conditions, just paste one more value in cell A4 and use range A1:A4
Another way is to use plus sign:
=FILTER(H:H,(E:E="Authorization")+(E:E="bigwhale")+(E:E="hi"))

Sanitize pasted text from MS-Word

Here's my wild and whacky psuedo-code. Anyone know how to make this real?
Background:
This dynamic content comes from a ckeditor. And a lot of folks paste Microsoft Word content in it. No worries, if I just call the attribute untouched it loads pretty. But the catch is that I want it to be just 125 characters abbreviated. When I add truncation to it, then all of the Microsoft Word scripts start popping up. Then I added simple_format, and sanitize, and truncate, and even made my controller start spotting out specific variables that MS would make and gsub them out. But there's too many of them, and it seems like an awfully messy way to accomplish this. Thus so! Realizing that by itself, its clean. I thought, why not just slice it. However, the microsoft word text becomes blank but still holds its numbered position in the string. So I came up with this (probably awful) solution below.
It's in three steps.
When the text parses, it doesn't display any of the MSWord junk. But that text still holds a number position in a slice statement. So I want to use a regexp to find the first actual character.
Take that character and find out what its numbered position is in the total string.
Use a slice statement to cut it from.
def about_us_truncated
x = self.about_us.find.first(regExp representing first actual character)
x.charCount = y
self.about_us[y..125]
end
The only other idea i got, is a regex statement that allows it to explicitly slice only actual characters like so :
about_us([a-zA-Z][0..125]) , but that is definately not how it is written.
Here is some sample text of MS Word junk :
&Lt;! [If Gte Mso 9]>&Lt;Xml>&Lt;Br /> &Lt;O:Office Document Settings>&Lt;Br /> &Lt;O:Allow Png/>&Lt;Br /> &Lt;/O:Off...
You haven't provided much information to go off of, but don't be too leery of trying to build this regex on your own before you seek help...
Take your sample text and paste it in Rubular in the test string area and start building your regex. It has a great quick reference at the bottom.
Stumbled across this
http://gist.github.com/139987
it looks like it requires the sanitize gem.
This is technically not a straight answer, but it seems like the best possible one you can find.
In order to prevent MS Word, you should be using CK Editor's built-in MS word sanitizer. This is because writing regex for it can be very complicated and you can very easily break tags in half and destroy your site with it.
What I did as a workaround, is I did a force paste as plain text in the CK Editor.

Resources