How can I simplify statements like these in an =OR() statement? - google-sheets

isnumber(search("-tr",right(j2,3
))),isnumber(search("-trus",right(j2,5))),isnumber(search(" ll",right(j2,3))),isnumber(search(" homes",right(j2,6))),isnumber(search("the ",left(j2,4))),isnumber(search(" hoa",right(j2,4))),isnumber(search("b ch",right(j2,4))),isnumber(search(" ch",right(j2,3))),isnumber(search("-trs",right(j2,4))),isnumber(search(" prop",right(j2,5))),isnumber(search(" st",right(j2,3))),isnumber(search(" av",right(j2,3))),isnumber(search(" ave",right(j2,4))),isnumber(search(" servi",right(j2,6))),isnumber(search(" maint",right(j2,6))),isnumber(search(" home",right(j2,5))),isnumber(search(" tr",right(j2,3))),isnumber(search(" assn",right(j2,5))),isnumber(search(" co",right(j2,3))),isnumber(search(" trus",right(j2,5))),isnumber(search(" trs",right(j2,4))),isnumber(search("-trs",right(j2,4))),isnumber(search(" tru",right(j2,4))),isnumber(search("jtrs",right(j2,4))),isnumber(search(" est of",right(j2,7))),isnumber(search(" trs",right(j2,4))),isnumber(value(LEFT(j2,1))),isnumber(search(" apts",right(j2,5))),isnumber(value(right(j2,3))),isnumber(search(" grp",right(j2,4))),isnumber(value(left(right(j2,4),1))),isnumber(search(" mgmt",right(j2,5))),isnumber(search(" props",right(j2,6))),isnumber(search(" tr",right(j2,3))),isnumber(search(" dev",right(j2,4))),isnumber(search(" tr",right(j2,3))),isnumber(search(" fdn",right(j2,4))),isnumber(search(" ent",right(j2,4))),isnumber(search(" PRPTS",right(j2,6))),isnumber(search(" ARPTS",right(j2,6))),isnumber(search(" univ",right(j2,5)))
So I have this giant =OR() statement containing a bunch of isnumner(search() statements checking to see if the string in a cell ends in these phrases. It is for the purpose of identifying company names in lists that contain both peoples names and company names. I feel like there must be a more efficient way. Adding them all together in one isnumber(search() in this format {item1|item2|item3} does not work.
I feel like there must be a more efficient way.

Building on the answer provided here, matching the end of the string can be done by using the $-sign (which means 'end of the string in regular expressions). Matching the beginning of the string on the other hand is done by providing a pattern after a caret (^), indicating the beginning of a string.
So, if you'd want to add both the the formula provided in the other thread
(LP|JT/RS)$ : match LP OR JT/RS at the end of the string
^(ABC|DEF) : match ABC OR DEF at the beginning of the string
That would make the formula look something like:
=REGEXMATCH(A2, "(?i)LLC|CORPORATION|COMPANY|HOLDINGS|PARTNERS|EQUITY|(LP|JT/RS)$|^(ABC|DEF)")
REFERENCE:
REGEXMATCH()
RE2 SYNTAX

Related

Generate Random String with Specific Restrictions

Tring to generate a random string but it needs to be formatted a specific way.
N = number
L = Capital Letter
must be NL-NN
needs hyphen as well
examples: 5K-22, 9L-19, 0R-66
every method I have tried has just generated a string but without the hyphen, I know it is probably something simple my brain just hurts thinking on it so I thought I'd see if one of yall could give me a hand.
Thanks
Try this:
function randomchar(a,b)
return string.char(math.random(string.byte(a),string.byte(b)))
end
a=randomchar('0','9')
b=randomchar('A','Z')
c=randomchar('0','9')
d=randomchar('0','9')
print(a..b..'-'..c..d)

BBEdit: how to write a replacement pattern when a back reference is immediately followed by a number

I'm new to GREP in BBEdit. I need to find a string inside an XML file. Such string is enclosed in quotes. I need to replace only what's inside the quotes.
The problem is that the replacement string starts with a number thus confuses BBEdit when I put together the replacement pattern. Example:
Original string in XML looks like this:
What I need to replace it with:
01 new file name.png
My grep search and replace patterns:
Using the replacement pattern above, BBEdit wrongly thinks that the first backreference is "\101" when what I really need it understand is that I mean "\01".
TIA for any help.
Your example is highly artificial because in fact there is no need for your \1 or \3 as you know their value: it is " and you can just type that directly to get the desired result.
"01 new file name.png"
However, just for the sake of completeness, the answer to your actual question (how to write a replacement group number followed by a number) is that you write this:
\0101 new file name.png\3
The reason that works is that there can only be 99 capture groups, so \0101 is parsed as \01 (the first capture group) followed by literal 01.

how to call the deepest parentheses python

I'm trying to do this example :
sentence="{My name is {Adam} and I don't work here}"
Result should be 'Adam'
So what I'm trying to say is however many parenthesis exist I want the result to show the value of the last closed parenthesis
It's not clear from your question, but if there can only ever be one set of outer braces at any level (i.e. "{My name} {is {Adam}}" and "{My {name} is {Adam}}" are invalid input), you can take advantage of the fact that what you want is the last opening brace in the sentence.
def deepest(sentence):
intermediate = sentence.rpartition("{")[-1]
return intermediate[:intermediate.index("}")]
deepest("{My name is {Adam} and I don't work here}")
# 'Adam'
deepest("{Someone {set us {{up} the bomb}!}}")
# 'up'
The regex answer also makes this assumption, though regex is likely to be much slower. If multiple outer braces are possible, please make your question clearer.
You can't just index strings like that... The best way is to use a clever regex:
>>> import re
>>> re.search(r'{[^{}]*}', "{My name is {Adam} and I don't work here}").group()
'{Adam}'
This regex pattern essentially searches for every set of {} that doesn't have the characters { or } in them.

(F) Lex, how do I match negation?

Some language grammars use negations in their rules. For example, in the Dart specification the following rule is used:
~('\'|'"'|'$'|NEWLINE)
Which means match anything that is not one of the rules inside the parenthesis. Now, I know in flex I can negate character rules (ex: [^ab] , but some of the rules I want to negate could be more complicated than a single character so I don't think I could use character rules for that. For example I may need to negate the sequence '"""' for multiline strings but I'm not sure what the way to do it in flex would be.
(TL;DR: Skip down to the bottom for a practical answer.)
The inverse of any regular language is a regular language. So in theory it is possible to write the inverse of a regular expression as a regular expression. Unfortunately, it is not always easy.
The """ case, at least, is not too difficult.
First, let's be clear about what we are trying to match.
Strictly speaking "not """" would mean "any string other than """". But that would include, for example, x""".
So it might be tempting to say that we're looking for "any string which does not contain """". (That is, the inverse of .*""".*). But that's not quite correct either. The typical usage is to tokenise an input like:
"""This string might contain " or ""."""
If we start after the initial """ and look for the longest string which doesn't contain """, we will find:
This string might contain " or "".""
whereas what we wanted was:
This string might contain " or "".
So it turns out that we need "any string which does not end with " and which doesn't contain """", which is actually the conjunction of two inverses: (~.*" ∧ ~.*""".*)
It's (relatively) easy to produce a state diagram for that:
(Note that the only difference between the above and the state diagram for "any string which does not contain """" is that in that state diagram, all the states would be accepting, and in this one states 1 and 2 are not accepting.)
Now, the challenge is to turn that back into a regular expression. There are automated techniques for doing that, but the regular expressions they produce are often long and clumsy. This case is simple, though, because there is only one accepting state and we need only describe all the paths which can end in that state:
([^"]|\"([^"]|\"[^"]))*
This model will work for any simple string, but it's a little more complicated when the string is not just a sequence of the same character. For example, suppose we wanted to match strings terminated with END rather than """. Naively modifying the above pattern would result in:
([^E]|E([^N]|N[^D]))* <--- DON'T USE THIS
but that regular expression will match the string
ENENDstuff which shouldn't have been matched
The real state diagram we're looking for is
and one way of writing that as a regular expression is:
([^E]|E(E|NE)*([^EN]|N[^ED]))
Again, I produced that by tracing all the ways to end up in state 0:
[^E] stays in state 0
E in state 1:
(E|NE)*: stay in state 1
[^EN]: back to state 0
N[^ED]:back to state 0 via state 2
This can be a lot of work, both to produce and to read. And the results are error-prone. (Formal validation is easier with the state diagrams, which are small for this class of problems, rather than with the regular expressions which can grow to be enormous).
A practical and scalable solution
Practical Flex rulesets use start conditions to solve this kind of problem. For example, here is how you might recognize python triple-quoted strings:
%x TRIPLEQ
start \"\"\"
end \"\"\"
%%
{start} { BEGIN( TRIPLEQ ); /* Note: no return, flex continues */ }
<TRIPLEQ>.|\n { /* Append the next token to yytext instead of
* replacing yytext with the next token
*/
yymore();
/* No return yet, flex continues */
}
<TRIPLEQ>{end} { /* We've found the end of the string, but
* we need to get rid of the terminating """
*/
yylval.str = malloc(yyleng - 2);
memcpy(yylval.str, yytext, yyleng - 3);
yylval.str[yyleng - 3] = 0;
return STRING;
}
This works because the . rule in start condition TRIPLEQ will not match " if the " is part of a string matched by {end}; flex always chooses the longest match. It could be made more efficient by using [^"]+|\"|\n instead of .|\n, because that would result in longer matches and consequently fewer calls to yymore(); I didn't write it that way above simply for clarity.
This model is much easier to extend. In particular, if we wanted to use <![CDATA[ as the start and ]]> as the terminator, we'd only need to change the definitions
start "<![CDATA["
end "]]>"
(and possibly the optimized rule inside the start condition, if using the optimization suggested above.)

Matching function in erlang based on string format

I have user information coming in from an outside source and I need to check if that user is active. Sometimes I have a User and a Server and other times I have User#Server. The former case is no problem, I just have:
active(User, Server) ->
do whatever.
What I would like to do with the User#Server case is something like:
active([User, "#", Server]) ->
active(User, Server).
Doesn't seem to work. When calling active in the erlang terminal with a#b for example, I get an error that there is no match. Any help would be appreciated!
You can tokenize the string to get the result:
active(UserString) ->
[User,Server] = string:tokens(UserString,"#"),
active(User,Server).
If you need something more elaborate, or with better handling of something like email addresses, it might then be time to delve into using regular expressions with the re module.
active(UserString) ->
RegEx = "^([\\w\\.-]+)#([\\w\\.-]+)$",
{match, [User,Server]} = re:run(UserString,RegEx,[{capture,all_but_first,list}]),
active(User,Server).
Note: The supplied Regex is hardly sufficient for email address validation, it's just an example that allows all alphanumeric characters including underscores (\\w), dots (\\.), and dashes (-) seperated by an at symbol. And it will fail if the match doesn't stretch the whole length of the string: (^ to $).
A note on the pattern matching, for the real solution to your problem I think #chops suggestions should be used.
When matching patterns against strings I think it's useful to keep in mind that erlang strings are really lists of integers. So the string "#" is actually the same as [64] (64 being the ascii code for #)
This means that you match pattern [User, "#", Server] will match lists like: [97,[64],98], but not "a#b" (which in list form is [97,64,98]).
To match the string you need to do [User,$#,Server]. The $ operator gives you the ascii value of the character.
However this match pattern limits the matching string to be 1 character followed by # and then one more character...
It can be improved by doing [User, $# | Server] which allows the server part to have arbitrary length, but the User variable will still only match one single character (and I don't see a way around that).

Resources