I'm using the following ARRAYFORMULA to create an image path:
=ARRAYFORMULA(
if(row(A:A)=1,"#Icon",IF(
B:B="",,SUBSTITUTE(
"../../../../../../_Assets/Icons/"& LOWER(B:B&".png"), " ", "_")
)
)
)
What it does
Adding a path before the text and replaces all spaces with an underscore '_'. Here is an example:
Name
#icon
A Tit(l)e
../../../../../../_Assets/Icons/a_tit(l)e.png
Title - Subtitle
../../../../../../_Assets/Icons/title _-_subtitle.png
Title text/string - Subtitle
../../../../../../_Assets/Icons/title_text/string _-_subtitle.png
What I want it to do
If possible, I would like to achieve the following:
Avoiding/removing characters in the list below like the forward slash / with an underscore _ (see the last row in my example above)
It allready replaces all white spaces with an underscore _ which is good. But when it sees a whitespace followed by a - and another whitespace it will output _-_ but then I want only a -
So the current table above would output the following instead:
Name
#icon
A Tit(l)e
../../../../../../_Assets/Icons/a_tit(l)e.png
Title - Subtitle
../../../../../../_Assets/Icons/title-subtitle.png
Title text/string - Subtitle
../../../../../../_Assets/Icons/title_text_string-subtitle.png
List of characters to be avoided/replaced with an underscore _:
# pound
% percent
& ampersand
{ left curly bracket
} right curly bracket
\ back slash
< left angle bracket
> right angle bracket
* asterisk
? question mark
/ forward slash
blank spaces
$ dollar sign
! exclamation point
' single quotes
" double quotes
: colon
# at sign
+ plus sign
` backtick
| pipe
= equal sign
Any help/suggestion would be much appreciated!
Put list of avoided chars into column and use REGEXREPLACE:
=ARRAYFORMULA(if(row(A:A)=1,"#Icon",IF(A:A="",,"../../../../../../_Assets/Icons/"&LOWER(REGEXREPLACE(REGEXREPLACE(A:A," - ","-"),TEXTJOIN("|\",0,D2:D23),"_")) & ".png")))
try:
=ARRAYFORMULA({"#Icon",
IF(B2:B="",,SUBSTITUTE(SUBSTITUTE(
"../../../../../../_Assets/Icons/"&LOWER(B2:B&".png"), " ", "_"), "_-_", "-", 1))})
The word ." prints a string. More precisely it compiles the (.") and the string up to the next " in the currently compiled word.
But how can I print
That's the "question".
with Forth?
In a Forth-2012 System (e.g. Gforth) you can use string literals with escaping via the word s\" as:
: foo ( -- ) s\" That's the \"question\"." type ;
In a Forth-94 system (majority of standard systems) you can use arbitrary parsing and the word sliteral as:
: foo ( -- ) [ char | parse That's the "question".| ] sliteral type ;
A string can be also extracted up to the end of the line (without printable delimiter); a multi-line string can be extracted too.
Specific helpers for particular cases can be easily defined.
For example, see the word s$ for string literals that are delimited by any arbitrary printable character, e.g.:
s$ `"test" 'passed'` type
Old school:
34 emit
Output:
"
Using gforth:
: d 34 emit ;
cr ." That's the " d ." question" d ." ." cr
Output:
That's the "question".
I am attempting to use this string:
var passwordRegex = "^[A-Za-z0-9 !\"#$%&'()*+,-./:;<=>?#[\\]^_`{|}~].{8,}$"
As my regular expression pattern, but it keeps failing saying the pattern is invalid. I used the \ character to escape for the characters: " and \, but it throws the error: invalid escape sequence in literal for regex key characters like ^ & [ ] | . etc.
What am I missing in order to allow the characters:
! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~
(including space) in my regex? I assume it's something with how I am escaping, but I can't find anything anywhere for these characters in regards to SWIFT's regular expression.
The problem is that characters [, \, and ] need to be escaped because they have special meaning in a regular expression.
So you need \[, \\, and \] in the regular expression. But since this is inside a Swift string, each \ needs to be escaped with a \.
So [\] becomes \[\\\] in the regular expression which becomes \\[\\\\\\] in the Swift string.
The final valid string is:
var passwordRegex = "^[A-Za-z0-9 !\"#$%&'()*+,-./:;<=>?#\\[\\\\\\]^_`{|}~].{8,}$"
I made a pastebin site where each entry gets a random string. For example
example.com/ds34
example.com/sdf-2zA
example.com/234+_2
My question is, what is the grammar rule for these strings?
Can that start with anything? which characters are/aren't allowed?
See in RFC and w3.org. In short - any ASCII symbol excluding reserved ! * ' ( ) ; : # & = + $, / ? % # [ ]. Other symbols can be percent-encoded.
Does anyone know the full list of characters that can be used within a GET without being encoded? At the moment I am using A-Z a-z and 0-9... but I am looking to find out the full list.
I am also interested into if there is a specification released for the up coming addition of Chinese, Arabic url's (as obviously that will have a big impact on my question)
EDIT: As #Jukka K. Korpela correctly points out, RFC 1738 was updated by RFC 3986.
This has expanded and clarified the characters valid for host, unfortunately it's not easily copied and pasted, but I'll do my best.
In first matched order:
host = IP-literal / IPv4address / reg-name
IP-literal = "[" ( IPv6address / IPvFuture ) "]"
IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
IPv6address = 6( h16 ":" ) ls32
/ "::" 5( h16 ":" ) ls32
/ [ h16 ] "::" 4( h16 ":" ) ls32
/ [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
/ [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
/ [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32
/ [ *4( h16 ":" ) h16 ] "::" ls32
/ [ *5( h16 ":" ) h16 ] "::" h16
/ [ *6( h16 ":" ) h16 ] "::"
ls32 = ( h16 ":" h16 ) / IPv4address
; least-significant 32 bits of address
h16 = 1*4HEXDIG
; 16 bits of address represented in hexadecimal
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
dec-octet = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
/ "25" %x30-35 ; 250-255
reg-name = *( unreserved / pct-encoded / sub-delims )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" <---This seems like a practical shortcut, most closely resembling original answer
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "#"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
pct-encoded = "%" HEXDIG HEXDIG
Original answer from RFC 1738 specification:
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
^ obsolete since 1998.
The characters allowed in a URI are either reserved or unreserved (or a percent character as part of a percent-encoding)
http://en.wikipedia.org/wiki/Percent-encoding#Types_of_URI_characters
says these are RFC 3986 unreserved characters (sec. 2.3) as well as reserved characters (sec 2.2) if they need to retain their special meaning. And also a percent character as part of a percent-encoding.
The full list of the 66 unreserved characters is in RFC3986, here: https://www.rfc-editor.org/rfc/rfc3986#section-2.3
This is any character in the following regex set:
[A-Za-z0-9_.\-~]
I tested it by requesting my website (apache) with all available chars on my german keyboard as URL parameter:
http://example.com/?^1234567890ß´qwertzuiopü+asdfghjklöä#<yxcvbnm,.-°!"§$%&/()=? `QWERTZUIOPÜ*ASDFGHJKLÖÄ\'>YXCVBNM;:_²³{[]}\|µ#€~
These were not encoded:
^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,.-!/()=?`*;:_{}[]\|~
Not encoded after urlencode():
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_
Not encoded after rawurlencode():
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_~
Note: Before PHP 5.3.0 rawurlencode() encoded ~ because of RFC 1738. But this was replaced by RFC 3986 so its safe to use, now. But I do not understand why for example {} are encoded through rawurlencode() because they are not mentioned in RFC 3986.
An additional test I made was regarding auto-linking in mail texts. I tested Mozilla Thunderbird, aol.com, outlook.com, gmail.com, gmx.de and yahoo.de and they fully linked URLs containing these chars:
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_~+#,%&=*;:#
Of course the ? was linked, too, but only if it was used once.
Some people would now suggest to use only the rawurlencode() chars, but did you ever hear that someone had problems to open these websites?
Asterisk
http://wayback.archive.org/web/*/http://google.com
Colon
https://en.wikipedia.org/wiki/Wikipedia:About
Plus
https://plus.google.com/+google
At sign, Colon, Comma and Exclamation mark
https://www.google.com/maps/place/USA/#36.2218457,...
Because of that these chars should be usable unencoded without problems. Of course you should not use &; because of encoding sequences like &. The same reason is valid for % as it used to encode chars in general. And = as it assigns a value to a parameter name.
Finally I would say its ok to use these unencoded:
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_~!+,*:#
But if you expect randomly generated URLs you should not use punctuation marks like .!, because some mail apps will not auto-link them:
http://example.com/?foo=bar! < last char not linked
From here
Thus, only alphanumerics, the special characters $-_.+!*'(),
and reserved characters used for their
reserved purposes may be used unencoded within a URL.
RFC3986 defines two sets of characters you can use in a URI:
Reserved Characters: :/?#[]#!$&'()*+,;=
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "#"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent.
Unreserved Characters: A-Za-z0-9-_.~
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
Characters that are allowed in a URI but do not have a reserved purpose are called unreserved.
These are listed in RFC3986. See the Collected ABNF for URI to see what is allowed where and the regex for parsing/validation.
This answer discusses characters may be included inside a URL fragment part without being escaped. I'm posting a separate answer since this part is slightly different than (and can be used in conjunction with) other excellent answers here.
The fragment part is not sent to the server and it is the characters that go after # in this example:
https://example.com/#STUFF-HERE
Specification
The relevant specifications in RFC 3986 are:
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
This also references rules in RFC 2234
ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
DIGIT = %x30-39 ; 0-9
Result
So the full list, excluding escapes (pct-encoded) are:
A-Z a-z 0-9 - . _ ~ ! $ & ' ( ) * + , ; = : # / ?
For your convenience here is a PCRE expression that matches a valid, unescaped fragment:
/^[A-Za-z0-9\-._~!$&'()*+,;=:#\/?]*$/
Encoding
Counting this up, there are:
26 + 26 + 10 + 19 = 81 code points
You could use base 81 to efficiently encode data here.
The upcoming change is for chinese, arabic domain names not URIs. The internationalised URIs are called IRIs and are defined in RFC 3987. However, having said that I'd recommend not doing this yourself but relying on an existing, tested library since there are lots of choices of URI encoding/decoding and what are considered safe by specification, versus what are safe by actual use (browsers).
If you like to give a special kind of experience to the users you could use pushState to bring a wide range of characters to the browser's url:
var u="";var tt=168;
for(var i=0; i< 250;i++){
var x = i+250*tt;
console.log(x);
var c = String.fromCharCode(x);
u+=c;
}
history.pushState({},"",250*tt+u);