How to avoid Hugo stripping unicode characters from page's slug? - url

I am working with Chinese content (using UTF-8), while most of the time it generates the right url, sometimes it strips certain Chinese characters from URL.
Some examples of these characters are:
〇
○
〡
〤
〢
⺮
〣
When generating a page for each character, i.e.: example.com/〇 it generates empty paths example.com// .
To reproduce this behaviour, add
slug: "foo〇○〡〤〢⺮〣21三bar"
in the front matter of any page Hugo will generate the following stripped path:
http://localhost:1313/foo21三bar/`
removing 〇○〡〤〢⺮〣.
Tested with latest Hugo release: Hugo Static Site Generator v0.30.2 linux/amd64 BuildDate: 2017-10-19T08:34:27-03:00
(x-post at discourse.hugo.com)

I found workaround to disable encoding/escaping of urls that contain UTF-8 (non english) chars, e.g.
<img {{ printf "src='%s%s'" .Site.BaseURL .imageUrl | safeHTMLAttr }} >

Related

importxml of url with Hebrew returns in encoding other than UTF-8 that chrome doesn't recognize

For example, in the dummy spreadsheet (tab 'desired outcome'), under "Link 1" you will see this URL:
http://www.promotion-il.co.il/service/%5DE%5E4%5D9%5E5-%5E8%5D9%5D7-%5D7%5E9%5DE%5DC%5D9-%5DC%5E2%5E1%5E7%5D9%5DD/
However, the actual URL in UTF-8 is:
http://www.promotion-il.co.il/service/%D7%9E%D7%A4%D7%99%D7%A5-%D7%A8%D7%99%D7%97-%D7%97%D7%A9%D7%9E%D7%9C%D7%99-%D7%9C%D7%A2%D7%A1%D7%A7%D7%99%D7%9D/
The actual URL string that contains Hebrew is:
http://www.promotion-il.co.il/service/מפיץ-ריח-חשמלי-לעסקים/
I will also add that the same URL has returned with a proper UTF-8 encoding for other blog posts. (See second example in the same tab).
Why is it happening?
How can it be fixed?
Thanks in advance!
This is the solution I came up with eventually:
I saw that for the imported urls - in order to fix a broken url 2 repalcements were needed:
5D --> D7%9
5E --> D7%A
I used this formula in a separate column to achieve it:
==ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE((<COLUMN WITH IMPORTED URLS HERE>),"5D","D7%9"),"5E","D7%A"))

regex to extract URLs from text - Ruby

I am trying to detect the urls from a text and replace them by wrapping in quotes like below:
original text: Hey, it is a url here www.example.com
required text: Hey, it is a url here "www.example.com"
original text show my input value and required text represents the required output. I searched a lot on web but could not find any possible solution. I already have tried URL.extract feature but that doesn't seem to detect URLs without http or https. Below are the examples of some of urls I want to deal with. Kindly let me know if you know the solution.
ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.
https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/
www.jstor.org/stable/24084454
www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/
insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so
www.cerege.fr/spip.php?page=pageperso&id_user=94
Find words who look like urls:
str = "ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.\n\nhttps://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/\n\nwww.jstor.org/stable/24084454\n\nwww.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/\n\ninsu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so\n\nwww.cerege.fr/spip.php?page=pageperso&id_user=94"
str.split.select{|w| w[/(\b+\.\w+)/]}
This will give you an array of words which have no spaces and include a one or more . characters which MIGHT work for your use case.
puts str.split.select{|w| w[/(\b+\.\w+)/]}
www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,
https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/
www.jstor.org/stable/24084454
www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/
insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so
www.cerege.fr/spip.php?page=pageperso&id_user=94
Updated
Complete solution to modify your string:
str_with_quote = str.clone # make a clone for the `gsub!`
str.split.select{|w| w[/(\b+\.\w+)/]}
.each{|url| str_with_quote.gsub!(url, '"' + url + '"')}
Now your cloned object wraps urls inside double quotes
puts str_with_quote
Will give you this output
ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, "www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les" Belles lettres, 2001.
"https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/"
"www.jstor.org/stable/24084454"
"www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/"
"insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so"
"www.cerege.fr/spip.php?page=pageperso&id_user=94"

Getting "Swiss Standard German, ss" character with UTF-8 collation

After searching in the archives I found a few resources that recommended replacing the character encoding (#1273 - Unknown collation: 'utf8mb4_unicode_ci' Cpanel).
This is what my collation was (CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;) and this is what I converted it to (CHARSET=utf8 COLLATE=utf8_unicode_ci). I also changed the collation in phpMyAdmin to utf8_unicode_ci. HTML meta charset is UTF-8. The problem persists. Issue can be viewed here by looking for words that have 'ss' in them: http://photonew.rasdesignmedia.com/about-roger-aguirre-smith/
Further testing shows that the problem does not exist with iOS. It persists in Chrome, Safari, Opera and Firefox.
The problem resided in my CSS and had nothing to do with collation at any level. Here is a link to more detail of what I found regarding CSS font-feature-settings: https://wordpress.stackexchange.com/a/281076/126173

Encoding issue with wkhtml2pdf footer-html/header-html parameters

I'm generating PDF from HTML file using utf-8 encoding. As a matter of fact, some of my titles contain non-ascii characters. In main PDF, everything is fine, encoding is right, rendering is OK.
When setting my footer.html file, I retreive the current section which does not display right whenever there is a non-ascii character in section name.
This section name is passed as a "section" URL parameter to footer.html call ; its encoding is already wrong. For example, I changed my ToC titel to the french "Table des matières" in my xsl file (that's the only change made to default xsl).
ToC page is fine in PDF, the "è" is correct. But in my footer, I retreive URI via JS which give me before unencoding : mati%C3%A8res ( when I expected it to be mati%E8res), which of course displays in my PDF footer as matières
I've found numerous pages about encoding issues in templates, main html file or header/footer html file, none of them was related to these parameters. How can I make it right in PDF ?
Sample run :
F:\test>wkhtmltopdf.exe --encoding "UTF-8" --footer-html footer.html \
--print-media-type toc --xsl-style-sheet default.xsl test.html test.pdf
Loading pages (1/6)
Counting pages (2/6)
Loading TOC (3/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done
The problem was from my JS code. On parameters I used unencode() instead of using the correct function decodeURIcomponent()

xcode info.plist build variable ${PRODUCT_NAME:rfc1034identifier} seems completely undocumented?

I'm trying to find documentation that describe the syntax and possibilities suggested by the construction ${PRODUCT_NAME:rfc1034identifier}. Obviously this turns into some version of the product name, but where is the documentation that describes how? I just grepped the entire /Developer directory, and got nothing useful.
I'm not looking for the narrow definition of what happens to this particular variable, I want to know about all such modifiers like rfc1034identifier.
By using strings I also dug out the following things that look like they're related to :rfc1034identifier:
:quote - adds backslashes before whitespaces (and more), for use in shell scripts
:identifier - replaces whitespace, slashes (and more) with underscores
:rfc1034identifier - replaces whitespace, slashes (and more) with dashes
:dir - don't know, observed replace with ./ in some cases
:abs - don't know
Exact command:
strings /Developer/Library/PrivateFrameworks/DevToolsCore.framework/Versions/A/DevToolsCore|grep '^:'
There are more things that look like interesting modifiers (for example, :char-range=%#), but I couldn't get these to work. There's only one example of :char-range on the net, and it's from a crash log for Xcode.
Someone asked how do we know it's a modifier specification. Well, we know because it works on multiple variables in build settings. Plist preprocessor probably uses the same mechanisms to resolve build variables as does the build system.
Hack Saw, if you get a response via that bug report, don't forget to keep us informed :-)
Looks like you can stack these as well. The useful case floating around out there is
com.yourcompany.${PRODUCT_NAME:rfc1034identifier:lower}
such that a product name of "Your App" becomes com.yourcompany.your-app.
At long last, Apple produced some documentation on this. This is in the "Text Macros" section of the Xcode manual, as of this date.
Text macro format reference
A text macro can contain any valid unicode text. It can also contain other text macros.
Including other text macros
To include another text macro, add three underscore (_) characters before and after the macro name:
___<MacroName>___
Modifying text macro expansion
You can modify the final expansion of the text macro by adding one or more modifiers. Add a modifier to a text macro by placing a colon (:) at the end of the macro followed by the modifier. Add multiple modifiers by separating each one with a comma (,).
<MACRO>:<modifier>[,<modifier>]…
For example, the following macro will remove the path extension from the FILENAME macro:
FILENAME:deletingPathExtension
To turn the modified macro above into a valid C identifier, add the identifier macro:
FILENAME:deletingPathExtension,identifier
Modifiers
bundleIdentifier: Replaces any non-bundle identifier characters with a hyphen (-).
deletingLastPathComponent: Removes the last path component from the expansion string.
deletingPathExtension: Removes any path extension from the expansion string.
deletingTrailingDot: Removes any trailing dots (.).
identifier: Replaces any non-C identifier characters with an underscore (_).
lastPathComponent: Returns just the last path component of the expansion string.
pathExtension: Returns the path extension of the expansion string.
rfc1034Identifier: Replaces any non-rfc1034 identifier characters with a hyphen (-).
xml: Replaces special xml characters with the corresponding escape string. For example, less-than (<) is replaced with <
TEXT MACROS
Text macros reference
COPYRIGHT
A copyright string that uses the company name of the team for the project. If there is no company name, the string is blank.
The example shows a copyright string when the company is set to “Apple”.
Copyright © 2018 Apple. All rights reserved.
DATE
The current date.
DEFAULTTOOLCHAINSWIFTVERSION
The version of Swift used for the default toolchain.
FILEBASENAME
The name of the current file without any extension.
FILEBASENAMEASIDENTIFIER
The name of the current file encoded as a C identifier.
FILEHEADER
The text placed at the top of every new text file.
FILENAME
The full name of the current file.
FULLUSERNAME
The full name of the current macOS user.
NSHUMANREADABLECOPYRIGHTPLIST
The entry for the human readable copyright string in the Info.plist file of a macOS app target. The value of the macro must include the XML delimiters for the plist. For example, a valid value is:
'''
<key>NSHumanReadableCopyright</key>
<string>Copyright © 2018 Apple, Inc. All rights reserved.</string>
'''
Notice that the value includes a newline.
ORGANIZATIONNAME
The name for your organization that appears in boilerplate text throughout your project folder. The organization name in your project isn’t the same as the organization name that you enter in App Store Connect.
PACKAGENAME
The name of the package built by the current scheme.
PACKAGENAMEASIDENTIFIER
A C-identifier encoded version of the package name built by the current scheme.
PRODUCTNAME
The app name of the product built by the current scheme.
PROJECTNAME
The name of the current project.
RUNNINGMACOSVERSION
The version of macOS that is running Xcode.
TARGETNAME
The name of the current target.
TIME
The current time.
USERNAME
The login name for the current macOS user.
UUID
Returns a unique ID. The first time this macro is used, it generates the ID before returning it. You can use this macro to create multiple unique IDs by using a modifier. Each modifier returns an ID that is unique for that modifier. For example, the first time the UUID:firstPurpose modifier is used, the macro generates and returns a unique ID for that macro and modifier combination. Subsequent uses of the UUID:firstPurpose modifier return the same ID. Adding the UUID:secondPurpose modifier generates and returns a different ID that will be unique to UUID:secondPurpose, and different from the ID for UUID:firstPurpose.
WORKSPACENAME
The name of the current workspace. If there is only one project open, then the name of the current project.
YEAR
The current year as a four-digit number.
$ strings /Developer/Library/PrivateFrameworks/DevToolsCore.framework/Versions/A/DevToolsCore
PRODUCTNAME
PRODUCTNAMEASIDENTIFIER
PRODUCTNAMEASRFC1034IDENTIFIER
PRODUCTNAMEASXML
It seems that there are :identifier, :rfc1034identifier and :xml modifiers. But I have no clue except this.
After stumbling over this question and its existing answers, I have to say: Apples documentation did not improve on this topic over the recent years. We are currently at Xcode 13 and there is still no complete list of all modifiers available.
Therefore I did some spelunking and found the supported modifiers in DVTFoundation.framework which I will list below.
I've tested them all in Xcode 13.3 build settings and used the following two macros to illustrate their impact:
MY_MACRO = Some "text" with umlauts äöüçñ and special characters are ',/|\-_:;%&<>.!
MY_SOURCE = /Applications/Xcode.app/Contents/Frameworks/../SharedFrameworks/DVTFoundation.framework
Retrieval operators/modifiers
Retrieval modifiers are used to extract and/or transform all or parts of a macro/variable/setting.
They are applied using the following syntax: $(<VARIABLE>:<MODIFIER>)
quote: Escapes all characters which have a special meaning in shell scripts/commands like space, colon, semicolon and backslash.
RESULT_quote = $(MY_MACRO:quote)
Some\ \"text\"\ with\ umlauts\ äöüçñ\ and\ special\ characters\ are\ \',/|\\-_:;%&<>.!
upper: Transforms all characters to their uppercase equivalents.
RESULT_upper = $(MY_MACRO:upper)
SOME "TEXT" WITH UMLAUTS ÄÖÜÇÑ AND SPECIAL CHARACTERS ARE ',/|\-_:;%&<>.!
lower: Transforms all characters to their lowercase equivalents.
RESULT_lower = $(MY_MACRO:lower)
some "text" with umlauts äöüçñ and special characters are ',/|\-_:;%&<>.!
identifier: Replaces any non-C identifier characters with an underscore (_).
RESULT_identifier = $(MY_MACRO:identifier)
Some__text__with_umlauts_______and_special_characters_are________________
rfc1034identifier: Replaces any non-rfc1034 identifier characters with a hyphen (-)
RESULT_rfc1034identifier = $(MY_MACRO:rfc1034identifier)
Some--text--with-umlauts-------and-special-characters-are----------------------------
c99extidentifier: Replaces any non-C99 identifier characters with an underscore (_). Umlauts are allowed as C99 uses Unicode!
RESULT_c99extidentifier = $(MY_MACRO:c99extidentifier)
Some__text__with_umlauts_äöüçñ_and_special_characters_are___________________________
xml: According to Apple documentation it should replace special xml characters with the corresponding escape string. For example, less-than (<) is replaced with <. But in my examples this didn't work.
RESULT_xml = $(MY_MACRO:xml)
Some "text" with umlauts äöüçñ and special characters are ',/|\-_:;%&<>.!
dir: Extracts the directory part of a path
RESULT_dir = $(MY_SOURCE:dir)
/Applications/Xcode.app/Contents/Frameworks/../SharedFrameworks/
file: Extracts the filename part of a path
RESULT_file = $(MY_SOURCE:file)
DVTFoundation.framework
base: Extracts the filename base part of a path (=filename without suffix/extension)
RESULT_base = $(MY_SOURCE:base)
DVTFoundation
suffix: Extracts the filename extension/suffix a path or filename
RESULT_suffix = $(MY_SOURCE:suffix)
.framework
standardizepath: Standardizes the path (e.g. ../ and tilde (~) are resolved)
RESULT_standardizepath = $(MY_SOURCE:standardizepath)
/Applications/Xcode.app/Contents/SharedFrameworks/DVTFoundation.framework
Replacement operators/modifiers
Beside above extracting/transforming operators, there is support built into the build settings system to replace specific parts of a directory which are matched using a modifier.
They are applied using the following syntax: $(<VARIABLE>:<MODIFIER>=<VALUE>)
dir=<VALUE>: Replaces the directory part of a path with <VALUE> and returns the new path
RESULT2_dir = $(MY_SOURCE:dir=/Developer/SharedFrameworks)
/Developer/SharedFrameworks/DVTFoundation.framework
file=<VALUE>: Replaces the filename part of a path and returns the new path
RESULT2_file = $(MY_SOURCE:file=my_file.txt)
/Applications/Xcode.app/Contents/Frameworks/../SharedFrameworks/my_file.txt
base=<VALUE>: Replaces the filename base part of a path (=filename without suffix/extension) and returns the new path
RESULT2_base = $(MY_SOURCE:base=Dummy)
/Applications/Xcode.app/Contents/Frameworks/../SharedFrameworks/Dummy.framework
suffix=<VALUE>: Replaces the filename extension/suffix a path and returns the new path
RESULT2_suffix = $(MY_SOURCE:suffix=.txt)
/Applications/Xcode.app/Contents/Frameworks/../SharedFrameworks/DVTFoundation.txt
I hope this list will help more people looking at Xcodes build settings and wondering how they can be transformed.

Resources