Why does CommonMark consider this 7 space indent a code block? - commonmark

I'm storing some notes with CommonMark and I noticed that this snippet seems to render differently on SO (echo is indented 7 spaces).
1. Print Windows folder path
echo %windir%
Here it is interpreted as a code block on http://spec.commonmark.org/dingus/:
And here it is on Stack Overflow:
If I indent echo by 8 spaces instead, it will now show as a code block on Stack Overflow:
But on http://spec.commonmark.org/dingus/ it now has a leading space (I've selected it to show):
Is this because SO isn't actually using the full CommonMark spec (yet?)?
Or is there a CommonMark setting to make it render the way SO does?
This is a little annoying because many of my notes are indeed text that could find their way into a question or answer somewhere on Stack Exchange. So I'm just hoping to figure out what's going on here.

Turns out this was answered by #balpha in this June 2015 post:
https://meta.stackexchange.com/a/258587/879
List items
Currently, this will create a list item with two paragraphs:
1. This is the first paragraph
And this is the second one.
With CommonMark (and even in a significant number of other Markdown
implementations),
the "second one" will not be part of the list item, but a stand-alone
paragraph after the list. To make it part of the list item, you have
to indent it to the same margin as the first paragraph like this:
1. This is the first paragraph
And this is the second one.
I was confused by what seemed like a somewhat arbitrary "7 spaces" but now I realize it's in fact 3 spaces to bring it under alignment with the list item, and then the standard 4 spaces which identifies it as a code block.
Confirmed by this little test:
This actually makes good sense to me, I think I like it.
So that answers that, I guess SE still uses its own variant that doesn't involve this type of alignment.

Related

How do I get auto-conversion of Part [[ double-brackets ]] on paste?

A pet peeve of mine is the use of double square brackets for Part rather than the single character \[LeftDoubleBracket] and \[RightDoubleBracket]. I would like to have these automatically replaced when pasting plain-text code (from StackOverflow for example) into a Mathematica Notebook. I have been unable to configure this.
Can it be done with ImportAutoReplacements or another automatic method (preferred), or will I need use a method like the "Paste Tabular Data Palette" referenced here?
Either way, I am not good with string parsing, and I want to learn the best way to handle bracket counting.
Sjoerd gave Defer and Simon gave Ctrl+Shift+N which both cause Mathematica to auto-format code. These are fine options.
I am still interested in a method that is automatic and/or preserves as much of the original code as possible. For example, maintaining prefix f#1, infix 1 ~f~ 2, and postfix 1 // f functions in their original forms.
A subsection of this question was reposted as Matching brackets in a string and received several good answers.
Not really an answer, but a thread on entering the double [[ ]] pair (with the cursor between both pairs) using a single keystroke occurred a couple of weeks ago on the mathgroup. It didn't help me, but for others this was a solution apparently.
EDIT
to make good on my slightly off-topic first response here's a pattern replacement that seems to do the job (although I have difficulties myself to understand why it should be b and not b_; the latter doesn't work):
Defer[f[g[h[[i[[j[2], k[[1, m[[1, n[2]]]]]]]]]]]] /.
HoldPattern[Part[b, a_]] -> HoldPattern[b\[LeftDoubleBracket]a\[RightDoubleBracket]]
I leave the automation part to you.
EDIT 2
I discovered that if you add the above rule to ImportAutoReplacements and paste your SO code in a notebook in a Defer[] and evaluate this, you end up with a usable form with double brackets which can be used as input somewhere else.
EDIT 3
As remarked by Mr.Wizard invisibly below in the comments, the replacement rule isn't necessary. Defer does it on its own! Scientific progress goes "Boink", to cite Bill Watterson.
EDIT 4
The jury is still out on Defer. It has some peculiar side effects, and doesn't work well on all expressions. try the "Paste Tabular Data Palette" in the toolbag question for instance. Pasting this block of code in Defer and executing gives me this:
It worked much better in another code snippet from the same thread:
The second part is how it looks after turning it in to input by editing the output of the first block (basically, I inserted a couple of returns to restore the format). This turns it into Input. Please notice that all double brackets turned into the correct corresponding symbol, but notice also the changing position of ReleaseHold.
Simon wrote in a comment, but declined to post as an answer, something fairly similar to what I requested, though it is not automatic on paste, and is not in isolation from other formatting.
(One can) select the text and press Ctrl+Shift+N to translate to StandardForm

Sanitize pasted text from MS-Word

Here's my wild and whacky psuedo-code. Anyone know how to make this real?
Background:
This dynamic content comes from a ckeditor. And a lot of folks paste Microsoft Word content in it. No worries, if I just call the attribute untouched it loads pretty. But the catch is that I want it to be just 125 characters abbreviated. When I add truncation to it, then all of the Microsoft Word scripts start popping up. Then I added simple_format, and sanitize, and truncate, and even made my controller start spotting out specific variables that MS would make and gsub them out. But there's too many of them, and it seems like an awfully messy way to accomplish this. Thus so! Realizing that by itself, its clean. I thought, why not just slice it. However, the microsoft word text becomes blank but still holds its numbered position in the string. So I came up with this (probably awful) solution below.
It's in three steps.
When the text parses, it doesn't display any of the MSWord junk. But that text still holds a number position in a slice statement. So I want to use a regexp to find the first actual character.
Take that character and find out what its numbered position is in the total string.
Use a slice statement to cut it from.
def about_us_truncated
x = self.about_us.find.first(regExp representing first actual character)
x.charCount = y
self.about_us[y..125]
end
The only other idea i got, is a regex statement that allows it to explicitly slice only actual characters like so :
about_us([a-zA-Z][0..125]) , but that is definately not how it is written.
Here is some sample text of MS Word junk :
≪! [If Gte Mso 9]>≪Xml>≪Br /> ≪O:Office Document Settings>≪Br /> ≪O:Allow Png/>≪Br /> ≪/O:Off...
You haven't provided much information to go off of, but don't be too leery of trying to build this regex on your own before you seek help...
Take your sample text and paste it in Rubular in the test string area and start building your regex. It has a great quick reference at the bottom.
Stumbled across this
http://gist.github.com/139987
it looks like it requires the sanitize gem.
This is technically not a straight answer, but it seems like the best possible one you can find.
In order to prevent MS Word, you should be using CK Editor's built-in MS word sanitizer. This is because writing regex for it can be very complicated and you can very easily break tags in half and destroy your site with it.
What I did as a workaround, is I did a force paste as plain text in the CK Editor.

Latex - Apply an operation to every character in a string

I am using LaTeX and I have a problem concerning string manipulation.
I want to have an operation applied to every character of a string, specifically
I want to replace every character "x" with "\discretionary{}{}{}x". I want to do
this because I have a long string (DNA) which I want to be able to separate at
any point without hyphenation.
Thus I would like to have a command called "myDNA" that will do this for me instead of
inserting manually \discretionary{}{}{} after every character.
Is this possible? I have looked around the web and there wasnt much helpful
information on this topic (at least not any I could understand) and I hoped
that you could help.
--edit
To clarify:
What I want to see in the finished document is something like this:
the dna sequence is CTAAAGAAAACAGGACGATTAGATGAGCTTGAGAAAGCCATCACCACTCA
AATACTAAATGTGTTACCATACCAAGCACTTGCTCTGAAATTTGGGGACTGAGTACACCAAATACGATAG
ATCAGTGGGATACAACAGGCCTTTACAGCTTCTCTGAACAAACCAGGTCTCTTGATGGTCGTCTCCAGGT
ATCCCATCGAAAAGGATTGCCACATGTTATATATTGCCGATTATGGCGCTGGCCTGATCTTCACAGTCAT
CATGAACTCAAGGCAATTGAAAACTGCGAATATGCTTTTAATCTTAAAAAGGATGAAGTATGTGTAAACC
CTTACCACTATCAGAGAGTTGAGACACCAGTTTTGCCTCCAGTATTAGTGCCCCGACACACCGAGATCCT
AACAGAACTTCCGCCTCTGGATGACTATACTCACTCCATTCCAGAAAACACTAACTTCCCAGCAGGAATT
just plain linebreaks, without any hyphens. The DNA sequence will be one
long string without any spaces or anything but it can break at any point.
This is why my idea was to inesert a "\discretionary{}{}{}" after every
character, so that it can break at any point without inserting any hyphens.
This takes a string as an argument and calls \discretionary{}{}{} after each character. The input string stops at the first dollar sign, so you should not use that.
\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString}
\def\xHyphenate#1#2\wholeString {\if#1$%
\else\say{#1}\discretionary{}{}{}%
\takeTheRest#2\ofTheString
\fi}
\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}
\def\say#1{#1}
You’d call it like \hyphenateWholeString{CTAAAGAAAACAGGACG}.
Instead of \discretionary{}{}{} you can also try \hspace{0pt}, if you like that more (and are in a latex environment). In order to align the right margin, I think you’d need to do some more fine tuning (but see below). The effect is of course minimised by using a font of fixed width.
Revision:
\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString\unskip}
\def\xHyphenate#1#2\wholeString {\if#1$%
\else\transform{#1}%
\takeTheRest#2\ofTheString\fi}
\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}
\def\transform#1{#1\hskip 0pt plus 1pt}
Steve’s suggestion of using \hskip sounds like a very good idea to me, so I made a few corrections. Note that I’ve renamed the \say macro and made it more useful in that it now actually does the transformation. (However, if you remove the \hskip from \transform, you’ll also need to remove the \unskip in the main macro definition.
Edit:
There is also the seqsplit package which seems to be made for printing DNA data or long numbers. They also bring a few options for nicer output, so maybe that is what you’re looking for…
Debilski's post is definitely a solid way to do it, although the \say is not necessary. Here's a shorter way that makes use of some LaTeX internal shortcuts (\#gobble and \#ifnextchar):
\makeatletter
\def\hyphenatestring#1{\xHyphen#te#1$\unskip}
\def\xHyphen#te{\#ifnextchar${\#gobble}{\sw#p{\hskip 0pt plus 1pt\xHyphen#te}}}
\def\sw#p#1#2{#2#1}
\makeatother
Note the use of \hskip 0pt plus 1pt instead of \discretionary - when I tried your example I ended up with a ragged margin because there's no stretchability. The \hskip adds some stretchable glue in between each character (and the \unskip afterwards cancels the extra one we added). Also note the LaTeX style convention that "end user" macros are all lowercase, while internal macros have an # in them somewhere so that users don't accidentally call them.
If you want to figure out how this works, \#gobble just eats whatever's in front of it (in this case the $, since that branch is only run when a $ is the next char). The main point is that \sw#p is only given one argument in the "else" branch, so it swaps that argument with the next char (that isn't a $). We could just as well have written \def\hyphenate#next#1{#1\hskip...\xHyphen#te} and put that with no args in the "else" branch, but (in my opinion) \sw#p is more general (and I'm surprised it's not in standard LaTeX already).
There is a contrib package on CTAN that deals with typesetting DNA sequences. It does a little more than just line-breaking, for example, it also supports colouring. I'm not sure if it is possible to get the output you are after though, and I have no experience in the DNA-sequence-typesetting area, but is one long string the most readable representation?
Assuming your string is the same, in your preamble, use the \newcommand{}{}. Like this:
\newcommand{\myDNA}{blah blah blah}
if that doesn't satisfy your requirements, I suggest:
2. Break the strings down to the smallest portion, then use the \newcommand and then call the new commands in sequence: \myDNA1 \myDNA2.
If that still doesn't work, you might want to look at writing a perl script to satisfy your string replacement needs.

LaTeX sometimes puts too much or too little space after periods

LaTeX tries to guess whether a period ends a sentence, in which case it puts extra space after it.
Here are two examples where it guesses wrong:
I watched Superman III. Then I went home.
(Too little space after "Superman III.".)
After brushing teeth etc. I went to bed.
(Too much space after "etc.".)
Note that it doesn't matter how much whitespace you use in the LaTeX source since LaTeX ignores that.
I found the answer here: http://john.regehr.org/latex/. Excerpt:
When a non-sentence-ending period is to be followed by a space, the space must be an explicit blank.
So the second example should be:
After brushing teeth etc.\ I went to bed.
The converse of this problem happens when a capital letter precedes a sentence-ending period in the input, as in the first example.
In this case LaTeX assumes that the period terminates an abbreviation and follows it with inter-word space rather than inter-sentence space.
The fix is to put "\#" before the period.
So the first example should be
I watched Superman III\#. Then I went home.
A handy way to find this error is:
grep '[A-Z]\.' *.tex
You can sidestep the spacing issue if you prefer single spaces at the end of sentences: put \frenchspacing on (for older versions of Latex this was a fragile command). Knuth was following the traditional naming in calling it French spacing, although calling double spacing after sentences French spacing has become dominant in publishing.
Dirk Margulis wrote a nice post summarising some of the reasons for the prevalance of single spacing: Space between sentences.
I like the answer from dreeves and the handy search he suggests too. I don't have the Stackoverflow "rep" points to comment, but...
Since lines in raw *.tex tend to be very long, the output from grep can be overwhelming (i.e., entire paragraphs); I suggest a variation to only display the words ending in '[A-Z].' (followed by one or more space, followed by a new capitalized word). It is,
grep -o -E '[A-Z]+\. +[A-Z]+[A-Za-z]+' *.tex

Latex multicols. Can I group content so it won't split over cols and/or suggest colbreaks?

I'm trying to learn LaTeX. I've been googling this one for a couple days, but I don't speak enough LaTeX to be able to search for it effectively and what documentation I have found is either too simple or goes way over my head (http://www.uoregon.edu/~dspivak/files/multicol.pdf)
I have a document using the multicol package. (I'm actually using multicols* so that the first col fills before the second begins instead of trying to balance them, but I don't think that's relevant here.) The columns output nicely, but I want to be able to indicate that some content won't be broken up into different columns.
For instance,
aaaaaaaa bbbbbbb
aaaaaaaa bbbbbbb
aaaaaaaa
ccccccc
bbbbbbbb ccccccc
That poor attempt at ascii art columns is what's happening. I'd like to indicate that the b block is a whole unit that shouldn't be broken up into different columns. Since it doesn't fit under the a block, the entirety of the b block should be moved to the second column.
Should b be wrapped in something? Is there a block/float/section/box/minipage/paragraph structure I can use? Something specific to multicol? Alternatively is there a way that I can suggest a columnbreak? I'm thinking of something like \- that suggests a hyphenated line break if its convenient, but this would go between blocks.
Thanks!
Would putting the text inside a minipage not work for this?
\begin{minipage}{\columnwidth}
text etc
\end{minipage}
Forcing a column break is as easy as \columnbreak.
There are some gentler possibilities here.
If you decide to fight LaTeX algorithms to the bitter end, there is also this page on preventing page breaks. You can try the \samepage command, but as the page says, "it turns out to be surprisingly tricky".

Resources