How do I get auto-conversion of Part [[ double-brackets ]] on paste? - parsing

A pet peeve of mine is the use of double square brackets for Part rather than the single character \[LeftDoubleBracket] and \[RightDoubleBracket]. I would like to have these automatically replaced when pasting plain-text code (from StackOverflow for example) into a Mathematica Notebook. I have been unable to configure this.
Can it be done with ImportAutoReplacements or another automatic method (preferred), or will I need use a method like the "Paste Tabular Data Palette" referenced here?
Either way, I am not good with string parsing, and I want to learn the best way to handle bracket counting.
Sjoerd gave Defer and Simon gave Ctrl+Shift+N which both cause Mathematica to auto-format code. These are fine options.
I am still interested in a method that is automatic and/or preserves as much of the original code as possible. For example, maintaining prefix f#1, infix 1 ~f~ 2, and postfix 1 // f functions in their original forms.
A subsection of this question was reposted as Matching brackets in a string and received several good answers.

Not really an answer, but a thread on entering the double [[ ]] pair (with the cursor between both pairs) using a single keystroke occurred a couple of weeks ago on the mathgroup. It didn't help me, but for others this was a solution apparently.
EDIT
to make good on my slightly off-topic first response here's a pattern replacement that seems to do the job (although I have difficulties myself to understand why it should be b and not b_; the latter doesn't work):
Defer[f[g[h[[i[[j[2], k[[1, m[[1, n[2]]]]]]]]]]]] /.
HoldPattern[Part[b, a_]] -> HoldPattern[b\[LeftDoubleBracket]a\[RightDoubleBracket]]
I leave the automation part to you.
EDIT 2
I discovered that if you add the above rule to ImportAutoReplacements and paste your SO code in a notebook in a Defer[] and evaluate this, you end up with a usable form with double brackets which can be used as input somewhere else.
EDIT 3
As remarked by Mr.Wizard invisibly below in the comments, the replacement rule isn't necessary. Defer does it on its own! Scientific progress goes "Boink", to cite Bill Watterson.
EDIT 4
The jury is still out on Defer. It has some peculiar side effects, and doesn't work well on all expressions. try the "Paste Tabular Data Palette" in the toolbag question for instance. Pasting this block of code in Defer and executing gives me this:
It worked much better in another code snippet from the same thread:
The second part is how it looks after turning it in to input by editing the output of the first block (basically, I inserted a couple of returns to restore the format). This turns it into Input. Please notice that all double brackets turned into the correct corresponding symbol, but notice also the changing position of ReleaseHold.

Simon wrote in a comment, but declined to post as an answer, something fairly similar to what I requested, though it is not automatic on paste, and is not in isolation from other formatting.
(One can) select the text and press Ctrl+Shift+N to translate to StandardForm

Related

LaTeX - Define a list of words to use a certain font

Problem
I'm writing an essay/documentation about an application. Within this doc there are a lot of code-words which I want to be highlighted using a different font. Currently I work with:
{\fontfamily{cmtt}\selectfont SOME-KEY-WORD}
Which is a bit of work to use every time.
Question
I'm looking for a way to declare a list of words to use a specific font within the text.
I know that I can use the listings package and define morekeywords which will be highlighted within the listings-environment but I need it in the text.
I thought of something like this:
\defineList{\fontfamily{cmtt}}{
SOME-KEY-WORD-1,
SOME-Key-word-2,
...
}
EDIT
I forgot to mention that I already tried something like:
\def\somekeyword{\fontfamily{cmtt}\selectfont some\_key\_word\normalfont}
which is a little bit better then the first attempt but I still need to use \somekeyword in the text.
EDIT 2
I came upon a workaround:
\newcommand{\cmtt}[1]{{\fontfamily{cmtt}\selectfont #1\normalfont}}
It's a little better then EDIT but still not the perfect solution.
Substitution every time a word occurs, without providing any clues to TeX, might be difficult and is beyond my skills (though I'd be interested to see someone come up with a solution).
But why not simply create a macro for each of those words?
\newcommand\somekeyword{\fontfamily{cmtt}\selectfont SOME-KEY-WORD}
Use like this:
Hello, \somekeyword{} is the magic word!
The trailing {} are unfortunately necessary to prevent eating the subsequent whitespace; even the built-in \LaTeX command requires them.
If you have very many of these words and are worried about maintainability, you can even create a macro to create the macros:
\newcommand\declareword[2][]{%
\expandafter\newcommand%
\csname\if\relax#1\relax#2\else#1\fi\endcsname%
{{\fontfamily{cmtt}\selectfont #2}}%
}
\declareword{oneword} % defines \oneword
\declareword{otherword} % defines \otherword
\declareword[urlspy]{urls.py} % defines \urlspy
...
The optional argument indicates the name of the command, in case the word itself contains characters like . which cannot be used in the name of a command.

How to properly do custom markdown markup

I currently work on a personal writing project which has ended up with me maintaining a few different versions due to the differences of the relevant platforms and output formats I want to support that are not trivially solved. After several instances of me glancing at pandoc and the sheer forest that it represents, I have concluded mere templates don't do what I need, and worse, that I seem to need a combination of a custom filter and writer... suffice to say: messing with the AST is where I feel way out of my depth. Enough so that, rather than asking specific questions of 'how do I do X' here, this is a question of 'is X the right way to go about it, or what is the proper way to do it, and can you give an example of how it ties together?'... so if this question is rather lengthy: my apologies.
My current goal is to have custom markup like the following which is supposed to 'track' which character says something:
<paul|"Hi there">
If I convert to HTML, I'd want something similar to:
<span class="speech paul">"Hi there"</span>
to pop out (and perhaps the <p> tags), whereas if it is just pure markdown / plain text, I'd want it to silently disappear:
"Hi there"
Looking at the JSON AST structures I've studied, it would make sense that I'd want a new structure type similar to the 'Emph' tag called 'Speech' which allows whole blobs of text to be put inside of it with a bit of extra information attached (the person speaking). So something like this:
{"t":"Speech","speaker":"paul","c":[ ... ] }
Problem #1: At the point a lua-filter sees the document, it is obviously already distilled to an AST. This means replacing the items in a manner similar to what most macro expander samples do cannot really work since it would require reading forward. With this method, I just replace bits and pieces in place (<NAME| becomes a StartSpeech and the first solitary > that follows becomes an EndSpeech, but that would make malformed input a bigger potential problem because of silent-ish failures. Additionally, these tags would be completely out of sorts with how an AST is supposed to look.
To complicate matters even further, some of my characters end up learning a secondary language throughout the story, for which I apply a different format that contains a simplified understanding of the spoken text with perspective-characters understanding of what was said. Example:
<paul|"Heb je goed geslapen?"|"Did you ?????">
I could probably add a third 'UnderstoodSpeech' group to my filter, but (problem #2) at this point, the relationship between the speaker, the original speech, and the understood translation is completely gone. As long as the final documents need these values in these respective orders and only in these orders, it is fine... but what if I want my HTML version to look like
"Did you?????"
with a tool-tip / hover-over effect containing the original speech? That would be near impossible to achieve because the AST does not contain that kind of relational detail.
Whatever kind of AST I create in the filter is what I need to understand in my custom writer. Ideally, I want to re-use as much stock functionality of pandoc as possible for the writer, but I don't even know if that is feasible at this point.
So now my question: could someone with great pandoc understanding please give me an example on how to keep relevant data-bits together and apply them in the correct manner? By this I mean show a basic example of what needs to be put in the lua-filter and lua-writer scripts in the following toolchain
[CUSTOMIZED MARKDOWN INPUT] -> lua-filter -> lua-writer -> [CUSTOMIZED HTML5 OUTPUT]

Is there a solution for transpiling Lua labels to ECMAScript3?

I'm re-building a Lua to ES3 transpiler (a tool for converting Lua to cross-browser JavaScript). Before I start to spend my ideas on this transpiler, I want to ask if it's possible to convert Lua labels to ECMAScript 3. For example:
goto label;
:: label ::
print "skipped";
My first idea was to separate each body of statements in parts, e.g, when there's a label, its next statements must be stored as a entire next part:
some body
label (& statements)
other label (& statements)
and so on. Every statement that has a body (or the program chunk) gets a list of parts like this. Each part of a label should have its name stored in somewhere (e.g, in its own part object, inside a property).
Each part would be a function or would store a function on itself to be executed sequentially in relation to the others.
A goto statement would lookup its specific label to run its statement and invoke a ES return statement to stop the current statements execution.
The limitations of separating the body statements in this way is to access the variables and functions defined in different parts... So, is there a idea or answer for this? Is it impossible to have stable labels if converting them to ECMAScript?
I can't quite follow your idea, but it seems someone already solved the problem: JavaScript allows labelled continues, which, combined with dummy while loops, permit emulating goto within a function. (And unless I forgot something, that should be all you need for Lua.)
Compare pages 72-74 of the ECMAScript spec ed. #3 of 2000-03-24 to see that it should work in ES3, or just look at e.g. this answer to a question about goto in JS. As usual on the 'net, the URLs referenced there are dead but you can get summerofgoto.com [archived] at the awesome Internet Archive. (Outgoing GitHub link is also dead, but the scripts are also archived: parseScripts.js, goto.min.js or goto.js.)
I hope that's enough to get things running, good luck!

Sanitize pasted text from MS-Word

Here's my wild and whacky psuedo-code. Anyone know how to make this real?
Background:
This dynamic content comes from a ckeditor. And a lot of folks paste Microsoft Word content in it. No worries, if I just call the attribute untouched it loads pretty. But the catch is that I want it to be just 125 characters abbreviated. When I add truncation to it, then all of the Microsoft Word scripts start popping up. Then I added simple_format, and sanitize, and truncate, and even made my controller start spotting out specific variables that MS would make and gsub them out. But there's too many of them, and it seems like an awfully messy way to accomplish this. Thus so! Realizing that by itself, its clean. I thought, why not just slice it. However, the microsoft word text becomes blank but still holds its numbered position in the string. So I came up with this (probably awful) solution below.
It's in three steps.
When the text parses, it doesn't display any of the MSWord junk. But that text still holds a number position in a slice statement. So I want to use a regexp to find the first actual character.
Take that character and find out what its numbered position is in the total string.
Use a slice statement to cut it from.
def about_us_truncated
x = self.about_us.find.first(regExp representing first actual character)
x.charCount = y
self.about_us[y..125]
end
The only other idea i got, is a regex statement that allows it to explicitly slice only actual characters like so :
about_us([a-zA-Z][0..125]) , but that is definately not how it is written.
Here is some sample text of MS Word junk :
&Lt;! [If Gte Mso 9]>&Lt;Xml>&Lt;Br /> &Lt;O:Office Document Settings>&Lt;Br /> &Lt;O:Allow Png/>&Lt;Br /> &Lt;/O:Off...
You haven't provided much information to go off of, but don't be too leery of trying to build this regex on your own before you seek help...
Take your sample text and paste it in Rubular in the test string area and start building your regex. It has a great quick reference at the bottom.
Stumbled across this
http://gist.github.com/139987
it looks like it requires the sanitize gem.
This is technically not a straight answer, but it seems like the best possible one you can find.
In order to prevent MS Word, you should be using CK Editor's built-in MS word sanitizer. This is because writing regex for it can be very complicated and you can very easily break tags in half and destroy your site with it.
What I did as a workaround, is I did a force paste as plain text in the CK Editor.

When you write TeX source, how do you use your editor's word wrap?

Do you use "hard wrapping" (either yourself or automatically by your editor) by inserting newlines into your source document at a certain line length, or do you write your paragraphs in one continual line and let your editor "soft-wrap" for you?
Also, what editor do you use for this?
Note: I'm interested in how you wrap lines in your TeX source code (.tex file, general prose), not how TeX wraps lines for the final document.
I recently switched to hard-wrapping per sentence (i.e., newline after sentence end only; one-to-one mapping between lines and sentences) for two reasons:
softwrap for a whole paragraph makes typos impossible to spot in version control diffs.
hardwrapped paragraphs look nice until you start to edit them, and if you re-flow a hard wrapped paragraph you end up with a whole bunch of lines changed in the diff for a possibly one word change.
Only wrapping per sentence fixes these two problems:
Small changes are comparatively easy to spot in a diff.
No re-flowing of text, only changes to, insertions of, or removal of single lines.
Looks a bit weird when you first look at it, but is the only compromise I've seen that addresses the two problems of soft and hard wrapping.
Of course, if you're working collaboratively, the answer is to use whatever the other people are using.
I use Emacs (with AUCTeX). After editing or writing a paragraph, I hit M-q to hard-wrap it. It also handles indenting items, and it also formats commented paragraphs. I don't like soft wraps, because they are visually indistinguishable from real newline characters, but behave differently.
I generally let my LaTeX editor softwrap the lines. I think part of it is due to the fact that I had some bad experiences with significant whitespace when I was first learning LaTeX, and part of it is because I don't like heavily-jagged right-margins when I'm editing the text file.
Depending on what os you use, i recommend winedt (windows) and kile (linux). Both of these soft wrap, and there is no need for hard wraps. (That is, i leave my paragraphs as long lines in the source) Latex sorts out line breaks in the output and when i read the source, i use my editor.
The only possible reason to use hard line breaks is to make it easier to find errors in the code (which the compiler indicates by line number) but they are generally not hard to find, if it's mainly text, errors are rare anyway.
Typically I have my editor insert newlines. That is, I try not to hit the "enter" key for a new line, but when the editor soft-wraps, it actually inserts a newline character.
I use vim to accomplish this, and I don't know if other editors have this feature or how they work. In specific, i use the wrapmargin feature.
I typically try to keep my lines of code (TeX or otherwise) at n-characters long for clarity and consistency. I tend to go with 80 characters, but that is up to you.
More vim-related line-breaking docs:
http://www.vim.org/htmldoc/usr_25.html
http://www.vim.org/htmldoc/options.html#%27textwidth%27
I tend to do hard-wrapping with TeX, but that's rooted more in my obsession with text formatting than any real gain of efficiency. One major thing that I don't like about soft-wrapping is that it tends (in my opinion, obviously) to make things harder to read by wrapping in semantically-random places.
Although I would prefer to use soft wrapping I end up using hard wrapping for one practical reason: all of my collaborators do the same. So, when I work on an article with someone it would be a big pain for me to soft wrap while the other person hard wraps. The second reason is that Emacs was until recently able to handle properly on hard wrapping. Emacs 23 which I currently use changes this but it will be a long time before everybody upgrades to 23 so I can sneak soft wrapped texts to them.
The way I actually use hard wrapping is to have auto-fill-mode turned on. Furthermore M-q is bound to LaTeX-fill-paragraph (in the AucTeX mode - but I don't remember if this is a standard binding or one of my bindings - I'm pretty sure it's the latter). Combining these two I manage to keep my TeX source more or less decently formatted.
By the way, I have heard the suggestion to always start a new sentence at the beginning of a line. In other words a period at the end of a sentence should be followed by a hard return. The benefit is that it works well with version control systems since changes to a sentence can remain localized. I think that this is in principle a nice idea but I have not managed to use it because of my obsessive-compulsive usage of M-q.
I use Kile under Linux with hard wrapping (called static word wrap in Kile) because apparently in my work environment everybody do like that. Soft wrapping makes much more sense to me, so if I could choose I would use that rather than hard wrapping.
I work in joe mostly. I from time to time press enter automatically, and if it doesn't look good I press auto-format (ctrl-k j).
Joe has autowrap modes, but I don't even bother.
I use Auctex with automatic line breaking switched off, and insert line breaks by hand. I avoid auto-formatting, since I want as few changes to where line breaks occur between edits to the document, which makes diffs less cluttered.
Using a smarter diff, one that doesn't care about tex-irrelevant whitespace, would be better, but that's the tool I use.
I like Will's suggestion of hard wrapping per sentence. I thought about it before, but I am fixed in my habits.

Resources