Better writing, editing, and thinking through the power of line breaks
2015/05/13
With *LaTeX and markup languages in general, the (partial) separation of content from format allows us to write in ways that are
 Conceptually powerful; and
 Excellently structured for editing (which is when the best writing most often happens).
In this post, I'll provide some of what I find to be good practices for writing *LaTeX documents.
Writing is coding:
A framing first: written text is code—an algorithm intended to elicit specific thinking (which I'll say includes emotion, a fundamental kind of thinking) or a variety of possible thoughts in a reader's mind. With computer programs, we have the clean distinction of human readable/editable code and resultant compiled binaries. For written text, and perhaps surprisingly given we're now talking about people communicating with each other and not machines, being human editable and being human readable are also not exactly the same thing.
Linebreakfulness:
Let's start with a most excellent implement: \n, the line break.
A simple, vital element of writing
enabled by *LaTeX is that we can start each
new sentence or phrase
on a separate line.
(This paragraph provides an example,
even if we're in textwrapping htmlspace.)
The benefits are that it's then:
 easy to move sentences and phrases around (Emacs and other powerful text editors make this a pleasure); and
 reflective of the actual structure of the thought process that goes into writing. Phrases are a natural base unit, so breaking at commas and semicolons makes sense, and long phrases should have carriage returns applied liberally. When line breaks are used well, sentences and phrases are clearly rendered as the core material of a text.
Note: Emacs and presumably other editors can be extended to make line breaks occur automatically, and to repack paragraphs with line breaks after each phraseending element.
My PhD advisor Dan Rothman pointed out the blocking by sentences idea and, over time, I've found many kinds of *LaTeX structures can be laid out in ways I find better for writing, rewriting, and, inextricably, thinking.
I'll go through an example for equations and then add a few examples of other environments and elements.
Equations:
Here's an initial form of an equation for the JensenShannon divergence from one of our papers on Google Books:
\begin{equation}D_{JS,i}(PQ) = m_i\log m_i + \frac12\left(p_i\log p_i+q_i\log q_i\right).\end{equation}
Here's the output which is in decent shape: \begin{equation}D_{JS,i}(PQ) = m_i\log m_i + \frac12\left(p_i\log p_i+q_i\log q_i\right).\end{equation}
The LaTeX code is compact, does the job, but is difficult to read and edit. Let's help ourselves (the machines will be fine) and step through some improvements.
First, we need to separate the environment, indent the equation, and add a label for potential referencing:
\begin{equation}
D_{JS,i}(PQ) = m_i\log m_i + \frac{1}{2}\left(p_i\log p_i+q_i\log q_i\right).
\label{JSequation}
\end{equation}
I like to add the label at the end of environments
that use them (figures, tables, etc.).
I've also added curly braces to the
\frac
command;
\frac{1}{2}
is clearer and allows for more complicated
arguments.
As for sentences, we can deploy line breaks to leave
the equation both easier to read and edit.
Here's a simple start:
\begin{equation}
D_{JS,i}(PQ) =
 m_i \log m_i +
\frac{1}{2}\left(p_i \log p_i + q_i \log q_i \right).
\label{eq:googlebooks.JSequation}
\end{equation}
The main pieces of the equation (blob = blob + blob) now have
their own lines.
But we can do more and break the equation
across lines into its smallest functional units.
We'll do these things:
 Give equalities and operations their own line .
 As for the equation environment, place enclosing bracket structures on separate lines, and allow the editor to indent things nicely.
\begin{equation}
D_{\textrm{JS},i}
(P\,\,Q)
=
 m_i \log_{2} m_i
+
\frac{1}{2}
\left(
p_i \log_{2} p_i
+
q_i \log_{2} q_i
\right).
\label{eq:googlebooks.JSequation}
\end{equation}
The output has changed in a just a few small ways: \begin{equation} D_{\textrm{JS},i} (P\,\,Q) =  m_i \log_{2} m_i + \frac{1}{2} \left( p_i \log_{2} p_i + q_i \log_{2} q_i \right). \end{equation} Both reading and editing are now simple. A few notes:
 As for sentences, we can easily move functional units around by cutting lines or sets of lines. If we wanted to swap the order of $p_i \log_{2} p_i$ and $q_i \log_{2} q_i$, we would just cut and paste lines (some Ck, Cy action).
 I've kept the form $p_i \log_{2} p_i$ together as this is a conceptually clear element for entropy.
 $D_{\textrm{JS},i}$ and $(P\,\,Q)$ are on separate lines to make future editing easier, and we've given the $P$ and $Q$ some breathing room with the small space "\,".
 We've also converted $JS$ to $\textrm{JS}$ so that this subscript is set in normal text rather than math text.
 Last: we've made the $\log$ into $\log_{2}$ to be clear. Again, even when a single term is a subscript or an argument it's best to use curly braces for clarity and future editing.

If we use mildly complex expressions even more than
a few times, it's a good idea to turn them into a command.
We may find we have a general structure that could take
in arguments as well.
So for example we could replace
D_{\textrm{JS},i}(P\,\,Q)
with a command\DJS{P}{Q}
with
in the preamble (I like to have a separate settings file; more on this elsewhere).\newcommand[2]{\DJS}{ D_{\textrm{JS},i} (#1\,\,#2) }

In fact, for any repeated structure, no matter how simple, it's useful
to create a command to simplify wholesale document changes.
For example, if you've been using $d$ for density and decide that
$\rho$ would be a loftier symbol, then search and replace will bring sadness.
Much better to have started off with:
\newcommand{\density}{d}
and then be able to move to\newcommand{\density}{\rho}
with one simple change. 
Along the way, I created a richer reference description for the label.
As a rule, I use this format:
\label{eq:papertag.tag}
\label{fig:papertag.tag}
\label{tab:papertag.tag}
\label{sec:papertag.tag}
\label{subsec:papertag.tag}
where papertag gives a semantically reasonable pointer to the paper. Having this extra level of identification is useful in various ways including (1) being able to search for a certain kind of reference (e.g., just figures), and (2) when combining documents to form, for example, an edited volume or thesis.
All right. Here's a selection of example formats, including a few more equations:
More Equations:
From our charming paper on Limited Imitation Contagion:
In Fig.~\ref{fig:updownrfn_network02}A,
we show an example of a probabilistic response function,
the tent map, which is defined as
$
T_r(x)
=
rx
$
for
$
0 \le x \le \frac{1}{2}
$
and
$
r(1x)
$
for
$
\frac{1}{2} \le x \le 1.
$
While breakable, the ranges for $x$ make for reasonable phrases
so they both stay intact on a single line.
Here's the output:
In Fig. 1A, we show an example of a probabilistic response function, the tent map, which is defined as $ T_r(x) = rx $ for $ 0 \le x \le \frac{1}{2} $ and $ r(1x) $ for $ \frac{1}{2} \le x \le 1. $
From my course Beamerized Principles of Complex Systems, part of a calculation for Herbert Simon's RichgetsRicher model:
Preamble (included in a separate settings file):
\newcommand{\avg}[1]{\left\langle#1\right\rangle}
\newcommand{\simonalpha}{\rho}
Calculation:
$$
\avg{N_{k,t+1}  N_{k,t}}
=
(1\simonalpha)
\left(
(k1)\frac{N_{k1,t}}{t}

k\frac{N_{k,t}}{t}
\right)
$$
becomes
$$
n_k(t+1)n_k t
=
(1\simonalpha)
\left(
(k1)\frac{n_{k1}t}{t}

k\frac{n_{k}t}{t}
\right)
$$
Output: $$ \newcommand{\avg}[1]{\left\langle#1\right\rangle} \newcommand{\simonalpha}{\rho} \avg{N_{k,t+1}  N_{k,t}} = (1\simonalpha) \left( (k1)\frac{N_{k1,t}}{t}  k\frac{N_{k,t}}{t} \right) $$ becomes $$ n_k(t+1)n_k t = (1\simonalpha) \left( (k1)\frac{n_{k1}t}{t}  k\frac{n_{k}t}{t} \right) $$
Figures and Tables:
Here's a draft example figure environment, one spanning two columns
in our
PNAS paper on the positivity of human language (Fig. 3). Fairly simple: centre the figure and then give the
caption plenty of linebreakage.
The long figure name and labels are no problem to handle and mitigate the possibility
of overlap later on (note the paper tag mlhap).
Giving figures long names (lumping tags together) makes finding
them later on (if and when one's memory fails) much simpler (using, for example, locate
).
Table environments can be laid out in the same way, with
some attention paid to tabular environments.
Some good practices foe structuring work directories will appear elsewhere.
\begin{figure*}
\centering
\includegraphics[width=\textwidth]{fighappinessdist_jellyfish_words_havg_multilanguage_example001_noname.pdf}
\caption{
Examples of how word happiness varies little
with usage frequency.
Above each plot is a histogram of average happiness $h_{\rm avg}$
for the 5000 most frequently used words in the given corpus, matching
Fig.~\ref{fig:mlhap.happinessdist_comparison}.
Each point locates a word by its rank $r$ and average happiness
$h_{\textrm{avg}}$,
and we show some regularly spaced example words.
The descending gray curves of these jellyfish plots
indicate deciles for windows of 500 words of
contiguous usage rank,
showing that the overall histogram's form is
roughly maintained at all scales.
The `kkkkkk...' words represent laughter in Brazilian Portuguese,
in the manner of `hahaha...'.
See
Fig.~\ref{fig:mlhap.jellyfish_translated}
for an English translation, Figs.~\ref{fig:mlhap.happinessdist_jellyfish_words_havg_multilanguage001_table1}\ref{fig:mlhap.happinessdist_jellyfish_words_havg_multilanguage001_table4}
for all corpora,
and Figs.~\ref{fig:mlhap.happinessdist_jellyfish_words_hstd_multilanguage001_table1}\ref{fig:mlhap.happinessdist_jellyfish_words_hstd_multilanguage001_table4}
for the equivalent plots for standard deviation of word happiness
scores.
}
\label{fig:mlhap.jellyfish}
\end{figure*}
Okay, that's enough:
Nutshell: line breaks are unexpectedly good friends.
Using them well with sophisticated markup languages will enable faster and (hopefully) better writing and editing.