Saturday, May 06, 2006

Anti-Literacy Program

Literate programming is a paradigm for writing and
documenting programs in which the documentation is primary. The
"source" for the program is an essay about how it works, in which all
the code is embedded. The code segments are placed in whatever order
makes sense for the exposition. A program called the tangler
then extracts those segments and puts them in the proper order; you
can think of this as the first pass of the compiler. (Why Donald Knuth,
the inventor of literate programming, did not choose the term
"untangler" has always been a mystery to me, but
perhaps he was deliberately trying to point out that the order
compilers want is not necessarily the best. As a Lisp programmer,
I've never felt that constrained by the compiler, so the point is lost
on me.)



Meanwhile, the essay can be written as ordinary text, HTML, or LaTeX.
Writing it in a WYSIWYG system such as Word would be more of a
challenge for the tangler, but that's not what I want to talk about
here.



I have flirted with literate programming for the last couple of years,
and wrote a Lisp-based system ("LitLisp") to support it. I don't
recommend my system, really, but the manual is available in case
anyone is interested in looking at it.



My conclusion after writing (or at least documenting) several programs
using literate programming is that it just doesn't work. Here are my
reasons:

  1. During program development, I tend to build a partial solution to a
    problem, then realize it's wrong and discard it or turn it inside
    out. It's very hard to force yourself to write a bunch of prose
    during this process; not only is the writing mostly wasted, it
    slows down your thought processes.

  2. It may or may or may not be unnatural for the only copy of a
    program to be a paper about that program. It is certainly
    impossible for it to be the only paper about it. For this
    reason, LitLisp allows two paradigms: the main representation is
    the essay, from which the program is extracted; or the main
    representation is the program, from which the essay may quote. If
    done right, the reader can't tell which paradigm was used. But
    this seemingly attractive idea requires the code in the paper to be
    marked up with all possible fragments one later quotes in some
    paper or other. (Remember, paper #1 is the only stable
    representation of the code to be quoted in paper #2 or #14.)

  3. A paper about a complex algorithm often presents several versions
    of the algorithm, starting with a sketch, and gradually adding
    complexity, until something like the final version is reached. How
    do we express the actual algorithm this way? In some cases the
    preliminary version of a program can be presented as the final
    version with some segments postponed to a later section. But there
    are many cases where some of the program fragments are "fake," in
    the sense that they appear in the paper as if they were part of the
    final program, but in fact they aren't. The literate-programming
    system must incorporate devices for indicating "versioning"
    information (e.g., that this segment is version 3 of a function,
    with pointers to versions 2 and 4). I gave up on LitLisp before
    trying to implement this.

  4. Perhaps this objection is due to my being mired in obsolete habits,
    but here it is anyway: I find it hard to think of the original
    essay as the "real" program. It's easier to look at the
    unscrambled version, which appears in the order God meant programs
    to appear in. Making a change to the program then requires going
    back to the essay and finding the place in the essay where (e.g.)
    you put the damned special-variable declarations. If one yields to
    the temptation to make a little change in the program and fix it in
    the essay later, one is on the road to perdition. I once forgot
    for several weeks that I was committing this sin, and had to
    painstakingly merge lots of little program changes back into my
    essay.

  5. There is a tension between two purposes of one's essay: to
    enlighten humans or to document the program. If you try for both,
    then you run into the temptation to hide big boring chunks of the
    code. LitLisp provides facilities to do that, but using them means
    departing from the basic literate-programming idea. The hidden
    parts of the program can be seen in the source for the essay, but
    not the essay itself. That means you have to work with two
    documents, one of which (the source) looks suspiciously like the
    unadorned code one was trying to avoid.

  6. One word: CVS! Has anyone ever even attempted to use literate
    programming in concert with a group of collaborators? I have trouble
    imagining this scenario, but I would like to hear from anyone who
    tried it or actually made it work.

8 comments:

Anonymous said...

Axiom is a large, general purpose
computer algebra system written in
common lisp. It was converted to
use literate programming years ago
in an effort to merge the research
work on an algorithm with the code
that implements the algorithm.

We currently use noweb but I'm
reaching the stage of writing our
own version of noweb in lisp.

Would you care to share your effort
and save me some work?

Tim Daly
daly@axiom-developer.org
Axiom Lead Developer

GratefulFrog said...

I've also done some litterate programming in Common Lisp, and built some add-ons for noweb to make it better understand lisp definitions (available at http://gratefulfrog.net)

I sympathize with the author of this blog, but disagree. I believe that the "cost" of a literate program is less than or at most equal to the cost of writing a properly documented system. When writing code in say lisp, my average comments/code line ratio is 5 to 1. Most of these comments are detailed explanation around the code, with nearly none embedded in the functions. The use of noweb, for example, makes it easy to write excellent documentation, whereas commenting in code files is far more difficult and produces inferior results.

But, the underlying issue, in my opinion, is the expected lifecycle of your system. I say system because I'm not talkking about a few lines of script. A system can live for a long time. It will require updating, by the original authors and by others. My experience in sw project mgt has taught me the cost of maintaining undocumented stuff, even (especially) my own stuff! Today, I feel that its just not worth writing any code, even a short script, or a simple makefile, without documenting it in a fully professional manner. Literate programming is in my opinion, just that.

I use the tool chaine: noweb, emacs, LaTeX. It was very easy to add onto Norman Ramesy's noweb so that it is more adapted to Lisp, i.e. autodefs and finduses set-up for Lisp syntax. Check my home page for downloads.

I see no need to re-write noweb in Lisp. What is your goal there?

Anonymous said...

I've considered myself a literate lisp programmer for some time,
(shortly after the original article on literate programming),
but I almost immediately gave up on using any software to do
tangling or formatting. I consider the benefits of those things
to be small. The source is read by the (unmodified) lisp reader.

The real difference between my previous and current programs is
large extended comments that describe what I'd call design:
what I'm trying to do, why, alternative approaches I explored and
their relative merits. I now view this as just proper documentation.
It's exactly what will be useful to someone who later wants to
understand or modify the program.

I would like to address your argument about building a partial
solution and then realizing it's wrong. I try to write the
design documentation first and the code later. Often, while
doing the former, I find bugs in the design and have to change it.
That's not a bad thing. I view writing the documentation as part of
thinking about the design. Admittedly there's some cost to writing
it down, but I think there's also a benefit, even in the design
process itself, especially if the design itself is large.
Perhaps what you're saying is that you write exploratory code as
part of a conscious design process. In that case I would not
complain if you failed to document the design of the exploratory
code. But I'd hope that you would write down what you learned
from the exploration.

Admittedly, the process of maintenance is somewhat different.
The usual case is a small incremental change. If it's mostly
independent of the design then I think you're justified in simply
adding a small amount of incremental documentation, typically
describing the original code and what was wrong with it. If the
change represents a change to a small part of the design, it may
be reasonable to add a similar increment to the documentation,
describing what was wrong with the previous design and how you
now fix it. The problem is that a large number of increments
imposes a large cognitive cost on the reader. Since the reader
is most often the maintainer, at some point it becomes worth
while for the maintainer to go back and improve the design
documentation, which often results in improving the design itself.

(That's more than I intended to write. Sorry if it's more than
you wanted to read.)

airfoyle said...

[gratefulfrog]
> I see no need to re-write noweb in
Lisp. What is your goal there?

I don't remember! Either it was just an
attempt to see how easy it would be to
define a literate-programming system on
top of my existing text-Lisp
preprocessor, or there was some feature
I wanted that noweb didn't provide. It
turned out that my existing preprocessor
just got in the way (but it's still
there). It also turned out that once I
understood how tanglers worked I got
very deeply into adding features. The
result, although a pain to use, provides
a lot of functionality other systems
don't provide — although it leaves
many features out that other people seem
to want.

airfoyle said...

[anonymous]
> (That's more than I intended to
> write. Sorry if it's more than
> you wanted to read.)

Nope. Sometimes I think I'm a very bad
programmer, based on how often I tear
large programs apart and put them back
together again, which bespeaks a basic
inability to anticipate what it is I
really wanted. (I would use the word
"refactoring" here, but I think that's
supposed to be reserved for when a
program's functionality stays the same,
and mine rarely keep the same
functionality over time.)

So a document describing all the
previous iterations of one of my hacks
would read like a nightmare blend of
Proust and Neal Stephenson. I'm afraid
what I would have to do is discard the
entire previous draft and rewrite it.
And I'm supposed to do this _in
flagrante delicto_, with the program
lying on the floor in dysfunctional
fragments? It's not going to happen.

7 days to success said...
This comment has been removed by a blog administrator.
Anonymous said...

I used noweb for a number of Python, C, and C++ projects at my old job. All were source-controlled (mostly SVN, one in Git). We never tried WYSIWIG documentation, which, as you say, would make many things harder to cope with.

noweb almost always is written with code and documentation on different lines. That pretty much makes diffing and VCS work. Off hand, the only problem was line re-wrapping, an annoying but manageable source of noise and occasional conflicts. What problems did you have in mind?

After a few years, everyone was hurting from a lack of a true reverse for the tangler. If there were a utility that took source code files (possibly with chunks marked in some way) and merged it back into the noweb paper, that would be a much easier way to edit source naturally.

One of our projects had modules extensively re-written, which turned out to be slightly difficult to merge. We developed some conventions that helped, but there was general agreement that a reverse tangle merger would greatly lower the cost of LP.

The thing I most disliked was the buggyness of the various Emacs weak multimode systems. But that's me.

airfoyle said...

[anonymous]
What problems did you have in mind?

Well, this is one, although it really falls under my item (4):

>After a few years, everyone was
> hurting from a lack of a true
> reverse for the tangler. If there
> were a utility that took source
> code files (possibly with chunks
> marked in some way) and merged it
> back into the noweb paper, that
> would be a much easier way to
> edit source naturally.

The other problem is that different programmers might have different ways of describing the same piece of code, and find it natural to "untangle" it in different orders. You would have to agree on the composition of the paper at the same time as the composition of the source, which would lead to a huge distraction about issues that might be irrelevant to the actual code being cranked out. If CVS found conflicts between two versions of a file, a careful analysis would be required to determine whether the conflict was about the code, about the description of the code, or both.