Will.Whim

A weblog by Will Fitzgerald

Monthly Archives: April 2005

We th ppl f th US (Compressing the US Constitution)

As promised, a quick test of compressing the US Constitution to compare it to the compression rates seen by Jean Véronis for the European constitution.

Gains de compression - diagramme en bâtons

It’s interesting to note that the compression ratio went down with the addition of the Bill of Rights. Not surprisingly, the other amendments have the biggest compression ratio There are a number of copies of “The Congress shall have power to enforce this article by appropriate legislation.” in the later amendments–I’ve never understood why some amendments have this proviso, and others don’t.

In any case, even at its highest ratio (indicating the most redundant text), the US constitution does well in contrast to the European constitution. And taking the original constitutions head-to-head? No contest: 65% vs. 75%.

Of course, take this with a grain of salt. Compression was done using gzip. By the way, Jean Véronis’s text compression (for his post in French) is 52%. Apparently, he’s a good writer.

Compression as indicator of document quality

Jean Véronis describes an informal experiment in compressing different texts, including the European constitution. (Original in French, Auto-translated to English). He notes that normally, French texts compress at about 60-65% of the original, but the EU constitution compresses at about 75%. He puts this down to “jargon, puffery, redundancy…” in the constitution. I wonder what the US Constitution is (before and after any admendments added after the Bill of Rights). To be determined…

By the way, Google’s translation provider translates “”jargon, baratin, redondance…” as “jargon, sweet talk, redundancy…” Which makes me wonder about the compression ratios achievable on flattery.

Dropping Lambda

Shriram Krishnamurthi, that bad boy of the Scheme community, announced today (April 1) that PLT Scheme plans on removing LAMBDA from PLT Scheme 3.0.

So I posted this note:

A group of us are pretty upset that the PLT Scheme Team has announced plans to drop LAMBDA from PLT Scheme v300.

We agree that dropping FILTER and MAP is a good idea, although for different reasons. We think that (filter P S) is almost always written more clearly as:

(((lambda (f)
 ((lambda (x) (f (x x)))
     (lambda (x) (f (lambda (y) ((x x) y))))))
   (lambda (f)
     (lambda (r)
       (if (null? r) '()
    (if (p (car r))
        (cons (car r) (f (cdr r)))
        (f (cdr r))))))) s)

So, we are announcing that we will be forking the PLT Scheme codebase, into a new project, tentatively named “Lorenzo’s Oil.” Version 1 will remove MAP, etc. We expect Version 2 to further cleanse the language. We are embarrassed, for example, that we left in the IF statements.

— The Lorenzo’s Oil Team

So, are you in, or are you out?