A question from Angus Grieve-Smith posted to the Riders-Tech listserv inspired me to post this page. Please add to the list if you know of any tools.
- Boxer might do it.1
- Dreamweaver has long offered a "clean up word HTML" feature which works well.2
- HTML Tidy looks useful and is free open source software.2
- Word 2000 can output to "compact HTML" if a free filter is installed.2
- Word 2003 can save to "filtered HTML" that is pretty clean. (Anyone know if Word 2002 can do this?).2
- Wordcleaner was made for this task.2
1Suggested by Ozzie Sutcliffe.
2Suggested by Zac Mutrux.
3Suggested by Adam Brin.

More on cleaning Word HTML
Unfortunately, none of these tools worked. I wound up hacking together a Perl script to do it, and then replacing Microsoft's long style sheet with a ten-line one. Here's the script:
#!/usr/local/bin/perl
# strip yucky MS HTML format
while (<STDIN>) {
$file .= $_;
}
$_ = $file;
s/<span .+?>//sg;
s/</span>//sg;
s/class=\w+//gs;
s/style='.+?'//gs;
s/<td width=\d+/<td/gs;
s/ cellpadding=0//sg;
print;