Tools for cleaning up Word HTML

Submitted by zac on 2005, August 30 - 1:24pm.
Groups: Toolbox

A question from Angus Grieve-Smith posted to the Riders-Tech listserv inspired me to post this page. Please add to the list if you know of any tools.

1Suggested by Ozzie Sutcliffe.

2Suggested by Zac Mutrux.

3Suggested by Adam Brin.

Site-wide tags:

More on cleaning Word HTML

Submitted by grvsmth on 2005, August 30 - 7:20pm.

Unfortunately, none of these tools worked. I wound up hacking together a Perl script to do it, and then replacing Microsoft's long style sheet with a ten-line one. Here's the script:


#!/usr/local/bin/perl

# strip yucky MS HTML format

while (<STDIN>) {
$file .= $_;
}

$_ = $file;
s/<span .+?>//sg;
s/</span>//sg;
s/class=\w+//gs;
s/style='.+?'//gs;
s/<td width=\d+/<td/gs;
s/ cellpadding=0//sg;
print;

Toolbox

Techonology tools and resources

Toolbox

  • You must login/register in order to contribute to this group.

Navigation

User login