Tools for cleaning up Word HTML

Enviado por zac el 2005, Agosto 30 - 1:24pm.
Grupos: Toolbox

A question from Angus Grieve-Smith posted to the Riders-Tech listserv inspired me to post this page. Please add to the list if you know of any tools.

1Suggested by Ozzie Sutcliffe.

2Suggested by Zac Mutrux.

3Suggested by Adam Brin.

More on cleaning Word HTML

Enviado por grvsmth el 2005, Agosto 30 - 7:20pm.

Unfortunately, none of these tools worked. I wound up hacking together a Perl script to do it, and then replacing Microsoft's long style sheet with a ten-line one. Here's the script:


#!/usr/local/bin/perl

# strip yucky MS HTML format

while (<STDIN>) {
$file .= $_;
}

$_ = $file;
s/<span .+?>//sg;
s/</span>//sg;
s/class=\w+//gs;
s/style='.+?'//gs;
s/<td width=\d+/<td/gs;
s/ cellpadding=0//sg;
print;

Toolbox

Techonology tools and resources

Toolbox

  • Debe loguearse o registrarse para contribuir a este grupo.

Navegación

Inicio de sesión de usuario