Various search and replace clean up expressions for various html editors


Using Notepad++ the following replacement string was useful. Note the n r order. Watch out: a line wrap is not an n r. This is NOT a standard expression, use the extensions radio button. This was post-TextFX Tidy run.

style=\r\n"border-top: 1px solid #6b2394; border-bottom: 1px solid #6b2394; border-left: 1px solid #6b2394; border-right: 1px solid #6b2394"

Using the extending search and replace mode to replace <br /><br /> with a hard return in between. This results from running Tidy on saved Thunderbird email (html format) from Notepad++ with no htmltidy.cfg set up.

<br />\r\n<br />

Was cleaning up an OpenOffice 4 Writer output and found the following useful:

 style=\"[a-z]+\-[a-z]+: [0-9].[0-9]+in; [a-z]+-[a-z]+: [0-9].[0-9]+in\"

While that is clumsy and limited, my skill set is clumsy and limited. To find other expressions see

Stripping OpenOffice Calc spreadsheets in HTML format of special codes

If you prefer to have HTML 4.01 clean pages as verified by TIDY, use HTML-Kit as your prime HTML editor, and prefer to let CSS control your table layout, but often layout complex tables in Calc, then you run into the need to strip OpenOffice's HTML of SDVAL, SDNUM, and other tags. With thanks to starting points on how to use Gvim and various online regex tutorials, here are some other clumsily put together expressions that will work in Chami's HTML-Kit to strip these codes. Feel free to improve these and let me know if you want to share those improvements back.

These regular expressions go in the find box, the replace box is left blank. Check the RegExp check box.

200512282231 regex openoffice code strippers 

Note that I usually pre-run TIDY with a switch set to drop everything to lowercase. The expressions above will not work on uppercase without editing to take uppercase into account. Obviously these could be tighter and more generalized. The repeats could be encoded. And the SDVAL would need a rewrite to handle decimal numbers. When I need such I will work on it. In the meantime these are a starting place for those working in this tiny niche.

Other odd-ends that get cleaned up.

<td><b> <th> </b></td> </th>