Word Processing

Current Word Processors are a joke. What we need is an html based Word Processor.

html is a sub-set of sgml - perhaps that is the way to go.

MS Word

I had a small MS Word file, added about 100 bytes in MS Word and saved it. As far as I can tell, both documents were save as Word.Document.8 files. The second file was saved with unicode (2 bytes per character). The html document only took about an hour to clean it up (fix the formatting).

With this kind of performance, why would anyone use MS Word?

(In a typical month, I use MS Word as only a spelling checker. All my other word processing is done with notepad/html.)

Bugs and Features

You won't believe the problems reported at Woody's Watch

Corrupt Templates

MS Word is vulnerable to various viruses and file corruption. If it isn't acting right, try renaming the default template With Office 2000, try With Office XP on Windows XP, try When Normal.dot can not be found, MS Word will simply create a new copy. In general, this will fix missing menu options. In one case I read about, Windows crashed every time MS Word started until the template was deleted.

Normal.dot (the default template)


This is interesting - open a .doc file in notepad and look at what is in it. There are several items I find unacceptable from a privacy point of view. That's right, every time you save an MS Word document, Microsoft hides information inside it that can be used to violate your privacy. Using SaveAs makes no difference - the information in the original file is saved in the new file. It appears that the _PID_GUID is determined by the machine that the file was originally created on. This identifier does not cahnge when you edit the file on another machine.

For instance, I don't want everyone who gets a copy of my resume to know the name of the directory on my machine where it is stored.

Some of this information can be removed via File / Properties. Title, Author, and Company are automatically populated by Word - Just delete them from the Summary tab. Be sure to check the other tabs to make sure that all identifying information is removed.

You think that I am over reacting. Consider this - A company wants to distribute a resume with the applicant's name removed. Most people simply remove the name from the document. Sounds simple. I have received resumes like this. In order to retrieve the applicant's name, you just

The only way I know to delete the history and to get a new _PID_GUID is to copy the contents of the file to the clipboard and to paste it into a new document. Unfortunately, it will still have the computer name and path for the new file.

The "Miscellaneous data from other files" was even more suprising. One of my doc files actually had registry entries in it. My best guess on this is that Word allocates blocks of data and simply writes data to them without first erasing them. Talk about leaking private information to the world.

You have to realize that a 4K text file contains the same number of letters as a 34K Word file. (I've seen the same file change from 23K to 35K just by using SaveAs.)

(I have valid reasons why I prefer to use html for text processing.)


Well, at least you can see the edit codes with this. I no longer use it on a regular basis because my company won't let me. The main reason for that is because our client (the US Government) REQUIRES us to use MS Word. (Well, that's one way to create a monopoly.)


Only about 5 additional tags are necessary to make html a full word processing language Some additional functions that would be nice are

Author: Robert Clemenzi - clemenzi@cpcug.org
URL: http:// cpcug.org / user / clemenzi / technical / WordProcessing.html