Stripping all formatting from Word doc

K

Kurt

Office 2004. Might be a stupid question, but what's the easy to convert
a document to plain text. I've tried saving as plain text, but I still
get a somewhat formatted document. I want to strip all the MS code from
it.
As it is now, I copy and paste into TextEdit, where this can be easily
done.
 
C

CyberTaz

Are you perhaps being deceived by the fact that the text file remains open
on screen and continues to "appear" to retain its formatting?

If you use Save As and select any of the 'Text...' options the formatting is
removed as per the warning that pops up when you click Save. Close the file
& reopen it and you will see that it no longer retains any formatting.

You can also Select All and use the Formatting Palette to 'Clear Formatting'
by clicking the drop-down arrow of the 'Current style of selected text'
display field. This, however, actually reformats to Normal using your
default font & font size, whereas the Save As file types convert the text to
Courier.

HTH |:>)
 
H

Helpful Harry

CyberTaz said:
Are you perhaps being deceived by the fact that the text file remains open
on screen and continues to "appear" to retain its formatting?

If you use Save As and select any of the 'Text...' options the formatting is
removed as per the warning that pops up when you click Save. Close the file
& reopen it and you will see that it no longer retains any formatting.

You can also Select All and use the Formatting Palette to 'Clear Formatting'
by clicking the drop-down arrow of the 'Current style of selected text'
display field. This, however, actually reformats to Normal using your
default font & font size, whereas the Save As file types convert the text to
Courier.

HTH |:>)

Using Save As to a Text file doesn't retain any font information, so
the font you see when re-opening the file in any application is simply
that application's default font.



Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
 
K

Kurt

What does "somewhat formatted" mean?

Bill

Not sure how to describe this, but cutting and pasting Word plain text
into say, my web program, GoLive. produces text that I need to reformat
to the existing CSS. The same text cut and pasted from TextEdit assumes
the CSS definitions of where I paste it into.
Wonder why there should be any difference between plain text in Word and
the same in TextEdit?
 
H

Helpful Harry

Kurt said:
Not sure how to describe this, but cutting and pasting Word plain text
into say, my web program, GoLive. produces text that I need to reformat
to the existing CSS. The same text cut and pasted from TextEdit assumes
the CSS definitions of where I paste it into.
Wonder why there should be any difference between plain text in Word and
the same in TextEdit?

TextEdit is a text editor - it doesn't format the document when you
open it.

Word on the other hand will reformat the Text document back into a Word
document when you open it - so when you copy it you're not copying only
the "text", but Word format text instead. Then when you paste it into
GoLive you're actually pasting formated / styled text, not plain text.


Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
 
K

Kurt

Helpful Harry said:
TextEdit is a text editor - it doesn't format the document when you
open it.

Word on the other hand will reformat the Text document back into a Word
document when you open it - so when you copy it you're not copying only
the "text", but Word format text instead. Then when you paste it into
GoLive you're actually pasting formated / styled text, not plain text.
That's what I figured was going on.
So it is not possible to completely strip all the formatting out of Word
in order to cut and paste plain text without going through an
intermediary like TextEdit. Not the end of the world, but inefficient.

I did find a great script for InDesign that removes all the formatting
from Word when I cut and paste, but will have to see if one might exist
for GoLive.
 
H

Helpful Harry

Kurt said:
That's what I figured was going on.
So it is not possible to completely strip all the formatting out of Word
in order to cut and paste plain text without going through an
intermediary like TextEdit. Not the end of the world, but inefficient.

You can't open a plain text document in Word - it reformats it to
"Word" format when it opens. The same probably happens in AppleWorks
and most other word processor applications.

To get the plain text you either need to open the document in a text
editing application ready for copy / paste OR use the Import command
(aka Place in Adobe applications) to import the plain text into the
destination application ... BUT in the case of web design applications
it's even easier (see below).


I did find a great script for InDesign that removes all the formatting
from Word when I cut and paste, but will have to see if one might exist
for GoLive.

Technically, an HTML document for a web page *IS* just a plain text
document, so once you've saved the Word file as plain text you can
simply go directly to GoLive use the Open command and the text will
appear in a new window ready for copy / paste or reformatting as a web
page - no need for any intermediate application at all. :eek:)

You could even save the original document as an HTML version from Word
if you wanted to keep the styling (bold, italic) and SIMPLE layout.
Then you can open that HTML document in GoLive as well.



Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
 
K

Klaus Linke

You can't open a plain text document in Word - it reformats it to
"Word" format when it opens. The same probably happens in
AppleWorks and most other word processor applications.

That's not so. Word won't touch the text, apart from some control characters
(namely ASCII 7, and maybe carriage returns/line feeds ... most "simple text
editors" mess up a lot more).
If you save in the original encoding, and then do a byte-for-byte compare
with the original file, you're unlikely to find a difference.

To get the plain text you either need to open the document in a text
editing application ready for copy / paste OR use the Import command
(aka Place in Adobe applications) to import the plain text into the
destination application ... BUT in the case of web design applications
it's even easier (see below).

Plain simple text editors don't know anything but plain text, and don't put
anything but plain text on the clipboard. Word puts stuff on the clipboard
in a variety of formats, and the application you paste into gives you a
choice on what to paste, or simply makes the decision for you (as Matt
said).

Regards,
Klaus
 
H

Helpful Harry

"Klaus Linke" said:
That's not so. Word won't touch the text, apart from some control characters
(namely ASCII 7, and maybe carriage returns/line feeds ... most "simple text
editors" mess up a lot more).
If you save in the original encoding, and then do a byte-for-byte compare
with the original file, you're unlikely to find a difference.

The file itself IS a plain text version ... BUT once you open it in
Word again it becomes a Word document (while it's on-screen or if saved
in another file format), so any copying of the text also takes the
formatting, which in the case of a plain text file is the default font,
size, etc.



Plain simple text editors don't know anything but plain text, and don't put
anything but plain text on the clipboard. Word puts stuff on the clipboard
in a variety of formats, and the application you paste into gives you a
choice on what to paste, or simply makes the decision for you (as Matt
said).

Most applications will simply paste in the same as was copied (within
the application's limits of course). Copying the text in Word which
includes the font, size, etc. as above, which will then paste into
GoLive using that formatting (where possible).

Copying the text from a Text Editor obviously doesn't contain any
formatting, so GoLive will paste it using it's own defaults. You'll get
the same effect if you simply open the plain text document into GolIve
itself. BUT if you paste the plain text into the middle of an
established Style section within a GoLive page, then the new plain text
will pick up the formatting of that style.


Just as another example, if you copy some text from Word and then paste
it into a PageMaker document (and somehow manage to do it without
PageMaker crashing!), then PageMaker not only picks up the text and
formatting, but also any defined stlyes (eg. Normal, Heading, etc.) ...
which can be a totla pain in the behind - that's why I always save the
Word document as plain text and then Place it into PageMaker and redo
the formatting there.


Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
 
K

Kurt

I did find a great script for InDesign that removes all the formatting
from Word when I cut and paste, but will have to see if one might exist
for GoLive.

Technically, an HTML document for a web page *IS* just a plain text
document, so once you've saved the Word file as plain text you can
simply go directly to GoLive use the Open command and the text will
appear in a new window ready for copy / paste or reformatting as a web
page - no need for any intermediate application at all. :eek:)

You could even save the original document as an HTML version from Word
if you wanted to keep the styling (bold, italic) and SIMPLE layout.
Then you can open that HTML document in GoLive as well.[/QUOTE]


Say I have an area in a GoLive page, that is already CSS formatted.
With plain text copied from TextEdit, the pasted text (pasted after a
letter or two of existing text, to help preserve the formatting)
assumes the CSS attributes.

The same text from Word pasted in the same way produces generic text,
that needs to be reformatted with CSS.

I generally get website text from clients in Word, created by people who
have various skill levels. Some over-format, others use tabs, spaces and
line breaks in ways you can't even imagine. (Most use this method)

My job is to get this easily into simple Ascii text so that I can fix
the line breaks and place it where I need to in the html.
 
K

Kurt

Helpful Harry said:
Just as another example, if you copy some text from Word and then paste
it into a PageMaker document (and somehow manage to do it without
PageMaker crashing!), then PageMaker not only picks up the text and
formatting, but also any defined stlyes (eg. Normal, Heading, etc.) ...
which can be a totla pain in the behind - that's why I always save the
Word document as plain text and then Place it into PageMaker and redo
the formatting there.

And InDesign has a script that strips all that formatting anyway, so not
a problem. Indesign also will pick up the same formatting, (doesn't
crash, either), as you remark, all the bloat that comes with it can
often be far more work to remove than it would have been to bring in
plain text and format.

Am I the only one still lamenting that MS should have stopped at Word 5?
Fast startup, fine features, and easy to edit.
 
B

Beth Rosengard

I don't see this using Dreamweaver. I recently wrote a page of text in Word
and when I copied and pasted it into my DW document (formatted via an
external CSS), it immediately assumed the correct destination formatting.
The only thing I needed to do was apply Heading styles, which I hadn't done
in Word.

Beth
 
K

Klaus Linke

Hi again [comments inline],

Helpful Harry said:
The file itself IS a plain text version ... BUT once you open it in
Word again it becomes a Word document (while it's on-screen
or if saved in another file format), so any copying of the text
also takes the formatting, which in the case of a plain text file
is the default font, size, etc.

Maybe I'm splitting hairs here, but Word don't mess with the text file
(bytes) until and unless you save in *.doc format.
And it copies the text to the clipboard in a variety of formats, including
plain text. That other programs that you paste into use the richest format
available as a default may be a good thing, but that some don't allow you to
paste the plain text version at all is a shortcoming of those programs, not
of Word.
Most applications will simply paste in the same as was copied (within
the application's limits of course). Copying the text in Word which
includes the font, size, etc. as above, which will then paste into
GoLive using that formatting (where possible).

See above. It works that way in Windows, and I'm pretty sure the Mac
clipboard also contains different formats after you copy in Word.
Copying the text from a Text Editor obviously doesn't contain any
formatting, so GoLive will paste it using it's own defaults. You'll get
the same effect if you simply open the plain text document into GolIve
itself. BUT if you paste the plain text into the middle of an
established Style section within a GoLive page, then the new plain text
will pick up the formatting of that style.


Just as another example, if you copy some text from Word and then paste
it into a PageMaker document (and somehow manage to do it without
PageMaker crashing!), then PageMaker not only picks up the text and
formatting, but also any defined stlyes (eg. Normal, Heading, etc.) ...
which can be a totla pain in the behind - that's why I always save the
Word document as plain text and then Place it into PageMaker and redo
the formatting there.

No argument here. But PageMaker (and GoLive) could offer the option to paste
the plain text version quite easily, since it is on the clipboard.

Greetings,
Klaus
 
H

Helpful Harry

"Klaus Linke" said:
Hi again [comments inline],

Helpful Harry said:
The file itself IS a plain text version ... BUT once you open it in
Word again it becomes a Word document (while it's on-screen
or if saved in another file format), so any copying of the text
also takes the formatting, which in the case of a plain text file
is the default font, size, etc.

Maybe I'm splitting hairs here, but Word don't mess with the text file
(bytes) until and unless you save in *.doc format.

Yes, but I'm not talking about the saved file.

When you open the text file, Word (or most applications that aren't
"text only") converts it IN MEMORY to Word format using Word's default
fonts, sizes, etc. The ON-SCREEN is then in Word format, regardless of
what format the saved file you opened is in - you can add tables,
images, play with fonts, etc. and Word won't complain ... UNTIL you try
and save it back to that original text format. That means when you
select the text in the window and copy it, you're actually copying the
Word formatted version, not the plain text version that's still stored
on the disk.



Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
 
K

Klaus Linke

Maybe I'm splitting hairs here, but Word don't mess with the text file
Yes, but I'm not talking about the saved file.

When you open the text file, Word (or most applications that aren't
"text only") converts it IN MEMORY to Word format using Word's default
fonts, sizes, etc. The ON-SCREEN is then in Word format, regardless of
what format the saved file you opened is in - you can add tables,
images, play with fonts, etc. and Word won't complain ... UNTIL you try
and save it back to that original text format.

As long as you just edit the text ( and don't format something in ways that
can't be saved in plain text format), Word will save in the same (text)
format you opened, without any warning (since there's no formatting to
loose) and without any changes.
That means when you select the text in the window and copy it, you're
actually copying the Word formatted version, not the plain text version
that's still stored on the disk.

As I said, in Windows Word puts the text on the clipboard in a variety of
formats (HTML, RTF, plain text, and a couple more), and the application you
paste into makes a choice, or lets the user choose, the format it pastes.

Maybe we are just looking at the same thing from two different perspectives.
For the case at hand it doesn't make much difference.

I just posted because there's a widespread misconception that Word is a
lousy text editor. Quite the contrary in my experience. If it weren't for
ASCII 7 (which Word needs for end-of-cell markers in tables), you could
probably edit/patch executables in Word, save, and run them, much like using
a hex editor.

Regards,
Klaus
 
E

Elliott Roper

Am I the only one still lamenting that MS should have stopped at Word 5?
Fast startup, fine features, and easy to edit.

Hell no! Fond memories of 5 is a pre-requisite for posting here.

Cranking up an old Mac and a copy of 5 is therapeutic. It did not seem
as wonderful as I remembered it when last I did so. I hated Word 5 for
its bloat when it first came out.

Word has completely lost its way. Sadly, I don't think there is any
going back. The whole point of Office on Mac is to permit interworking
with Windows versions.

I'd like to see a Cocoa word processor that would read and write a
subset of Word's upcoming XML based document format. I'd really like to
see it from Microsoft's Mac BU, so they have some chance of keeping it
in step with the full product.
If that were done in the /spirit/ of Word 5, I would not complain.
 
D

Daiya Mitchell

Thanks, Beth. I've been sitting on a snarky comment about how this works
perfectly in Dreamweaver for a day or so..... :) In fact, one of the
reasons I quickly abandoned GoLive was because this didn't work so
transparently (I think). But I'm guessing these people are committed to the
Adobe CS and thus GoLive. (heading styles come over great, by the way,
though in fact, it doesn't all work perfectly :)

GoLive, however, does not do as good a job with this. GoLive uses <b>
instead of <strong>. Heading styles came over fine in both programs--DW
dropped style-applied bullets entirely but GL brought them as direct
formatting (argh)

Interestingly, copying plain text from TextEdit got me a whole bunch of <br>
instead of <p> in both programs, definitely not what I wanted. But I think
the OP is doing something rather complicated--I'm really not sure how the
text can assume existing attributes unless you are pasting a few sentences
into an existing paragraph. I think this might depend on the formatting
that is wanted and the exact process.

Daiya
 
K

Kurt

Daiya Mitchell said:
Thanks, Beth. I've been sitting on a snarky comment about how this works
perfectly in Dreamweaver for a day or so..... :) In fact, one of the
reasons I quickly abandoned GoLive was because this didn't work so
transparently (I think). But I'm guessing these people are committed to the
Adobe CS and thus GoLive. (heading styles come over great, by the way,
though in fact, it doesn't all work perfectly :)

GoLive, however, does not do as good a job with this. GoLive uses <b>
instead of <strong>. Heading styles came over fine in both programs--DW
dropped style-applied bullets entirely but GL brought them as direct
formatting (argh)

I use CSS almost exclusively these days, and do prefer GL to Dreamweaver
for my workflow. (my design company manages quite a few sites).
But also considering that we do a lot of print production, with many
elements that eventually go to the web or html formatted email, CS makes
it easy to integrate all the Adobe products.
Interestingly, copying plain text from TextEdit got me a whole bunch of <br>
instead of <p> in both programs, definitely not what I wanted. But I think
the OP is doing something rather complicated--I'm really not sure how the
text can assume existing attributes unless you are pasting a few sentences
into an existing paragraph. I think this might depend on the formatting
that is wanted and the exact process.
I think that has more to do with the original text brought into
TextEdit. I've never had that problem unless I was bringing in text
copied from an email.
 
D

Daiya Mitchell

I think that has more to do with the original text brought into
TextEdit. I've never had that problem unless I was bringing in text
copied from an email.

I saved as Text Only out of Word, opened in TextEdit, then copy and pasted.
Every paragraph ended with a line break. Not that it matters, since I have a
workflow that works for me.

Re your problem, Kurt:
I wonder whether formatting the doc in Word, and then bringing it over might
be an efficient approach? Figure out what GL will bring over as you want it,
and use that formatting, copy directly from Word, then finish off the
formatting in GL. (I've kinda lost track of this thread, but it doesn't
seem to have been resolved)

All formatting in Word set to Normal style and no character formatting
should just come over as a plain paragraph, and it's pretty easy to wipe a
Word doc down to that. What types of CSS formatting are you trying to get
the text to automatically pick up? I'm having a hard time picturing what you
wrote below.

Although, once a workflow is decided upon, it should be easy to automate
whether it's Word or TextEdit, but I would assume using one less program
would be slightly simpler.

Daiya
 
D

Daiya Mitchell

Am I the only one still lamenting that MS should have stopped at Word 5?
Hell no! Fond memories of 5 is a pre-requisite for posting here.

I don't have any. :)
Word has completely lost its way. Sadly, I don't think there is any
going back. The whole point of Office on Mac is to permit interworking
with Windows versions.

Yes, very true. And not necessarily a bad thing, since it helps keep the Mac
platform strong and since so many companies are offering alternatives for
those who don't need to exchange with Windows versions.

Daiya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top