Untidy Text

U

Uddhava

Dear all,

I have MS Word 2002. Tidy text means one space between each word like this
across the length of a iine. However when I copy
text from a web page it sometimes
comes out like this ie with multiple spaces between words
and
short lines . Or this might happen when
the text I am copying for example here
http://ccbs.ntu.edu.tw/FULLTEXT/JR-PHIL/inada4.htm has short lines
or blank
space at the
start of a line (indentation).

Is there some way to correct this? What I want the text to do is to run
together ie to have one space between each line / cut out empty space and to
fill the length of the line. Is there some way to define these conditions for
a block of text? I have a tried a few things but no success, so I am having
to do corrections manually, which in a long document takes forever. Any
suggestions?

Thank you
 
U

Uddhava

Dear all,

X = space between word

When IXXXXXX posted XXthe XXaboveXX I XXXXadded XXXXXXextra XXXspaces
XXbetweenXXXX words to illustrate my problem. Ironically, the forum soffware
detected these unnatural spaces and automactically deleted them ie tidied up
my text. This is just what I am trying to acheive in MS Word.
 
S

Suzanne S. Barnhill

I'm seeing multiple spaces in your original post. I assume the problem is
not just that the lines are justified. You can remove excess spaces by
searching for " {2,}" (without the quotation marks; note the space before
the opening brace) and replacing with a single space.
 
U

Uddhava

Suzanne S. Barnhill said:
You can remove excess spaces by
searching for " {2,}" (without the quotation marks; note the space before
the opening brace) and replacing with a single space.


Dear Suzanne,

Thanks for your reply.
I assume the problem is not just that the lines are justified

Well it is sometimes something to do with justifying in the sense that I am
trying to undo the justifying embedded in the original text. For example take
the first paragraph of the link that I mentioned before -
http://ccbs.ntu.edu.tw/FULLTEXT/JR-PHIL/inada4.htm
This is written in short lines and the multiple spaces are there in the
process of justifying it, but the problem is that this formatting (short
lines and multiple spaces) which I don't want is carried into Word. I can
manage to lose the indentation by pushing the whole block of text to the left
but I am still stuck with the horrible short lines and multiple spaces.

Anyway I tried your {2,}
space, curly bracket, 2, comma, curly bracket
but it doesn't seem to work - I get '0 replacements were made' :(
 
R

RobertVA

Uddhava said:
Dear Suzanne,

Thanks for your reply.


Well it is sometimes something to do with justifying in the sense that I am
trying to undo the justifying embedded in the original text. For example take
the first paragraph of the link that I mentioned before -
http://ccbs.ntu.edu.tw/FULLTEXT/JR-PHIL/inada4.htm
This is written in short lines and the multiple spaces are there in the
process of justifying it, but the problem is that this formatting (short
lines and multiple spaces) which I don't want is carried into Word. I can
manage to lose the indentation by pushing the whole block of text to the left
but I am still stuck with the horrible short lines and multiple spaces.

Anyway I tried your {2,}
space, curly bracket, 2, comma, curly bracket
but it doesn't seem to work - I get '0 replacements were made' :(

You ARE going to get large gaps between words if you use full
justification on relatively narrow columns. That's the way it makes the
last word even with the right margin. The effect is particularly
noticeable if there is a long first word on the next line. If you don't
like the gaps you have to switch to Left Justify and put up with the
jagged right edge.
 
U

Uddhava

RobertVA said:
You ARE going to get large gaps between words if you use full
justification on relatively narrow columns. That's the way it makes the
last word even with the right margin. The effect is particularly
noticeable if there is a long first word on the next line. If you don't
like the gaps you have to switch to Left Justify and put up with the
jagged right edge.


Dear Robert,

Yes I understand why the multiple spaces are there in the original text
(written by someone else) but the question is how to get rid of them - the
align left button will get rid of the muiltiple spaces when the text is
originated by me but doesn't seem to work with text pasted from a web page.
Anyway there is still the problem of short lines. When I write a Word doc,
the right indent marker is at say 15 but some text pasted from the web
ignores my marker at 15 and stops short at say 11.
 
U

Uddhava

Suzanne S. Barnhill said:
Sorry, I forgot to mention that you have to check "Use wildcards" in the
Replace dialog.

OK using the first pargraph of my link above I get 44 replacements - however
all this achieves is to shift the spaces around, I still have numerous
multiple spaces but in a different arrangement :(
 
R

RobertVA

Uddhava said:
Dear Robert,

Yes I understand why the multiple spaces are there in the original text
(written by someone else) but the question is how to get rid of them - the
align left button will get rid of the muiltiple spaces when the text is
originated by me but doesn't seem to work with text pasted from a web page.
Anyway there is still the problem of short lines. When I write a Word doc,
the right indent marker is at say 15 but some text pasted from the web
ignores my marker at 15 and stops short at say 11.

"...my marker at 15..." You are either using a VERY wide page or metric
measurements in your ruler. Unless the text contains some long words
justification typically works pretty well as long as the column is over
about 3 inches or 75 mm. Unless you use a really small font you will get
some obvious gaps with narrower columns. The text won't usually extend
to the right column where there's a typed return. You can see spaces and
typed returns by clicking the Paragraph symbol on the tool bar (looks
like a backwards "P" with two vertical lines). With that function on
spaces will look like periods.

There are multiple ways of adding spaces in web pages, and the results
you get from a cut and paste may vary with how the page author achieved
the spaces. If the page source contains multiple space between words or
even returns your browser will normally compress the spaces to a single
space. The page author could potentially override this behavior in a
couple of ways, but I'm not sure you want an HTML lesson.

Some people I'm acquainted with clear the formatting from text by
pasting it to a notepad document, copying the text to the clipboard
again from the notepad document and THEN pasting the text in Word. This
will eliminate things like justification, fonts and attributes like
italics and boldface. It probably won't eliminate actual multiple
consecutive spaces, tabs or typed returns.

I don't have the latest version of Word (I have Word 97) BUT I have a
"Paste Special..." option on my "Edit" menu. The pop-up that appears
when I click that option offers an ability to "Paste as unformatted
text". Again this should eliminate everything but multiple spaces, tabs
and typed returns. Maybe later versions of Word have the same item on
their "Edit" menu. In Word 97 I would eliminate multiple consecutive
spaces by typing "^w" (without quotes - "w" stands for "White space") in
the "Find What:" field of the "Replace" pop-up, and " " (a single space)
in the "Replace with:" field. For tabs entering "^t" in the "Find what:"
field should work (do this before "^w").

When there are typed returns within what I believe should be paragraphs
I replace the returns I want with an unusual character like a tilde (~)
or a pipe (|). I then use Word's "Replace" function to change typed
returns to a space, replace white space with a single space and THEN
replace my chosen unusual character with returns.
 
D

Doug Robbins - Word MVP

If you turn on the display of paragraph marks by clicking on the ¶ button,
you will probably only see one dot between each word meaning that there is
only one space between each word. What appears to be additional spaces, but
is not multiple spaces, is caused by the text being justified as has been
mentioned earlier in this thread.

--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP
 
U

Uddhava

Doug Robbins - Word MVP said:
If you turn on the display of paragraph marks by clicking on the ¶ button,
you will probably only see one dot between each word meaning that there is
only one space between each word.

Dear Doug,

No there are multiple dots between the words - if anyone wants to see what I
mean please try the first paragraph of this link
http://ccbs.ntu.edu.tw/FULLTEXT/JR-PHIL/inada4.htm as a random example of web
text.

When you paste something you get the 'paste options' icon and I choose
'keep text only' but in fact I don't get the text only but also the original
formatting which I don't want, namely short lines and the multiple spaces
that go with the short lines in the original justifying.
 
C

Charles Kenyon

This is a separate issue. While paste unformatted text will give you no
formatting from the original, it will retain spaces between words and line
or paragraph breaks. Those spaces include spaces which were used in the
original instead of tabs. See
http://word.mvps.org/FAQs/Formatting/CleanWebText.htm.

Did you look at the page Doug referred you to? The multiple dots are
multiple spaces. They are what happens when one attempts to use a computer
like a typewriter.
http://www.shaunakelly.com/word/concepts/introduction/index.html
--
Charles Kenyon

Word New User FAQ & Web Directory: http://addbalance.com/word

Intermediate User's Guide to Microsoft Word (supplemented version of
Microsoft's Legal Users' Guide) http://addbalance.com/usersguide




--------- --------- --------- --------- --------- ---------
This message is posted to a newsgroup. Please post replies
and questions to the newsgroup so that others can learn
from my ignorance and your wisdom.
 
U

Uddhava

Charles Kenyon said:
This is a separate issue. While paste unformatted text will give you no
formatting from the original, it will retain spaces between words and line
or paragraph breaks. Those spaces include spaces which were used in the
original instead of tabs. See
http://word.mvps.org/FAQs/Formatting/CleanWebText.htm.

Did you look at the page Doug referred you to? The multiple dots are
multiple spaces. They are what happens when one attempts to use a computer
like a typewriter.
http://www.shaunakelly.com/word/concepts/introduction/index.html

Dear Charles,

I don't think Doug mentioned a page. Anyway the ¶ are turned on and the web
text has multiple dots ie multiple spaces between words, so I guess I need
some kind of command to say 'replace all multiple spaces with a single
space'. I think this is what Suzanne's idea was trying to do except it
doesn't seem to work.

Re the short lines, I can use Replace ^p with nothing. This will stretch out
the short lines to my right indent marker (say 15 instead of 11). However on
the down side, this causes mumerous words to merge together without a space.
 
C

Charles Kenyon

Replace ^p with a space then replace two spaces with one space. This will
take care of everything except a single space at the beginning of a
paragraph.

http://word.mvps.org/FAQs/Formatting/NonPrintChars.htm

--
Charles Kenyon

Word New User FAQ & Web Directory: http://addbalance.com/word

Intermediate User's Guide to Microsoft Word (supplemented version of
Microsoft's Legal Users' Guide) http://addbalance.com/usersguide




--------- --------- --------- --------- --------- ---------
This message is posted to a newsgroup. Please post replies
and questions to the newsgroup so that others can learn
from my ignorance and your wisdom.
 
U

Uddhava

Suzanne S. Barnhill said:
As I posted later, the search term I gave you *does* work provided you have
"Use wildcards" checked. See
http://word.mvps.org/FAQs/Formatting/CleanWebText.htm for more on taking
care of short lines.

Dear Suzanne,

Yes you are right it does work perfectly! I think my problem before was that
I had stray spaces in the 'replace with' box.

Anyway I am using first
replace ^p with (space)
and then
replace {2,} with (space)

This is doing a great job on the text - giving me long lines and no multiple
spaces - all that remains is to re-insert the paragraph breaks = easy peasy.
Thanks v much to all for your help. :)
 
Top