Deleting text in bookmarks very slow...

A

Alex

Hello,

I've run into a problem in my C# add-in for Word.

One of the operations that the add-in has to do is going over all bookmarks in a document and removing the bookmarks and its content if certain conditions are true.
Originally, I tried the following:

bookmark.Range.Text = "";

Unfortunately, it did not work correctly in all cases (for example, if the bookmark spanned table cells).
So I used the following:

Microsoft.Office.Interop.Word.Range range = bookmark.Range;
bookmark.Delete();
range.Delete(ref missing, ref missing);

when "missing" was initialized thus:

object missing = System.Reflection.Type.Missing;

That works correctly but, according to the profiler, is 4-5 times slower!

Since this operation is performed in a tight loop, on a document with a lot of bookmarks that need to be deleted, this slowdown is VERY noticeable.

The breakdown of the timing between the 3 lines is roughly:

Microsoft.Office.Interop.Word.Range range = bookmark.Range; // 3.6%
bookmark.Delete(); // 13.8%
range.Delete(ref missing, ref missing); // 82.6%

So the culprit is the range.Delete() operation.

I really need to speed this up!

Any help is appreciated.

Thanks,
Alex.
 
P

Peter Huang [MSFT]

Hi

I think you may try to take a look at the Application.Run method.
Which will run a VBA macro in the office's process which may speed the
range.Delete.

HOW TO: Run Office Macros by Using Automation from Visual C# .NET
http://support.microsoft.com/?kbid=306683

You may have a try.

Also I think your approach to delete a bookmark correct, because bookmark
can be considered as a position, so if we delete a bookmark, the content
should not be deleted. Just as we do in the Word's UI.

So we need to do two steps, one is delete the bookmark, the second is to
delete the necessary content.

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
 
A

Alex

"Peter Huang" said:
Hi

I think you may try to take a look at the Application.Run method.
Which will run a VBA macro in the office's process which may speed the
range.Delete.

HOW TO: Run Office Macros by Using Automation from Visual C# .NET
http://support.microsoft.com/?kbid=306683

You may have a try.

This is interesting but I will need to somehow inject the macros into the document.
Can I do it programmatically?
 
A

Alex

"Peter Huang" said:
Hi

I think you may try to take a look at the Application.Run method.
Which will run a VBA macro in the office's process which may speed the
range.Delete.

HOW TO: Run Office Macros by Using Automation from Visual C# .NET
http://support.microsoft.com/?kbid=306683

You may have a try.

I did.

The C# code was:

System.Type appType = typeof(Microsoft.Office.Interop.Word.Application);
object[] args = new object[2] { "MyMacro", null };
foreach (Microsoft.Office.Interop.Word.Bookmark bookmark in application.ActiveDocument.Bookmarks)
{
if (/* bookmark satisfies condition */)
{
args[1] = bookmark;
appType.InvokeMember(
"Run",
System.Reflection.BindingFlags.Default | System.Reflection.BindingFlags.InvokeMethod,
null, myWordApp, args);
}
}

The Macro was:

Option Explicit
Sub MyMacro(bm As Bookmark)
Dim rng As Range
Set rng = bm.Range
bm.Delete
rng.Delete
End Sub

The net result: 3.5% slower on the average.
Plus Word was hanging on exit.

So, to make a long story short, it did not help.
 
P

Peter Huang [MSFT]

Hi

Since the Word Macro is running inside the Word process, it can be
considered as the intrinsic language in Word, if the macro did not work
faster.
I think that maybe the word's nature. Because Word (including other office
product) is designed as desktop side products, it aimed at Interactive
Operation with End User. To use it as a Server is not the advantage of Word.

Also I think you may try to contact MSPSS to see if they has other idea on
your performance issue.
http://support.microsoft.com

If you still have any concern, please feel free to post here.

Thanks for your understanding!


Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
 
C

Cindy M -WordMVP-

Hi Alex,
Since this operation is performed in a tight loop, on a document with a lot
of bookmarks that need to be deleted, this slowdown is VERY noticeable.
The breakdown of the timing between the 3 lines is roughly:

Microsoft.Office.Interop.Word.Range range = bookmark.Range; // 3.6%
bookmark.Delete(); // 13.8%
range.Delete(ref missing, ref missing); // 82.6%

So the culprit is the range.Delete() operation.
Hmmm. When you delete a range, if you (or Word) are tracking any other range
objects, all these object references need to be "updated" so that you don't
lose the text to which they're pointing. And for each bookmark, Word probably
has to start calculating again from the beginning of the document to figure out
exactly which characters are the range.

The only suggestion I can make is, try looping another way. Sometimes, when
Word gives us problems, it makes sense to approach it "upside down". Rather
than For...Each (in my pseudo C# syntax):
int nrBkm = doc.Bookmarks.Count
For (int counter = nrBkm, counter==0, -1)
{ doc.Bookmarks(counter).Range.Delete
doc.Bookmarks(counte).Delete }

Note that you may first need to check for the existance of this bookmark
(bookmarks.exists) since deleting the range may well delete the bookmark.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :)
 
A

Alex

Hello Cindy,
Cindy M -WordMVP- said:
Hmmm. When you delete a range, if you (or Word) are tracking any other range
objects, all these object references need to be "updated" so that you don't
lose the text to which they're pointing. And for each bookmark, Word probably
has to start calculating again from the beginning of the document to figure out
exactly which characters are the range.

That sounds reasonable.
However, as I wrote in the original post, I was previously deleting the boomnarks and content using
bookmark.Range.Text = "";

That should also have caused Word to update the other references but was 4-5 times faster.
(Unfortunately, I could not use it because it did not play nicely with tables.)
The only suggestion I can make is, try looping another way. Sometimes, when
Word gives us problems, it makes sense to approach it "upside down". Rather
than For...Each (in my pseudo C# syntax):

That is an interesting suggestion. I shall try it.
By the way, feel free to post your suggestion in the syntax you're most comfortable with.
Unless there will be significant functional differences between VB and C#, I'll figure it out :)
int nrBkm = doc.Bookmarks.Count
For (int counter = nrBkm, counter==0, -1)
{ doc.Bookmarks(counter).Range.Delete
doc.Bookmarks(counte).Delete }

Note that you may first need to check for the existance of this bookmark
(bookmarks.exists) since deleting the range may well delete the bookmark.

That is a problem that I struggled with earlier.
According to my tests, deleting the range *sometimes* (not always) deletes the bookmark.
If it did, any access to the bookmark will throw an exception.
This is problematic because it seems to me that there is a significant overhead in handling .NET exceptions.

My solution was to save the range in a variable, delete the bookmark first and then delete the saved range.

Best wishes,
Alex.
 
A

Alex

Followup:

My laest uptimization follows:

using Word = Microsoft.Office.Interop.Word;

Word.Document doc = myWordApp.ActiveDocument;
Word.Range[] ranges = new Word.Range[doc.Bookmarks.Count];
int numRanges = 0;
foreach (Word.Bookmark bm in doc.Bookmarks)
{
if (/* bm satisfies condition */)
{
ranges[numRanges++] = bm.Range;
bm.Delete(); // [2]
}
}

Array.Sort(ranges, 0, numRanges, new RangeComparer());
for (int i = 0; i < numRanges; ++i)
{
Word.Range range = ranges;
if (range.Start != range.End)
range.Delete(ref missing, ref missing); // [1]
Marshal.ReleaseComObject(range);
}

When:

private class RangeComparer: IComparer
{
public int Compare(object x, object y)
{
Word.Range left = (Word.Range) x;
Word.Range right = (Word.Range) y;
int left_end = left.End;
int right_end = right.End;
return left_end == right_end ? left.Start - right.Start : right_end - left_end;
}
}

The sorting improved the timing slightly.

Anyway, this is the best I could come with.

[1] This is still the biggest time consumer.

[2] This line also consumes a lot of time, which is strange since it only removes the bookmarks,
not the content. Weird...


Best wishes,
Alex.
 
P

Peter Huang [MSFT]

Hi Alex,

Thank for your sharing the code.
As I suggest before, if you do have concern about the performance issue, I
suggest you contact MSPSS directly.
http://support.microsoft.com

If you still have any other concern, please feel free to post here.



Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
 
C

Cindy M -WordMVP-

Hi Alex,
However, as I wrote in the original post, I was previously deleting the boomnarks and content using
bookmark.Range.Text = "";

That should also have caused Word to update the other references but was 4-5 times faster.
(Unfortunately, I could not use it because it did not play nicely with tables.)
Yes... But you aren't "fiddling" with a collection of Ranges in this case.
That is a problem that I struggled with earlier.
According to my tests, deleting the range *sometimes* (not always) deletes the bookmark.
If it did, any access to the bookmark will throw an exception.
But as I said, you can check directly whether a bookmark exists, there's actually a
property for it that returns true/false:
doc.Bookmarks.Exists(sBookmarkName)

So no need to go with the overhead of an exception, or looping through the collection to
determine if it's there.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply in the
newsgroup and not by e-mail :)
 
C

Cindy M -WordMVP-

Hi Alex,
[1] This is still the biggest time consumer.
Yes, but since you can check whether or not a bookmark exists, I'd still try
looping through the bookmark ranges, deleting them as you go, then check if the
bookmark still exists and delete it, if that's the case. My gut feeling is that
this would be fastest.
[2] This line also consumes a lot of time, which is strange since it only removes the bookmarks,
not the content. Weird...
As I said, try looping through backwards in a For...Next (no Each).

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :)
 
A

Alex

Hi Cindy,

Cindy M -WordMVP- said:
Hi Alex,

Yes... But you aren't "fiddling" with a collection of Ranges in this case.

Sorry?
Why does bookmark.Range.Delete "fiddle with a collection of Ranges" and bookmark.Range.Text = "" will not?
But as I said, you can check directly whether a bookmark exists, there's actually a
property for it that returns true/false:
doc.Bookmarks.Exists(sBookmarkName)

I haven't timed it but I doubt that it's free.


Best wishes,
Alex.
 
A

Alex

Hi Cindy,

Cindy M -WordMVP- said:
Hi Alex,
[2] This line also consumes a lot of time, which is strange since it only removes the bookmarks,
not the content. Weird...
As I said, try looping through backwards in a For...Next (no Each).

Cindy, I am not sure what it will buy me.

Please work with me on this one:

Your original reply:
http://groups.google.ca/group/microsoft.public.office.developer.com.add_ins/msg/5ad3824b78c75f56

If I understand you correctly, you said that removing a bookmark *content* is a slow operation because the following bookmarks and ranges will have to be updated. Correct?

So, in order to minimize the updating, the order of removing the bookmarked elements should be from the end of the document to the beginning.
E.g., if I have added 4 bookmarks (bm1, bm2, bm3 and bm4) in the following positions: ---[bm3]---[bm1]---[bm4]---[bm2]---
I should remove bm2 first, then bm4, bm1 and finally bm3.

However, the ActiveDocument.Bookmarks collection holds the document in the order they were *added*, not their position in the document.
I tested it with the following macro:
For i = ActiveDocument.Bookmarks.Count To 1 Step -1
MsgBox (ActiveDocument.Bookmarks(i))
Next

and the order returned was: bm4, bm3, bm2 and bm1.

Since the order of iterating through the bookmarks is dependent on the order of their creation and unrelated to their locations, I cannot see how reversing the iteration will speed it up.

What I did instead, is collecting all the bookmarks into an array, then sorting the array by the bookmark positions in the document.
That *did* speed it up somewhat.

I was also puzzled as to why the bookmark.Delete() operation would be slow, as it only deletes the actual bookmark, leaving the content intact, so it should not affect the layout of the document (and positions of any other ranges) at all.
 
A

Alex

C

Cindy M -WordMVP-

Hi Alex,
Found the answer:
http://groups.google.ca/group/microsoft.public.mac.office.word/msg/acf4ae50fd60410a
: in the case of a physical object such as a bookmark or a hyperlink,
: if you delete it, Word instantly renumbers them all.
Ah, interesting. Thanks for posting this :)

OTOH, as far as I'm aware, Word doesn't actually NUMBER either of these, any more
than any other object. However, in the case of bookmarks, I'm sure Word has to rebult
two sets of indexes, one of which is the alphabetized by name one. That would
certainly take some time.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply in
the newsgroup and not by e-mail :)
 
A

Alex

Hello Cindy,

Cindy M -WordMVP- said:
Hi Alex,

Ah, interesting. Thanks for posting this :)

Doing what I can to share the knowledge I stumble upon.

As you probably remember, I was a card carrying member of the "huh?" club myself not so long ago.
It is a little frustrating when one encounters a newsgroup full of problems an a disproportionaly small number of solutions
OTOH, as far as I'm aware, Word doesn't actually NUMBER either of these, any more
than any other object. However, in the case of bookmarks, I'm sure Word has to rebult
two sets of indexes, one of which is the alphabetized by name one. That would
certainly take some time.

I guess he used "renumber" in the figurative sense.

However, that brings me back to the original issue:
Is there a way to temporarily suppress this processing in order to speed up the deletion of a large number of bookmarks?
 
P

Peter Huang [MSFT]

Hi Alex,

Based on my knowledge, Word is designed as a Desktop application, which
interacted with End-User, it is commonly not suitable to use it as a large
batch documents processor.

Also if you did not get any response, as I suggest before, I think you may
try to contact MSPSS directly.
http://support.microsoft.com

Thanks for your understanding!

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
 
C

Cindy M -WordMVP-

Hi Alex,
However, that brings me back to the original issue:
Is there a way to temporarily suppress this processing in order to speed up
the deletion of a large number of bookmarks?No change to my answer, either :) Work from the back, forwards, and don't use
format of the bookmarks collection that relies on the alphabetical list. That's
all you can do...

It's essential for Word to maintain the pointers on this collection
immediately, and cleanly, as bookmarks underlie a lot of Word's features (TOCs,
cross-references, links between files and OLE services, just to mention a few).

Can you remind me what you're using the bookmarks for? Maybe what we need to
look for is an alternative to using bookmarks...

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :)
 
A

Alex

Hi Cindy,

Cindy M -WordMVP- said:
It's essential for Word to maintain the pointers on this collection
immediately, and cleanly, as bookmarks underlie a lot of Word's features (TOCs,
cross-references, links between files and OLE services, just to mention a few).

Too bad I could not find a "delete all" operation.
Can you remind me what you're using the bookmarks for? Maybe what we need to
look for is an alternative to using bookmarks...

Adding persistent metadata to selected pieces of text in the document.
 
C

Cindy M -WordMVP-

Hi Alex,
Adding persistent metadata to selected pieces of text in the document.
It sorta depends on how volatile things could be (how great the danger is, that
someone will delete it), but have you ever considered using a SET field? It might
be faster to delete a SET field, than a bookmark. (Might not be, since a SET
field defines a bookmark, but you never know.) Or possibly an XE (Index marker)
field.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top