Extract links and .asp file names from text

M

Maxi

I have huge text in a word file I want to extract all the links that
starts from http:// and all filenames ending with .asp file extension
and keep it at the end of the document. Is this possible in word vba?
 
J

Jezebel

You don't actually need VBA ---

Method 1: use wildcard searching: http[! ]{1,} and [a-zA-Z]{1,}.asp
respectively. You could use this technique to apply a special format, then
delete everything that doesn't have that format. (You use VBA for this also,
if you're extracting the items and doing something else with them.)

Method 2: convert all whitespace to paragraph marks, to turn the document
into a flat list of words. Copy and paste into Excel. Do a unique sort. All
your http lines will come together. Use the Find() function in the adjacent
column to find cells contain ".asp".
 
M

Maxi

I like method2, can you tell me how to convert all whitespace to
paragraph marks. Is there any particular symbol that I can replace
using the Find/Replace option?
 
J

Jonathan West

Maxi said:
I like method2, can you tell me how to convert all whitespace to
paragraph marks. Is there any particular symbol that I can replace
using the Find/Replace option?

Place ^w in the Find What box and ^p in the Replace With box. Click Replace
All.

In fact you don't even need to copy & paste into Excel. Select the entire
document, go to Table, Sort and sort the paragraphs there. (Yes, the
paragraphs aren't in a table, but they can still be sorted!)


--
Regards
Jonathan West - Word MVP
www.intelligentdocuments.co.uk
Please reply to the newsgroup
Keep your VBA code safe, sign the ClassicVB petition www.classicvb.org
 
J

Jezebel

Sorting in Excel has the advantage that you can eliminate the duplicates.
Also the .asp lines will be easier to find.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top