Strip or Convert HTML

D

Djmask

I need to convert or remove html tags in excel fields. I found a
suggestion here to copy the data into notepad and save as .html. This
works great however is not practical as I numerous files. Could this
possible done with VBA? Thanks for any help. I have been searching the
internet for days now trying to figure this out.

scott
 
B

Bob Phillips

This might help.

Run it a few times to pick up any strays

Dim cell As Range
Dim ipos As Long
Dim iEnd As Long

For Each cell In Selection
ipos = InStr(1, cell.Value, "<")
If ipos > 0 Then
For iEnd = ipos + 1 To Len(cell.Value)
If Mid(cell.Value, iEnd, 1) = ">" Then
Exit For
End If
Next iEnd
If ipos > 1 Then
cell.Value = Left(cell.Value, ipos - 1) & Right(cell.Value,
Len(cell.Value) - iEnd)
Else
cell.Value = Right(cell.Value, Len(cell.Value) - iEnd)
End If
End If
Next cell
 
D

David McRitchie

Hi Scott,
after running Bob's macro you might also want to run
(or incorporate macro into Bob's) the TRIMALL macro
http://www.mvps.org/dmcritchie/excel/join.htm#trimall

and also include replacements for &amp as &
and &nbsp; as a space

Curious how you ended up with HTML code within Excel.
---
HTH,
David McRitchie, Microsoft MVP - Excel [site changed Nov. 2001]
My Excel Pages: http://www.mvps.org/dmcritchie/excel/excel.htm
Search Page: http://www.mvps.org/dmcritchie/excel/search.htm

Bob Phillips said:
This might help.

Run it a few times to pick up any strays

Dim cell As Range
Dim ipos As Long
Dim iEnd As Long

For Each cell In Selection
ipos = InStr(1, cell.Value, "<")
If ipos > 0 Then
For iEnd = ipos + 1 To Len(cell.Value)
If Mid(cell.Value, iEnd, 1) = ">" Then
Exit For
End If
Next iEnd
If ipos > 1 Then
cell.Value = Left(cell.Value, ipos - 1) & Right(cell.Value,
Len(cell.Value) - iEnd)
Else
cell.Value = Right(cell.Value, Len(cell.Value) - iEnd)
End If
End If
Next cell
 
D

David McRitchie

Hi Scott,
after running Bob's macro you might also want to run
(or incorporate macro into Bob's) the TRIMALL macro
http://www.mvps.org/dmcritchie/excel/join.htm#trimall

to see if I missed something but

actually you might want to include this within Bob's
remember VBA is case sensitive
Selection.Replace What:=Chr(160), Replacement:=Chr(32), _
LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False Selection.Replace What:="&nbsp;"), Replacement:=Chr(32), _
LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Selection.Replace What:="&amp;"), Replacement:="&", _
LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Selection.Replace What:="<br>", Replacement:=Chr(10), _
LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False


Curious how you ended up with HTML code within Excel.
---
HTH,
David McRitchie, Microsoft MVP - Excel [site changed Nov. 2001]
My Excel Pages: http://www.mvps.org/dmcritchie/excel/excel.htm
Search Page: http://www.mvps.org/dmcritchie/excel/search.htm

Bob Phillips said:
This might help.

Run it a few times to pick up any strays

Dim cell As Range
Dim ipos As Long
Dim iEnd As Long

For Each cell In Selection
ipos = InStr(1, cell.Value, "<")
If ipos > 0 Then
For iEnd = ipos + 1 To Len(cell.Value)
If Mid(cell.Value, iEnd, 1) = ">" Then
Exit For
End If
Next iEnd
If ipos > 1 Then
cell.Value = Left(cell.Value, ipos - 1) & Right(cell.Value,
Len(cell.Value) - iEnd)
Else
cell.Value = Right(cell.Value, Len(cell.Value) - iEnd)
End If
End If
Next cell
 
B

Bob Phillips

It is odd isn't it? I assumed he had imported it as a text file.

Bob

David McRitchie said:
Hi Scott,
after running Bob's macro you might also want to run
(or incorporate macro into Bob's) the TRIMALL macro
http://www.mvps.org/dmcritchie/excel/join.htm#trimall

and also include replacements for &amp as &
and &nbsp; as a space

Curious how you ended up with HTML code within Excel.
---
HTH,
David McRitchie, Microsoft MVP - Excel [site changed Nov. 2001]
My Excel Pages: http://www.mvps.org/dmcritchie/excel/excel.htm
Search Page: http://www.mvps.org/dmcritchie/excel/search.htm

This might help.

Run it a few times to pick up any strays

Dim cell As Range
Dim ipos As Long
Dim iEnd As Long

For Each cell In Selection
ipos = InStr(1, cell.Value, "<")
If ipos > 0 Then
For iEnd = ipos + 1 To Len(cell.Value)
If Mid(cell.Value, iEnd, 1) = ">" Then
Exit For
End If
Next iEnd
If ipos > 1 Then
cell.Value = Left(cell.Value, ipos - 1) & Right(cell.Value,
Len(cell.Value) - iEnd)
Else
cell.Value = Right(cell.Value, Len(cell.Value) - iEnd)
End If
End If
Next cell
 
D

David McRitchie

Hi Scott,
Perhaps you could post a small sample of say 5 rows x 25 characters
of what you started with in the way of source HTML. What you see in
Excel and what you wanted in Excel.

If you were dealing strictly with source HTML and not Excel the following
would fail if you tried to strip everything between < > on a line by line
basis.
see <a href="#abc" title="def ghi
jkl mno"> fails since < and > are not matched on same line, you get bad results.

In the original question wouldn't you save as text.

If you want to extract some wording from HTML document from a
browser, You could use PureText to remove all formatting and just
have text -- same result of pasting to notepad, and then copying from
notepad.

PureText Home Page, strips all formatting out of the clipboard so you can copy from and HTML display and paste plain text. Invoke
with Window+V shortcut or with [PT] on taskbar next to clock.
http://stevemiller.net/puretext/

If you want do edit the HTML source HTML-Kit with Tidy provides a means of
stripping out unwanted HTML if you get into a bit more than I have. .
 
Top