Unable to Retrieve Complete Web Page

A

Alan

I am trying to completely retrieve a web page and search it using
VBScript regular expressions. However, I do not get the complete web
page.

Am I running into some VBA string length limit or what? Is there some
way around it?

My Sub may be found below.

Alan

Sub GetGoogleHomePage()

Dim oIE As SHDocVw.InternetExplorer
Dim sPage As String

' Create a new (hidden) instance of IE
Set oIE = New SHDocVw.InternetExplorer

' Open the web page
oIE.Navigate "http://www.google.com"

' Wait for the page to complete loading
Do Until oIE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop

' Retrieve the text of the web page
sPage = oIE.Document.body.InnerHTML

' Display the HTML
Debug.Print sPage

End Sub
 
L

Leith Ross

Alan;477519 said:
I am trying to completely retrieve a web page and search it using
VBScript regular expressions. However, I do not get the complete web
page.

Am I running into some VBA string length limit or what? Is there some
way around it?

My Sub may be found below.

Alan

Sub GetGoogleHomePage()

Dim oIE As SHDocVw.InternetExplorer
Dim sPage As String

' Create a new (hidden) instance of IE
Set oIE = New SHDocVw.InternetExplorer

' Open the web page
oIE.Navigate "http://www.google.com"

' Wait for the page to complete loading
Do Until oIE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop

' Retrieve the text of the web page
sPage = oIE.Document.body.InnerHTML

' Display the HTML
Debug.Print sPage

End Sub

Hello Alan,

Here is another method that overcomes the string/character limitations
This uses the WinHTTP COM object to retrieve and store the web page'
source code into a text file (.txt). The file created is "C:\tem
URL.txt". You can change the path and file name to what you want.

================================
'Written: September 04, 2009
'Author: Leith Ross
'Summary: Saves a web page's source code to a text file.

Sub SaveServerDataAsFile()

'Create an array to hold the response data.
Dim d() As Byte
Dim objReq As Object

On Error Resume Next
Set objReq = CreateObject("WinHttp.WinHttpRequest.5.1")
If objReq Is Nothing Then
Set objReq = CreateObject("WinHttp.WinHttpRequest.5")
End If
Err.Clear
On Error GoTo 0

'Assemble an HTTP Request.
objReq.Open "GET", "http://www.thecodecage.com/", False

'Send the HTTP Request.
objReq.Send

'Show status and content type.
MsgBox objReq.Status & " - " & objReq.StatusText

'Put response data into a file.
Open "C:\temp URL.txt" For Binary As #1
d() = objReq.ResponseBody
Put #1, 1, d()
Close #1

End Sub
===============================

--
Leith Ros

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/
 
A

Alan

Leith,
It seems like I will run into the same problem (too long)
or have problems with text broken over multiple lines when I read the
data back from the file.

But, I'll give this a try.

Could you please explain the "WinHttp.WinHttpRequest.5.1" vs.
"WinHttp.WinHttpRequest.5"?

Thanks, Alan
 
L

Leith Ross

Alan;477872 said:
Leith,
It seems like I will run into the same problem (too long)
or have problems with text broken over multiple lines when I read the
data back from the file.

But, I'll give this a try.

Could you please explain the "WinHttp.WinHttpRequest.5.1" vs.
"WinHttp.WinHttpRequest.5"?

Thanks, Alan

Hello Alan,

Here is link to page that explains the differences in detail

'WinHTTP Versions (Windows)
(http://msdn.microsoft.com/en-us/library/aa384276(VS.85).aspx

--
Leith Ros

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/
 
A

Alan

I do seem to have the same truncation problem when I read the firl
back in with this code:

' Read each line of the file, looking for the description
Dim myFileName As String
Dim myLine As String
Dim FileNum As Long

myFileName = ThisWorkbook.Path & "\URL.txt"
FileNum = FreeFile
Close FileNum
Open myFileName For Input As FileNum
count = 0
Do While Not EOF(FileNum)
count = count + 1
Line Input #FileNum, myLine
Debug.Print myLine
Debug.Print "===========================================" &
vbCrLf
Debug.Print count & vbCrLf
Debug.Print vbCrLf &
"===========================================" & vbCrLf
myLine = ExtractCoDescr(myLine)
If Len(myLine) > 0 Then
GetCoDescription = myLine
Close FileNum
Exit Do
End If
Loop
Close FileNum
 
L

Leith Ross

Alan;477883 said:
I do seem to have the same truncation problem when I read the firl
back in with this code:

' Read each line of the file, looking for the description
Dim myFileName As String
Dim myLine As String
Dim FileNum As Long

myFileName = ThisWorkbook.Path & "\URL.txt"
FileNum = FreeFile
Close FileNum
Open myFileName For Input As FileNum
count = 0
Do While Not EOF(FileNum)
count = count + 1
Line Input #FileNum, myLine
Debug.Print myLine
Debug.Print "===========================================" &
vbCrLf
Debug.Print count & vbCrLf
Debug.Print vbCrLf &
"===========================================" & vbCrLf
myLine = ExtractCoDescr(myLine)
If Len(myLine) > 0 Then
GetCoDescription = myLine
Close FileNum
Exit Do
End If
Loop
Close FileNum

Hello Alan,

The code I wrote downloads the web page in binary format using unsigne
bytes. This is all stored into memory before being save to a disk fil
with a ".txt" extension. Web page size is only limited by availabl
memory. The advantage of binary is that all information is brought int
the file, not just text and this could be what is causing you
truncation problems. The file reading method you are using expects th
file data to be in a specific format. What information are you trying t
locate or extract from the file

--
Leith Ros

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/
 
A

Alan

I am trying to extract text following a series of HTML tags and
keywords.

If you can explain how I get started on properly reading it, I
would appreciate it.

Alan
 
L

Leith Ross

Alan;478340 said:
I am trying to extract text following a series of HTML tags and
keywords.

If you can explain how I get started on properly reading it, I
would appreciate it.

Alan

Hello Alan,

The easiest method would to be to use Word as the file editor. You ca
view the file best by going to View > Web Layout and search using Fin
option

--
Leith Ros

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/
 
A

Alan

Here is the code that worked:

Sub ReadWebFileTextStream()
Dim fs As Object ' scripting.filesystemobject
Dim txtin As Object ' scripting.textstream
Dim strline As String

Set fs = CreateObject("scripting.filesystemobject")
Set txtin = fs.opentextfile(ThisWorkbook.Path & "\URL.txt", 1) '
1 is for Reading

Do While Not txtin.atendofstream
strline = txtin.readline
'
' Process data here . . .
'
Loop
txtin.Close
Set txtin = Nothing
Set fs = Nothing
End Sub
 
R

Ron Rosenfeld

Am I running into some VBA string length limit or what? Is there some
way around it?

I believe there is a limit as to how much data the immediate window can
display.

However, I've had no problems parsing long documents using the innerHTML or
innerTEXT property.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top