|
|
![]() |
|
HtmlZap sample programThis little code snippet opens all the files that match the filespec
Anybody who wonders why I would be doing things like this has never worked with HTML files created by Microsoft Word's "Save as HTML" command... <g>
Fetching web pagesHtmlZap users frequently ask me how to use the component to parse pages retrieved directly from the Internet. HtmlZap itself doesn't have this functionality: it can only parse HTML contained in a local disk file or a string. However, there are many components (commercial and otherwise) that can
fetch web pages and return them in a form HtmlZap can use. The simplest
approach may be to use the Microsoft Internet Transfer Control, which is
included with Visual Studio. If you were to place an Internet Transfer
Control named HZ.LoadBuffer ITC.OpenURL("http://www.google.com/", icByteArray)
This will cause the Internet Transfer Control to connect to the target
URL (Google's main page in the example), retrieve its contents as a Byte
Array, then pass the array to the HtmlZap component named If you'd rather not use the Internet Transfer Control, which doesn't work from scripting languages, you can try a freeware component like AspTear, for example, though I've never used it myself. Several commercial IP component suites include tools that can retrieve pages over HTTP; I've had really good luck with the IP*Works suite from /n software.
The source Private Sub Demo()
'
' Parse the intro
'
Dim of As Integer
Dim sf As String, pfn As String
Dim pict As String
Const SrcDir = "j:\intro\" ' Source directory
HZ.CompressWS = False ' Don't compress whitespace
sf = Dir(SrcDir + "page*.htm") ' Get all page*.htm files
While sf <> ""
HZ.Load SrcDir + sf ' Load a file
of = FreeFile ' Open the copy
Open SrcDir + "copy\" + sf For Output As #of
While Not HZ.EOF ' Loop through the entire source file
If HZ.IsTag Then ' This is a tag
Select Case HZ.TagName ' Which one?
Case "font", "/font", "p"
' Do nothing... just remove these
Case "/p"
Print #of, "<p>"; ' Convert </p> to <p>
Case "img"
pfn = LCase$(HZ.Param("src"))
FileCopy SrcDir + pfn, "j:\intro\send\" + pfn
pict = SrcDir + pfn
' Use an invisible picture control to get picture sizes
PicSizer.Picture = LoadPicture(pict)
Print #of, "<img src="""; pfn; """ width="; _
Format$(PicSizer.ScaleWidth); " height="; _
Format$(PicSizer.ScaleHeight); ">";
DoEvents
Case Else ' All other tags
Print #of, "<"; HZ.ToString; ">";
End Select
Else
Print #of, HZ.Text; ' Just transcribe text unchanged
End If
HZ.Next ' Get the next slice
Wend
Close #of ' Close the copy
HZ.Reset ' Reset the Html Zapper
DoEvents
sf = Dir ' Get next filename
Wend
MsgBox "Done", vbOKOnly, "Demo Copy"
End Sub
Last revised: 24 September 2002
|
||