使用VB.net或c#,如何获取生成的
HTML源代码?
要获取页面的html源代码,我可以在下面使用它,但是这不会获得生成的源代码,它将不包含由浏览器中的javascript动态添加的任何html.如何获取最终生成的HTML源代码?
谢谢
WebRequest req = WebRequest.Create("http://www.asp.net"); WebResponse res = req.GetResponse(); StreamReader sr = new StreamReader(res.GetResponseStream()); string html = sr.ReadToEnd();
如果我在下面尝试这个,那么它会返回没有注入JavaScript代码的文档
Public Class Form1 Dim WB As WebBrowser = Nothing Private Sub Form1_Load(sender As Object,e As EventArgs) Handles MyBase.Load WB = New WebBrowser() Me.Controls.Add(WB) AddHandler WB.DocumentCompleted,AddressOf WebBrowser1_DocumentCompleted WB.Navigate("mysite/Default.aspx") End Sub Private Sub WebBrowser1_DocumentCompleted(sender As Object,e As WebBrowserDocumentCompletedEventArgs) 'Dim htmlcode As String = WebBrowser1.Document.Body.OuterHtml() Dim s As String = WB.DocumentText End Sub End Class
HTML返回
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="server"> <title></title> </head> <body> <form id="form1" runat="server"> <div id="center_text_panel"> //test text this text should be here </div> </form> </body> </html> <script type="text/javascript"> document.getElementById("center_text_panel").innerText = "test text"; </script>
解决方法
你可以使用
WebKit.NET
Look here正式教程
这不仅可以获取源代码,还可以通过pageload事件处理javascript.
webKitBrowser1.Navigate(MyURL)
然后,处理DocumentCompleted事件,并:
private documentContent = webKitBrowser1.DocumentText
编辑 – 这可能是更好的开源WebKit选项:http://code.google.com/p/open-webkit-sharp/