在C#
Windows窗体应用程序中,我可以使用以下命令获取网页的内容:
- string content = webClient.DownloadString(url);
我可以使用以下方式获取HTTP标头:
- HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
- request.Method = "GET";
- string response = ((HttpWebResponse)request.GetResponse()).StatusCode.ToString();
有没有办法在一次访问服务器而不是两次获取内容和HTTP状态代码(如果失败)?
谢谢.
解决方法
您可以在HttpWebResponse对象中读取Stream中的数据:
- HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
- request.Method = "GET";
- using (var response = request.GetResponse())
- using (var stream = response.GetResponseStream())
- using (var reader = new StreamReader(stream))
- {
- HttpStatusCode statusCode = ((HttpWebResponse)response).StatusCode;
- string contents = reader.ReadToEnd();
- }
通过这种方式,您必须手动检测编码,或使用库来检测编码.您可以从HttpWebResponse对象中读取编码作为字符串,如果存在,则它位于ContentType属性中.如果页面是Html,那么您将不得不解析它以在文档顶部或头部内部进行可能的编码更改.
从ContentType标头读取处理编码
- var request = (HttpWebRequest)WebRequest.Create(url);
- request.Method = "GET";
- string content;
- HttpStatusCode statusCode;
- using (var response = request.GetResponse())
- using (var stream = response.GetResponseStream())
- {
- var contentType = response.ContentType;
- Encoding encoding = null;
- if (contentType != null)
- {
- var match = Regex.Match(contentType,@"(?<=charset\=).*");
- if (match.Success)
- encoding = Encoding.GetEncoding(match.ToString());
- }
- encoding = encoding ?? Encoding.UTF8;
- statusCode = ((HttpWebResponse)response).StatusCode;
- using (var reader = new StreamReader(stream,encoding))
- content = reader.ReadToEnd();
- }