.Net provides low level classes to natively work with HTTP. Two of such classes worth mentioning are:
- HttpWebRequest
- HttpWebResponse
To send a request to target URL we will create an instance of HttpWebRequest as follows:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://shaktitanwar.com/");
Next step is to set the headers which are required by target site we can do those as below:
request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip, deflate, sdch");
request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-US,en;q=0.8");
We can also trick the site to treat us as a different browser by changing user-agent as below:
request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36";
Getting response from site: we can get response from site by calling GetResponse method of request as follows:
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
This gives us the response object and we can get the site html using below code. We can later on extract information from html using regex or programming logic:
Stream responseStream = response.GetResponseStream();
string html = streamReader.ReadToEnd();
Once we have all data we will try to extract information from it. I will be utilizing an open source library called HtmlAgilityPack .