Automating applications is a serious business. Most business requires data coming from various sources. Usually there are 2 techniques that are used to fetch data from sources:
1.) Calling Provider API :
This is a White hat approach of getting data from data sources.
Pros:
It’s more reliable since contracts are provided by the provider and don’t change often.
Cons:
Usually comes at a price.
Providers don’t expose all data via APIs
2.) Scraping
This is a black hat approach and is deemed illegal by providers. Scrapping or scrubbing is a process of extracting data available publically on website pages.
Pros:
It’s free and we can scrape whatever data is available on website.
Cons:
Providers usually don’t show all data on websites i.e. they usually show a low number of pages than what’s available on site. Example Google allows only 60 pages of search results out of millions for a particular search.
Have to deal with spam blockers on sites like Captcha and IP blocks
style="display:block; text-align:center;"
data-ad-format="fluid"
data-ad-layout="in-article"
data-ad-client="ca-pub-5021110436373536"
data-ad-slot="9215486331">
I will try to explain the black hat approach more so since there are not enough resources available on internet for this.
Approaches are:
- Browser Automation
- Headless automation using tools like Selenium
- Sending Socket request using native HttpRequest and HttpResponse classes.