Understanding Web Scraping
3
Best Practices
Art of Web Scraping
Twitter
LinkedIn
Introduction
Understanding Web Scraping
1
Web Scraping 101
2
Scraping Philosophy
3
Best Practices
4
Ethics & Legal Issues
Python Scraping Ecosystem
5
Making Requests
6
Parsing HTML
7
Other Libraries
Appendices
A
CSS & XPath Selectors
Table of contents
3.1
Scraper Design
3.2
Developer Ergonomics
3.3
Being a Good Citizen
3.3.1
Scraping != Adversarial
3.3.2
Identifying Yourself
3.3.3
robots.txt
3.3.4
Rate Limiting
3.3.5
Caching
Understanding Web Scraping
3
Best Practices
3
Best Practices
3.1
Scraper Design
3.2
Developer Ergonomics
3.3
Being a Good Citizen
3.3.1
Scraping != Adversarial
3.3.2
Identifying Yourself
3.3.3
robots.txt
3.3.4
Rate Limiting
What is Rate Limiting?
Naive Rate Limiting
Proper Rate Limiting
Enhancements
3.3.5
Caching
HTTP Caching
Caching for Scrapers
Caveats
2
Scraping Philosophy
4
Ethics & Legal Issues