Search form


Python: Webscraping With BeautifulSoup

Some experiments in website scraping using Python 2.7 with BeautifulSoup 3.2. The first function here shows various manipulations of an HTML page, including saving a scrubbed file to disk. The second function shows a simple crawler that attempts to traverse a domain and build a sitemap from hyperlinks encountered in the pages. Includes some commentary on page encoding, parsers and multiple approaches to some tasks.

Subscribe to RSS - Python