Bypassing Cloudflare When Web Scraping with Python, requests, & BeautifulSoup

This bug fix is brought to you by the developer of MoneyPhone: Personal Expense Tracking for Android and iOS. Consistently spend less than you earn each month and get your finances under control when you start monitoring your expenses with MoneyPhone!

After work today I started playing around with web scraping in Python using requests and BeautifulSoup, following along with the tutorials in the book Web Scraping with Python by Ryan Mitchell. Specifically, I want to be able to scrape the AngelList website to create my own angel investor database for project I’m calling AngelDB.xyz. However, I quickly ran into problems related to Cloudflare and their anti-bot protections.

This isn’t the first time Cloudflare has foiled one of my legitmate projects (they pretty much rendered my Chrome Extension the Internet Archivist’s Intrepid Extension useless after a few months). Luckily this time around, I found a pretty sweet library to help me bypass Cloudflare and scrape on:

cloudflare-scrape by Anorov

I haven’t gotten an opportunity to play with the library just yet as I just discovered it a few minutes ago and wanted to bookmark it here. However, after I get node.js installed here on my Windows machine I play on taking it for a spin. Wish me luck!

UPDATE (5/4/2019 10:43PM): After playing around with cloudflare-scrape for a little bit I could not get it to bypass cloudflare’s bot-security measures, and ended up receiving the same cloudflare html instead of the page that I actually wanted just as before. So if anyone happens to stumble upon this blog post, I’m skeptical that cloudflare-scrape will actually work for you.

 

topherPedersen