Proxy Configuration Examples – Python

Why you need ProxyEgg for your web scraping projects?

If you are extracting data from the web at scale, you’ve probably already figured out the answer. IP banning. The website you are targeting might not like that you are extracting data even though what you are doing is totally ethical and legal. When your scraper is banned, it can really hurt your business because the incoming data flow that you were so used to is suddenly missing. 

Setting up proxies in Scrapy

Setting up a proxy inside Scrapy is easy. There are two easy ways to use proxies with Scrapy – passing proxy info as a request parameter or implementing a custom proxy middleware.

Option 1: Via request parameters

Normally when you send a request in Scrapy you just pass the URL you are targeting and maybe a callback function. If you want to use a specific proxy for that URL you can pass it as a meta parameter, like this:

def start_requests(self):
    for url in self.start_urls:
        return Request(url=url, callback=self.parse,
                       headers={"User-Agent": "My UserAgent"},
                       meta={"proxy": "http://user:[email protected]:8080"})

The way it works is that inside Scrapy, there’s a middleware called HttpProxyMiddleware which takes the proxy meta parameter from the request object and sets it up correctly as the used proxy. The middleware is enabled by default so there is no need to set it up.

Option 2: Create custom middleware

Another way to utilize proxies while scraping is to actually create your own middleware. This way the solution is more modular and isolated. Essentially, what we need to do is the same thing as when passing the proxy as a meta parameter:

from w3lib.http import basic_auth_header 
class CustomProxyMiddleware(object):
    def process_request(self, request, spider):
        request.meta[“proxy”] = "http://p.proxyegg.com:8080"
        request.headers[“Proxy-Authorization”] = 
                          basic_auth_header(“<proxy_user>”, “<proxy_pass>”)

In the code above, we define the proxy URL and the necessary authentication info. Make sure that you also enable this middleware in the settings and put it before the HttpProxyMiddleware:

DOWNLOADER_MIDDLEWARES = { 
    'myproject.middlewares.CustomProxyMiddleware': 350, 
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400, 
}

IP Rotating Proxy Onsale

SPECIAL LIMITED TIME OFFER

00
Months
00
Days
00
Hours
00
Minutes
00
Seconds
First month free with coupon code FREE30