Setting up proxies in Scrapy

Setting up a proxy inside Scrapy is easy. There are two easy ways to use proxies with Scrapy – passing proxy info as a request parameter or implementing a custom proxy middleware.

Option 1: Via request parameters

Normally when you send a request in Scrapy you just pass the URL you are targeting and maybe a callback function. If you want to use a specific proxy for that URL you can pass it as a meta parameter, like this:

def start_requests(self):
    for url in self.start_urls:
        return Request(url=url, callback=self.parse,
                       headers={"User-Agent": "My UserAgent"},
                       meta={"proxy": "http://user:[email protected]:8080"})

The way it works is that inside Scrapy, there’s a middleware called HttpProxyMiddleware which takes the proxy meta parameter from the request object and sets it up correctly as the used proxy. The middleware is enabled by default so there is no need to set it up.

Option 2: Create custom middleware

Another way to utilize proxies while scraping is to actually create your own middleware. This way the solution is more modular and isolated. Essentially, what we need to do is the same thing as when passing the proxy as a meta parameter:

from w3lib.http import basic_auth_header 
class CustomProxyMiddleware(object):
    def process_request(self, request, spider):
        request.meta[“proxy”] = ""
        request.headers[“Proxy-Authorization”] = 
                          basic_auth_header(“<proxy_user>”, “<proxy_pass>”)

In the code above, we define the proxy URL and the necessary authentication info. Make sure that you also enable this middleware in the settings and put it before the HttpProxyMiddleware:

    'myproject.middlewares.CustomProxyMiddleware': 350, 
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400, 

