How to Retry Failed Python Requests Effectively with ProxyTee

How to Retry Failed Python Requests Effectively with ProxyTee

The requests library in Python simplifies making HTTP requests to web servers, often in just a few lines of code. However, web requests can sometimes fail, particularly during web scraping tasks. To prevent manual code reruns after a failure, enhancing your Python requests module usage is crucial. This article will guide you on setting up a Python requests retry mechanism using HTTPAdapter, coding your custom retry function, and using ProxyTee for failed requests.

Common Error Codes and Solutions

Error Code Causes Solutions
403 Forbidden Access to the requested resource is denied, possibly due to insufficient permissions.
429 Too Many Requests You've sent too many requests in a short amount of time, often due to a website's rate-limiting policies.
  • Send fewer requests.
  • Use proxies.
  • Implement delays according to the Retry-After header.
500 Internal Server Error The server encountered an unexpected condition that prevented it from fulfilling the request.
  • Ensure your requests aren’t malformed.
  • Retry the request.
  • Contact the website’s support.
502 Bad Gateway The server you’re trying to access got a bad response from another server it depends on.
  • Monitor the target server's status.
  • Ensure your requests aren’t malformed.
  • Increase your request timeout.
  • Send fewer requests.
  • Retry your requests.
503 Service Unavailable The server cannot handle the request, possibly due to being overloaded or undergoing maintenance.
  • Send fewer requests.
  • Check the Retry-After header.
  • Retry your requests.
  • Cache successful responses.
  • Monitor the target server's status.

Understanding Retry Strategies

A retry strategy automates rerunning a request when an HTTP error occurs. Instead of retrying immediately, it's best to implement a delay, ideally using a backoff strategy, where the delay increases with each attempt. This method prevents overloading the server.

Exponential Backoff: This strategy is common in web scraping, increasing the delay between retries. A typical formula looks like this:

delay = base_delay * (backoff_factor ** current_retry_count)

Another way to calculate delay:

delay = backoff_factor * (2 ** retry_count)

Setting up Request Retries with HTTPAdapter

The requests library with the HTTPAdapter class and urllib3 with the Retry class are good choices. Let's use HTTPAdapter

First, install the Requests library using the command:

pip install requests

Next, import necessary libraries:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry

The HTTPAdapter manages a connection pool, enhancing control and performance by reusing connections. The Retry class manages how failed requests are retried.

Create a try-except block and define your retry logic using the Retry() class:

try:
    retry = Retry(
        total=5,
        backoff_factor=2,
        status_forcelist=[429, 500, 502, 503, 504],
    )
except Exception as e:
    print(e)

The total parameter sets the maximum retries, and status_forcelist defines the HTTP error codes that trigger retries.

Create an HTTPAdapter instance, pass the retry object, and then create a Session() object, using the mount() method for the https:// URL prefix. Afterward, perform a GET request:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry

try:
    retry = Retry(
        total=5,
        backoff_factor=2,
        status_forcelist=[429, 500, 502, 503, 504],
    )
    adapter = HTTPAdapter(max_retries=retry)
    session = requests.Session()
    session.mount('https://', adapter)
    r = session.get('https://httpbin.org/status/502', timeout=180)
    print(r.status_code)
except Exception as e:
    print(e)

This code retries the request up to 5 times. If errors persist, it prints a “Max retries exceeded” message.

Implementing a Custom Request Retry Mechanism

You can create a custom retry logic tailored to specific needs using Python's random and time modules.

Here is an example of a delay function using the exponential backoff formula:delay = backoff_factor * (2 ** retry_count):

import random
import time

def delay(backoff_factor, max_retries, max_delay, jitter=False):
    for retry in range(max_retries):
        delay_time = backoff_factor * (2 ** retry)
        if jitter:
            delay_time *= random.uniform(1, 1.5)
        effective_delay = min(delay_time, max_delay)
        # time.sleep(effective_delay) #Uncomment to enable sleep
        print(f"Attempt {retry + 1}: Delay for {effective_delay} seconds.")

delay(2, 5, 180, jitter=True)

Here, backoff_factor determines the increase in delay, max_retries is the number of retry attempts, max_delay caps the delay, and jitter adds randomness.

Use this in a custom get() function that retries based on the delay function defined above:

import random, time, requests

def delay(backoff_factor, max_retries, max_delay, jitter=False):
    delay_times = []
    for retry in range(max_retries):
        delay_time = backoff_factor * (2 ** retry)
        if jitter:
            delay_time *= random.uniform(1, 1.5)
        effective_delay = min(delay_time, max_delay)
        delay_times.append(effective_delay)
    return delay_times

def get(URL, **kwargs):
    success = False
    for delay_time in backoff:
        r = requests.get(URL, **kwargs)
        status = r.status_code
        if 200 <= status < 300:
            print(f'Success! Status: {status}')
            success = True
            break
        elif status in [429, 500, 502, 503, 504]:
            print(f'Received status: {status}. Retrying in {delay_time} seconds.')
            time.sleep(delay_time)
        else:
            print(f'Received status: {status}.')
            break
    if not success:
        print("Maximum retries reached.")

backoff = delay(2, 5, 180, jitter=True)
get('https://httpbin.org/status/502', timeout=180)

This get() function will attempt to get the provided URL with custom retries based on error codes.

Retrying Failed Requests With ProxyTee

Integrating ProxyTee is easy. ProxyTee provides rotating residential proxies with unlimited bandwidth, a global IP pool, and multiple protocol support, including HTTP and SOCKS5.

Proxies with HTTPAdapter

To use ProxyTee with HTTPAdapter, use your proxy credentials:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry

USERNAME = 'YOUR_PROXY_USERNAME'
PASSWORD = 'YOUR_PROXY_PASSWORD'
proxies = {
    'http': f'http://{USERNAME}:{PASSWORD}@gate.proxytee.com:7777',
    'https': f'https://{USERNAME}:{PASSWORD}@gate.proxytee.com:7777'
}

try:
    retry = Retry(
        total=5,
        backoff_factor=2,
        status_forcelist=[403, 429, 500, 502, 503, 504],
    )
    adapter = HTTPAdapter(max_retries=retry)
    session = requests.Session()
    session.mount('https://', adapter)
    r = session.get('https://ip.proxytee.com/', proxies=proxies, timeout=180)
    print(r.status_code)
    print(r.text)
except Exception as e:
    print(e)

This code makes all requests through ProxyTee, with the ability to test successful integration with https://ip.proxytee.com/.

Proxies with Custom Retry Logic

With custom retry code, you can also specify proxies, for example, if status code is 403 or 429:

import random, time, requests

USERNAME = 'YOUR_PROXY_USERNAME'
PASSWORD = 'YOUR_PROXY_PASSWORD'
proxies = {
    'http': f'http://{USERNAME}:{PASSWORD}@gate.proxytee.com:7777',
    'https': f'https://{USERNAME}:{PASSWORD}@gate.proxytee.com:7777'
}

def delay(backoff_factor, max_retries, max_delay, jitter=False):
    delay_times = []
    for retry in range(max_retries):
        delay_time = backoff_factor * (2 ** retry)
        if jitter:
            delay_time *= random.uniform(1, 1.5)
        effective_delay = min(delay_time, max_delay)
        delay_times.append(effective_delay)
    return delay_times

def get(URL, **kwargs):
    success = False
    enable_proxies = False
    for delay_time in backoff:
        if enable_proxies:
            r = requests.get(URL, proxies=proxies, **kwargs)
        else:
            r = requests.get(URL, **kwargs)
        status = r.status_code
        if 200 <= status < 300:
            print(f'Success! Status: {status}')
            success = True
            break
        elif status in [500, 502, 503, 504]:
            print(f'Received status: {status}. Retrying in {delay_time} seconds.')
            time.sleep(delay_time)
        elif status in [403, 429]:
            print(f'Received status: {status}. Retrying in {delay_time} seconds with proxies.')
            enable_proxies = True
            time.sleep(delay_time)
        else:
            print(f'Received status: {status}.')
            break
    if not success:
        print("Maximum retries reached.")

backoff = delay(2, 5, 180, jitter=True)
get('https://httpbin.org/status/429', timeout=180)

Here, the enable_proxies flag controls the proxy usage and proxies will only be used for 403 or 429 status codes.

Essential Retry Strategy Best Practices

  • Avoid fixed delays and implement a backoff strategy.
  • Create a specific error code list for retry attempts with dedicated strategies for each, like using proxies from ProxyTee for 403 errors.
  • Always honor a server’s Retry-After header.
  • Adjust the request rate if the server response time increases.
  • Use ready-made libraries such as HTTPAdapter for easier integration.

Performance Considerations

  • For high-performance needs, use asynchronous libraries like asyncio with aiohttp but you will have to code your retry mechanism.
  • Keep the retry count reasonably low to prevent issues like latency or resource exhaustion.
  • Set timeouts to prevent requests from hanging indefinitely.
  • Consider keeping the Keep-Alive header to reduce overhead.

Conclusion

Implementing a retry mechanism enhances your Python requests, allowing you to control when and how requests are retried, while utilizing custom delays. Combining the ease of requests.HTTPAdapter with urllib3.Retry simplifies setup, although creating your own retry logic can provide more control. ProxyTee's Unlimited Residential Proxies offer a robust solution for handling retries, with their unlimited bandwidth, auto-rotation features and global IP pool making it a reliable service for web scraping or other data gathering tasks. With prices as low as 50% compared to its main competitors like Bright Data, Smart Proxy, Oxylabs or GeoSurf, ProxyTee's affordable pricing makes it a preferred choice for users looking for an easy, and budget-friendly proxy service. Whether you’re integrating via HTTPAdapter, or using a custom code with proxies from ProxyTee, following these guidelines makes request handling smoother.