How to Retry Failed Python Requests Effectively with ProxyTee
The requests
library in Python simplifies making HTTP requests to web servers, often in just a few lines of code. However, web requests can sometimes fail, particularly during web scraping tasks. To prevent manual code reruns after a failure, enhancing your Python requests module usage is crucial. This article will guide you on setting up a Python requests retry mechanism using HTTPAdapter, coding your custom retry function, and using ProxyTee for failed requests.
Common Error Codes and Solutions
Error Code | Causes | Solutions |
---|---|---|
403 Forbidden | Access to the requested resource is denied, possibly due to insufficient permissions. |
|
429 Too Many Requests | You've sent too many requests in a short amount of time, often due to a website's rate-limiting policies. |
|
500 Internal Server Error | The server encountered an unexpected condition that prevented it from fulfilling the request. |
|
502 Bad Gateway | The server you’re trying to access got a bad response from another server it depends on. |
|
503 Service Unavailable | The server cannot handle the request, possibly due to being overloaded or undergoing maintenance. |
|
Understanding Retry Strategies
A retry strategy automates rerunning a request when an HTTP error occurs. Instead of retrying immediately, it's best to implement a delay, ideally using a backoff strategy, where the delay increases with each attempt. This method prevents overloading the server.
Exponential Backoff: This strategy is common in web scraping, increasing the delay between retries. A typical formula looks like this:
delay = base_delay * (backoff_factor ** current_retry_count)
Another way to calculate delay:
delay = backoff_factor * (2 ** retry_count)
Setting up Request Retries with HTTPAdapter
The requests
library with the HTTPAdapter
class and urllib3
with the Retry
class are good choices. Let's use HTTPAdapter
First, install the Requests library using the command:
pip install requests
Next, import necessary libraries:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
The HTTPAdapter
manages a connection pool, enhancing control and performance by reusing connections. The Retry
class manages how failed requests are retried.
Create a try-except
block and define your retry logic using the Retry()
class:
try:
retry = Retry(
total=5,
backoff_factor=2,
status_forcelist=[429, 500, 502, 503, 504],
)
except Exception as e:
print(e)
The total
parameter sets the maximum retries, and status_forcelist
defines the HTTP error codes that trigger retries.
Create an HTTPAdapter
instance, pass the retry
object, and then create a Session()
object, using the mount()
method for the https://
URL prefix. Afterward, perform a GET
request:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
try:
retry = Retry(
total=5,
backoff_factor=2,
status_forcelist=[429, 500, 502, 503, 504],
)
adapter = HTTPAdapter(max_retries=retry)
session = requests.Session()
session.mount('https://', adapter)
r = session.get('https://httpbin.org/status/502', timeout=180)
print(r.status_code)
except Exception as e:
print(e)
This code retries the request up to 5 times. If errors persist, it prints a “Max retries exceeded” message.
Implementing a Custom Request Retry Mechanism
You can create a custom retry logic tailored to specific needs using Python's random
and time
modules.
Here is an example of a delay function using the exponential backoff formula:delay = backoff_factor * (2 ** retry_count)
:
import random
import time
def delay(backoff_factor, max_retries, max_delay, jitter=False):
for retry in range(max_retries):
delay_time = backoff_factor * (2 ** retry)
if jitter:
delay_time *= random.uniform(1, 1.5)
effective_delay = min(delay_time, max_delay)
# time.sleep(effective_delay) #Uncomment to enable sleep
print(f"Attempt {retry + 1}: Delay for {effective_delay} seconds.")
delay(2, 5, 180, jitter=True)
Here, backoff_factor
determines the increase in delay, max_retries
is the number of retry attempts, max_delay
caps the delay, and jitter
adds randomness.
Use this in a custom get()
function that retries based on the delay function defined above:
import random, time, requests
def delay(backoff_factor, max_retries, max_delay, jitter=False):
delay_times = []
for retry in range(max_retries):
delay_time = backoff_factor * (2 ** retry)
if jitter:
delay_time *= random.uniform(1, 1.5)
effective_delay = min(delay_time, max_delay)
delay_times.append(effective_delay)
return delay_times
def get(URL, **kwargs):
success = False
for delay_time in backoff:
r = requests.get(URL, **kwargs)
status = r.status_code
if 200 <= status < 300:
print(f'Success! Status: {status}')
success = True
break
elif status in [429, 500, 502, 503, 504]:
print(f'Received status: {status}. Retrying in {delay_time} seconds.')
time.sleep(delay_time)
else:
print(f'Received status: {status}.')
break
if not success:
print("Maximum retries reached.")
backoff = delay(2, 5, 180, jitter=True)
get('https://httpbin.org/status/502', timeout=180)
This get()
function will attempt to get the provided URL with custom retries based on error codes.
Retrying Failed Requests With ProxyTee
Integrating ProxyTee is easy. ProxyTee provides rotating residential proxies with unlimited bandwidth, a global IP pool, and multiple protocol support, including HTTP and SOCKS5.
Proxies with HTTPAdapter
To use ProxyTee with HTTPAdapter, use your proxy credentials:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
USERNAME = 'YOUR_PROXY_USERNAME'
PASSWORD = 'YOUR_PROXY_PASSWORD'
proxies = {
'http': f'http://{USERNAME}:{PASSWORD}@gate.proxytee.com:7777',
'https': f'https://{USERNAME}:{PASSWORD}@gate.proxytee.com:7777'
}
try:
retry = Retry(
total=5,
backoff_factor=2,
status_forcelist=[403, 429, 500, 502, 503, 504],
)
adapter = HTTPAdapter(max_retries=retry)
session = requests.Session()
session.mount('https://', adapter)
r = session.get('https://ip.proxytee.com/', proxies=proxies, timeout=180)
print(r.status_code)
print(r.text)
except Exception as e:
print(e)
This code makes all requests through ProxyTee, with the ability to test successful integration with https://ip.proxytee.com/
.
Proxies with Custom Retry Logic
With custom retry code, you can also specify proxies, for example, if status code is 403 or 429:
import random, time, requests
USERNAME = 'YOUR_PROXY_USERNAME'
PASSWORD = 'YOUR_PROXY_PASSWORD'
proxies = {
'http': f'http://{USERNAME}:{PASSWORD}@gate.proxytee.com:7777',
'https': f'https://{USERNAME}:{PASSWORD}@gate.proxytee.com:7777'
}
def delay(backoff_factor, max_retries, max_delay, jitter=False):
delay_times = []
for retry in range(max_retries):
delay_time = backoff_factor * (2 ** retry)
if jitter:
delay_time *= random.uniform(1, 1.5)
effective_delay = min(delay_time, max_delay)
delay_times.append(effective_delay)
return delay_times
def get(URL, **kwargs):
success = False
enable_proxies = False
for delay_time in backoff:
if enable_proxies:
r = requests.get(URL, proxies=proxies, **kwargs)
else:
r = requests.get(URL, **kwargs)
status = r.status_code
if 200 <= status < 300:
print(f'Success! Status: {status}')
success = True
break
elif status in [500, 502, 503, 504]:
print(f'Received status: {status}. Retrying in {delay_time} seconds.')
time.sleep(delay_time)
elif status in [403, 429]:
print(f'Received status: {status}. Retrying in {delay_time} seconds with proxies.')
enable_proxies = True
time.sleep(delay_time)
else:
print(f'Received status: {status}.')
break
if not success:
print("Maximum retries reached.")
backoff = delay(2, 5, 180, jitter=True)
get('https://httpbin.org/status/429', timeout=180)
Here, the enable_proxies
flag controls the proxy usage and proxies will only be used for 403 or 429 status codes.
Essential Retry Strategy Best Practices
- Avoid fixed delays and implement a backoff strategy.
- Create a specific error code list for retry attempts with dedicated strategies for each, like using proxies from ProxyTee for 403 errors.
- Always honor a server’s
Retry-After
header. - Adjust the request rate if the server response time increases.
- Use ready-made libraries such as
HTTPAdapter
for easier integration.
Performance Considerations
- For high-performance needs, use asynchronous libraries like
asyncio
withaiohttp
but you will have to code your retry mechanism. - Keep the retry count reasonably low to prevent issues like latency or resource exhaustion.
- Set timeouts to prevent requests from hanging indefinitely.
- Consider keeping the
Keep-Alive
header to reduce overhead.
Conclusion
Implementing a retry mechanism enhances your Python requests, allowing you to control when and how requests are retried, while utilizing custom delays. Combining the ease of requests.HTTPAdapter
with urllib3.Retry
simplifies setup, although creating your own retry logic can provide more control. ProxyTee's Unlimited Residential Proxies offer a robust solution for handling retries, with their unlimited bandwidth, auto-rotation features and global IP pool making it a reliable service for web scraping or other data gathering tasks. With prices as low as 50% compared to its main competitors like Bright Data, Smart Proxy, Oxylabs or GeoSurf, ProxyTee's affordable pricing makes it a preferred choice for users looking for an easy, and budget-friendly proxy service. Whether you’re integrating via HTTPAdapter, or using a custom code with proxies from ProxyTee, following these guidelines makes request handling smoother.