Affordable Rotating Residential Proxies with Unlimited Bandwidth
  • Products
  • Features
  • Pricing
  • Solutions
  • Blog

Contact sales

Give us a call or fill in the form below and we will contact you. We endeavor to answer all inquiries within 24 hours on business days. Or drop us a message at support@proxytee.com.

Edit Content



    Sign In
    Tutorial

    Serverless Web Scraping with Scrapy and ProxyTee

    January 26, 2025 Mike
    a laptop and a cell phone sitting on a table

    In this guide, we’ll explore how to leverage Scrapy, a robust Python framework for web scraping, in a serverless environment powered by AWS Lambda. Additionally, we’ll show you how ProxyTee’s rotating residential proxies can elevate your scraping projects by ensuring reliability and flexibility. From setup to deployment, this comprehensive guide covers everything you need to get started.


    Enhancing Serverless Scraping with ProxyTee

    ProxyTee stands out as an excellent solution for web scraping, offering rotating residential proxies that are crucial for avoiding detection and bans from websites. Known for affordability and efficiency, ProxyTee provides services with unlimited bandwidth and a vast pool of IP addresses. For those focused on tasks like web scraping, streaming, or general data collection, ProxyTee’s easy-to-integrate platform is invaluable. ProxyTee has following key features:

    • Unlimited Bandwidth: No more worrying about data caps or overages, this lets you scrape extensively without additional costs.
    • Global IP Coverage: Gain access to over 20 million IP addresses from 100+ countries, great for targeting specific regions or conducting location-based scraping.
    • Multiple Protocol Support: Flexibility with both HTTP and SOCKS5, perfect for various applications.
    • User-Friendly Interface: The intuitive GUI ensures users get up and running with ease, minimal technical expertise needed.
    • Auto Rotation: IP addresses rotate automatically every 3-60 minutes, reducing the chances of detection.
    • API Integration: A straightforward API helps to automate proxy-related tasks, supporting all the features of the service.

    With these advantages, ProxyTee ensures a smooth and reliable scraping experience, making it an excellent choice for professionals and beginners alike.


    The Benefits of Serverless Web Scraping

    Serverless architectures, such as AWS Lambda, offer an innovative way to run web scraping tasks without the hassle of managing servers. While AWS Lambda may have higher runtime costs per hour, its cost-effectiveness lies in paying only for the active function time, making it ideal for intermittent scraping tasks.

    Advantages of Serverless:

    • Cost-Effective: Pay only for the time your function runs.
    • Scalability: Automatically handle increased workloads without manual intervention.
    • Ease of Management: No server maintenance required.

    Limitations:

    • Execution Time Limits: AWS Lambda has a maximum runtime of 15 minutes per execution.
    • Cold Start Latency: Idle functions might take longer to initialize.
    • Vendor Lock-In: Serverless applications are harder to migrate across cloud providers.

    Step-by-Step Guide to Serverless Web Scraping with Scrapy

    1️⃣ Prerequisites:

    • An AWS account with an Amazon S3 bucket.
    • Python installed on your local machine.

    2️⃣ Setting Up AWS S3

    • Navigate to the AWS Management Console.
    • Go to All Services > S3 > Create Bucket.
    • Name your bucket and use default settings.

    3️⃣ Creating the Scrapy Project

      • Create a new project directory:
    mkdir scrapy_aws
    
      • Move into the new directory and set up a Python virtual environment:
    cd scrapy_aws
    python3 -m venv venv
    
      • Activate the environment:
    source venv/bin/activate
    
      • Install Scrapy:
    pip install scrapy
    

    4️⃣ Writing Our Scrapy Spider:

    For this example, we are targeting books.toscrape.com as target site, which is great for educational purpose for web scraping. This page lists books as elements with a class named product_pod. Within this, titles and prices are embedded in a and p tags.

    Open a new Python file, such as aws_spider.py. Paste in the following:

    import scrapy
    
    class BookSpider(scrapy.Spider):
        name = "books"
        allowed_domains = ["books.toscrape.com"]
        start_urls = ["https://books.toscrape.com"]
    
        def parse(self, response):
            for card in response.css("article"):
                yield {
                    "title": card.css("h3 > a::text").get(),
                    "price": card.css("div > p::text").get(),
                }
            next_page = response.css("li.next > a::attr(href)").get()
    
            if next_page:
                yield scrapy.Request(response.urljoin(next_page))
    

    Test it with following command:

    python -m scrapy runspider aws_spider.py -o books.json
    

    This should create a books.json file containing the scraped book titles and prices.


    Running Scrapy in AWS Lambda

    1️⃣ Creating a Handler

    Now we write the function to run our Scrapy spider in a lambda function. First one for local testing. Let’s call it lambda_function_local.py. Paste the following:

    import subprocess
    
    def handler(event, context):
        # Output file path for local testing
        output_file = "books.json"
    
        # Run the Scrapy spider with the -o flag to save output to books.json
        subprocess.run(["python", "-m", "scrapy", "runspider", "aws_spider.py", "-o", output_file])
    
        # Return success message
        return {
            'statusCode': '200',
            'body': f"Scraping completed! Output saved to {output_file}",
        }
    
    # Add this block for local testing
    if __name__ == "__main__":
        # Simulate an AWS Lambda invocation event and context
        fake_event = {}
        fake_context = {}
    
        # Call the handler and print the result
        result = handler(fake_event, fake_context)
        print(result)
    

    Delete books.json, then test this with the command below. A new books.json should show up in the directory

    python lambda_function_local.py
    

    Next, prepare a handler to be deployed in AWS Lambda, let’s name it lambda_function.py. It also need code modification to write the results in AWS S3.

    import subprocess
    import boto3
    
    def handler(event, context):
        # Define the local and S3 output file paths
        local_output_file = "/tmp/books.json"  # Must be in /tmp for Lambda
        bucket_name = "aws-scrapy-bucket" #replace with your bucket
        s3_key = "scrapy-output/books.json"  # Path in S3 bucket
    
        # Run the Scrapy spider and save the output locally
        subprocess.run(["python3", "-m", "scrapy", "runspider", "aws_spider.py", "-o", local_output_file])
    
        # Upload the file to S3
        s3 = boto3.client("s3")
        s3.upload_file(local_output_file, bucket_name, s3_key)
    
        return {
            'statusCode': 200,
            'body': f"Scraping completed! Output uploaded to s3://{bucket_name}/{s3_key}"
        }
    

    Here, our data gets saved first into a temp file, then it’s uploaded to your AWS S3 bucket.

    2️⃣ Deploying To AWS Lambda

    Make a package directory to prepare deployment.

    mkdir package
    

    Copy python dependencies to the package directory.

    cp -r venv/lib/python3.*/site-packages/* package/
    

    Copy our project file, including handler lambda_function.py into package folder. Make sure you do copy lambda handler for AWS. Also copy the scrapy code aws_spider.py to this folder as well.

    cp lambda_function.py aws_spider.py package/
    

    Compress the package directory into a zip file.

    zip -r lambda_function.zip package/
    

    Next, in your AWS management console, go to the Lambda and choose “Create Function”. Set up Python as the runtime and choose architecture. You also need to give access to the function for the S3 bucket you created before. Choose “Upload From” option in Source section. Pick .zip file to upload the generated zip file and finally click the test button and check for results in your S3 Bucket.


    Troubleshooting

    • Scrapy Not Found: Ensure that the command in subprocess.run() includes python executable path and required modules to make sure it can find Scrapy.
    • General Dependency Issues: Use the correct Python versions both locally and in AWS to avoid errors.
    • Handler Issues: Function definition should match in Lambda config settings and code. For example, lambda_function.handler.
    • Can’t Write to S3: Lambda may lack the access to upload data. Grant appropriate permission to the role of function. For instance, policy like AmazonS3FullAccess can be attached to the function in the IAM settings.

    Now, you’ve successfully set up serverless web scraping in AWS with Scrapy and are ready for data collection. If you are seeking simpler, reliable methods of web scraping, explore ProxyTee for its excellent service and value, with features such as unlimited bandwidth of residential proxies and rotating as residential proxies that will benefit your project. Especially its Geo/country targeting features, makes it more useful comparing to the competitors with random or continent-like targeting. ProxyTee offers up to 50% cheaper price as well!

    With tools like ProxyTee, scraping process becomes easier and allows you to achieve your project’s goal without the need to handle low level settings. Explore our other products as Datacenter Proxies, or pricing plans to understand our offering.

    • Python
    • Scrapy
    • Web Scraping

    Post navigation

    Previous
    Next

    Table of Contents

    • Enhancing Serverless Scraping with ProxyTee
    • The Benefits of Serverless Web Scraping
    • Step-by-Step Guide to Serverless Web Scraping with Scrapy
    • Running Scrapy in AWS Lambda
    • Troubleshooting

    Categories

    • Comparison & Differences
    • Exploring
    • Integration
    • Tutorial

    Recent posts

    • Set Up ProxyTee Proxies in GeeLark for Smooth Online Tasks
      Set Up ProxyTee Proxies in GeeLark for Smooth Online Tasks
    • Web Scraping with Beautiful Soup
      Learn Web Scraping with Beautiful Soup
    • How to Set Up a Proxy in SwitchyOmega
      How to Set Up a Proxy in SwitchyOmega (Step-by-Step Guide)
    • DuoPlus Cloud Mobile Feature Overview: Empowering Unlimited Opportunities Abroad
      DuoPlus Cloud Mobile Feature Overview: Empowering Unlimited Opportunities Abroad!
    • Best Rotating Proxies in 2025
      Best Rotating Proxies in 2025

    Related Posts

    Web Scraping with Beautiful Soup
    Tutorial

    Learn Web Scraping with Beautiful Soup

    May 30, 2025 Mike

    Learn Web Scraping with Beautiful Soup and unlock the power of automated data collection from websites. Whether you’re a developer, digital marketer, data analyst, or simply curious, web scraping provides efficient ways to gather information from the internet. In this guide, we explore how Beautiful Soup can help you parse HTML and XML data, and […]

    Best Rotating Proxies in 2025
    Comparison & Differences

    Best Rotating Proxies in 2025

    May 19, 2025 Mike

    Best Rotating Proxies in 2025 are essential tools for developers, marketers, and SEO professionals seeking efficient and reliable data collection. With the increasing complexity of web scraping and data gathering, choosing the right proxy service can significantly impact your operations. This article explores the leading rotating proxy providers in 2025, highlighting their unique features and […]

    How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide
    Tutorial

    How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide

    May 19, 2025 Mike

    Scrape websites with Puppeteer efficiently using modern techniques that are perfect for developers, SEO professionals, and data analysts. Puppeteer, a Node.js library developed by Google, has become one of the go-to solutions for browser automation and web scraping in recent years. Whether you are scraping data for competitive analysis, price monitoring, or SEO audits, learning […]

    We help ambitious businesses achieve more

    Free consultation
    Contact sales
    • Sign In
    • Sign Up
    • Contact
    • Facebook
    • Twitter
    • Telegram
    Affordable Rotating Residential Proxies with Unlimited Bandwidth

    Get reliable, affordable rotating proxies with unlimited bandwidth for seamless browsing and enhanced security.

    Products
    • Features
    • Pricing
    • Solutions
    • Testimonials
    • FAQs
    • Partners
    Tools
    • App
    • API
    • Blog
    • Check Proxies
    • Free Proxies
    Legal
    • Privacy Policy
    • Terms of Use
    • Affiliate
    • Reseller
    • White-label
    Support
    • Contact
    • Support Center
    • Knowlegde Base

    Copyright © 2025 ProxyTee