{"id":11057,"date":"2023-01-12T20:42:29","date_gmt":"2023-01-12T20:42:29","guid":{"rendered":"https:\/\/cheesecakelabs.com\/blog\/"},"modified":"2026-06-24T00:09:47","modified_gmt":"2026-06-24T00:09:47","slug":"selenium-scraper-aws-lambda","status":"publish","type":"post","link":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/","title":{"rendered":"How To Use Selenium To Web-Scrape on AWS Lambda"},"content":{"rendered":"\n<p>Web scraping might save several hours when compared to manually collecting data from websites, and AWS Lambda is a good way to set up scripts to run by demand.<\/p>\n\n\n\n<p>In this blog post, we\u2019ll cover everything you need to know,\u00a0 the good and bad things about Web Scraping and AWS Lambda, and also analyze the code of an example project where we can combine these two technologies that fit very well together.<\/p>\n\n\n\n<p>If you don&#8217;t have a good way to <strong>retrieve information from a website<\/strong>, data scraping is the way to go. Besides, it&#8217;s awesome to see a browser opening, clicking on buttons, and filling out forms by itself.<\/p>\n\n\n\n<p>Selenium was the default for 15 years. Playwright launched in 2020 and has been steadily taking over. Now, the question is not &#8220;should I learn Selenium?&#8221; \u2014 it is &#8220;when does Selenium still make sense?&#8221;<\/p>\n\n\n\n<p>Let&#8217;s talk about these technologies and go through a Python example that solves this mystery. Or, if you are not into reading definitions and just want to go straight to the point, skip to &#8220;The codebase&#8221; section of this blog post.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Data-Scraping\">What is Data Scraping  and when is it justified?<\/h2>\n\n\n\n<p>Data scraping is the process of retrieving information generated by another program. Web scraping is the specific case where that information lives on a website.<\/p>\n\n\n\n<p>It should be treated as a last resort. APIs exist precisely to give you structured, reliable access to the data you need. <a href=\"https:\/\/cheesecakelabs.com\/blog\/api-design-think-first-code-later\/\" target=\"_blank\" rel=\"noreferrer noopener\">Consuming an API<\/a> is faster, more stable, and carries none of the legal or technical fragility that scraping introduces. If an API is available, use it.<\/p>\n\n\n\n<p>When scraping is justified, no API exists, data is publicly accessible, and the site&#8217;s structure is stable. The method you choose depends on what you are scraping.<\/p>\n\n\n\n<p><strong>When you do not need a browser:<\/strong> If the data you need is in the initial HTML response, a lightweight HTTP approach is unbeatable. No browser overhead, no JavaScript execution, no ChromeDriver to manage.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-keyword\">from<\/span> bs4 <span class=\"hljs-keyword\">import<\/span> BeautifulSoup\n<span class=\"hljs-keyword\">from<\/span> urllib.request <span class=\"hljs-keyword\">import<\/span> urlopen\n\n<span class=\"hljs-keyword\">with<\/span> urlopen(<span class=\"hljs-string\">'https:\/\/en.wikipedia.org\/wiki\/Main_Page'<\/span>) <span class=\"hljs-keyword\">as<\/span> response:\n    soup = BeautifulSoup(response, <span class=\"hljs-string\">'html.parser'<\/span>)\n    <span class=\"hljs-keyword\">for<\/span> anchor <span class=\"hljs-keyword\">in<\/span> soup.find_all(<span class=\"hljs-string\">'a'<\/span>):\n        print(anchor.get(<span class=\"hljs-string\">'href'<\/span>, <span class=\"hljs-string\">'\/'<\/span>))<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>This approach \u2014 <code><strong>requests<\/strong><\/code> or <code><strong>urllib<\/strong><\/code> plus<strong> <code>BeautifulSoup<\/code><\/strong> for HTML parsing \u2014 covers the majority of simple scraping use cases and requires no cloud infrastructure beyond a basic function.<\/p>\n\n\n\n<p><strong>When you need a browser:<\/strong> JavaScript-rendered pages, single-page applications, dynamic content loaded after page load, login flows, form interactions \u2014 these require browser automation. This is where Selenium and Playwright enter the picture, and where Lambda becomes the right deployment target.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"List-of-some-web-scraping-techniques:\"><strong>Web scraping techniques: A quick reference<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Copy and paste<\/strong>: Manual. Every developer has done this. Not worth automating for one-off tasks.<\/li>\n\n\n\n<li><strong>Text pattern matching<\/strong>: Using tools like <code>curl<\/code> and <code>grep<\/code> to extract patterns from raw HTML. Fast for simple, stable targets with predictable markup.<\/li>\n\n\n\n<li><strong>HTML parsing<\/strong>: Parsing the response into a navigable element tree \u2014 <code>BeautifulSoup<\/code> is the standard Python library. No browser required. Suited for static content.<\/li>\n\n\n\n<li><strong>DOM parsing:<\/strong> A browser controlled by automation software interacts with the page as a real user would \u2014 opening pages, clicking buttons, filling forms, waiting for JavaScript to execute, then extracting data from the rendered DOM. This is Selenium and Playwright&#8217;s domain. It is also the most resource-intensive approach and requires the most infrastructure to run reliably in the cloud.<\/li>\n<\/ul>\n\n\n\n<p>One famous package used for this is <a href=\"https:\/\/www.selenium.dev\/\" target=\"_blank\" rel=\"noreferrer noopener\">Selenium<\/a>. This technique requires a driver to communicate with the browser installed (e.g. <a href=\"https:\/\/chromedriver.chromium.org\/downloads\" target=\"_blank\" rel=\"noreferrer noopener\">ChromeDriver<\/a>, <a href=\"https:\/\/github.com\/mozilla\/geckodriver\/releases\" target=\"_blank\" rel=\"noreferrer noopener\">GeckoDriver<\/a>).<\/p>\n\n\n\n<p>This driver provides an API so the Selenium package can manage the browser. The driver version needs to be compatible with the installed browser version<\/p>\n\n\n\n<p>Here&#8217;s an example of a Web Scraper that opens a browser gets the header element of a page and prints its header content:<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\"><pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> selenium <span class=\"hljs-keyword\">import<\/span> webdriver\n<span class=\"hljs-keyword\">from<\/span> selenium.webdriver.common.by <span class=\"hljs-keyword\">import<\/span> By\n\ndriver = webdriver.Chrome() <span class=\"hljs-comment\"># Open browser<\/span>\ndriver.get(<span class=\"hljs-string\">\"http:\/\/example.com\"<\/span>) <span class=\"hljs-comment\"># Access page<\/span>\nheader_element = driver.find_element(By.CSS_SELECTOR, <span class=\"hljs-string\">'h1'<\/span>) <span class=\"hljs-comment\"># Find H1 element<\/span>\nprint(header_element.text) <span class=\"hljs-comment\"># Print found element text content<\/span>\ndriver.quit() <span class=\"hljs-comment\"># Close browser<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre><\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"How-to-use-the-Web-Scraper-script?\"><strong>Why AWS Lambda for web scraping?<\/strong><\/h2>\n\n\n\n<p>In this blog post, we are going to set the web scraper script on the <strong>cloud<\/strong>. Cloud computing is making services like hosting and storage, available over the internet.<\/p>\n\n\n\n<p>Some famous examples of cloud providers are <a href=\"https:\/\/aws.amazon.com\/?nc2=h_lg\" target=\"_blank\" rel=\"noreferrer noopener\">Amazon AWS<\/a>, <a href=\"https:\/\/cloud.google.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Google Cloud<\/a>, and <a href=\"https:\/\/azure.microsoft.com\/en-us\/\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Azure<\/a>. There are several advantages of using a cloud solution, for example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You don&#8217;t need to set up your server<\/li>\n\n\n\n<li>There are a lot of engineers working to make sure these services don&#8217;t have any security breaches<\/li>\n\n\n\n<li>It&#8217;s usually very easy to escalate the capabilities of the cloud service you are using. In many of these services, you can increase the memory and processing capacity just by clicking some buttons<\/li>\n<\/ul>\n\n\n\n<p>Some of these <a href=\"https:\/\/cheesecakelabs.com\/blog\/cloud-services-best-fit-for-your-project\/\" target=\"_blank\" rel=\"noreferrer noopener\">cloud services<\/a> are <strong>serverless<\/strong>, which means these services are executed by demand and the cloud provider takes care of the server infrastructure on behalf of the customer.&nbsp;<\/p>\n\n\n\n<p>Some of these cloud services require the usage of <strong>containers<\/strong>, which are executions of virtualized operating systems based on images (templates). <a href=\"https:\/\/www.docker.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Docker<\/a> is the main platform for working with containers.<\/p>\n\n\n\n<p>Although Cloud computing is great, it is not needed for every project. Running the script on your machine might be enough.&nbsp;<\/p>\n\n\n\n<p>Each one of the providers mentioned has several ways to host a web scraper script. <a href=\"https:\/\/aws.amazon.com\/blogs\/architecture\/serverless-architecture-for-a-web-scraping-solution\/\" target=\"_blank\" rel=\"noreferrer noopener\">This AWS blog post<\/a> describes 3 options.<\/p>\n\n\n\n<p><strong>To summarize the blog post, the options are:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Create a virtual machine using the EC2 Service. This is the most primitive option, you&#8217;d need to set up the machine just like a regular one and it would be kept on 24 hours a day. This is also the most expensive solution out of the 3 options.<\/li>\n\n\n\n<li>Containerize the script and use it on the AWS Fargate service. Fargate is a serverless option, which is useful because the web scraper only needs to be executed by demand. Fargate is also cheaper than EC2.<\/li>\n\n\n\n<li>Use AWS Lambda, which is also a serverless service that supports both raw code and containerized scripts. It has more limitations compared to Fargate, but it is enough in most cases. It&#8217;s the cheapest service and your script might even fit the free tier.<\/li>\n<\/ol>\n\n\n\n<p>Here, we will use AWS Lambda. Its main limitations are the timeout limit, which is 15 minutes and the deployment package can&#8217;t exceed 250 MB (but it accepts up to 10 GB using containers).<\/p>\n\n\n\n<p>The example script is very simple and takes less than a minute to execute, but as we previously mentioned, Selenium requires a browser, and the Chrome binary size is around 500 MB, which forces us to use the container approach.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Read more: <\/strong><a href=\"https:\/\/cheesecakelabs.com\/blog\/aws-finops-best-practices\/\" type=\"post\" id=\"13020\" target=\"_blank\" rel=\"noreferrer noopener\">AWS FinOps Best Practices: How to Cut and Optimize Cloud Costs<\/a><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"How-are-we-going-to-deploy-it-to-AWS-Lambda?\"><strong>How are we going to deploy it to AWS Lambda?<\/strong><\/h2>\n\n\n\n<p>There are several ways to set up an AWS Lambda function. One easy way is by using the <a href=\"https:\/\/serverless.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Serverless Framework<\/a>. The Serverless framework helps us to develop and deploy Lambda Functions by using a single YAML file to declare the lambda functions, their infrastructure, and the events that will trigger them.<\/p>\n\n\n\n<p>Using the Serverless Framework also allows us to deploy the lambda functions with a single command, simplifying the process a lot.<\/p>\n\n\n\n<p>The Serverless Framework also provides an optional dashboard which gives us an interface to check the function&#8217;s health, trigger events manually, and check their logs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"The-codebase:\"><strong>The codebase<\/strong><\/h2>\n\n\n\n<p>In this section, we will analyze some important files of a demo project which can be checked <a href=\"https:\/\/github.com\/CheesecakeLabs\/selenium-serverless-example\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. <a href=\"https:\/\/github.com\/CheesecakeLabs\/selenium-serverless-example\/blob\/main\/serverless.yml\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>serverless.yml<\/strong><\/a><\/p>\n\n\n\n<p>This is the file where we set the lambda application infrastructure, the lambda functions, and the events that are going to trigger them.<\/p>\n\n\n\n<p>In the <strong>provider<\/strong> section of this file, we declare that we are going to use a docker image named <em>img<\/em>. The <strong>functions<\/strong> section is where we set the lambda functions and their specific configuration like environment variables and handler functions.<\/p>\n\n\n\n<p>Notice that here we add environment variables that will have the browser and its driver path, we state that we will use the image that was previously set, that the command that will be executed when the lambda function is triggered is the <em>example.py <\/em>file handler, and that the event that will trigger it is a <em>cronjob<\/em> that is scheduled for every 6 hours.<\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/CheesecakeLabs\/selenium-serverless-example\/blob\/main\/Dockerfile\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Dockerfile<\/strong><\/a><\/p>\n\n\n\n<p>The Dockerfile is the template where we configure our container image. This file creates a Linux instance capable of running the web scraper. The template installs the project requirements (including Chrome and Chrome driver) and copies the required files to the image.<\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/CheesecakeLabs\/selenium-serverless-example\/tree\/main\/src\/handlers\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Example.py<\/strong><\/a><\/p>\n\n\n\n<p>This file has a function that will be executed when the event is triggered. This file is pretty similar to the Selenium example we showed earlier.<\/p>\n\n\n\n<p>The main difference is that we customize the browser to not display an interface (because the lambda does not have a display) and to use a single process (because the lambda only has 1 CPU).<\/p>\n\n\n\n<p>In this case, the handler function returns a dictionary with a status code and a body just in case we want to change the event from a cronjob to an HTTP request.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Deploying\"><strong>Deploying<\/strong><\/h3>\n\n\n\n<p>To deploy the lambda function we just need to run a single command on the terminal.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"275\" src=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapper-aws-lambda-1200x275.jpg\" alt=\"\" class=\"wp-image-11060\" srcset=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapper-aws-lambda-1200x275.jpg 1200w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapper-aws-lambda-600x138.jpg 600w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapper-aws-lambda-768x176.jpg 768w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapper-aws-lambda-1536x352.jpg 1536w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapper-aws-lambda-760x174.jpg 760w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapper-aws-lambda.jpg 1858w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/figure>\n\n\n\n<p>In this case, Serverless Frameworks raises a warning message that explains that the dashboard does not support functions that use container images. It means that we will not be able to check the lambda function logs, nor trigger the lambda function manually through the dashboard.<\/p>\n\n\n\n<p>But we can still do it through the AWS console. Below we have a screenshot of a successful log retrieved from AWS Cloudwatch:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"379\" src=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapping-aws-1200x379.jpg\" alt=\"\" class=\"wp-image-11062\" srcset=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapping-aws-1200x379.jpg 1200w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapping-aws-600x190.jpg 600w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapping-aws-768x243.jpg 768w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapping-aws-1536x486.jpg 1536w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapping-aws-760x240.jpg 760w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/web-scrapping-aws.jpg 1999w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Selenium on Lambda: Full Setup Updated for 2026<\/h2>\n\n\n\n<p>The architecture is unchanged from the original article. The updates are in the Python runtime (3.12, Lambda&#8217;s latest supported version), the Chrome headless flag syntax (changed in Chrome 112), and Serverless Framework v4.<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs\">scraper\/\n\u251c\u2500\u2500 Dockerfile\n\u251c\u2500\u2500 handler.py\n\u251c\u2500\u2500 requirements.txt\n\u2514\u2500\u2500 serverless.yml<\/code><\/span><\/pre>\n\n\n<h4 class=\"wp-block-heading\">Dockerfile:<\/h4>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\">FROM <span class=\"hljs-keyword\">public<\/span>.ecr.aws\/lambda\/python:<span class=\"hljs-number\">3.12<\/span>\n\n<span class=\"hljs-comment\"># Install Chromium and ChromeDriver<\/span>\nRUN dnf install -y chromium chromedriver\n\n<span class=\"hljs-comment\"># Install Python dependencies<\/span>\nCOPY requirements.txt .\nRUN pip install -r requirements.txt --target <span class=\"hljs-string\">\"${LAMBDA_TASK_ROOT}\"<\/span>\n\n<span class=\"hljs-comment\"># Copy handler<\/span>\nCOPY handler.py ${LAMBDA_TASK_ROOT}\n\nCMD &#91;<span class=\"hljs-string\">\"handler.handler\"<\/span>]<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p><strong>requirements.txt:<\/strong><\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs\">selenium==4.18.1<\/code><\/span><\/pre>\n\n\n<p><strong>handler.py:<\/strong><\/p>\n\n\n\n<p>python<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\">from selenium import webdriver\nfrom selenium.webdriver.chrome.options import Options\nfrom selenium.webdriver.common.by import By\n\ndef handler(event, context):\n    options = Options()\n    options.add_argument(<span class=\"hljs-string\">\"--headless=new\"<\/span>)   <span class=\"hljs-comment\"># Updated: Chrome 112+ syntax<\/span>\n    options.add_argument(<span class=\"hljs-string\">\"--no-sandbox\"<\/span>)\n    options.add_argument(<span class=\"hljs-string\">\"--disable-dev-shm-usage\"<\/span>)\n    options.add_argument(<span class=\"hljs-string\">\"--single-process\"<\/span>)\n    options.add_argument(<span class=\"hljs-string\">\"--disable-gpu\"<\/span>)\n    options.add_argument(<span class=\"hljs-string\">\"--window-size=1920,1080\"<\/span>)\n\n    driver = webdriver.Chrome(options=options)\n\n    <span class=\"hljs-keyword\">try<\/span>:\n        driver.get(event.get(<span class=\"hljs-string\">\"url\"<\/span>, <span class=\"hljs-string\">\"https:\/\/example.com\"<\/span>))\n        title = driver.title\n        header = driver.find_element(By.CSS_SELECTOR, <span class=\"hljs-string\">'h1'<\/span>).text\n        <span class=\"hljs-keyword\">return<\/span> {<span class=\"hljs-string\">\"title\"<\/span>: title, <span class=\"hljs-string\">\"header\"<\/span>: header}\n    <span class=\"hljs-keyword\">finally<\/span>:\n        driver.quit()  <span class=\"hljs-comment\"># Always close the browser<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Key updates from the original:<\/h3>\n\n\n\n<p>The <code><strong>--headless<\/strong><\/code> flag was deprecated in Chrome 112. The new flag is <code><strong>--headless=new<\/strong><\/code>. Using the old flag produces compatibility warnings and may cause rendering issues on modern Chrome versions. Always use <strong><code>--headless=new<\/code> <\/strong>in 2026.<\/p>\n\n\n\n<p>The <code><strong>try\/finally<\/strong><\/code> block around <code><strong>driver.quit()<\/strong><\/code> is also important \u2014 without it, a scraping error will leave a Chrome process running inside the Lambda container, consuming memory for the remainder of the invocation.<\/p>\n\n\n\n<p><strong>serverless.yml (Serverless Framework v4):<\/strong><\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\">service: selenium-scraper\n\nprovider:\n  name: aws\n  region: us-east<span class=\"hljs-number\">-1<\/span>\n  timeout: <span class=\"hljs-number\">900<\/span>  <span class=\"hljs-comment\"># 15 minutes maximum<\/span>\n\nfunctions:\n  scraper:\n    image:\n      uri: &lt;your-ecr-image-uri&gt;\n    memorySize: <span class=\"hljs-number\">2048<\/span>  <span class=\"hljs-comment\"># Chrome needs memory \u2014 2GB is a safe starting point<\/span>\n    events:\n      - schedule: rate(<span class=\"hljs-number\">6<\/span> hours)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p><strong>Build, push, and deploy:<\/strong><\/p>\n\n\n\n<p>bash<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"HTML, XML\" data-shcb-language-slug=\"xml\"><span><code class=\"hljs language-xml\"># Build and push the container to ECR\naws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">account-id<\/span>&gt;<\/span>.dkr.ecr.us-east-1.amazonaws.com\ndocker build -t selenium-scraper .\ndocker tag selenium-scraper:latest <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">account-id<\/span>&gt;<\/span>.dkr.ecr.us-east-1.amazonaws.com\/selenium-scraper:latest\ndocker push <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">account-id<\/span>&gt;<\/span>.dkr.ecr.us-east-1.amazonaws.com\/selenium-scraper:latest\n\n# Deploy with Serverless Framework\nserverless deploy<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">HTML, XML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">xml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p><strong>Note: c<\/strong>ontainer-based Lambda functions cannot be managed through the Serverless dashboard UI, but logs are accessible via AWS CloudWatch. Monitor your scraper&#8217;s execution time and memory usage there \u2014 Chrome is memory-hungry, and Lambda will terminate functions that exceed the configured memory ceiling.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When to Use Playwright Instead<\/h3>\n\n\n\n<p>If you are starting a new scraping project in 2026, Playwright is the stronger default. The practical advantages are real:<\/p>\n\n\n\n<p><strong>Auto-waiting eliminates an entire class of bugs.<\/strong> The most common cause of flaky scrapers is timing \u2014 an element is not yet visible, a network request has not completed, a JavaScript animation is still running. Selenium requires explicit <code>WebDriverWait<\/code> calls for every interaction. Miss one, and the scraper breaks intermittently, making debugging painful. Playwright waits automatically for elements to be actionable before interacting with them. This alone removes the majority of timing-related scraper failures.<\/p>\n\n\n\n<p><strong>Async support enables dramatically higher throughput.<\/strong> Playwright&#8217;s async API makes concurrent scraping straightforward. Scraping 20 URLs concurrently is a handful of lines with <code>asyncio<\/code>. In Selenium, the same task requires thread management and is significantly more complex to debug.<\/p>\n\n\n\n<p><strong>Setup is simpler.<\/strong> No ChromeDriver version matching. No compatibility matrix between Chrome and ChromeDriver versions. <code>playwright install<\/code> downloads the correct browser binaries automatically and keeps them in sync.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Playwright on Lambda:<\/h3>\n\n\n\n<p>python<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-keyword\">import<\/span> asyncio\n<span class=\"hljs-keyword\">from<\/span> playwright.async_api <span class=\"hljs-keyword\">import<\/span> async_playwright\n\n<span class=\"hljs-keyword\">async<\/span> def scrape(url: str) -&gt; dict:\n    <span class=\"hljs-keyword\">async<\/span> <span class=\"hljs-keyword\">with<\/span> async_playwright() <span class=\"hljs-keyword\">as<\/span> p:\n        browser = <span class=\"hljs-keyword\">await<\/span> p.chromium.launch(\n            headless=True,\n            args=&#91;<span class=\"hljs-string\">\"--no-sandbox\"<\/span>, <span class=\"hljs-string\">\"--disable-dev-shm-usage\"<\/span>]\n        )\n        page = <span class=\"hljs-keyword\">await<\/span> browser.new_page()\n\n        <span class=\"hljs-attr\">try<\/span>:\n            <span class=\"hljs-keyword\">await<\/span> page.goto(url, wait_until=<span class=\"hljs-string\">\"networkidle\"<\/span>)\n            title = <span class=\"hljs-keyword\">await<\/span> page.title()\n            header = <span class=\"hljs-keyword\">await<\/span> page.locator(<span class=\"hljs-string\">\"h1\"<\/span>).text_content()\n            <span class=\"hljs-keyword\">return<\/span> {<span class=\"hljs-string\">\"title\"<\/span>: title, <span class=\"hljs-string\">\"header\"<\/span>: header}\n        <span class=\"hljs-attr\">finally<\/span>:\n            <span class=\"hljs-keyword\">await<\/span> browser.close()\n\ndef handler(event, context):\n    url = event.get(<span class=\"hljs-string\">\"url\"<\/span>, <span class=\"hljs-string\">\"https:\/\/example.com\"<\/span>)\n    <span class=\"hljs-keyword\">return<\/span> asyncio.run(scrape(url))<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>The Dockerfile structure for Playwright on Lambda is similar \u2014 containerized function with Playwright and its browser binaries installed. The <code>playwright install chromium<\/code> command during the image build handles all binary management.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When Selenium still makes sense<\/h3>\n\n\n\n<p>Rewriting working Selenium code is not justified by &#8220;Playwright is newer.&#8221; If you have an existing Selenium codebase that runs reliably, the migration cost is real and the benefit is marginal for stable scrapers.<\/p>\n\n\n\n<p>Stick with Selenium when: the codebase is established and functioning, your team is invested in Selenium Grid for distributed execution, or you need Java, C#, or Ruby \u2014 languages where Playwright&#8217;s bindings are less mature than Python and JavaScript.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Anti-bot detection<\/h3>\n\n\n\n<p>Both Selenium and Playwright are detectable by modern anti-bot systems. Cloudflare, Akamai, PerimeterX, and DataDome all fingerprint browser automation tools at the network and browser level \u2014 inspecting headers, JavaScript properties, mouse movement patterns, and timing signatures.<\/p>\n\n\n\n<p>Vanilla Selenium is particularly detectable because it sets <code>navigator.webdriver = true<\/code> by default. Stealth wrappers like <code>undetected-chromedriver<\/code> patch this, but detection systems evolve and stealth patches require maintenance.<\/p>\n\n\n\n<p>For targets with serious<strong> anti-bot protection<\/strong>, managed browser infrastructure is the more pragmatic choice. Browserless, Scrapfly, and Apify manage browser pools, proxy rotation, and anti-detection at the infrastructure level. The cost per request is higher than running your own Lambda; the reliability and maintenance burden are significantly lower.<\/p>\n\n\n\n<p>For unprotected targets \u2014 internal tools, publicly accessible data with no bot detection \u2014 a straightforward Selenium or Playwright Lambda setup works reliably.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Read more: <\/strong><a href=\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/\" type=\"post\" id=\"13828\" target=\"_blank\" rel=\"noreferrer noopener\">Harness Engineering: Why \u201cDone\u201d Isn\u2019t the Agent Saying So<\/a><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Conclusions\"><strong>Conclusions<\/strong><\/h2>\n\n\n\n<p>This is a very specific example of the usage of AWS Lambda and Selenium but I hope it can illustrate the potential of these technologies. Instead of creating a web scraper, we can create functions that run end-to-end tests that, in case of failure, send a Slack message, or we can create an API that calculates the distance between two strings and returns it in the HTTP response. It&#8217;s all up to your imagination!<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"http:\/\/cheesecakelabs.com\/services\" target=\"_blank\" rel=\" noreferrer noopener\"><img decoding=\"async\" width=\"1200\" height=\"409\" src=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-1200x409.jpg\" alt=\"legacy-app-ckl | | Cheesecake Labs\" class=\"wp-image-13491\" srcset=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-1200x409.jpg 1200w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-600x205.jpg 600w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-768x262.jpg 768w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-1536x524.jpg 1536w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-760x259.jpg 760w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl.jpg 1920w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Web scraping might save several hours when compared to manually collecting data from websites, and AWS Lambda is a good way to set up scripts to run by demand. In this blog post, we\u2019ll cover everything you need to know,\u00a0 the good and bad things about Web Scraping and AWS Lambda, and also analyze the [&hellip;]<\/p>\n","protected":false},"author":81,"featured_media":11064,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[432],"tags":[305],"class_list":["post-11057","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-engineering","tag-tag-development"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Setting up a Selenium Scraper on AWS Lambda<\/title>\n<meta name=\"description\" content=\"In this article, you&#039;ll discover more about Web Scrapping and learn step by step how to use Selenium to Web-Scrape on a AWS Lambda.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Setting up a Selenium Scraper on AWS Lambda\" \/>\n<meta property=\"og:description\" content=\"In this article, you&#039;ll discover more about Web Scrapping and learn step by step how to use Selenium to Web-Scrape on a AWS Lambda.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/\" \/>\n<meta property=\"og:site_name\" content=\"Cheesecake Labs\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cheesecakelabs\" \/>\n<meta property=\"article:published_time\" content=\"2023-01-12T20:42:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-06-24T00:09:47+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/selenium-web-scraping-aws-lambda.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"860\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Cheesecake Labs\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@cheesecakelabs\" \/>\n<meta name=\"twitter:site\" content=\"@cheesecakelabs\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/\"},\"author\":{\"name\":\"Karran Besen\"},\"headline\":\"How To Use Selenium To Web-Scrape on AWS Lambda\",\"datePublished\":\"2023-01-12T20:42:29+00:00\",\"dateModified\":\"2026-06-24T00:09:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/\"},\"wordCount\":2155,\"publisher\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/selenium-web-scraping-aws-lambda.jpg\",\"keywords\":[\"development\"],\"articleSection\":[\"Engineering\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/\",\"url\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/\",\"name\":\"Setting up a Selenium Scraper on AWS Lambda\",\"isPartOf\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/selenium-web-scraping-aws-lambda.jpg\",\"datePublished\":\"2023-01-12T20:42:29+00:00\",\"dateModified\":\"2026-06-24T00:09:47+00:00\",\"description\":\"In this article, you'll discover more about Web Scrapping and learn step by step how to use Selenium to Web-Scrape on a AWS Lambda.\",\"breadcrumb\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#primaryimage\",\"url\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/selenium-web-scraping-aws-lambda.jpg\",\"contentUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/selenium-web-scraping-aws-lambda.jpg\",\"width\":1920,\"height\":860,\"caption\":\"woman programming in front of a pc\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/cheesecakelabs.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How To Use Selenium To Web-Scrape on AWS Lambda\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#website\",\"url\":\"https:\/\/cheesecakelabs.com\/blog\/\",\"name\":\"Cheesecake Labs\",\"description\":\"Nearshore outsourcing company for Web and Mobile design and engineering services, and staff augmentation for startups and enterprises..\",\"publisher\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/cheesecakelabs.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#organization\",\"name\":\"Cheesecake Labs\",\"alternateName\":\"Cheesecake Labs Inc\",\"url\":\"https:\/\/cheesecakelabs.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cheesecakelabs.com\/#logo\",\"url\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2022\/06\/cheesecake-labs-1.png\",\"contentUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2022\/06\/cheesecake-labs-1.png\",\"caption\":\"Cheesecake Labs\",\"inLanguage\":\"en\"},\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cheesecakelabs.com\/#primary-image\",\"url\":\"https:\/\/ckl-website-v4-strapi-prod.s3.us-east-2.amazonaws.com\/ai_software_development_company_83fb512983.webp\",\"contentUrl\":\"https:\/\/ckl-website-v4-strapi-prod.s3.us-east-2.amazonaws.com\/ai_software_development_company_83fb512983.webp\",\"width\":1920,\"height\":1080,\"caption\":\"Cheesecake Labs \u2014 AI, Data & Blockchain software development services\",\"inLanguage\":\"en\"},\"sameAs\":[\"https:\/\/www.facebook.com\/cheesecakelabs\",\"https:\/\/x.com\/cheesecakelabs\",\"https:\/\/www.instagram.com\/cheesecakelabs\/\",\"https:\/\/www.linkedin.com\/company\/cheesecake-labs\/\",\"https:\/\/www.youtube.com\/channel\/UCdGEQ5AHJcmIlaOaI5fGGVA\",\"https:\/\/clutch.co\/profile\/cheesecake-labs\",\"https:\/\/www.behance.net\/cheesecakelabs\",\"https:\/\/dribbble.com\/cheesecakelabs\",\"https:\/\/www.designrush.com\/agency\/profile\/cheesecake-labs\",\"https:\/\/www.g2.com\/products\/cheesecake-labs\/reviews\"],\"description\":\"Cheesecake Labs is a software development studio that designs and builds custom digital products \u2014 web, mobile, and platforms \u2014 combining product design and high-performance engineering.\",\"foundingDate\":\"2013\"},{\"@type\":\"Person\",\"name\":\"Karran Besen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2019\/12\/karran-300x300.png\",\"contentUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2019\/12\/karran-300x300.png\",\"caption\":\"Karran Besen\"},\"url\":\"https:\/\/cheesecakelabs.com\/blog\/autor\/karran-besen\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Setting up a Selenium Scraper on AWS Lambda","description":"In this article, you'll discover more about Web Scrapping and learn step by step how to use Selenium to Web-Scrape on a AWS Lambda.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/","og_locale":"en_US","og_type":"article","og_title":"Setting up a Selenium Scraper on AWS Lambda","og_description":"In this article, you'll discover more about Web Scrapping and learn step by step how to use Selenium to Web-Scrape on a AWS Lambda.","og_url":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/","og_site_name":"Cheesecake Labs","article_publisher":"https:\/\/www.facebook.com\/cheesecakelabs","article_published_time":"2023-01-12T20:42:29+00:00","article_modified_time":"2026-06-24T00:09:47+00:00","og_image":[{"width":1920,"height":860,"url":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/selenium-web-scraping-aws-lambda.jpg","type":"image\/jpeg"}],"author":"Cheesecake Labs","twitter_card":"summary_large_image","twitter_creator":"@cheesecakelabs","twitter_site":"@cheesecakelabs","twitter_misc":{"Written by":null,"Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#article","isPartOf":{"@id":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/"},"author":{"name":"Karran Besen"},"headline":"How To Use Selenium To Web-Scrape on AWS Lambda","datePublished":"2023-01-12T20:42:29+00:00","dateModified":"2026-06-24T00:09:47+00:00","mainEntityOfPage":{"@id":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/"},"wordCount":2155,"publisher":{"@id":"https:\/\/cheesecakelabs.com\/blog\/#organization"},"image":{"@id":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#primaryimage"},"thumbnailUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/selenium-web-scraping-aws-lambda.jpg","keywords":["development"],"articleSection":["Engineering"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/","url":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/","name":"Setting up a Selenium Scraper on AWS Lambda","isPartOf":{"@id":"https:\/\/cheesecakelabs.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#primaryimage"},"image":{"@id":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#primaryimage"},"thumbnailUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/selenium-web-scraping-aws-lambda.jpg","datePublished":"2023-01-12T20:42:29+00:00","dateModified":"2026-06-24T00:09:47+00:00","description":"In this article, you'll discover more about Web Scrapping and learn step by step how to use Selenium to Web-Scrape on a AWS Lambda.","breadcrumb":{"@id":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#primaryimage","url":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/selenium-web-scraping-aws-lambda.jpg","contentUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/01\/selenium-web-scraping-aws-lambda.jpg","width":1920,"height":860,"caption":"woman programming in front of a pc"},{"@type":"BreadcrumbList","@id":"https:\/\/cheesecakelabs.com\/blog\/selenium-scraper-aws-lambda\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/cheesecakelabs.com\/blog\/"},{"@type":"ListItem","position":2,"name":"How To Use Selenium To Web-Scrape on AWS Lambda"}]},{"@type":"WebSite","@id":"https:\/\/cheesecakelabs.com\/blog\/#website","url":"https:\/\/cheesecakelabs.com\/blog\/","name":"Cheesecake Labs","description":"Nearshore outsourcing company for Web and Mobile design and engineering services, and staff augmentation for startups and enterprises..","publisher":{"@id":"https:\/\/cheesecakelabs.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/cheesecakelabs.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/cheesecakelabs.com\/blog\/#organization","name":"Cheesecake Labs","alternateName":"Cheesecake Labs Inc","url":"https:\/\/cheesecakelabs.com\/","logo":{"@type":"ImageObject","@id":"https:\/\/cheesecakelabs.com\/#logo","url":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2022\/06\/cheesecake-labs-1.png","contentUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2022\/06\/cheesecake-labs-1.png","caption":"Cheesecake Labs","inLanguage":"en"},"image":{"@type":"ImageObject","@id":"https:\/\/cheesecakelabs.com\/#primary-image","url":"https:\/\/ckl-website-v4-strapi-prod.s3.us-east-2.amazonaws.com\/ai_software_development_company_83fb512983.webp","contentUrl":"https:\/\/ckl-website-v4-strapi-prod.s3.us-east-2.amazonaws.com\/ai_software_development_company_83fb512983.webp","width":1920,"height":1080,"caption":"Cheesecake Labs \u2014 AI, Data & Blockchain software development services","inLanguage":"en"},"sameAs":["https:\/\/www.facebook.com\/cheesecakelabs","https:\/\/x.com\/cheesecakelabs","https:\/\/www.instagram.com\/cheesecakelabs\/","https:\/\/www.linkedin.com\/company\/cheesecake-labs\/","https:\/\/www.youtube.com\/channel\/UCdGEQ5AHJcmIlaOaI5fGGVA","https:\/\/clutch.co\/profile\/cheesecake-labs","https:\/\/www.behance.net\/cheesecakelabs","https:\/\/dribbble.com\/cheesecakelabs","https:\/\/www.designrush.com\/agency\/profile\/cheesecake-labs","https:\/\/www.g2.com\/products\/cheesecake-labs\/reviews"],"description":"Cheesecake Labs is a software development studio that designs and builds custom digital products \u2014 web, mobile, and platforms \u2014 combining product design and high-performance engineering.","foundingDate":"2013"},{"@type":"Person","name":"Karran Besen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cheesecakelabs.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2019\/12\/karran-300x300.png","contentUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2019\/12\/karran-300x300.png","caption":"Karran Besen"},"url":"https:\/\/cheesecakelabs.com\/blog\/autor\/karran-besen\/"}]}},"_links":{"self":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts\/11057","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/users\/81"}],"replies":[{"embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/comments?post=11057"}],"version-history":[{"count":7,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts\/11057\/revisions"}],"predecessor-version":[{"id":13984,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts\/11057\/revisions\/13984"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/media\/11064"}],"wp:attachment":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/media?parent=11057"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/categories?post=11057"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/tags?post=11057"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}