Technology

#How to Run Puppeteer and Headless Chrome in a Docker Container

“#How to Run Puppeteer and Headless Chrome in a Docker Container”
Illustration showing the Puppeteer logoa

Puppeteer is a Node.js library which lets you interact with the Chrome web browser. Recent releases also include Firefox support.

Puppeteer is commonly used to automate testing, archive webpage data, and generate screenshots of live web content. It lets you control Chrome via a clear API, giving you the ability to navigate to pages, click on form controls, and issue browser commands.

Getting Puppeteer running in a Docker container can be complex as many dependencies are needed to run headless Chrome. Here’s how to get everything installed so you can use Puppeteer in a Kubernetes cluster, in an isolated container on your dev machine, or as part of a CI pipeline.

The Basic Requirements

We’re using a Debian-based image for the purposes of this article. If you’re using a different base, you’ll need to adapt the displayed package manager commands accordingly. The official Node.js image is a suitable starting point that means you don’t need to manually install Node.

Puppeteer is distributed via npm, the Node.js package manager. It bundles the latest build of Chromium within its package, so theoretically an npm install puppeteer would get you running. In practice, a clean Docker environment will lack the dependencies you need to run Chrome.

As it’s ordinarily a heavyweight GUI program, Chrome depends on font, graphics, configuration, and window management libraries. These all need to be installed within your Dockerfile.

At the time of writing, the current dependency list looks like this:

FROM node:latest
WORKDIR /puppeteer
RUN apt-get install -y 
    fonts-liberation 
    gconf-service 
    libappindicator1 
    libasound2 
    libatk1.0-0 
    libcairo2 
    libcups2 
    libfontconfig1 
    libgbm-dev 
    libgdk-pixbuf2.0-0 
    libgtk-3-0 
    libicu-dev 
    libjpeg-dev 
    libnspr4 
    libnss3 
    libpango-1.0-0 
    libpangocairo-1.0-0 
    libpng-dev 
    libx11-6 
    libx11-xcb1 
    libxcb1 
    libxcomposite1 
    libxcursor1 
    libxdamage1 
    libxext6 
    libxfixes3 
    libxi6 
    libxrandr2 
    libxrender1 
    libxss1 
    libxtst6 
    xdg-utils

The dependencies are being installed manually to facilitate use of the Chromium binary that’s bundled with Puppeteer. This ensures consistency between Puppeteer releases and avoids the possibilities of a new Chrome release arriving with incompatibilities that break Puppeteer.

Now run npm install puppeteer in your local working directory. This will create a package.json and package-lock.json for you to use. In your Dockerfile, copy these files into the container and use npm ci to install Puppeteer.

# (above section omitted)
COPY package.json .
COPY package-lock.json .
RUN npm ci

The final step is to make Puppeteer’s bundled Chromium binary properly executable. Otherwise, you’ll run into permission errors whenever Puppeteer tries to start Chrome.

# (above section omitted)
RUN chmod -R o+rwx node_modules/puppeteer/.local-chromium

You might want to manually install a specific Chrome version in customized environments. Setting the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD environment variable before you run npm ci will disable Puppeteer’s own browser download during installation. This helps slim down your final image.

At this point you should be ready to build your image:

docker build . -t puppeteer:latest

This is a fairly large build process which could take several minutes on a slower internet connection.

Using Puppeteer in Docker

Some special considerations apply to launching Chrome when you’re using Puppeteer in a Dockerized environment. Despite installing all the dependencies, the environment still looks different to most regular Chrome installations, so additional launch flags are required.

Here’s a minimal example of using Puppeteer inside your container:

const puppeteer = require("puppeteer");
 
const browser = await puppeteer.launch({
    headless: true,
    args: [
        "--disable-gpu",
        "--disable-dev-shm-usage",
        "--disable-setuid-sandbox",
        "--no-sandbox",
    ]
});
 
const page = await browser.newPage();
await page.goto("https://example.com");
const ss = await page.screenshot({path: "/screenshot.png"});
 
await page.close();
await browser.close();

This demonstrates a simple script that launches a headless Chrome instance, navigates to a URL, and captures a screenshot of the page. The browser is then closed to avoid wasting system resources.

The important section is the arguments list that’s passed to Chromium as part of the launch() call:

  • disable-gpu – The GPU isn’t usually available inside a Docker container, unless you’ve specially configured the host. Setting this flag explicitly instructs Chrome not to try and use GPU-based rendering.
  • no-sandbox and disable-setuid-sandbox – These disable Chrome’s sandboxing, a step which is required when running as the root user (the default in a Docker container). Using these flags could allow malicious web content to escape the browser process and compromise the host. It’s vital you ensure your Docker containers are strongly isolated from your host. If you’re uncomfortable with this, you’ll need to manually configure working Chrome sandboxing, which is a more involved process.
  • disable-dev-shm-usage – This flag is necessary to avoid running into issues with Docker’s default low shared memory space of 64MB. Chrome will write into /tmp instead.

Add your JavaScript to your container with a COPY instruction. You should find Puppeteer executes successfully, provided proper Chrome flags are used.

Conclusion

Running Puppeteer in a Docker container lets you automate webpages as part of your CI pipelines and production infrastructure. It also helps you isolate your environment during development, so you don’t need to install Chrome locally.

Your container needs to have the right dependencies installed. You must also set Chrome launch arguments so the browser operates correctly in your Dockerized environment. Afterwards, you should be able to use the Puppeteer API with no further special considerations.

It is worth paying attention to Chrome’s resource usage. Launching multiple browsers in a single container instance could quickly exhaust Docker memory limits. Either raise the limits on your container or implement a system that restricts script concurrency or reuses running browser instances.

If you liked the article, do not forget to share it with your friends. Follow us on Google News too, click on the star and choose us from your favorites.

For forums sites go to Forum.BuradaBiliyorum.Com

If you want to read more like this article, you can visit our Technology category.

Source

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
Close

Please allow ads on our site

Please consider supporting us by disabling your ad blocker!