1
0
Fork 0
mirror of https://github.com/jaypyles/Scraperr.git synced 2026-03-02 21:36:59 -05:00
No description
  • TypeScript 67%
  • Python 30.4%
  • CSS 0.8%
  • Dockerfile 0.5%
  • Makefile 0.5%
  • Other 0.8%
Find a file
2025-10-12 16:55:31 +00:00
.github chore: push for arm64 2025-07-05 10:47:02 -05:00
alembic Feat/swap to sqlalchemy (#99) 2025-07-12 21:12:33 -05:00
api/backend fix: only log if it got a job 2025-10-12 11:55:20 -05:00
cypress feat: edit ui + add return html option (#90) 2025-06-08 18:14:02 -05:00
docker Feat/swap to sqlalchemy (#99) 2025-07-12 21:12:33 -05:00
docs feat: edit ui + add return html option (#90) 2025-06-08 18:14:02 -05:00
helm chore: bump version to 1.1.7 2025-10-12 16:55:31 +00:00
public fix: make general fixes to dev containers and log pages 2024-10-18 17:19:46 -05:00
scripts feat: auto deploy 2025-06-02 19:08:18 -05:00
src Feat/swap to sqlalchemy (#99) 2025-07-12 21:12:33 -05:00
.dockerignore feat: add recording viewer and vnc (#78) 2025-05-16 21:37:09 -05:00
.gitignore feat: add media viewer + other fixes (#79) 2025-05-17 16:31:34 -05:00
.prettierignore Feat/add helm chart (#69) 2025-05-12 21:19:17 -05:00
.python-version feat: fix authentication 2025-04-24 18:24:19 -05:00
alembic.ini Feat/swap to sqlalchemy (#99) 2025-07-12 21:12:33 -05:00
cypress.config.ts feat: add cypress E2E testing 2024-10-21 19:57:12 -05:00
docker-compose.dev.yml feat: add media viewer + other fixes (#79) 2025-05-17 16:31:34 -05:00
docker-compose.yml Chore: app refactor (#88) 2025-06-01 15:56:15 -05:00
FUNDING.yml chore: docs [skip ci] 2025-05-11 11:24:19 -05:00
LICENSE Create LICENSE 2024-07-07 14:06:35 -05:00
Makefile Chore: app refactor (#88) 2025-06-01 15:56:15 -05:00
next-env.d.ts feat: add import/export for job configurations (#91) 2025-06-12 18:00:39 -05:00
next.config.mjs wip: separate frontend from backend 2024-07-23 20:53:15 -05:00
package.json Chore: app refactor (#88) 2025-06-01 15:56:15 -05:00
pdm.lock Feat/swap to sqlalchemy (#99) 2025-07-12 21:12:33 -05:00
postcss.config.js wip: update Dockerfile with next deps 2024-06-26 16:14:43 -05:00
pyproject.toml Feat/swap to sqlalchemy (#99) 2025-07-12 21:12:33 -05:00
README.md feat: add media viewer + other fixes (#79) 2025-05-17 16:31:34 -05:00
start.sh Feat/swap to sqlalchemy (#99) 2025-07-12 21:12:33 -05:00
supervisord.conf feat: add recording viewer and vnc (#78) 2025-05-16 21:37:09 -05:00
tailwind.config.js wip: update UI 2024-07-22 15:57:32 -05:00
tsconfig.json Chore: app refactor (#88) 2025-06-01 15:56:15 -05:00
yarn.lock Chore: app refactor (#88) 2025-06-01 15:56:15 -05:00

Scraperr Logo

A powerful self-hosted web scraping solution

MongoDB FastAPI Next JS TailwindCSS

📋 Overview

Scrape websites without writing a single line of code.

📚 Check out the docs for a comprehensive quickstart guide and detailed information.

Scraperr Main Interface

Key Features

  • XPath-Based Extraction: Precisely target page elements
  • Queue Management: Submit and manage multiple scraping jobs
  • Domain Spidering: Option to scrape all pages within the same domain
  • Custom Headers: Add JSON headers to your scraping requests
  • Media Downloads: Automatically download images, videos, and other media
  • Results Visualization: View scraped data in a structured table format
  • Data Export: Export your results in markdown and csv formats
  • Notifcation Channels: Send completion notifcations, through various channels

🚀 Getting Started

Docker

make up

Helm

Refer to the docs for helm deployment: https://scraperr-docs.pages.dev/guides/helm-deployment

When using Scraperr, please remember to:

  1. Respect robots.txt: Always check a website's robots.txt file to verify which pages permit scraping
  2. Terms of Service: Adhere to each website's Terms of Service regarding data extraction
  3. Rate Limiting: Implement reasonable delays between requests to avoid overloading servers

Disclaimer: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool.

💬 Join the Community

Get support, report bugs, and chat with other users and contributors.

👉 Join the Scraperr Discord

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

👏 Contributions

Development made easier with the webapp template.

To get started, simply run make build up-dev.