mirror of
https://github.com/jaypyles/Scraperr.git
synced 2026-03-02 22:57:51 -05:00
No description
- TypeScript 67%
- Python 30.4%
- CSS 0.8%
- Dockerfile 0.5%
- Makefile 0.5%
- Other 0.8%
| .github | ||
| alembic | ||
| api/backend | ||
| cypress | ||
| docker | ||
| docs | ||
| helm | ||
| public | ||
| scripts | ||
| src | ||
| .dockerignore | ||
| .gitignore | ||
| .prettierignore | ||
| .python-version | ||
| alembic.ini | ||
| cypress.config.ts | ||
| docker-compose.dev.yml | ||
| docker-compose.yml | ||
| FUNDING.yml | ||
| LICENSE | ||
| Makefile | ||
| next-env.d.ts | ||
| next.config.mjs | ||
| package.json | ||
| pdm.lock | ||
| postcss.config.js | ||
| pyproject.toml | ||
| README.md | ||
| start.sh | ||
| supervisord.conf | ||
| tailwind.config.js | ||
| tsconfig.json | ||
| yarn.lock | ||
A powerful self-hosted web scraping solution
📋 Overview
Scrape websites without writing a single line of code.
📚 Check out the docs for a comprehensive quickstart guide and detailed information.
✨ Key Features
- XPath-Based Extraction: Precisely target page elements
- Queue Management: Submit and manage multiple scraping jobs
- Domain Spidering: Option to scrape all pages within the same domain
- Custom Headers: Add JSON headers to your scraping requests
- Media Downloads: Automatically download images, videos, and other media
- Results Visualization: View scraped data in a structured table format
- Data Export: Export your results in markdown and csv formats
- Notifcation Channels: Send completion notifcations, through various channels
🚀 Getting Started
Docker
make up
Helm
Refer to the docs for helm deployment: https://scraperr-docs.pages.dev/guides/helm-deployment
⚖️ Legal and Ethical Guidelines
When using Scraperr, please remember to:
- Respect
robots.txt: Always check a website'srobots.txtfile to verify which pages permit scraping - Terms of Service: Adhere to each website's Terms of Service regarding data extraction
- Rate Limiting: Implement reasonable delays between requests to avoid overloading servers
Disclaimer: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool.
💬 Join the Community
Get support, report bugs, and chat with other users and contributors.
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
👏 Contributions
Development made easier with the webapp template.
To get started, simply run make build up-dev.