mirror of
https://github.com/Rarebuffalo/securelens-backend.git
synced 2026-06-19 07:00:30 +00:00
238 lines
10 KiB
Markdown
238 lines
10 KiB
Markdown
# Architecture Overview
|
||
|
||
This document explains how the different pieces of SecureLens fit together — what each layer does, why it exists, and how data flows through the system.
|
||
|
||
---
|
||
|
||
## High-Level Architecture
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ CLIENT │
|
||
│ (Next.js Frontend / Swagger UI / curl / API consumer) │
|
||
└───────────────────────────────┬──────────────────────────────┘
|
||
│ HTTP requests
|
||
▼
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ FASTAPI APPLICATION │
|
||
│ │
|
||
│ ┌───────────────┐ ┌───────────────┐ ┌─────────────────┐ │
|
||
│ │ Auth Router │ │ Scan Router │ │ Code Scan Router│ │
|
||
│ │ /auth/* │ │ /scan │ │ /code-scan/* │ │
|
||
│ └───────┬───────┘ └───────┬───────┘ └────────┬────────┘ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌──────────────┐ ┌─────────────────┐ ┌──────────────┐ │
|
||
│ │ Auth Service │ │ Scanner Service │ │ Orchestrator│ │
|
||
│ │ JWT + Users │ │ 5 check layers │ │ 3-phase agent│ │
|
||
│ └──────┬───────┘ └───────┬─────────┘ └──────┬───────┘ │
|
||
│ │ │ │ │
|
||
└─────────┼──────────────────┼────────────────────┼────────────┘
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
|
||
│ PostgreSQL │ │ Target URLs │ │ GitHub API │
|
||
│ Database │ │ (live scans) │ │ + Gemini AI │
|
||
└─────────────┘ └──────────────┘ └──────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Application Layers
|
||
|
||
### 1. FastAPI Application (`app/main.py`)
|
||
|
||
This is the entry point. It creates the FastAPI app, registers all the routers, sets up CORS, and configures the lifespan (startup/shutdown logic like creating database tables).
|
||
|
||
FastAPI is async from top to bottom. Every request handler is an `async def` function, which means the server can handle many concurrent requests without blocking on I/O — critical for a system that makes lots of external HTTP calls.
|
||
|
||
The app listens on port `8000` and serves:
|
||
- A REST API for all functionality
|
||
- An interactive Swagger UI at `/docs`
|
||
- An OpenAPI schema at `/openapi.json`
|
||
|
||
---
|
||
|
||
### 2. Routers (`app/routers/`)
|
||
|
||
Routers are just groups of related endpoints. FastAPI uses them to keep the codebase organised.
|
||
|
||
| File | What It Handles |
|
||
|---|---|
|
||
| `auth.py` | Register, login, get current user |
|
||
| `scan.py` | Website URL scanning |
|
||
| `history.py` | Reading and deleting past scan results |
|
||
| `code_scan.py` | GitHub repo scanning + AI chat |
|
||
| `health.py` | Health check endpoints |
|
||
|
||
Routers don't contain business logic. They receive the request, call the appropriate service, and return the result. They're thin by design.
|
||
|
||
---
|
||
|
||
### 3. Services (`app/services/`)
|
||
|
||
Services contain the actual business logic.
|
||
|
||
#### `scanner/` — The Website Scanner
|
||
|
||
A collection of five independent checkers, each responsible for one "layer" of security:
|
||
|
||
- `transport.py` — Checks if the site uses HTTPS and implements HSTS correctly
|
||
- `ssl_checker.py` — Validates the SSL certificate (expiry, chain, TLS version)
|
||
- `headers.py` — Checks for the presence and correct configuration of security headers (CSP, X-Frame-Options, etc.)
|
||
- `cookies.py` — Checks session cookies for HttpOnly, Secure, and SameSite flags
|
||
- `exposure.py` — Probes for exposed sensitive paths like `/admin`, `/.env`, `/phpinfo.php`
|
||
|
||
Each checker runs independently. The scan router calls all of them, collects their results, passes them to the scoring engine, then sends everything through the AI service for enhancement.
|
||
|
||
#### `code_scanner/` — The Code Scanner Agent
|
||
|
||
Contains the three-phase AI pipeline. See [ai-agent.md](./ai-agent.md) for a full explanation.
|
||
|
||
- `orchestrator.py` — The main pipeline class (Triage → Analysis → Summary)
|
||
- `github_client.py` — Handles all GitHub API communication
|
||
|
||
#### `ai.py` — Website Scanner AI Layer
|
||
|
||
Standalone functions that use Gemini to enhance the website scanner's results: `enhance_security_issues()`, `chat_with_scan_context()`, `generate_threat_narrative()`.
|
||
|
||
#### `scoring.py` — The Scoring Engine
|
||
|
||
A pure Python function that takes the list of issues from all scanners, applies weights based on severity, and produces a 0–100 score and an A–F letter grade. No AI involved here — it's deterministic and consistent.
|
||
|
||
---
|
||
|
||
### 4. Schemas (`app/schemas/`)
|
||
|
||
Pydantic models that define the shape of every request and response. FastAPI uses these for automatic validation, serialisation, and documentation generation.
|
||
|
||
If a request body doesn't match the schema, FastAPI returns a `422` automatically without your handler even being called.
|
||
|
||
Key schemas:
|
||
|
||
- `auth.py` — `RegisterRequest`, `LoginRequest`, `TokenResponse`, `UserResponse`
|
||
- `scan.py` — `ScanRequest`, `ScanResponse`, `IssueDetail`
|
||
- `code_scan.py` — `CodeScanRequest`, `CodeScanResponse`, `VulnerabilityIssue`, `CodeChatRequest`, `CodeChatResponse`
|
||
|
||
---
|
||
|
||
### 5. Models (`app/models/`)
|
||
|
||
SQLAlchemy ORM models — the Python representation of database tables.
|
||
|
||
- `user.py` — The `User` table (id, email, username, hashed_password, created_at)
|
||
- `scan.py` — The `ScanResult` table (id, user_id, url, score, grade, full result JSON)
|
||
|
||
These are what get stored in PostgreSQL. The code scanner's results are *not* stored in the database in the current version — they're kept in an in-memory dict in `code_scan.py`.
|
||
|
||
---
|
||
|
||
### 6. Middleware (`app/middleware/`)
|
||
|
||
- `auth.py` — The `get_current_user` dependency. Any endpoint that requires authentication uses this. It validates the JWT token from the `Authorization` header and returns the user object.
|
||
- `rate_limiter.py` — SlowAPI configuration. Limits the number of requests per IP per minute.
|
||
|
||
---
|
||
|
||
### 7. Utils (`app/utils/`)
|
||
|
||
- `auth.py` — Low-level JWT functions: creating tokens, verifying tokens, hashing passwords, checking passwords
|
||
- `validators.py` — URL validation and SSRF protection. Before scanning any URL, we check it's not a private IP address or localhost, which would let attackers use our scanner to probe internal networks
|
||
|
||
---
|
||
|
||
## Data Flow — Code Scan Request
|
||
|
||
This is exactly what happens when you call `POST /code-scan/analyze`:
|
||
|
||
```
|
||
1. Request arrives at FastAPI
|
||
│
|
||
2. Pydantic validates the body → CodeScanRequest(repo_url, github_token, branch)
|
||
│
|
||
3. Router creates a CodeScanOrchestrator instance
|
||
│
|
||
4. GitHubClient.get_repo_tree() → fetches all file paths via GitHub Trees API
|
||
│
|
||
├── Makes 1-2 GitHub API calls (uses token for auth)
|
||
└── Returns: ["app/page.js", "app/users/page.js", "package.json", ...]
|
||
│
|
||
5. orchestrator.triage_files() → sends file list to Gemini
|
||
│
|
||
├── 1 Gemini API call with all filenames
|
||
└── Returns: ["app/users/page.js", "middleware.ts", ...] (5 files)
|
||
│
|
||
6. orchestrator.analyze_files() → fetches and scans each file
|
||
│
|
||
├── GitHubClient.get_file_content() × 5 (concurrent, async)
|
||
├── Gemini generate_content() × 5 (concurrent, async, behind Semaphore)
|
||
└── Returns: [VulnerabilityIssue, VulnerabilityIssue, ...]
|
||
│
|
||
7. orchestrator.generate_summary() → writes executive summary
|
||
│
|
||
├── 1 Gemini API call with all vulnerability data
|
||
└── Returns: "The repository presents a moderate risk..."
|
||
│
|
||
8. Router creates CodeScanResponse with a UUID scan_id
|
||
│
|
||
9. scan_store[scan_id] = response (saved in-memory for chat)
|
||
│
|
||
10. Response returned to client (JSON)
|
||
```
|
||
|
||
Total external API calls: 2-3 GitHub + 7 Gemini = ~9-10 calls per scan.
|
||
|
||
---
|
||
|
||
## Database
|
||
|
||
We use PostgreSQL in production (via Docker Compose) and SQLite in local development.
|
||
|
||
The connection is managed by SQLAlchemy's async engine. All database operations use `async with get_db() as session:` — they never block.
|
||
|
||
Migrations are managed by Alembic. To run migrations:
|
||
|
||
```bash
|
||
alembic upgrade head
|
||
```
|
||
|
||
The tables are also auto-created on startup in development mode (the `create_all()` call in `main.py`'s lifespan function).
|
||
|
||
---
|
||
|
||
## Environment Configuration
|
||
|
||
All configuration is driven by the `.env` file. The `config.py` file uses Pydantic's `BaseSettings` to read it:
|
||
|
||
```python
|
||
class Settings(BaseSettings):
|
||
gemini_api_key: str | None = None
|
||
database_url: str = "sqlite+aiosqlite:///./securelens.db"
|
||
jwt_secret: str = "change-me-in-production"
|
||
# ...
|
||
```
|
||
|
||
If a required variable is missing, Pydantic raises an error on startup — not silently at runtime.
|
||
|
||
See `.env.example` for the full list of options, or the Configuration section in [README.md](../README.md).
|
||
|
||
---
|
||
|
||
## Docker Setup
|
||
|
||
The `docker-compose.yml` runs two services:
|
||
|
||
```
|
||
backend ← FastAPI app (port 8000)
|
||
db ← PostgreSQL (port 5432, internal only)
|
||
```
|
||
|
||
The backend container reads `DATABASE_URL` from `.env` and connects to the `db` container over the internal Docker network. PostgreSQL data persists in a Docker volume across restarts.
|
||
|
||
To rebuild from scratch:
|
||
|
||
```bash
|
||
docker compose down -v # removes containers AND the data volume
|
||
docker compose up --build
|
||
```
|