Files
securelens-backend/docs/architecture.md
2026-04-29 16:22:02 +05:30

10 KiB
Raw Permalink Blame History

Architecture Overview

This document explains how the different pieces of SecureLens fit together — what each layer does, why it exists, and how data flows through the system.


High-Level Architecture

┌──────────────────────────────────────────────────────────────┐
│                        CLIENT                                │
│  (Next.js Frontend / Swagger UI / curl / API consumer)       │
└───────────────────────────────┬──────────────────────────────┘
                                │  HTTP requests
                                ▼
┌──────────────────────────────────────────────────────────────┐
│                    FASTAPI APPLICATION                       │
│                                                              │
│  ┌───────────────┐  ┌───────────────┐  ┌─────────────────┐   │
│  │  Auth Router  │  │  Scan Router  │  │ Code Scan Router│   │
│  │  /auth/*      │  │  /scan        │  │  /code-scan/*   │   │
│  └───────┬───────┘  └───────┬───────┘  └────────┬────────┘   │
│          │                  │                    │           │
│          ▼                  ▼                    ▼           │
│  ┌──────────────┐  ┌─────────────────┐  ┌──────────────┐     │
│  │ Auth Service │  │ Scanner Service │  │  Orchestrator│     │
│  │ JWT + Users  │  │ 5 check layers  │  │ 3-phase agent│     │
│  └──────┬───────┘  └───────┬─────────┘  └──────┬───────┘     │
│         │                  │                    │            │
└─────────┼──────────────────┼────────────────────┼────────────┘
          │                  │                    │
          ▼                  ▼                    ▼
   ┌─────────────┐   ┌──────────────┐    ┌──────────────┐
   │ PostgreSQL  │   │ Target URLs  │    │  GitHub API  │
   │  Database   │   │ (live scans) │    │  + Gemini AI │
   └─────────────┘   └──────────────┘    └──────────────┘

Application Layers

1. FastAPI Application (app/main.py)

This is the entry point. It creates the FastAPI app, registers all the routers, sets up CORS, and configures the lifespan (startup/shutdown logic like creating database tables).

FastAPI is async from top to bottom. Every request handler is an async def function, which means the server can handle many concurrent requests without blocking on I/O — critical for a system that makes lots of external HTTP calls.

The app listens on port 8000 and serves:

  • A REST API for all functionality
  • An interactive Swagger UI at /docs
  • An OpenAPI schema at /openapi.json

2. Routers (app/routers/)

Routers are just groups of related endpoints. FastAPI uses them to keep the codebase organised.

File What It Handles
auth.py Register, login, get current user
scan.py Website URL scanning
history.py Reading and deleting past scan results
code_scan.py GitHub repo scanning + AI chat
health.py Health check endpoints

Routers don't contain business logic. They receive the request, call the appropriate service, and return the result. They're thin by design.


3. Services (app/services/)

Services contain the actual business logic.

scanner/ — The Website Scanner

A collection of five independent checkers, each responsible for one "layer" of security:

  • transport.py — Checks if the site uses HTTPS and implements HSTS correctly
  • ssl_checker.py — Validates the SSL certificate (expiry, chain, TLS version)
  • headers.py — Checks for the presence and correct configuration of security headers (CSP, X-Frame-Options, etc.)
  • cookies.py — Checks session cookies for HttpOnly, Secure, and SameSite flags
  • exposure.py — Probes for exposed sensitive paths like /admin, /.env, /phpinfo.php

Each checker runs independently. The scan router calls all of them, collects their results, passes them to the scoring engine, then sends everything through the AI service for enhancement.

code_scanner/ — The Code Scanner Agent

Contains the three-phase AI pipeline. See ai-agent.md for a full explanation.

  • orchestrator.py — The main pipeline class (Triage → Analysis → Summary)
  • github_client.py — Handles all GitHub API communication

ai.py — Website Scanner AI Layer

Standalone functions that use Gemini to enhance the website scanner's results: enhance_security_issues(), chat_with_scan_context(), generate_threat_narrative().

scoring.py — The Scoring Engine

A pure Python function that takes the list of issues from all scanners, applies weights based on severity, and produces a 0100 score and an AF letter grade. No AI involved here — it's deterministic and consistent.


4. Schemas (app/schemas/)

Pydantic models that define the shape of every request and response. FastAPI uses these for automatic validation, serialisation, and documentation generation.

If a request body doesn't match the schema, FastAPI returns a 422 automatically without your handler even being called.

Key schemas:

  • auth.pyRegisterRequest, LoginRequest, TokenResponse, UserResponse
  • scan.pyScanRequest, ScanResponse, IssueDetail
  • code_scan.pyCodeScanRequest, CodeScanResponse, VulnerabilityIssue, CodeChatRequest, CodeChatResponse

5. Models (app/models/)

SQLAlchemy ORM models — the Python representation of database tables.

  • user.py — The User table (id, email, username, hashed_password, created_at)
  • scan.py — The ScanResult table (id, user_id, url, score, grade, full result JSON)

These are what get stored in PostgreSQL. The code scanner's results are not stored in the database in the current version — they're kept in an in-memory dict in code_scan.py.


6. Middleware (app/middleware/)

  • auth.py — The get_current_user dependency. Any endpoint that requires authentication uses this. It validates the JWT token from the Authorization header and returns the user object.
  • rate_limiter.py — SlowAPI configuration. Limits the number of requests per IP per minute.

7. Utils (app/utils/)

  • auth.py — Low-level JWT functions: creating tokens, verifying tokens, hashing passwords, checking passwords
  • validators.py — URL validation and SSRF protection. Before scanning any URL, we check it's not a private IP address or localhost, which would let attackers use our scanner to probe internal networks

Data Flow — Code Scan Request

This is exactly what happens when you call POST /code-scan/analyze:

1. Request arrives at FastAPI
      │
2. Pydantic validates the body → CodeScanRequest(repo_url, github_token, branch)
      │
3. Router creates a CodeScanOrchestrator instance
      │
4. GitHubClient.get_repo_tree() → fetches all file paths via GitHub Trees API
      │
      ├── Makes 1-2 GitHub API calls (uses token for auth)
      └── Returns: ["app/page.js", "app/users/page.js", "package.json", ...]
      │
5. orchestrator.triage_files() → sends file list to Gemini
      │
      ├── 1 Gemini API call with all filenames
      └── Returns: ["app/users/page.js", "middleware.ts", ...] (5 files)
      │
6. orchestrator.analyze_files() → fetches and scans each file
      │
      ├── GitHubClient.get_file_content() × 5 (concurrent, async)
      ├── Gemini generate_content() × 5 (concurrent, async, behind Semaphore)
      └── Returns: [VulnerabilityIssue, VulnerabilityIssue, ...]
      │
7. orchestrator.generate_summary() → writes executive summary
      │
      ├── 1 Gemini API call with all vulnerability data
      └── Returns: "The repository presents a moderate risk..."
      │
8. Router creates CodeScanResponse with a UUID scan_id
      │
9. scan_store[scan_id] = response  (saved in-memory for chat)
      │
10. Response returned to client (JSON)

Total external API calls: 2-3 GitHub + 7 Gemini = ~9-10 calls per scan.


Database

We use PostgreSQL in production (via Docker Compose) and SQLite in local development.

The connection is managed by SQLAlchemy's async engine. All database operations use async with get_db() as session: — they never block.

Migrations are managed by Alembic. To run migrations:

alembic upgrade head

The tables are also auto-created on startup in development mode (the create_all() call in main.py's lifespan function).


Environment Configuration

All configuration is driven by the .env file. The config.py file uses Pydantic's BaseSettings to read it:

class Settings(BaseSettings):
    gemini_api_key: str | None = None
    database_url: str = "sqlite+aiosqlite:///./securelens.db"
    jwt_secret: str = "change-me-in-production"
    # ...

If a required variable is missing, Pydantic raises an error on startup — not silently at runtime.

See .env.example for the full list of options, or the Configuration section in README.md.


Docker Setup

The docker-compose.yml runs two services:

backend   ← FastAPI app (port 8000)
db        ← PostgreSQL (port 5432, internal only)

The backend container reads DATABASE_URL from .env and connects to the db container over the internal Docker network. PostgreSQL data persists in a Docker volume across restarts.

To rebuild from scratch:

docker compose down -v   # removes containers AND the data volume
docker compose up --build