mirror of
https://git.numenor-labs.us/dsfx.git
synced 2025-04-29 16:20:34 +00:00
174 lines
8.9 KiB
Markdown
174 lines
8.9 KiB
Markdown
# Encrypted File Storage System – Initial Discovery Document
|
||
|
||
## 1. Overview
|
||
|
||
This document outlines the discovery and design considerations for a secure, encrypted file storage system intended for self-hosting enthusiasts. The system is designed not only to securely store and version files with client-side encryption and an immutable manifest but also to extend into a peer-to-peer (P2P) network for redundancy, backup, and collaborative sharing between trusted nodes.
|
||
|
||
## 2. Goals & High-Level Vision
|
||
|
||
### 2.1. Security & Privacy
|
||
|
||
- **End-to-End Encryption:**
|
||
|
||
- Files are encrypted on the client before being uploaded, ensuring that the server or any peer only ever handles ciphertext.
|
||
- Decryption occurs solely on the client, preserving data privacy even in a distributed network.
|
||
|
||
- **Robust Key Management:**
|
||
- Encryption keys are derived client-side using strong key derivation functions (e.g., Argon2, scrypt, or PBKDF2).
|
||
- Integration with OS-level or hardware-based key management solutions is considered for enhanced security.
|
||
|
||
### 2.2. Efficient Versioning & Storage
|
||
|
||
- **File Partitioning & Chunking:**
|
||
|
||
- Files are divided into fixed-size or content-defined chunks (e.g., 1 MB per chunk).
|
||
- Only updated chunks are re-uploaded which reduces upload times and storage consumption.
|
||
|
||
- **Content Addressable Storage & Deduplication:**
|
||
|
||
- Each chunk is encrypted separately and indexed by its SHA-256 hash.
|
||
- This method allows for deduplication, reducing redundant uploads across versions and even across files.
|
||
|
||
- **Verifiable Append-Only Manifest:**
|
||
- An append-only log (referred to as the manifest) maintains a complete history of chunk operations and file versions.
|
||
- Periodic snapshots (that include a hash of the log state) allow for pruning of older logs while ensuring verifiability of data integrity.
|
||
|
||
### 2.3. P2P Node Network & Redundancy
|
||
|
||
- **Peer-to-Peer (P2P) Connectivity:**
|
||
|
||
- Users can connect their nodes with those of friends using an IP address plus public key combination for secure identification and mutual authentication.
|
||
- Nodes establish encrypted channels (e.g., via TLS or secure Diffie-Hellman exchanges) to share encrypted file blobs.
|
||
|
||
- **Data Redundancy and Backup:**
|
||
|
||
- Shared file blobs are replicated across nodes to provide data redundancy and backup.
|
||
- Users can define replication policies, choosing which peers should store copies.
|
||
|
||
- **Dynamic Trust and Access Control:**
|
||
- Nodes employ certificate or public-key-based validation mechanisms to grant or restrict access.
|
||
- A decentralized trust framework ensures that only authorized nodes participate in file sharing.
|
||
|
||
## 3. Architecture & Component Overview
|
||
|
||
### 3.1. Client Application
|
||
|
||
- **User Interaction:**
|
||
|
||
- Initially a command-line interface (CLI) is used for file management; a web interface is planned for future development.
|
||
|
||
- **Processing & Encryption:**
|
||
|
||
- Files are partitioned into chunks which are each individually encrypted using authenticated encryption algorithms (e.g., AES-GCM or ChaCha20-Poly1305).
|
||
- The client maintains an immutable, append-only manifest that records all operations such as additions, modifications, and deletions.
|
||
|
||
- **Synchronization:**
|
||
- The manifest is used for multi-device synchronization, allowing devices to merge their change logs seamlessly.
|
||
- The manifest incorporates periodic snapshots to prune old entries while keeping a cryptographic chain for integrity verification.
|
||
|
||
### 3.2. Server/P2P Node
|
||
|
||
- **Self-Hosting Friendly:**
|
||
|
||
- Designed to run on user-managed servers with simple configuration.
|
||
- Containerized deployment (e.g., Docker) ensures consistent and easy installation.
|
||
|
||
- **API & Data Handling:**
|
||
|
||
- The node provides a secure, stateless API (RESTful or gRPC) for interactions with clients.
|
||
- Encrypted blobs (chunks) are stored and indexed using content addressing to support deduplication and versioning.
|
||
|
||
- **P2P Functionality & Node Connectivity:**
|
||
|
||
- Nodes establish direct secure connections using an IP address and public key identifier.
|
||
- Support for NAT traversal techniques (e.g., UPnP, STUN, TURN) is built in to facilitate peer connections.
|
||
- A decentralized discovery mechanism (potentially via a DHT or bootstrap nodes) helps nodes locate one another.
|
||
|
||
- **Redundancy & Data Synchronization:**
|
||
- Each node maintains information in the manifest regarding which peers store which file chunks.
|
||
- Health checks and heartbeat signals maintain up-to-date replication and rebalancing across the network.
|
||
|
||
## 4. Detailed Design Considerations
|
||
|
||
### 4.1. File Partitioning and Deduplication
|
||
|
||
- **Chunking Strategy:**
|
||
|
||
- Fixed-size chunks provide predictability, while content-defined chunking (e.g., using Rabin fingerprinting) may help reduce unnecessary re-uploads when file content shifts.
|
||
|
||
- **Content Addressing:**
|
||
- SHA-256 is used to hash each chunk before storage, allowing duplicate chunks to be recognized and only stored once across nodes.
|
||
|
||
### 4.2. Manifest (Append-Only Log)
|
||
|
||
- **Structure & Integrity:**
|
||
|
||
- The manifest is an append-only log where each entry records a discrete change (addition, deletion, modification of chunks), along with metadata such as timestamps, device identifiers, and operation details.
|
||
- Cryptographic chaining (each log entry containing a hash of the previous one) ensures tamper-evident history.
|
||
|
||
- **Snapshot Mechanism:**
|
||
- Periodic snapshots capture the full state of the manifest up to that point by incorporating an aggregate hash.
|
||
- Future log entries reference the last snapshot, allowing previous data to be pruned while maintaining verifiability.
|
||
|
||
### 4.3. Multi-Device & P2P Synchronization
|
||
|
||
- **Manifest Merging:**
|
||
|
||
- Devices (including those on separate nodes) merge their manifest logs using ordering methods such as vector clocks or Lamport timestamps to resolve concurrent updates.
|
||
|
||
- **Node-to-Node Sharing:**
|
||
- Secure node connections based on IP address and public key identifiers facilitate the sharing of encrypted blobs between trusted peers.
|
||
- A decentralized model ensures that file updates, redundancy, and replication policies are distributed and maintained across the network.
|
||
|
||
### 4.4. Security and Access Control
|
||
|
||
- **Connection Security:**
|
||
|
||
- All node-to-node communications use strong encryption (TLS, Diffie-Hellman key exchange, or equivalent) to protect data in transit.
|
||
|
||
- **Trust & Authentication:**
|
||
- Nodes exchange public keys during initial handshake for mutual authentication.
|
||
- Certificate-based or signed permission systems safeguard against unauthorized access, ensuring only trusted peers participate.
|
||
|
||
### 4.5. Scalability, Resilience, and Management
|
||
|
||
- **NAT Traversal & Discovery:**
|
||
|
||
- Implementation of NAT traversal techniques (UPnP, STUN, TURN) and decentralized peer discovery ensures reliable connectivity, even behind firewalls.
|
||
|
||
- **Monitoring & Conflict Resolution:**
|
||
- Systems to monitor node availability (heartbeats, health checks) are essential for maintaining replication and redundancy.
|
||
- Conflict resolution protocols are implemented in the manifest for consistent state management across devices and nodes.
|
||
|
||
## 5. Threat Model & Security Audit
|
||
|
||
- **Threat Model Perspective:**
|
||
|
||
- Protect against potential vulnerabilities including unauthorized access, tampering with the manifest, compromised nodes, and interception of communications.
|
||
- Emphasize that all sensitive operations (encryption, key management, and manifest maintenance) are performed client-side or within a trusted node environment.
|
||
|
||
- **Audit & Transparency:**
|
||
- An immutable, cryptographically chained manifest offers auditability.
|
||
- The design encourages regular security audits and community reviews to validate the security framework.
|
||
|
||
## 6. Roadmap & Future Enhancements
|
||
|
||
- **Initial Development:**
|
||
|
||
- Build a robust CLI client for file operations, including adding files, updating versions, and restoring files, backed by an append-only manifest.
|
||
- Develop the server/P2P node with secure API endpoints and replication functionality.
|
||
|
||
- **P2P Network Expansion:**
|
||
|
||
- Implement automated peer discovery, NAT traversal support, and dynamic trust mechanisms.
|
||
- Create user-friendly tools for configuring peer connections and managing replication policies.
|
||
|
||
- **Advanced Features:**
|
||
- Extend the system with a web interface.
|
||
- Explore integration with existing distributed systems or version control systems.
|
||
- Consider plugins or additional tools to automate backup management and network health monitoring.
|
||
|
||
## 7. Conclusion
|
||
|
||
This discovery document establishes the fundamental framework for a secure, self-hosted encrypted file storage system that integrates efficient versioning, a verifiable append-only manifest, and a P2P network for file sharing and redundancy. By combining client-side encryption, content-addressable storage, robust manifest management, and secure node connectivity, the product aims to deliver high security, privacy, and resilience in a distributed self-hosting environment.
|