File Deduplication
OxiCloud uses SHA-256 content-addressable storage to avoid storing duplicate files. If two users upload the same file, only one copy is stored on disk.
How It Works
- When a file is uploaded, its SHA-256 hash is computed
- The hash is checked against the blob store (
.blobs/{prefix}/{hash}.blob) - If a blob with that hash already exists, the file metadata points to the existing blob (no extra disk usage)
- If not, the content is saved as a new blob
- A reference counter tracks how many files point to each blob
Automatic Cleanup
When a file is permanently deleted:
- The blob's reference count is decremented
- If the reference count reaches zero, the blob is removed from disk
This means disk space is only freed when the last reference to a blob is removed.
Storage Layout
storage/
├── .blobs/
│ ├── a1/
│ │ └── a1b2c3d4...sha256.blob
│ ├── f8/
│ │ └── f8e7d6c5...sha256.blob
│ └── ...The first two hex characters of the hash are used as a directory prefix to avoid having millions of files in a single directory.
Benefits
- Disk savings — identical files across users consume storage only once
- Instant uploads — if the blob already exists, the upload completes immediately
- Integrity — SHA-256 ensures bit-for-bit correctness
Limitations
- Deduplication is based on exact content match (byte-identical files)
- Near-duplicate files (e.g., a JPEG re-saved at slightly different quality) are stored separately
- Encryption at rest would require per-user keys, which breaks deduplication (planned as opt-in)