renAI

Privacy & Data Retention Guide

This document outlines privacy considerations, data handling practices, and recommendations for using renAI securely.


πŸ“‹ Overview

renAI processes documents locally and sends text content to Large Language Model (LLM) providers for metadata extraction. Understanding how data flows through the system is essential for making informed decisions about your document privacy.

Data Flow

Local Document β†’ Text Extraction β†’ LLM API β†’ Metadata Response
                    ↓
              Local Cache Storage
  1. Text Extraction: Document content is extracted locally using PDF parsers or OCR.
  2. Vision Processing (Optional): If Vision Mode is enabled and text extraction fails (or if renaming images), pages/images are sampled and converted to Base64.
  3. API Transmission: Text or sampled images are sent to LLM providers (cloud or local).
  4. Processing: LLM analyzes content and returns metadata.
  5. Caching: Extracted text, sampled vision data, and final metadata are stored locally for efficiency.

πŸ”’ Data Sent to LLM Providers

When using cloud LLM providers (DeepInfra, OpenRouter, OpenAI), the following data is transmitted:

Data Type Description Sent to LLM
Document Text Extracted text content (up to 10000 chars) βœ… Yes
Sampled Images Base64 encoded page samples (Vision Mode only) βœ… Yes
Original Images Full image files when using --rename-images βœ… Yes
File Metadata Filename, file size, page count ❌ No
File Content Full document binary/source ❌ No
Category List Your custom categories from categories.toml βœ… Yes (in prompt)

What is NOT Sent

What IS Sent


🏠 Local Processing vs. Cloud Processing

Cloud Providers (Default)

Provider Data Location Privacy Level
DeepInfra US-based servers Standard
OpenRouter Various providers Standard
OpenAI US-based servers Standard

Characteristics:

Tool Setup Required Model Options
Ollama Install Ollama, run ollama serve Llama 3, Mistral, Gemma, etc.
LM Studio Install LM Studio, enable API Any GGUF model
LocalAI Docker or binary installation Many options
vLLM Server installation High-performance

Characteristics:


πŸ“ Local Cache Storage

Cache Location

renAI utilizes modern OS-agnostic path management via the platformdirs library. Cache files are stored in your operating system’s standard cache location:

The directory structure is organized as follows:

Cache/
β”œβ”€β”€ text/        # Extracted text files
└── metadata/    # metadata files

Cache Management

# Clear all cache (text, metadata, and ocr_debug)
renai cache

Cache Security Recommendations

  1. Use built-in tools: Use renai cache to safely clear cached data.
  2. Exclude from backups: Consider excluding the OS-specific cache directory from automated backups.
  3. Regular cleanup: Clear cache periodically, especially after processing sensitive documents.
  4. Encrypted storage: Store cache on encrypted drives if possible.

πŸ” Privacy by Scenario

Scenario 1: Public Domain Books

Risk Level: 🟒 Low

Recommendation: Any provider is suitable

Considerations:

# Example: Using OpenRouter for public books
renai process "C:/Public/Books" --mode rename --provider openrouter

Scenario 2: Personal Documents

Risk Level: 🟑 Medium

Recommendation: Use local LLM or trusted cloud provider

Considerations:

# Example: Using local Ollama for personal documents
# Set these in your OS-specific .env file or environment
RENAI_PROVIDER=custom
CUSTOM_API_BASE_URL=http://localhost:11434/v1
CUSTOM_MODEL=llama3
uv run renai process "C:/Personal/Documents" --mode rename

Scenario 3: Business/Professional Documents

Risk Level: πŸ”΄ High

Recommendation: Use local LLM only

Considerations:

# Example: Using LM Studio for business documents
$env:CUSTOM_API_BASE_URL = "http://localhost:1234/v1"
$env:CUSTOM_MODEL = "your-model"
renai process "C:/Work/Documents" --mode rename --provider custom

Scenario 4: Healthcare/Medical Records

Risk Level: πŸ”΄πŸ”΄ Very High

Recommendation: Local LLM with additional safeguards

Considerations:


πŸ›‘οΈ Best Practices for Privacy

1. Use Local Processing for Sensitive Content

export CUSTOM_API_BASE_URL="http://localhost:11434/v1"
export CUSTOM_MODEL="llama3"
renai process "C:/Sensitive/Files" --mode rename --provider custom

2. Clear Cache After Processing Sensitive Files

# Immediately clear cache after processing
renai cache

3. Review Provider Terms of Service

Before using any cloud provider, review their:

4. Implement Network Controls

For local inference servers:

# Bind to localhost only (prevent external access)
# Ollama: OLLAMA_HOST=127.0.0.1:11434 ollama serve

# Configure firewall to block external access

5. Use Environment Variables for Sensitive Config

# Instead of command-line arguments (which may be logged)
# Use the .env file in your OS configuration directory

RENAI_PROVIDER=deepinfra
DEEPINFRA_API_KEY=your-api-key-here

uv run renai process "C:/Books" --mode rename

6. Audit Log Review

Enable logging and review regularly:

# Check evaluation.log for processing history
cat evaluation.log

# Review console output for errors

πŸ“Š Provider Comparison

Provider Data Retention Training Data Privacy Rating
Local (Ollama/LM Studio) You control N/A ⭐⭐⭐⭐⭐
DeepInfra 30 days Check policy ⭐⭐⭐
OpenRouter Varies by provider Check policy ⭐⭐⭐
OpenAI 30 days May use ⭐⭐⭐

🧹 Data Cleanup Checklist

After processing sensitive documents:


βš–οΈ Compliance Considerations

GDPR (EU)

HIPAA (Healthcare - US)

SOC 2 / ISO 27001


❓ Frequently Asked Questions

Q: Can I use renAI offline?

A: Yes, by using a local LLM provider (Ollama, LM Studio). The tool requires an LLM API to function, but it can be entirely local.

Q: Does renAI upload my documents to the cloud?

A: Only the extracted text (up to 8000 characters) is sent to the LLM provider. The original document files are never uploaded.

Q: How long does the cloud provider keep my data?

A: This varies by provider. Check their terms of service:

Q: Can I delete my data from the cloud provider?

A: For cloud providers, you can request data deletion per their policies. For local processing, you have complete control.

Q: Is the cache encrypted?

A: No, the cache is stored as plain text files. Use renai cache to manage your data or store the OS-specific cache directory on encrypted storage if processing sensitive documents.

Q: What happens to data during retries?

A: If the initial LLM request fails, the system may retry with a modified prompt. Each retry sends the text content again.


πŸ“ž Additional Resources


πŸ“ Changelog

Date Change
2025-01-26 Initial privacy guide

2026-03-14 Updated to include platformdirs and new CLI commands
2026-04-09 Updated for Vision Mode data sharing transparency, uv integration, and .env security