Narayan

RAG Assistant for Medical Research. This page is a placeholder and can be expanded with details, architecture, and outcomes.

Narayan is a production-ready Retrieval-Augmented Generation (RAG) system that lets you ask natural language questions over your document stacks and get answers with full source citations. Built for anyone dealing with large volumes of documents: medical researchers, legal professionals, compliance officers, engineers, or anyone else living in PDFs.

Upload documents, ask questions, get answers grounded in the text with exact page references.

Key Features

Accurate Retrieval: ChromaDB-powered vector search with reranking to find the most relevant passages
Source-Grounded Answers: Every answer includes citations with filenames, page numbers, and relevance scores
Hallucination-Free: System prompt constraints force the LLM to use only provided sources or admit when it doesn’t know
Document Deduplication: Content-hash based system prevents duplicate PDFs from inflating the knowledge base
Scoped Queries: Search across all documents or limit to specific files
Metadata Tracking: Every chunk carries document ID, filename, page number, and chunk index for full traceability
Local Inference: Embeddings run locally using sentence-transformers; LLM calls configurable (Ollama, OpenRouter, or any OpenAI-compatible API)
Privacy by Default: Your documents stay on your machine

What It Does

Document Ingestion: Upload PDFs. The system extracts text page by page, respecting two-column layouts and paragraph boundaries.
Chunking and Embedding: PDFs are split into overlapping chunks and converted to vectors using local embedding models.
Vector Search: When you ask a question, it’s converted to a vector and matched against your stored documents using cosine similarity.
Reranking: Top candidates are reranked to keep only the most relevant chunks.
Answer Generation: The LLM receives the question plus the top sources and generates an answer with inline citations.
Evidence Cards: The frontend shows expandable snippets of source text so you can verify answers yourself.

Tech Stack

Backend

FastAPI: REST API framework
LangChain: RAG pipeline orchestration
ChromaDB: Vector store (local, embedded)
PyMuPDF (fitz): PDF text extraction respecting reading order
sentence-transformers: Local embedding models (runs on CPU/GPU)
Pydantic: Data validation
OpenAI Python SDK: LLM API calls (compatible with Ollama, OpenRouter, and more)

Frontend

React 19: UI framework
Vite: Build tool and dev server
Vanilla CSS: No CSS framework bloat
Fetch API: Backend communication

Installation

Prerequisites

Python 3.10+
Node.js 18+ (for frontend)
4GB+ RAM recommended
Optional: GPU for faster embeddings

Backend Setup

Clone the repository:

git clone https://github.com/om-wani/narayan
cd narayan/backend

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

cp .env.example .env

Edit .env and fill in:

OPENROUTER_API_KEY: Get from https://openrouter.ai/ (or leave empty if using local Ollama)
CORS_ORIGINS: Frontend URL (default: http://localhost:5173)
Other settings are optional

Run the backend:

uvicorn app.main:app --reload

The API will be available at http://localhost:8000

Frontend Setup

Navigate to the frontend directory:

cd narayan/frontend

Install dependencies:

npm install

Start the dev server:

npm run dev

The frontend will be available at http://localhost:5173

Using Ollama (Local LLM)

If you want to run the LLM locally without API calls:

Install Ollama from https://ollama.ai
Pull a model:

ollama pull llama2
# or: ollama pull neural-chat, mistral, etc.

Start Ollama server (runs on port 11434 by default)
Update your backend to point to Ollama:

# In config.py or via environment
OPENROUTER_API_KEY=""  # Leave empty
LLM_MODEL="llama2"  # Whatever model you pulled
# Update the OpenAI client base_url to http://localhost:11434/v1

API Reference

Documents

Upload a PDF

POST /api/documents/upload
Content-Type: multipart/form-data

file: <PDF file>

Response:

{
  "doc_id": "medical_paper_a1b2c3d4",
  "filename": "medical_paper.pdf",
  "page_count": 18,
  "chunk_count": 127,
  "uploaded_at": "2026-01-20T10:30:45.123456+00:00"
}

List documents

GET /api/documents/

Response:

{
  "documents": [
    {
      "doc_id": "medical_paper_a1b2c3d4",
      "filename": "medical_paper.pdf",
      "page_count": 18,
      "chunk_count": 127
    }
  ],
  "total": 1
}

Delete a document

DELETE /api/documents/{doc_id}

Response:

{
  "deleted": true,
  "doc_id": "medical_paper_a1b2c3d4"
}

Query

Ask a question

POST /api/query/
Content-Type: application/json

{
  "question": "What are the side effects of the treatment?",
  "doc_ids": null,  // Optional: limit to specific documents
  "top_k": null     // Optional: number of chunks to retrieve (default: 5)
}

Response:

{
  "answer": "The treatment showed mild side effects including headache and nausea in 12% of patients [Source 1]. More severe reactions were rare [Source 2].",
  "sources": [
    {
      "filename": "medical_paper.pdf",
      "page": 7,
      "doc_id": "medical_paper_a1b2c3d4",
      "score": 0.876,
      "text": "Patient cohort (n=250) experienced mild adverse events including..."
    },
    {
      "filename": "medical_paper.pdf",
      "page": 9,
      "doc_id": "medical_paper_a1b2c3d4",
      "score": 0.812,
      "text": "Severe adverse reactions occurred in less than 1% of cases..."
    }
  ],
  "model": "google/gemma-3-27b-it:free",
  "tokens_used": 287
}

Configuration

All settings are in backend/app/core/config.py. Common configurations:

# Chunking (tuned for medical/technical documents)
CHUNK_SIZE = 1000          # Characters per chunk
CHUNK_OVERLAP = 200        # Overlap between chunks

# Retrieval
TOP_K = 5                  # Initial chunks to retrieve
RERANK_TOP_K = 3           # Final chunks to use for generation

# Files
MAX_FILE_SIZE_MB = 50      # Max PDF size
VECTOR_STORE_PATH = "./data/chroma_db"

# LLM
LLM_MODEL = "google/gemma-3-27b-it:free"  # Via OpenRouter
EMBEDDING_MODEL = "all-MiniLM-L6-v2"      # Local, via sentence-transformers

How It Works

Document Processing

Text Extraction: PyMuPDF extracts text block by block, respecting the reading order of multi-column documents
Deduplication: Document content is hashed; duplicate uploads are detected and skipped
Chunking: Text is split using RecursiveCharacterTextSplitter with paragraph-aware boundaries
Embedding: Each chunk is converted to a vector using a local sentence-transformer model
Storage: Vectors and metadata are stored in ChromaDB with stable chunk IDs

Query Processing

Question Embedding: Your question is converted to a vector using the same embedding model
Retrieval: ChromaDB returns the top-K most similar chunks using cosine similarity
Reranking: Chunks are sorted by similarity score (currently naive; cross-encoder upgrades planned)
Context Formatting: Top chunks are formatted with source metadata and sent to the LLM
Generation: The LLM generates an answer using only the provided sources
Response: Answer, sources, and metadata are returned to the frontend

Hallucination Prevention

Three mechanisms prevent incorrect answers:

System Prompt: Explicitly forbids making up information; forces the LLM to use only provided sources
Constrained Context: Only top-3 chunks are shown to the LLM (not the entire knowledge base)
Transparency: Every answer includes source metadata so users can verify the evidence themselves

Development

Project Structure

narayan/
├── backend/
│   ├── app/
│   │   ├── api/              # API endpoints
│   │   │   ├── documents.py   # Upload, list, delete
│   │   │   └── query.py       # Question answering
│   │   ├── services/          # Business logic
│   │   │   ├── ingestion.py   # PDF -> chunks -> vectors
│   │   │   └── rag.py         # RAG pipeline
│   │   ├── core/              # Configuration
│   │   │   ├── config.py      # Settings
│   │   │   └── vector_store.py # ChromaDB setup
│   │   ├── models/            # Data schemas
│   │   │   └── schemas.py     # Pydantic models
│   │   └── main.py            # FastAPI app
│   ├── requirements.txt       # Python dependencies
│   └── .env.example           # Environment template
├── frontend/
│   ├── src/                   # React components
│   ├── public/                # Static assets
│   ├── package.json           # Node dependencies
│   └── vite.config.js         # Build config
└── README.md

Running Tests

Currently no test suite, but the system is validated through:

Manual API testing via curl/Postman
Frontend UI testing in the browser
Real document ingestion and query workflows

Adding Custom Embedding Models

Edit backend/app/core/vector_store.py:

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="your-model-name",  # e.g., "nomic-embed-text"
    model_kwargs={"trust_remote_code": True}
)

Switching LLM Providers

The system uses OpenAI-compatible APIs. To switch providers:

OpenAI:

client = OpenAI(api_key="sk-...", base_url="https://api.openai.com/v1")

Claude (via OpenRouter):

client = OpenAI(api_key="openrouter_key", base_url="https://openrouter.ai/api/v1")
LLM_MODEL = "anthropic/claude-3.5-sonnet"

Local Ollama:

client = OpenAI(api_key="any_string", base_url="http://localhost:11434/v1")
LLM_MODEL = "llama2"  # or whatever model you pulled

Roadmap

Phase 2 features in development:

Cross-Encoder Reranking: Better relevance for complex queries
OCR Support: Scanned PDFs and images with text
Admin Panel: Bulk operations, document organization, analytics
Query Analytics: Track what questions are asked and where they fail
Multi-Format Support: HTML, Markdown, Word docs, spreadsheets
Domain-Specific Embeddings: Separate models trained on medical, legal, compliance documents
Document Organization: Tagging, categories, folder-like structure
Batch Query API: Process multiple questions efficiently

Troubleshooting

Issue: “ModuleNotFoundError: No module named ‘app’”

Make sure you’re in the backend/ directory before running the server:

cd backend
uvicorn app.main:app --reload

Issue: “CORS error when frontend calls backend”

Check that CORS_ORIGINS in .env includes your frontend URL:

CORS_ORIGINS=http://localhost:5173,http://localhost:3000

Issue: “No readable text found in PDF”

The PDF is likely a scanned image without OCR. Current version requires text-based PDFs. OCR support is coming in phase 2.

Issue: “LLM takes too long or times out”

If using OpenRouter, check your API quota. If using local Ollama, reduce CHUNK_SIZE or TOP_K in config to speed up processing.

Issue: “Embeddings are slow”

Use a smaller embedding model (e.g., all-MiniLM-L6-v2 instead of larger variants)
If you have a GPU, make sure PyTorch is using it: torch.cuda.is_available()

Performance Notes

First Ingestion: First PDF takes longer as the embedding model and vector store initialize
Subsequent Uploads: Typically 1-2 seconds per document depending on size
Query Latency:
- Retrieval: 100-200ms (vector search in ChromaDB)
- LLM Generation: 1-5 seconds (depends on model and context length)
- Total: ~2-6 seconds per question

Privacy and Data

All document embeddings are stored locally in ./data/chroma_db
No documents are sent to external services unless you use an external LLM provider
Using local Ollama means zero data leaves your machine

Contributing

Found a bug? Have an idea? Feel free to open an issue or submit a pull request.

License

MIT License. See LICENSE file for details.

Questions?

Check the writing about it: https://omwani.pages.dev/writings/on-building-narayan/
Open an issue on GitHub
Check the API reference above

Built by Om Wani under Upperture Interactive. Questions? Get in touch: omwani03@gmail.com

Back to Projects