StuLearn AI

Research Workspace

November 2025

Next.jsTypeScriptShadcn UINode.jsExpressMongoDBPythonFastAPIHuggingFace

The Problem

Researchers and students struggle to efficiently extract insights from lengthy PDF documents. Traditional search only finds exact keywords, missing conceptual relationships and making it difficult to synthesize information across multiple documents.

My Role

Full-Stack Developer & ML Engineer

The Solution

Developed an AI-powered research workspace that transforms PDFs into queryable knowledge graphs. The system uses semantic search, automatic summarization, and topic clustering with local LLMs from HuggingFace, providing an intelligent research assistant.

Technology Stack

Next.js

Modern React framework for the frontend interface

TypeScript

Type safety across the entire stack

Shadcn UI

Accessible and customizable component library

Node.js & Express

Backend API for document processing and queries

MongoDB

Document storage for PDFs and extracted content

Python & FastAPI

ML inference server for NLP operations

HuggingFace Models

Local LLMs for summarization and embeddings

Key Features

•PDF upload and automatic text extraction with structure preservation
•Semantic search across documents using transformer-based embeddings
•Automatic summarization of lengthy documents or specific sections
•Topic clustering to identify themes and relationships
•Knowledge graph visualization showing connections between concepts
•Query interface for natural language questions about uploaded documents

Impact & Results

Significantly reduced research time by enabling conceptual search and automatic summarization. Users can now query their entire research library semantically rather than relying on keyword search.

Screenshots

Lessons Learned

•Learned efficient PDF parsing techniques preserving document structure
•Gained experience running local LLMs for privacy-focused applications
•Understood the challenges of knowledge graph construction from unstructured text
•Mastered the balance between accuracy and performance in NLP pipelines