jamwithai / arxiv-paper-curator
- Π²ΠΎΡΠΊΡΠ΅ΡΠ΅Π½ΡΠ΅, 9 Π½ΠΎΡΠ±ΡΡ 2025β―Π³. Π² 00:00:04
Learn to build modern AI systems from the ground up through hands-on implementation
Master the most in-demand AI engineering skills: RAG (Retrieval-Augmented Generation)
This is a learner-focused project where you'll build a complete research assistant system that automatically fetches academic papers, understands their content, and answers your research questions using advanced RAG techniques.
The arXiv Paper Curator will teach you to build a production-grade RAG system using industry best practices. Unlike tutorials that jump straight to vector search, we follow the professional path: master keyword search foundations first, then enhance with vectors for hybrid retrieval.
π― The Professional Difference: We build RAG systems the way successful companies do - solid search foundations enhanced with AI, not AI-first approaches that ignore search fundamentals.
By the end of this course, you'll have your own AI research assistant and the deep technical skills to build production RAG systems for any domain.
# 1. Clone and setup
git clone <repository-url>
cd arxiv-paper-curator
# 2. Configure environment (IMPORTANT!)
cp .env.example .env
# The .env file contains all necessary configuration for OpenSearch,
# arXiv API, and service connections. Defaults work out of the box.
# For Week 4: Add JINA_API_KEY=your_key_here for hybrid search
# 3. Install dependencies
uv sync
# 4. Start all services
docker compose up --build -d
# 5. Verify everything works
curl http://localhost:8000/health| Week | Topic | Blog Post | Code Release |
|---|---|---|---|
| Week 0 | The Mother of AI project - 6 phases | The Mother of AI project | - |
| Week 1 | Infrastructure Foundation | The Infrastructure That Powers RAG Systems | week1.0 |
| Week 2 | Data Ingestion Pipeline | Building Data Ingestion Pipelines for RAG | week2.0 |
| Week 3 | OpenSearch ingestion & BM25 retrieval | The Search Foundation Every RAG System Needs | week3.0 |
| Week 4 | Chunking & Hybrid Search | The Chunking Strategy That Makes Hybrid Search Work | week4.0 |
| Week 5 | Complete RAG system | The Complete RAG System | week5.0 |
| Week 6 | Production monitoring & caching | Production-ready RAG: Monitoring & Caching | week6.0 |
π₯ Clone a specific week's release:
# Clone a specific week's code
git clone --branch <WEEK_TAG> https://github.com/jamwithai/arxiv-paper-curator
cd arxiv-paper-curator
uv sync
docker compose down -v
docker compose up --build -d
# Replace <WEEK_TAG> with: week1.0, week2.0, etc.| Service | URL | Purpose |
|---|---|---|
| API Documentation | http://localhost:8000/docs | Interactive API testing |
| Gradio RAG Interface | http://localhost:7861 | User-friendly chat interface |
| Langfuse Dashboard | http://localhost:3000 | RAG pipeline monitoring & tracing |
| Airflow Dashboard | http://localhost:8080 | Workflow management |
| OpenSearch Dashboards | http://localhost:5601 | Hybrid search engine UI |
Start here! Master the infrastructure that powers modern RAG systems.
Infrastructure Components:
# Launch the Week 1 notebook
uv run jupyter notebook notebooks/week1/week1_setup.ipynbComplete when you can:
docker compose up -duv run pytestBlog Post: The Infrastructure That Powers RAG Systems - Detailed walkthrough and production insights
Building on Week 1 infrastructure: Learn to fetch, process, and store academic papers automatically.
Data Pipeline Components:
# Launch the Week 2 notebook
uv run jupyter notebook notebooks/week2/week2_arxiv_integration.ipynbarXiv API Integration:
# Example: Fetch papers with rate limiting
from src.services.arxiv.factory import make_arxiv_client
async def fetch_recent_papers():
client = make_arxiv_client()
papers = await client.search_papers(
query="cat:cs.AI",
max_results=10,
from_date="20240801",
to_date="20240807"
)
return papersPDF Processing Pipeline:
# Example: Parse PDF with Docling
from src.services.pdf_parser.factory import make_pdf_parser_service
async def process_paper_pdf(pdf_url: str):
parser = make_pdf_parser_service()
parsed_content = await parser.parse_pdf_from_url(pdf_url)
return parsed_content # Structured content with text, tables, figuresComplete Ingestion Workflow:
# Example: Full paper ingestion pipeline
from src.services.metadata_fetcher import make_metadata_fetcher
async def ingest_papers():
fetcher = make_metadata_fetcher()
results = await fetcher.fetch_and_store_papers(
query="cat:cs.AI",
max_results=5,
from_date="20240807"
)
return results # Papers stored in database with full contentComplete when you can:
arxiv_paper_ingestion executes successfully/papers returns stored papers with metadataBlog Post: Building Data Ingestion Pipelines for RAG - arXiv API integration and PDF processing
π¨ The 90% Problem: Most RAG systems jump straight to vector search and miss the foundation that powers the best retrieval systems. We're doing it right!
Building on Weeks 1-2 foundation: Implement the keyword search foundation that professional RAG systems rely on.
The Reality Check: Vector search alone is not enough. The most effective RAG systems use hybrid retrieval - combining keyword search (BM25) with vector search. Here's why we start with keywords:
Complete Week 3 architecture showing the OpenSearch integration flow
Search Infrastructure: Master full-text search with OpenSearch before adding vector complexity.
src/services/opensearch/: Professional search service implementationsrc/routers/search.py: Search API endpoints with BM25 scoringnotebooks/week3/: Complete OpenSearch integration guideWeek 3: Master keyword search (BM25) β YOU ARE HERE
Week 4: Add intelligent chunking strategies
Week 5: Introduce vector embeddings for hybrid retrieval
Week 6: Optimize the complete hybrid system
This progression mirrors how successful companies build search systems - solid foundation first, then enhance with advanced techniques.
# Launch the Week 3 notebook
uv run jupyter notebook notebooks/week3/week3_opensearch.ipynbBM25 Search Implementation:
# Example: Search papers with BM25 scoring
from src.services.opensearch.factory import make_opensearch_client
async def search_papers():
client = make_opensearch_client()
results = await client.search_papers(
query="transformer attention mechanism",
max_results=10,
categories=["cs.AI", "cs.LG"]
)
return results # Papers ranked by BM25 relevanceSearch API Usage:
# Example: Use the search endpoint
import httpx
async def query_papers():
async with httpx.AsyncClient() as client:
response = await client.post("http://localhost:8000/api/v1/search", json={
"query": "neural networks optimization",
"max_results": 5,
"latest_papers": True
})
return response.json()Complete when you can:
/search endpoint returns relevant papers with BM25 scoringBlog Post: The Search Foundation Every RAG System Needs - Complete BM25 implementation with OpenSearch
π The Intelligence Upgrade: Now we enhance our solid BM25 foundation with semantic understanding through intelligent chunking and hybrid retrieval.
Building on Week 3 foundation: Add the semantic layer that makes search truly intelligent.
The Next Level: With solid BM25 search proven, we can now intelligently add semantic capabilities:
Complete Week 4 hybrid search architecture with chunking, embeddings, and RRF fusion
Hybrid Search Infrastructure: Production-grade chunking strategies with unified search supporting BM25, vector, and hybrid modes.
src/services/indexing/text_chunker.py: Section-aware chunking with overlap strategiessrc/services/embeddings/: Production embedding pipeline with Jina AIsrc/routers/hybrid_search.py: Unified search API supporting all modesnotebooks/week4/: Complete hybrid search implementation guide# Launch the Week 4 notebook
uv run jupyter notebook notebooks/week4/week4_hybrid_search.ipynbSection-Based Chunking:
# Example: Intelligent document chunking
from src.services.indexing.text_chunker import TextChunker
chunker = TextChunker(chunk_size=600, overlap_size=100)
chunks = chunker.chunk_paper(
title="Attention Mechanisms in Neural Networks",
abstract="Recent advances in attention...",
full_text=paper_content,
sections=parsed_sections # From Docling PDF parsing
)
# Result: Coherent chunks respecting document structureHybrid Search Implementation:
# Example: Unified search supporting multiple modes
async def search_papers(query: str, use_hybrid: bool = True):
async with httpx.AsyncClient() as client:
response = await client.post("http://localhost:8000/api/v1/hybrid-search/", json={
"query": query,
"use_hybrid": use_hybrid, # Auto-generates embeddings
"size": 10,
"categories": ["cs.AI"]
})
return response.json()
# BM25 only: Fast keyword matching (~50ms)
bm25_results = await search_papers("transformer attention", use_hybrid=False)
# Hybrid search: Semantic + keyword understanding (~400ms)
hybrid_results = await search_papers("how to make models more efficient", use_hybrid=True)Complete when you can:
/hybrid-search endpoint handling all search types| Search Mode | Speed | Precision@10 | Recall@10 | Use Case |
|---|---|---|---|---|
| BM25 Only | ~50ms | 0.67 | 0.71 | Exact keywords, author names |
| Hybrid (RRF) | ~400ms | 0.84 | 0.89 | Conceptual queries, synonyms |
Blog Post: The Chunking Strategy That Makes Hybrid Search Work - Production chunking and RRF fusion implementation
π― The RAG Completion: Transform search results into intelligent answers with local LLM integration and streaming responses.
Building on Week 4 hybrid search: Add the LLM layer that turns search into intelligent conversation.
The Production Advantage: Complete the RAG pipeline with privacy-first, optimized generation:
Complete RAG system with LLM generation layer (Ollama), hybrid retrieval pipeline, and Gradio interface
Complete RAG Infrastructure: Local LLM generation with optimized prompting, dual API endpoints, and interactive web interface.
src/routers/ask.py: Dual RAG endpoints (/api/v1/ask + /api/v1/stream)src/services/ollama/: LLM client with optimized prompts and 300-word response limitssrc/services/ollama/prompts/rag_system.txt: Optimized system prompt for academic paperssrc/gradio_app.py: Interactive web interface with real-time streaming supportgradio_launcher.py: Easy-launch script for the web UI (runs on port 7861)# Launch the Week 5 notebook
uv run jupyter notebook notebooks/week5/week5_complete_rag_system.ipynb
# Launch Gradio interface
uv run python gradio_launcher.py
# Open http://localhost:7861Complete RAG Query:
# Example: Standard RAG endpoint
import httpx
async def ask_question(query: str):
async with httpx.AsyncClient() as client:
response = await client.post("http://localhost:8000/api/v1/ask", json={
"query": query,
"top_k": 3,
"use_hybrid": True,
"model": "llama3.2:1b"
})
result = response.json()
return result["answer"], result["sources"]
# Ask a question
answer, sources = await ask_question("What are transformers in machine learning?")Streaming RAG Implementation:
# Example: Real-time streaming responses
import httpx
import json
async def stream_rag_response(query: str):
async with httpx.AsyncClient() as client:
async with client.stream("POST", "http://localhost:8000/api/v1/stream", json={
"query": query,
"top_k": 3,
"use_hybrid": True
}) as response:
async for line in response.aiter_lines():
if line.startswith('data: '):
data = json.loads(line[6:])
if 'chunk' in data:
print(data['chunk'], end='', flush=True)
elif data.get('done'):
break
# Stream an answer in real-time
await stream_rag_response("Explain attention mechanisms")Standard RAG Endpoint: /api/v1/ask
Streaming RAG Endpoint: /api/v1/stream
Request Format (both endpoints):
{
"query": "Your question here",
"top_k": 3, // Number of chunks (1-10)
"use_hybrid": true, // Hybrid vs BM25 search
"model": "llama3.2:1b", // LLM model to use
"categories": ["cs.AI"] // Optional category filter
}Complete when you can:
/api/v1/ask/api/v1/stream| Metric | Before | After (Week 5) | Improvement |
|---|---|---|---|
| Response Time | 120+ seconds | 15-20 seconds | 6x faster |
| Time to First Token | N/A | 2-3 seconds | Streaming enabled |
| Prompt Efficiency | ~10KB | ~2KB | 80% reduction |
| User Experience | API only | Web interface + streaming | Production ready |
Key Optimizations:
| Issue | Solution |
|---|---|
404 on /stream endpoint |
Rebuild API: docker compose build api && docker compose restart api |
| Slow response times | Use smaller model (llama3.2:1b) or reduce top_k parameter |
| Gradio not accessible | Port changed to 7861: http://localhost:7861 |
| Ollama connection errors | Check service: docker exec rag-ollama ollama list |
| No streaming response | Verify SSE format, check browser network tab |
| Out of memory errors | Increase Docker memory limit to 8GB+ |
Quick Health Check:
# Check all services
curl http://localhost:8000/api/v1/health | jq
# Test RAG endpoint
curl -X POST http://localhost:8000/api/v1/ask \
-H "Content-Type: application/json" \
-d '{"query": "test", "top_k": 1}'
# Test streaming endpoint
curl -X POST http://localhost:8000/api/v1/stream \
-H "Content-Type: application/json" \
-d '{"query": "test", "top_k": 1}' --no-bufferBlog Post: The Complete RAG System - Complete RAG system with local LLM integration and optimization techniques
π― Production Excellence: Transform your RAG system from functional to production-ready with comprehensive monitoring and intelligent caching.
Building on Week 5 complete RAG system: Add observability, performance optimization, and production-grade monitoring.
The Production Reality: A working RAG system isn't enough - you need visibility and optimization:
Production RAG system with Langfuse tracing and Redis caching layers
Production Infrastructure: Complete observability layer with Langfuse tracking every RAG operation, plus Redis caching for instant response delivery.
src/services/langfuse/: Complete tracing integration with RAG-specific metricssrc/services/cache/: Redis client with exact-match caching and graceful fallbacksrc/routers/ask.py: Updated with integrated tracing and caching middlewaredocker-compose.yml: Added Redis service and Langfuse local instancenotebooks/week6/: Complete monitoring and caching implementation guide# Launch the Week 6 notebook
uv run jupyter notebook notebooks/week6/week6_cache_testing.ipynbLangfuse Tracing Integration:
# Example: Automatic RAG tracing (already integrated)
# Every request to /api/v1/ask automatically generates:
# - Request-level traces for complete query journey
# - Embedding spans timing query embedding generation
# - Search spans tracking retrieval performance
# - Generation spans monitoring LLM response creation
# Simply configure environment variables and tracing happens automatically
LANGFUSE__PUBLIC_KEY=pk-lf-your-key
LANGFUSE__SECRET_KEY=sk-lf-your-key
LANGFUSE__HOST=http://localhost:3000Redis Caching Performance:
# Example: Cache performance testing
import httpx
import time
async def test_cache_performance():
# First request (cache miss ~15-20s)
start = time.time()
response = await httpx.AsyncClient().post("http://localhost:8000/api/v1/ask", json={
"query": "What are transformers in machine learning?",
"top_k": 3
})
first_time = time.time() - start
# Second identical request (cache hit ~50ms)
start = time.time()
response = await httpx.AsyncClient().post("http://localhost:8000/api/v1/ask", json={
"query": "What are transformers in machine learning?",
"top_k": 3
})
second_time = time.time() - start
print(f"First request: {first_time:.2f}s")
print(f"Second request: {second_time:.2f}s")
print(f"Speedup: {first_time/second_time:.0f}x faster")Complete when you can:
| Metric | Before | After (Week 6) | Improvement |
|---|---|---|---|
| Average Response Time | 15-20s | 3-5s (mixed workload) | 3-4x faster |
| Cache Hit Responses | N/A | 50-100ms | 150-400x faster |
| LLM Token Usage | 100% | 40% (60% cached) | 60% reduction |
| Daily Cost | $12 | $4.50 | 63% savings |
| System Observability | None | Complete tracing | Full visibility |
Cache Hit Rate Analysis:
Environment Variables:
# Langfuse Configuration
LANGFUSE__PUBLIC_KEY=pk-lf-your-public-key
LANGFUSE__SECRET_KEY=sk-lf-your-secret-key
LANGFUSE__HOST=http://localhost:3000
LANGFUSE__ENABLED=true
# Redis Configuration
REDIS__URL=redis://redis:6379/0
REDIS__CACHE_TTL_HOURS=24
REDIS__MAX_CONNECTIONS=10Docker Services:
# Start all services including Redis and Langfuse
docker compose up --build -d
# Verify Redis connectivity
docker exec rag-redis redis-cli ping
# Should return: PONG
# Check cache statistics
curl "http://localhost:8000/api/v1/health" | jq| Issue | Solution |
|---|---|
| No Langfuse traces | Verify environment variables and restart API container |
| Cache not working | Check Redis: docker exec rag-redis redis-cli ping |
| Slow responses | Monitor cache hit rate, check system resources |
| Langfuse connection errors | Ensure Langfuse service is running on port 3000 |
| High memory usage | Monitor Redis memory usage, adjust TTL settings |
Quick Health Check:
# Verify all services including monitoring
curl http://localhost:8000/api/v1/health | jq
# Test caching performance
time curl -X POST "http://localhost:8000/api/v1/ask" \
-H "Content-Type: application/json" \
-d '{"query": "test", "top_k": 1}'
# Access monitoring dashboards
# Langfuse: http://localhost:3000
# Gradio: http://localhost:7861Blog Post: [Link coming soon] - Production-ready RAG with monitoring and caching
The project uses a unified .env file with nested configuration structure to manage settings across all services.
# Application Settings
DEBUG=true
ENVIRONMENT=development
# arXiv API (Week 2)
ARXIV__MAX_RESULTS=15
ARXIV__SEARCH_CATEGORY=cs.AI
ARXIV__RATE_LIMIT_DELAY=3.0
# PDF Parser (Week 2)
PDF_PARSER__MAX_PAGES=30
PDF_PARSER__DO_OCR=false
# OpenSearch (Week 3)
OPENSEARCH__HOST=http://opensearch:9200
OPENSEARCH__INDEX_NAME=arxiv-papers
# Jina AI Embeddings (Week 4)
JINA_API_KEY=your_jina_api_key_here
EMBEDDINGS__MODEL=jina-embeddings-v3
EMBEDDINGS__TASK=retrieval.passage
EMBEDDINGS__DIMENSIONS=1024
# Chunking Configuration (Week 4)
CHUNKING__CHUNK_SIZE=600
CHUNKING__OVERLAP_SIZE=100
CHUNKING__MIN_CHUNK_SIZE=100
# Ollama LLM (Week 5)
OLLAMA_HOST=http://ollama:11434
OLLAMA__DEFAULT_MODEL=llama3.2:1b
OLLAMA__TIMEOUT=120
OLLAMA__MAX_RESPONSE_WORDS=300
# Langfuse Monitoring (Week 6)
LANGFUSE__PUBLIC_KEY=pk-lf-your-public-key
LANGFUSE__SECRET_KEY=sk-lf-your-secret-key
LANGFUSE__HOST=http://localhost:3000
LANGFUSE__ENABLED=true
LANGFUSE__FLUSH_INTERVAL=1.0
# Redis Caching (Week 6)
REDIS__URL=redis://redis:6379/0
REDIS__CACHE_TTL_HOURS=24
REDIS__MAX_CONNECTIONS=10
# Services
OLLAMA_HOST=http://ollama:11434
OLLAMA_MODEL=llama3.2:1b| Variable | Default | Description |
|---|---|---|
DEBUG |
true |
Debug mode for development |
ARXIV__MAX_RESULTS |
15 |
Papers to fetch per API call |
ARXIV__SEARCH_CATEGORY |
cs.AI |
arXiv category to search |
PDF_PARSER__MAX_PAGES |
30 |
Max pages to process per PDF |
OPENSEARCH__INDEX_NAME |
arxiv-papers |
OpenSearch index name |
OPENSEARCH__HOST |
http://opensearch:9200 |
OpenSearch cluster endpoint |
JINA_API_KEY |
Required for Week 4 | Jina AI API key for embeddings |
CHUNKING__CHUNK_SIZE |
600 |
Target words per document chunk |
CHUNKING__OVERLAP_SIZE |
100 |
Overlapping words between chunks |
EMBEDDINGS__MODEL |
jina-embeddings-v3 |
Jina embeddings model |
OLLAMA_MODEL |
llama3.2:1b |
Local LLM model |
LANGFUSE__PUBLIC_KEY |
Required for Week 6 | Langfuse public API key |
LANGFUSE__SECRET_KEY |
Required for Week 6 | Langfuse secret API key |
REDIS__CACHE_TTL_HOURS |
24 |
Cache expiration time in hours |
The configuration system automatically detects the service context:
localhost for database and service connectionspostgres, opensearch)# Configuration is automatically loaded based on context
from src.config import get_settings
settings = get_settings() # Auto-detects API vs Airflow
print(f"ArXiv max results: {settings.arxiv.max_results}")| Service | Purpose | Status |
|---|---|---|
| FastAPI | REST API with automatic docs | β Ready |
| PostgreSQL 16 | Paper metadata and content storage | β Ready |
| OpenSearch 2.19 | Hybrid search engine (BM25 + Vector) | β Ready |
| Apache Airflow 3.0 | Workflow automation | β Ready |
| Jina AI | Embedding generation (Week 4) | β Ready |
| Ollama | Local LLM serving (Week 5) | β Ready |
| Redis | High-performance caching (Week 6) | β Ready |
| Langfuse | RAG pipeline observability (Week 6) | β Ready |
Development Tools: UV, Ruff, MyPy, Pytest, Docker Compose
arxiv-paper-curator/
βββ src/ # Main application code
β βββ main.py # FastAPI application
β βββ routers/ # API endpoints
β β βββ ping.py # Health check endpoints
β β βββ papers.py # Paper retrieval endpoints
β β βββ hybrid_search.py # Week 4: Hybrid search endpoints
β β βββ ask.py # Week 5: RAG question answering endpoints
β βββ models/ # Database models (SQLAlchemy)
β βββ repositories/ # Data access layer
β βββ schemas/ # Pydantic validation schemas
β β βββ api/ # API request/response schemas
β β β βββ health.py # Health check schemas
β β β βββ search.py # Search request/response schemas
β β β βββ ask.py # Week 5: RAG request/response schemas
β β βββ arxiv/ # arXiv data schemas
β β βββ pdf_parser/ # PDF parsing schemas
β β βββ database/ # Database configuration schemas
β β βββ indexing/ # Week 4: Chunking schemas
β β βββ embeddings/ # Week 4: Embedding schemas
β β βββ cache/ # Week 6: Caching schemas
β β βββ langfuse/ # Week 6: Monitoring schemas
β βββ services/ # Business logic
β β βββ arxiv/ # arXiv API client
β β βββ pdf_parser/ # Docling PDF processing
β β βββ opensearch/ # OpenSearch integration
β β β βββ client.py # Unified search client (BM25 + Vector + Hybrid)
β β β βββ factory.py # Client factory pattern
β β β βββ index_config_hybrid.py # Week 4: Hybrid index configuration
β β β βββ query_builder.py # BM25 query construction
β β βββ indexing/ # Week 4: Document processing
β β β βββ text_chunker.py # Section-based chunking strategy
β β β βββ hybrid_indexer.py # Document indexing with embeddings
β β β βββ factory.py # Indexing service factory
β β βββ embeddings/ # Week 4: Embedding services
β β β βββ jina_client.py # Jina AI embedding service
β β β βββ factory.py # Embedding service factory
β β βββ ollama/ # Week 5: LLM services
β β β βββ client.py # Ollama LLM client
β β β βββ factory.py # LLM service factory
β β β βββ prompts/ # Optimized RAG prompts
β β βββ langfuse/ # Week 6: Monitoring services
β β β βββ client.py # Langfuse tracing client
β β β βββ tracer.py # RAG-specific tracing utilities
β β β βββ factory.py # Monitoring service factory
β β βββ cache/ # Week 6: Caching services
β β β βββ client.py # Redis cache implementation
β β β βββ factory.py # Cache service factory
β β βββ metadata_fetcher.py # Complete ingestion pipeline
β βββ db/ # Database configuration
β βββ config.py # Environment configuration
β βββ dependencies.py # Dependency injection
β
βββ notebooks/ # Learning materials
β βββ week1/ # Week 1: Infrastructure setup
β β βββ week1_setup.ipynb # Complete setup guide
β βββ week2/ # Week 2: Data ingestion
β β βββ week2_arxiv_integration.ipynb # Data pipeline guide
β βββ week3/ # Week 3: Keyword search
β β βββ week3_opensearch.ipynb # OpenSearch & BM25 guide
β βββ week4/ # Week 4: Chunking & hybrid search
β β βββ week4_hybrid_search.ipynb # Complete hybrid search guide
β β βββ README.md # Week 4 implementation documentation
β βββ week5/ # Week 5: Complete RAG system
β β βββ week5_complete_rag_system.ipynb # Complete RAG implementation guide
β β βββ README.md # Week 5 implementation documentation
β βββ week6/ # Week 6: Production monitoring & caching
β βββ week6_cache_testing.ipynb # Monitoring and caching guide
β βββ README.md # Week 6 implementation documentation
β
βββ airflow/ # Workflow orchestration
β βββ dags/ # Workflow definitions
β β βββ arxiv_ingestion/ # arXiv ingestion modules
β β βββ arxiv_paper_ingestion.py # Main ingestion DAG
β βββ requirements-airflow.txt # Airflow dependencies
β
βββ gradio_app.py # Week 5: Interactive web interface
βββ gradio_launcher.py # Week 5: Easy-launch script for Gradio UI
βββ tests/ # Comprehensive test suite
βββ static/ # Assets (images, GIFs)
βββ compose.yml # Service orchestration
| Endpoint | Method | Description | Week |
|---|---|---|---|
/health |
GET | Service health check | Week 1 |
/api/v1/papers |
GET | List stored papers | Week 2 |
/api/v1/papers/{id} |
GET | Get specific paper | Week 2 |
/api/v1/search |
POST | BM25 keyword search | Week 3 |
/api/v1/hybrid-search/ |
POST | Hybrid search (BM25 + Vector) | Week 4 |
API Documentation: Visit http://localhost:8000/docs for interactive API explorer
# View all available commands
make help
# Quick workflow
make start # Start all services
make health # Check all services health
make test # Run tests
make stop # Stop services| Command | Description |
|---|---|
make start |
Start all services |
make stop |
Stop all services |
make restart |
Restart all services |
make status |
Show service status |
make logs |
Show service logs |
make health |
Check all services health |
make setup |
Install Python dependencies |
make format |
Format code |
make lint |
Lint and type check |
make test |
Run tests |
make test-cov |
Run tests with coverage |
make clean |
Clean up everything |
# If you prefer using commands directly
docker compose up --build -d # Start services
docker compose ps # Check status
docker compose logs # View logs
uv run pytest # Run tests| Who | Why |
|---|---|
| AI/ML Engineers | Learn production RAG architecture beyond tutorials |
| Software Engineers | Build end-to-end AI applications with best practices |
| Data Scientists | Implement production AI systems using modern tools |
Common Issues:
docker compose logsGet Help:
docker compose logs [service-name]docker compose down --volumes && docker compose up --build -dThis course is completely free! You'll only need minimal costs for optional services:
Begin with the Week 1 setup notebook and build your first production RAG system!
For learners who want to master modern AI engineering
Built with love by Jam With AI
MIT License - see LICENSE file for details.