Welcome!
Start today with continuous-eval and make your LLM development a science not an art!
🚀 Getting Started Install the package and learn how to get started quickly.
Navigate this Documentation
🚰 Pipeline Define your GenAI application pipeline and run evaluation over a tailored dataset.
📊 Metrics Explore the available metrics and learn how to combine multiple metrics effectively.
🔍 Datasets Explore sample datasets and try generating a synthetic evaluation dataset from documents.
💡 Examples Discover code snippets and examples to help you understand and implement different evaluation pipelines.
Other Resources
-
Blog Posts:
- Practical Guide to RAG Pipeline Evaluation: Part 1: Retrieval
- Practical Guide to RAG Pipeline Evaluation: Part 2: Generation
- How important is a Golden Dataset for LLM evaluation? link
- How to evaluate complex GenAI Apps: a granular approach link
- How to make the most out of LLM production data: simulated user feedback link
- Generate synthetic data to test LLM applications link
-
Discord: Join our community of LLM developers Discord
-
Reach out to founders: Email or Schedule a chat