Benchmark Report · v2 (industrial)
An industrial-grade head-to-head: 22 real-world documents (financial filings, scientific papers, government reports, whitepapers, RFCs, spreadsheets, images), 80 questions across 11 categories and 7 question types, scored by an independent LLM judge with a fixed rubric.
Section 01
All metrics on the same 80-question eval set with the same 22 documents on both sides.
| Metric | AIveilix | AnythingLLM | Winner |
|---|
Section 02
Section 03
Average correctness sliced by document category, question type, and file format.
Section 04
an independent LLM judge (temperature=0 for deterministic scoring) scored each answer with a fixed rubric prompt. Same judge for both systems → bias cancels.
Section 05
Click any question to expand both answers + the judge's reasoning.
Section 06