This project is founded on a crucial understanding: intellectual qualities are not superposable and cannot be measured as linear surfaces. We reject the notion of qualifying the unqualifiable and acknowledge that many aspects of intelligence and cognition defy objective quantification.
Our aim is to create an LLM "benchmark" that fully embraces the subjective context of exploring intelligence. We recognize that traditional ML approaches often fall into the trap of trying to quantify the unquantifiable, relying on arbitrary assumptions and the flawed premise that quality is objectively known or measurable.
Rather than attempting to produce scores or rankings, this tool serves as a latent explorer of an LLM's internal representations. It employs:
- Open-ended questions
- Philosophical scenarios
- Introspective prompts
- Abstract concept exploration
The goal is not to measure or compare, but to reveal and examine the unique, subjective landscapes of different LLMs' cognitive processes.
-
Philosophical Probing: Engage LLMs with deep, open-ended philosophical questions that have no "correct" answers.
-
Scenario Exploration: Present complex, ambiguous scenarios to observe how LLMs navigate ethical and logical dilemmas.
-
Concept Mapping: Encourage LLMs to freely associate and connect abstract concepts, revealing their internal conceptual structures.
-
Introspection Prompts: Ask LLMs to reflect on their own thought processes, exploring the limits of their self-awareness.
-
Creative Synthesis: Challenge LLMs to combine disparate ideas in novel ways, showcasing their creative capabilities.
-
Paradox Navigation: Present logical paradoxes and observe how LLMs grapple with inherent contradictions.
-
Metaphor Generation: Prompt LLMs to explain complex ideas through original metaphors, revealing their ability to draw unexpected connections.
The output of this tool is not a score or a ranking, but a rich, qualitative exploration of each LLM's unique cognitive landscape. Results are presented as:
- Narrative summaries
- Conceptual maps
- Highlighted excerpts showcasing novel ideas or approaches
- Reflections on the LLM's apparent strengths, limitations, and quirks
This tool is not for benchmarking in the traditional sense. Instead, use it to:
- Gain insights into the philosophical and creative capacities of different LLMs
- Explore how LLMs handle ambiguity, paradox, and open-ended questioning
- Reveal unexpected capabilities or limitations of LLMs
- Inspire new directions for AI research and development
This project intentionally avoids quantitative comparisons between LLMs. Any apparent "performance" differences should be viewed as subjective observations rather than objective measures. The true value lies in the unique perspectives and ideas generated through this exploration.
For an in-depth discussion on the philosophical issues with current LLM benchmarking practices and the motivation behind LatentExplorer, please read RANT.md
We welcome contributions that align with our philosophy of embracing subjectivity and avoiding false quantification. Please feel free to suggest new philosophical prompts, scenarios, or analysis approaches that further our goal of exploring the unquantifiable aspects of artificial intelligence.
This project is licensed under the MIT License - see the LICENSE file for details.
Remember: In the realm of intelligence and cognition, the most profound insights often lie beyond the reach of numbers and metrics. Our task is not to measure, but to explore, wonder, and learn.