Nicholas Ventimiglia

Nicholas Ventimiglia

Startup focused AI-Native Architect

Deep Reading for Real Meaning


Deep Reading for Real Meaning is a high-precision knowledge extraction system designed to process large, complex corpora of text, mapping internal structures and verifying definitions against the outside world to produce structured, verified knowledge assets.

The naive approach to processing large amounts of text with Large Language Models (LLMs) is to dump document batches directly into a prompt and ask for a summary.

Instead, this built a system to systematically deep read documents. I break the documents into overlapping chunks, extract key terms and relationships piece-by-piece, link them into a cohesive network map, and verify the internal meanings against external definitions. This ensures every extracted fact is anchored directly to source evidence.

Source-Anchored Knowledge Mapping

Source-Anchored Knowledge Mapping: Extracting granular chunks into an interconnected, verified network graph.

The hallucination that got lawyers sanctioned

LLMs generate convincing text, but they fabricate. The larger the text block, the more likely it is to hallucinate. A confident wrong answer is worse than no answer at all.

Recently, a pair of New York attorneys ran into this and trusted ChatGPT to do their legal research. The model did not just get it wrong. It fabricated entire cases, complete with docket numbers, judicial quotes, and citations that never existed.

The result was disastrous. The judge found that the lawyers acted in "subjective bad faith," fined them thousands of dollars, and the story became a global warning about the dangers of blind trust in these tools.

When I rely on AI to parse the noise, I risk total 'hallucination' by an AI that prioritizes sounding right over being right.

AI Hallucination warning illustration

AI Hallucination vs. Fact Anchoring: Replacing ungrounded model outputs with strict source citations.

Origin: QuantGreenBook

I built this because I kept running into the same problem across two projects. QuantGreenBook is financial education software. Building it meant synthesizing dozens of textbooks into structured content. GammaCharts tracks market signals, which means reading SEC filings, Treasury speeches, and Fed commentary to distill signal from noise. In both cases, dumping documents into an LLM and asking for a summary wasn't good enough. I needed something that could read carefully.

Emic vs. Etic: Two Dimensions of Meaning

True precision reading requires a system that maps the internal vocabulary and structure of your documents before cross-referencing them with external standards. Not simple summarization, but structural comprehension.

This approach distinguishes between two distinct perspectives of truth: the emic (insider) meaning and the etic (outsider) meaning. The pipeline first maps the emic truth—how the corpus defines terms and concepts strictly on its own terms, from the inside. Only after capturing this internal vocabulary does it layer in the etic truth—the standard definitions and meanings established by the outside world. Contrasting these two perspectives exposes critical conceptual divergences, highlighting exactly where the documents deviate from public consensus.

Emic vs. Etic Perspectives of Meaning

Emic vs. Etic: Mapping the internal vocabulary of your corpus and cross-referencing it with public definitions to isolate conceptual divergence.


The Processing Pipeline

The pipeline is organized into five layers:

  • Layer 1: Ingestion
    Converts raw inputs (webpages, media, files, transcripts) into clean, standardized files. It uses deterministic scraping and text processing using off the shelf tool.
  • Layer 2: Extraction & Merge
    Extracts technical terms, definitions, and relationships strictly at face value from individual text chunks. Each chunk is passed to a local Ollama model in sequence, with dedicated passes to extract, merge, and flag observations across the full corpus.
  • Layer 3: Graph & Community Detection
    Terms alone don't reveal structure. The merged terms and relationships are fed into a directed graph, and clustering detection groups that share argumentative context. Ollama then summarizes each community and flags circular assumptions.
  • Layer 4: External Verification
    Internal definitions can drift from accepted meaning. For each term, the system searches the web, fetches the results, and passes both the (emic) internal meaning and (etic) external sources to Ollama. It then classifies divergence on a scale from fully aligned to contradicted.
  • Layer 5: Export & Synthesis
    Reshapes the verified knowledge graph into final formats designed for study and integration, generating structured glossaries, logical axioms representing causal chains, and synthetic question-and-answer pairs grounded strictly in the source text.
The five pipeline layers flowchart

The Processing Pipeline: From universal ingestion to final structural export and verified Q&A generation.


Zero Variable Cost for Multi-Pass Audits

This pipeline runs a lot of LLM calls. Every document is sliced into overlapping chunks, passed through multiple models independently, checked for contradictions, and verified against external sources.

The query count adds up fast, and that translates directly into cost. Running three models over a thousand pages of text would mean thousands of dollars in API bills per run.

By running locally on my own hardware, all of that computation is free. I can rerun the full pipeline, swap models, or audit a different corpus without watching a billing meter. The data never leaves my machine either.

Zero variable cost comparison

Zero Variable Cost: Processing dense verification passes locally bypasses cloud API expenses completely.


Key Learnings

Building a multi-layer deep reading architecture highlighted several crucial insights:

  • Calibrating Confidence: A single model processing a chunk in isolation cannot reliably gauge its own confidence or severity of divergence. Utilizing multi-model consensus and explicit validation loops calibrates the output quality.
  • The Importance of Rate-Limiting: While local LLM calls can run continuously, external verification requires strict rate-limiting and local caching to respect public search providers.
  • Validating with Targeted Evaluation: Generic QA pairs are insufficient for testing system understanding. High-fidelity verification requires generating targeted questions that focus specifically on the areas where internal documents and external definitions diverge.
Key takeaways diagram

Key Learnings: Calibrating confidence metrics, handling rate limits for external data, and using targeted evaluations.