Why the Semantic Web Underperformed Expectations
Tim Berners-Lee's 2001 vision of a machine-readable web powered by RDF, OWL, and SPARQL never produced the promised agentic future. High entry costs, weak publisher incentives, schema.org's lighter SEO-driven alternative, and LLMs that extract structure from prose directly all undercut it. The formal-semantics vision survived mainly in cultural-heritage linked data, life-science integration, and Wikidata, where institutional payoffs justify the ontology work.
In May 2001, Tim Berners-Lee, James Hendler, and Ora Lassila published "The Semantic Web" in Scientific American, sketching a future in which web pages would carry machine-readable assertions so software intelligent agents could negotiate appointments, reconcile medical records, and reason across vendors without human glue. The technical stack that followed — RDF for triples, OWL for ontologies, and SPARQL for queries — was rigorous, decentralized, and conceptually elegant. A quarter century later, the agentic web that arrived looks almost nothing like the one the paper described. The vision underperformed for converging reasons. The entry barrier was steep: publishing data correctly required minting stable URIs, picking namespaces, and either reusing or authoring an ontology — work that demanded skills closer to formal logic than to web development. Most site owners had no commercial incentive to expose clean data to competitors or aggregators, and consumers had no killer app that rewarded the effort. Cory Doctorow's 2001 essay Metacrap (Doctorow Essay) anticipated the social failure modes with seven blunt observations — people lie, people are lazy, schemas are not neutral, there is more than one way to describe anything — that applied as cleanly to RDFa as to Dublin Core. When schema.org launched in 2011 as a joint effort by Google, Bing, and Yahoo, it conceded the universal-ontology dream and offered a flat, pragmatic vocabulary aimed at a single payoff: richer search snippets. It won the markup war because the SEO incentive was legible and the schema was shallow enough for any webmaster. Large language models then absorbed the remaining use case from the other end. Instead of asking publishers to pre-format their knowledge as triples, modern LLMs extract entities, relations, and JSON directly from prose at inference time, treating the messy HTML web as their substrate. The knowledge graph moved inside the model rather than out into shared RDF stores. Where the original vision did land was in domains with patient institutions and high-value integration problems. The Linked Open Data Cloud now stitches together cultural-heritage catalogs through Europeana and national libraries, life-science resources such as UniProt and Bio2RDF, and the cross-domain hub Wikidata, which since its 2012 launch has accumulated over a hundred million items and more than a billion statements. The lesson is not that the Semantic Web failed but that formal interoperability survived exactly where the cost of ontology work was outweighed by the cost of not integrating — and lost everywhere else to lighter markup and statistical extraction.