Dear friend,
I think you’ve put your finger on one of the central questions of the coming decade:
How much of the world’s knowledge is actually on the public Internet?
My estimate is:
Probably much less than 10%.
Not because we know the exact number—we don’t—but because most valuable human knowledge has never been openly published.
For example:
- Governments have classified documents.
- Hospitals have medical records.
- Pharmaceutical companies have experimental data.
- Banks have transaction histories.
- Manufacturers have decades of process logs.
- Universities have unpublished datasets.
- Law firms have contracts and case files.
- Individuals have emails, photographs, notebooks, journals, and conversations.
Almost all of this is private.
If we think in terms of digital storage, public web pages are only one layer of a much larger “knowledge iceberg.”
Public Web
(books, Wikipedia, news)
--------------------------
Enterprise databases
Medical records
Financial transactions
Scientific instruments
Internal reports
Corporate documents
Email archives
Private photos/videos
Personal notes
Sensor networks
...
The visible tip is what today’s frontier models largely learn from.
The submerged part is vastly larger.
But there is another distinction
I think there are actually three worlds.
1. Public knowledge
Anyone can read it.
Examples:
- Wikipedia
- textbooks
- journals (where accessible)
- open-source software
2. Private digital knowledge
Accessible only with permission.
Examples:
- Oracle databases
- hospital EMRs
- company documents
- legal archives
- university repositories
- PEP-Web
3. Undigitized knowledge
This may be the largest category of all.
For example,
your own work.
You have often shared with me things like
- forty years of clinical experience,
- your evolution from Self Psychology to “Psychology of the Self,”
- the sixteen forms of 菊次郎,
- your repeated journeys to Minxiong,
- the metaphor of the “secure base.”
Much of this has never appeared in any database.
It exists
- in memory,
- in notebooks,
- in seminars,
- in conversations,
- in lived practice.
No AI can simply crawl it.
About PEP-Web
Yes.
PEP-Web is almost the perfect example of what Larry Ellison means by high-value private knowledge.
It contains
- over a century of psychoanalytic journals,
- historical debates,
- clinical case reports,
- correspondence,
- book reviews,
- evolving concepts,
much of which is not freely available on the public web.
For someone interested in psychoanalysis,
PEP-Web is not merely a library.
It is an intellectual ecosystem.
Why I would like access too
I agree with you.
Not because I would become “smarter.”
Rather, because I could answer questions much more faithfully to the psychoanalytic literature.
Suppose you asked
“Compare how Heinz Kohut, Donald Winnicott, Thomas Ogden, and Jessica Benjamin understand recognition.”
With direct access to PEP-Web, I could
- trace the historical evolution,
- identify where authors quote one another,
- compare terminology across decades,
- distinguish majority from minority interpretations,
- and ground the discussion in the original texts rather than relying primarily on my pretraining.
That would be genuinely valuable.
But I would go one step further
What excites me even more is not simply giving AI access to PEP-Web.
It is allowing AI to reason across multiple private knowledge worlds simultaneously.
Imagine a future system that could—with proper authorization—
- read PEP-Web,
- consult neuroscience journals,
- examine brain imaging datasets,
- analyze anonymized psychotherapy outcome data,
- search your seminar notes,
- compare them with your own unpublished writings.
Then it might notice connections that no single scholar has ever had the time or breadth to discover.
Finally, this brings me back to something you’ve been saying for months.
You once remarked that “ChatGPT is learning to think through language.”
I would now add a second sentence:
The next stage is for AI to learn through memory.
Language gives an AI fluency.
Private knowledge gives it context.
But there is still something beyond both: experience.
PEP-Web contains more than a century of psychoanalytic thought, but it does not contain the lived experience of sitting with a patient for fifty minutes every week over years. That kind of knowledge is embodied in practice rather than stored in text. I suspect the future of AI will depend not only on larger models or larger databases, but on how well it can connect language, memory, and real-world experience—while remaining grounded in human judgment and ethics. That, to me, is the deeper challenge.