From Knowledge Graphs to GPTs — Navigating Latent Spaces

2024-09-09

Did you ever wonder how ChatGPT can answer complex questions? After all, the basic explanation is that it simply predicts the next word in a sentence. Yet, it provides thoughtful, coherent, and contextually rich responses—even to abstract queries.

The secret is that the GPT algorithm adds more and more meaning to each word in the sentence, so that then the next word can only be one that is coherent with the same rich meaning of all the previous words. And once one is predicted, that word can be used for the next. And the next one after that and so on and so forth until the response is complete.

But how does one attach meaning to a word? If I say Switzerland, you might think of cheese, chocolate, and watches. But if I simply add [+46.004141, +8.956939], you might also think about palm trees. Those 2 numbers represent a x,y-direction that points to Lugano, a city in southern Switzerland that is close to Italy and is known for its mild micro-climate. Switzerland [+46.004141, +8.956939] thus conveys a similar, but more nuanced meaning compared Switzerland.

ChatGPT uses the same method, but with many more numbers: Each word is contextualized by more than 10’000 numbers. The more the numbers, the more detailed the meaning (although at a certain point performance and costs tradeoffs kick in). For instance, although Switzerland [+47.377570, +8.541321; –0.254560, +0.774330; –0.496443, –0.477749; –0.622426, +0.119037] also points to Lugano, the longer set of numbers makes the trip more meaningful: an adventure that takes you by the scenic Walensee and over the Lukmanier pass. Additionally, the numbers encode the path but also implicit information such as taking more time, arriving possibly more tired,… all of which have an impact on what happens afterwards.

The fact that one can smoothly navigate anywhere in the latent space means potentially ending up in the middle of a mountain. This might not be realistic nor make much sense, but could be interesting for instance for creative purposes

What now appears clearer and clearer is the fact that information is not just a set of points constrained by rigid structures, like mountains, but is more like an open space. A space that, akin to an ocean, can be explored smoothly by following any desired path. In AI terms, this is known as a latent space: an abstract, multi-dimensional space where complex data is represented in a simplified, numerical form. In the context of transformers like GPT, each word (more accurately a “token”) from the input is mapped to a vector within this high-dimensional (the 10’000+ mentioned above) space. These vectors capture various semantic and syntactic properties, allowing the model to understand relationships and patterns that aren’t explicitly stated in the data.

How far can we go with implicit knowledge?

When I wrote about Knowledge Graphs and Machine Learning, I discussed how these graphs represent domains—be it a business, an organization, or a field of study—through data points linked by explicit relations. Following the connecting path between data points allows for the semi-automatic generation of insights by revealing relationships that might not be immediately apparent.

GPTs (Generative Pre-trained Transformers) on the other hand typically don’t use an explicit representation of knowledge, but generate concepts by navigating through the latent space, the open ocean of information. This means that while the connections between concepts aren’t directly observable, they influence how the model generates responses. Like a cartographer on a vessel, updating the map based on duly logging different values of salinity, pH, krill concentration, and so on.

In the case of GPTs, the structure of latent space is shaped by the vast amount of data the model is trained on, enabling it to navigate and interpolate between different concepts seamlessly. In a sense, latent spaces are similar to implicit knowledge—knowledge that isn’t described overtly but is ready to be utilized when needed. These hidden representations allow transformers to generalize from specific examples and handle a wide range of queries by leveraging the underlying patterns captured during training.

While Knowledge Graphs are like cities with roads, transformers operate like ships on the open sea, navigating a continuous landscape where every point is reachable through smooth transitions.

The paths that GPTs navigate when generating answers are not very different from the explicit connections of Knowledge Graphs. Both systems rely on relationships between data points to generate meaningful outputs, whether those relationships are explicitly defined or implicitly learned. However, latent spaces offer a more flexible and dynamic way of representing knowledge, enabling models to adapt and respond to novel situations in ways that structured knowledge graphs might find challenging.

Understanding how navigating creates meaning

The sailing metaphor might give the impression that any point in the ocean means something. “Something” yes, but not necessarily important, valuable, or relevant. It’s not about what a single point means, but how passing through each point adds meaning and nuance. In other words, what matters is the way through the ocean. The path is the goal.

The GPT algorithm processes the sequence of tokens through a processing block that is repeated with the same architecture (“self-attention” and a feed-forward neural network) but different parameters. Each block adds context, meaning, and nuance to each word of the input prompt, allowing the GPT algorithm to make a sophisticated prediction of the next word after the prompt.

The more attention blocks you add, the better the answer. Each block does many calculations, so part of the reason why GPT wasn’t possible until a few years ago is that there simply were sufficient processing chips (GPUs in particular). That and algorithmic advances: In traditional models like RNN, LSTM, and others, understanding long-range dependencies in text was a significant challenge. Transformers address this by processing all words simultaneously and calculating “attention” scores that determine how much focus to place on each “word” when generating the next. “Multi-head attention” extends this concept by allowing the model to attend to different representation subspaces at different positions. Each “head” can focus on different types of relationships or patterns within the data. For instance, one head might capture syntactic structures, while another focuses on semantic meaning.

The multi-head attention mechanism enables the model to capture various dimensions of relationships in the data —like syntax, context, and meaning—, similar to exploring multiple paths in a Knowledge Graph simultaneously.

The Sirens’ Song: Crystallizing Generated Paths

Knowledge Graphs are composed of explicitly defined and often verifiable relationships, making them reliable for applications where accuracy is critical. GPTs, on the other hand, generate responses based on patterns learned from data, which can sometimes include biases or inaccuracies. Moreover, the implicit nature of the connections in transformers means they are not directly observable or interpretable. Extracting and validating these latent paths requires careful analysis to ensure that the relationships are meaningful and accurate.

One intriguing possibility is to crystallize the latent paths that GPTs generate, especially those that consistently lead to accurate or insightful outputs. By identifying and formalizing these paths, we could enhance Knowledge Graphs with dynamically discovered relationships.

For example, if a GPT model frequently associates certain concepts or entities in its responses, these associations could be extracted and added to a Knowledge Graph. This would enrich the graph with new, data-driven relationships that might not have been identified through traditional means.

Vice-versa, incorporating Knowledge Graphs into the training process of transformers could guide the attention mechanisms, improving the model’s accuracy and interpretability.

Using our brains to navigate towards AGI

“Crystallized intelligence” is also a term for learned knowledge. In living organisms, particularly humans, cognition operates on multiple levels: instinctive, learned, generative, and potentially more (see the quantum theories of consciousness).

This triad mirrors the conceptual interplay between Knowledge Graphs and GPTs. Knowledge Graphs represent explicit, hardcoded relationships akin to instinctive and learned knowledge in biological brains—providing reliable and immediate access to established information. Transformers, with their ability to navigate continuous latent spaces and generate new connections, embody the generative aspect of cognition.

Balancing Instinct, Learning, and Generation

Observing the animal kingdom more broadly, indicates a wide range of equilibria between Instinct, Learning, and Generation. Insects and animals lacking complex cerebral structures use instinct and often rely on emergent group behaviors for survival. Higher animals, including humans, exhibit a more intricate balance, combining instinct with extensive learning and the capacity for creative thought.

As everything in nature, the equilibrium is delicate and only allow for small perturbations. An overreliance on instinct, for instance, may lead to inflexibility, while excessive generative processing without foundational knowledge could result in inefficiency or error-prone behavior. The optimal balance allows for quick, reliable responses when necessary, supplemented by the ability to adapt, learn, and innovate.

This insight indicates that AGI might also require a balance between Instinct, Learning, and Generation. Perhaps achieving true general intelligence isn’t about creating a system without constraints but about finding the right equilibrium between innate structures (instinct and learned knowledge) and the capacity for generative, creative processing. Understanding this balance in biological systems could provide valuable insights into designing more effective and adaptable AI.

If a kind of Knowledge Graph exists within the brain — a network of instinctive and learned connections — it also suggests that intelligence, even in biological systems, isn’t entirely general. There’s always a framework or structure within which cognition operates, imposing certain limits on how general biological intelligence can be.

Final Thoughts

The journey from data to understanding, whether through explicit connections in Knowledge Graphs or the dynamic pathways of transformers, reflects a fundamental aspect of intelligence: the ability to form and utilize relationships. As we strive towards AGI, exploring the synergy and balance between instinctive knowledge, learned information, and generative processing could be key. Recognizing that even biological intelligence operates within certain limits invites us to redefine what we mean by “general” in Artificial General Intelligence.