Skip to content

Building Artificial General Intelligence

In this post, we will be talking about Artificial General Intelligence (AI), what it is, what it would and should be like and how to develop one

1. Defining AGI

We will start by exploring important concepts in AI

1.1. Intelligence

There have been many attempts to define intelligence. In this doc, we define intelligence as

The ability to acquire knowledge and use that knowledge to achieve goals.

Intelligence can be very difficult to measure directly, it is mostly observed when learning and performing tasks. We could mitigate the difficulty by going from trying to measure absolute intelligence to trying to measure intelligence per task.

In that case, intelligence (per task) can be defined as

The efficiency of learning and performing a specific task

Efficiency here refers to how well resources are utilized. Resources include but are not limited to: electrical energy required to run hardware, cost of labelling data, the amount of data consumed to learn a task, etc. Ultimately, efficiency can be modelled as the total energy used to learn and perform a task. We want a highly efficient system; the more efficient, the better. The system can be efficient on two fronts, namely learning and performing efficiency

  1. Learning Efficiency: We want a system that is able to learn to perform tasks with as few resources as possible.
  2. Performing Efficiency: We want a system that is efficient at performing a task it has learnt, i.e. it should use less energy than it did when it first learned the task if possible.

A good system should reduce the cost of performing a task as it gets better at that task. A better one could even amortize cost across tasks, i.e. getting good at Task A makes it better at Task B.

1.2. Agency

An Agent is an entity with some form of Agency. Agency is also a term that has had many attempts at a definition. In this doc, we define Agency as

The ability to set and/or achieve goals

Agency is usually demonstrated by "acting"; when an Agent takes action, it demonstrates its own Agency. An Agent usually acts upon a world or environment.

The environment contains the Agent and other entities. The Environment in a way acts upon the Agent through sensory perception. When the world acts on the Agent, the Agent gets observations from the world and these observations communicate the state of the world (Env State). The Agent can use these observations to model an internal state (Agent State), and the Agent can manipulate the world with actions.

graph LR
    Environment -- observation --> Agent -- action --> Environment;
    EnvState --o Environment;
    Agent --o AgentState;
Agent-Environment Diagram

The Agent can use its internal state (Agent State) to predict the state of the world (Env State) using its intelligence. Intelligence enhances Agency as it helps the Agent thrive in the environment it is in.

An Agent is an entity that can set and/or achieve goals. An intelligent Agent is an entity that is good at setting goals and/or achieving them

For example, let's assume we have a simple organism that lives in a grid world filled with obstacles, food and fire. The organism desires food but can not survive long exposure to fire. The organism has "eyes" to see its environment, "legs" to move around and "hands" to gather food. The organism will use its "eyes" to gather observations and "hands"/"legs" to perform actions in the world. The organism's success at gathering food and avoiding long exposure to fire is directly influenced by its intelligence.

1.3. Reasoning

In this doc, we define reasoning as

The process of exploring hypothesis that explains observations from the world in order to better model the world.

Reasoning is a way an Agent can learn to explain how the world works. When the Agent reasons, they consider different models that could explain how the world works until they find models that work well over time; this process is exactly the demonstration of intelligence. There is a leading school of thought that proposes we have two systems of thinking, System 1: perception and System 2: Reasoning. I believe this is a false dichotomy, I think reasoning is all there is, and that perception is just crystallized reasoning

Perception is just crystallized reasoning

I think of reasoning like a path one takes, or a series of actions one performs, that makes perception/system 1 more like muscle memory. Perception seems fast and quick because it is like a cache. When we encounter "familiar" situations, we attempt to reason, but because it is a road that we have taken multiple times, it happens intuitively without much effort, attention or even awareness.

1.4 Artificial (General) Intelligence

Artificial Intelligence (AI) is when a man made system exhibits intelligence, and one way we can tell that it does is by testing it on tasks. One way we can make AI is via Machine Learning, we assume that there is a function that works well for the tasks we plan to use to test the machine's intelligence. Artificial General Intelligence (AGI) is basically a man-made system that exhibits intelligence not just on a specific set of tasks but a wide variety of them, the wider the task set, the more general it is. This is like an AI that is not just "narrow" to one domain but can thrive in any domain.

AGI is basically a way to highlight this kind of AI because for a long time the kinds we have been developing seem to thrive mostly in narrow domains. When thinking about AGI, it is better not to think of its intelligence in terms of just tasks but in terms of learning ability. Instead of asking how many tasks can the system perform?, we ask how good is its ability to learn a task once it has been presented to it?. Another way to say this is that AGI is a system that is expected to perform well on tasks that it is yet to see. The main problem here is that because we don't know what task the machine would be presented with, we would need the machine to learn on the go, this is where continual learning comes in.

2. Testing AGI: A simple demonstration of general Intelligence

Absolute intelligence is hard to measure, so we test multiple agnets on a set of tasks to see which ones have the most efficiency (Learning + Performing efficiency). An Agent is said to have more general intelligence than another if its overall performance on all tasks is higher than the others. The catch here is how the tasks are presented. For all the possible order the data for all the tasks can be presented, a General intelligent Agent is one that scores the highest on average and uses the least amount of energy. Basically, the most efficient Agent.

Test Parameters

  • There has to be at least one task with at least two datapoints.
  • There should be at least two Agents, to test a single agent we should compare it to a random agent.
  • Each Agent is tested on every datapoint and allowed to learn from it after test
  • The Agent's performance is proportional to its average score and inverse energy for all possible arrangements of data/and or tasks.
  • The tasks should be as orthogonal as possible, the more orthogonal the tasks, the more general test

The most important thing is that we compare at least two agents in the same environment.

Notes

  • For tasks that are correlated, we should observe an increase in learning performance on the other after on task has been learned
  • This test is highly dependent on the set of tasks; this means two Agents are only strictly comparable when they have undergone the exact same tests.
  • This test accounts not just for performance of tasks but for learning performance. The Agents are expected to learn from the observations presented during the tests.
  • Learning and Performance efficiency should be factored into performance measure, not just accuracy or other usual score metrics. We don't just want clever Agents, we want efficient ones as well. Efficiency could just be energy efficiency

4. Building AGI

The goal in machine learning is to build systems that can "learn" to perform tasks without being explicitly instructed on how to perform said tasks. We can formulate the problem like so, let's say we have an input space X and an output space Y and there exists a corresponding Y for every X such that their mapping is not completely uniform. We can assume that there exists a function F* that can represent the mapping for every X to their corresponding Y.

Think of it like this, the most fundamental model of computation is that for every state there is a corresponding action you should take, and once you get to a terminal state you should have a final result, this is what an algorithm is, an algorithm is basically a function. Usually, algorithms are represented as programs written by humans that run on computer hardware, but with this formulation it is a function.

Machine learning is the process of finding said function, if we know this function exists, how can we find it. One of the first problems that needs to be solved is the function representation. A computer program is one representation of the function, but it is not the only one, functions can be represented in all sorts of ways, Finite state machines, Counting machines, Turing Machines, Neural networks, Decision tress, etc. Remember that a function is fundamentally an algorithm.

4.1. Representation of Function space

To build on the formulation above, we can assume that there is a space of functions F just as there is an input and output space. A function that represents the mapping from the input to the output space should exist in the function space, in fact, there could be multiple functions in the function space that can represent the mapping from X to Y.

It is important that we use a good representation of the function space because if our representation is limited, we could end up with a space that does not contain a single solution to our X→Y mapping. There are other factors to consider when choosing a representation for the function space like the ease of search.

4.2. Continual Learning

Learning is the process of searching the function space to find functions that fit a specific X→Y mapping. The problem is that in most cases we don't know F*, but we have samples from it, this collection of samples is what we usually call a dataset. In most cases, it is almost impossible to get all the possible samples there are for F*, so we are almost always working with a subset of data. One way we try to find F* in machine learning is by using gradient decent and neural networks as function representation.

Continual learning is where a system can learn gradually as it sees more data. The main issues with this is that the data can be presented in any order meaning the distribution of the data at time t1 and t2 can be drastically different and not even reflect the underlying distribution of the data from the true function F*. This is a big problem for offline algorithms that require that you have a "dataset" that has the same distribution as samples from F*, but in continual learning, for various reasons you won't have this dataset. Trying to adapt offline methods like deep learning + stochastic gradient descent is almost impossible for achieving continual learning.

All of the above would be heavily influenced by the learning rule chosen. The system needs to account not just for the current observation but many others when learning continually.

5.3. Long range dependencies

Another important feature we would need is the ability to model Long range dependencies. The system should be able to draw conclusions or learn from observations sparsely distributed across long time spans. E.g. the system can draw conclusion on the effect of studying or not studying and passing exams, or choice of university program on quality of life in the future.

The system would need a good structure or architecture to ensure efficient long-range dependencies learning.

5. AGI Safety: Motivation and Value Alignment

The system needs a value system that makes it what it is, these are a set of predispositions the system has, and these are where its goals/inclinations come from. The system will most likely not have a single objective that it is trying to maximize. Even though there may be a single objective that can represent the lifetime of the Agent, finding said objective may be too expensive. It will be beneficial to assume there is no ultimate objective. Instead, the system comes up with its own goals and alters them as needed

Agents have the ability to set goals, but why do they set goals? Agents set goals because of their motivation. Motivation is like an emergent characteristic from other characteristics. An Agent running on electricity will be motivated by activities that help keep its electrical components alive.

The hope for alignment is that we have an Agent that has an innate predisposition to be social (with humans). An Agent with this predisposition will be "kind of" like a human child that we can nurture and teach values to. If true, then the alignment problem can be reduced to a human alignment problem.

Innate predisposition to be social (with humans)

This may be easier than we think. It is safe to assume that a general intelligent Agent is interested in learning and novelty. Highly dynamic systems generate novelty, and humans and human society are a highly dynamic system which the Agent can be interested in.

(super) alignment problem

We could also get away with not worrying about ASI, as I believe there would be a lot of bottlenecks that would disrupt the so-called "intelligence explosion". The one I often consider is that learning isn't just about learning individual concepts but the relations between concepts; this means learning is fundamentally a combinatorial problem. Having an intelligence explosion in this kind of setup would not do much as it will take just as many resources or more to move the needle significantly more than previous generations. What would probably help is the number of intelligent Agents.

Appendix

A.1. Resources

Below are some resources that I have learned from or been inspired by, going through them may give you some insights into what I believe and how I think about the problem of intelligence. This is by no means an exhaustive list but a good starting point