🌐 This is an English translation.
This article was translated from the Japanese original. Read it here: シンギュラリティは2045年じゃない——5つのスパイク点から特異点を読み直す.
Introduction

Since childhood, I have had a habit of thinking with both wheels at once — deduction and induction. Whatever everyone says is “obviously so,” I snag on. Most of the time, I arrive at a different answer. But that method was heavy. Running two circuits at once, it took a long time to reach a conclusion. Everyone around me had long moved on, while I alone was still thinking.
At some point, that changed. I see a phenomenon and intuit, “this is where it bites.” That hunch is often right. But I have never once been able to explain how that hunch works. The old heaviness was gone. Without noticing, I was using something. What it was stayed a mystery for a long time.
That mystery was solved recently. The trigger was — AI.
As I kept asking how LLMs work, I noticed something strange. What happens inside the machine has the same shape as what I had been doing, unconsciously, inside my own head.
Before I get to that, I want to start from a bigger question. “The singularity is coming in 2045” — the line everyone says. Does it really arrive as such a “point”?
— This question comes back to me at the very end.
Where the Number “2045” Came From — Auditing the Conceptual History

I start by doubting the word “singularity.”
The first to put “intelligence explosion” into a paper was the mathematician I. J. Good. In 1965, he wrote that an ultraintelligent machine could design even better machines — that this means an intelligence explosion, leaving human intelligence far behind. The first ultraintelligent machine, he said, could be the last invention humans ever need to make. This was 1965.
Twenty-eight years later, the computer scientist and science-fiction author Vernor Vinge published a paper in 1993 and declared that the technological singularity would arrive by 2030. Ray Kurzweil‘s The Singularity Is Near came out twelve years after that, in 2005.
The three definitions differ subtly. Good: the self-improvement loop of an ultraintelligent machine. Vinge: the unpredictability caused by the emergence of greater-than-human AI. Kurzweil: the merger of AI and humans.
Kurzweil uses two numbers, “2045” and “2029,” for different things. 2029 is reaching AGI (Artificial General Intelligence). 2045 is the state of brain–AI fusion. This distinction is often conflated. Reading it as “nothing changes until 2045” is a misreading of the concept.
Candidate A: 2006–2012 — The GPU Revolution as “Ground”

Before discussing spike points, I have to talk about the ground.
Geoffrey Hinton put backpropagation into a paper in 1986. The theory existed in 1986. Why did it not run at scale until 2012? Simply because CPUs could not keep up with the computation.
Training a neural network is a process of repeating matrix multiply-and-accumulate operations millions of times. A CPU is a device that does complex tasks quickly, in sequence. A GPU, by contrast, evolved for games as a device that does simple tasks in massive parallel — because it has to compute millions of pixels on screen simultaneously. This massive parallelism of simple operations meshed exactly with the matrix operations of neural networks.
In November 2006, NVIDIA announced CUDA. Short for Compute Unified Device Architecture, it is a foundation that lets a GPU be used as a general-purpose parallel computer rather than graphics-only. The slogan “100× over the conventional” was, for matrix operations at least, no exaggeration.
Without this tool, none of the architectures that followed would have run. Architecture and the compute foundation cannot be discussed in isolation from each other.
Candidate B: 2012 — AlexNet and the Proof That “Deep Learning Works”

In September 2012, AlexNet appeared at the image-recognition competition ILSVRC. This CNN, developed at the University of Toronto by Krizhevsky, Sutskever, and Hinton, won decisively, beating conventional methods by roughly ten points or more in error rate.
Let me organize the terms here. Machine learning is the umbrella term for methods that learn patterns from data without explicit programming. Learning from labeled data is “supervised learning”; finding structure in unlabeled data is “unsupervised learning”; generating pseudo-answers from the data itself to learn from is “self-supervised learning.” Deep learning refers to a machine-learning method that uses multi-layer neural networks. AlexNet conquered image recognition with “deep learning + supervised learning.” The LLMs that appear later are pre-trained with “self-supervised learning,” predicting the next word from vast amounts of text.
More important than the numbers is the “discontinuity in quality.” Until then, image recognition had a human design the features and hand them to the algorithm. AlexNet was different. Using a GPU, it self-discovered features from large amounts of data. Recognition learned by the machine beat recognition designed by hand.
AlexNet split the model across two GTX 580 GPUs to train it. It was too big to fit on one. The conventional wisdom of the time — “known in theory but does not run at scale” — was overturned from that day.
It looks like a sudden mutation. But AlexNet happened when Hinton’s thirty years, the CUDA foundation, the large dataset called ImageNet, and the practical use of ReLU all came together. The person who built that ImageNet was Fei-Fei Li. She launched the project in 2007 and released it in 2009, out of a conviction that “the algorithms were not working; for machines to see, a data-driven approach was needed.” This person appears once more later.
Candidate C: 2017 — The Transformer Was a Leap of “Subtraction,” Not “Addition”

Google’s 2017 paper “Attention Is All You Need” is a paper whose essence cannot be captured by technical explanation alone.
“Process in order” — a halt of thought
In 2017, processing sequential data meant “process it in order (recurrence)” as a self-evident premise. Text is read front to back. A word’s meaning is determined by the immediately preceding context. So processing, too, must be sequential — this premise was not even a choice. It was a halt of thought, a “that’s just how it is.”
The RNN (Recurrent Neural Network) implemented that premise. Process token 1, then token 2. To reach token 1000 you pass through 999 steps. It is inherently sequential. Even if you want a GPU to parallelize it, you cannot move to the next step until the previous computation finishes. Thousands of cores sit idle.
The structure that dropped the premise — self-attention
The Transformer saw through that premise as unnecessary and dropped it. The self-attention mechanism references all tokens in the sequence at once. Token 1 and token 999 can be processed simultaneously. This is implemented as large-scale matrix multiplication — and matrix multiplication is exactly what GPUs are best at.
The mechanism works like this. Each token is converted, via learned weight matrices W (Weight), into three vectors — Query (what it is looking for), Key (what it is), and Value (the information it holds). The similarity between one token’s Query and all tokens’ Keys is computed, and the Values of the most similar tokens are aggregated with greater weight. “Which word should attend to which word” — the criteria for that judgment are all carved into the weight matrices W. W is a mass of numbers grown from training data, and it is the substance of an LLM’s “knowledge.”
Precisely because it discarded (subtracted) sequential processing, all tokens could be processed in parallel, and that meshed with the GPU’s parallelism. This dependency is the direct structural reason for today’s LLM explosion.
The breakthrough did not happen through addition. It happened through subtraction — seeing through a premise everyone believed was necessary and dropping it. GPT, BERT, ChatGPT, Claude, Gemini — the direct ancestor of every current LLM is the Transformer. On this single point, 2017 becomes the strongest candidate for a spike point.
Candidate D: 2020 — The Discovery That “Intelligence Emerges” When You Scale

In June 2020, OpenAI announced GPT-3 (GPT stands for Generative Pre-trained Transformer). It had 175 billion parameters.
More important than the number is the “scaling law” that GPT-3 proved. The bigger the model, the more data, the more compute, the more predictably capability improves. And beyond a certain scale, “emergence” occurs — abilities absent in smaller models suddenly appear.
Emergence corresponds to a phase transition in which quantitative change converts into qualitative change. When scale crosses a threshold, abilities in reasoning, coding, and translation appear abruptly. The scaling law can be predicted, but which ability appears at which timing cannot. This is why it “looks like a sudden mutation.” After GPT-3, AI development shifted from “research” to a “scaling game.”
There is an important distinction, though. The scaling law is not a vibe that “more makes it smarter.” It is a mathematical relationship in which loss decreases as a power law against compute, parameter count, and data size, each separately. OpenAI’s Kaplan et al. formalized this in a 2020 paper. With this law established, AI development changed from “experiment” to “engineering.” “How much smarter the next model will be” became, to some degree, calculable in advance.
Capital flowing in was inevitable. If a predictable return on investment exists, it becomes a business.
Candidate E: November 2022 — The Day It “Became Visible”

On November 30, 2022, OpenAI quietly released ChatGPT. An estimated 100 million monthly actives in about two months. It was reported as among the fastest adoption in internet history.
Technically it is no more than a fine-tuned model based on GPT-3.5 (fine-tuning means additionally training a pre-trained model to adapt it to a specific use). Specifically, it combines “instruction tuning” — adjusting it to respond to human instructions in natural dialogue — with reinforcement learning from human feedback (RLHF). The model’s own technical novelty is small. But the fact that “a conversational AI ordinary users could use was released to the world” carries a meaning beyond technical novelty. What had been visible to researchers for some time became, on this day, visible to everyone in the world. This was not a leap in technology but a critical point of perception.
The numbers are symbolic. Instagram took 2.5 months to reach one million users. Spotify took five months, Dropbox seven. ChatGPT took five days.
But “a leap in technology” and “penetration into society” are separate events. ChatGPT’s technical essence is the instruction fine-tuning of GPT-3.5, and it had been running inside OpenAI since the previous year. The world did not change. The world “became able to see.” That is the meaning of that day.
What only researchers and engineers could see became, after November 30, visible to anyone. What crossed the threshold was not technology but perception.
Sorting the Five Candidates — the Answer Changes With What You Call a “Spike Point”
| Candidate | Year | Event | What it changed |
|---|---|---|---|
| A | 2006– | CUDA / GPU revolution | The compute foundation. The “ground” that turned theory into reality |
| B | 2012 | AlexNet | Proof that deep learning “works” |
| C | 2017 | Transformer | Discarding the “sequential processing” premise. The structure that made scaling possible |
| D | 2020 | GPT-3 | Proof of scaling laws and emergence. The establishment of capital logic |
| E | 2022 | ChatGPT | The critical point of perception where “everyone could see” |

Which is the spike point? It depends on the definition of the question. If you ask for computational-physical causation, A. If you ask for the shift in technological paradigm, C. If you ask for the direct origin of today’s LLM explosion, C. If you ask for the starting point of social change, E. Next, I question the very intuition that “the spike point is one.”
From this sorting, one thing becomes visible: the question “which is the real spike point” may itself rest on a mistaken premise.
Candidates A through E are each turning points on different axes. The axis of the compute foundation, the axis of architecture, the axis of the logic of scale, the axis of social deployment. These are not on one and the same line. Trying to narrow “where is the trigger” to a single point is like asking which of five separate rivers is “the cause of the sea.”
“Mutation or Inevitability” — Timescale Is the Key

On the surface, each turning point looks like a sudden mutation. AlexNet’s win had too large a year-over-year gap. With ChatGPT, too, few foresaw it even a week before.
But pause. Hinton put backpropagation into a paper in 1986 and kept researching through the winter years. AlexNet is the crystallization of those thirty years. The attention mechanism existed before the paper. What “Attention Is All You Need” did was the shift in recognition of dropping the RNN. With scaling laws, too, the sense that bigger models are smarter existed before; GPT-3 proved it as a mathematical relationship.
Mutation and inevitable consequence are used differently depending on the timescale. In the short term (1–3 years) it looks like a mutation. In the medium term (10–20 years) it looks like an inevitable consequence. In the long term (50–100 years) it looks like a mutation again — within human history, a mere instant.
The dichotomy of “mutation or inevitability” has a problem in how the question is framed. More precisely: “abrupt in the short term, but structured as an inevitable consequence in the medium term.” Locally it is discontinuous. Seen as a whole, it is within the range of the laws.
Doubting the Premise That It “Arrives as a Single Threshold”

This is the core of the piece. One is tempted to conclude, “the singularity has already begun; the spike points are 2017 and 2022.” But that is no more than “moving the point called 2045 forward.” It still leaves unquestioned the premise that “the singularity arrives as a single threshold.”
Let me re-ask. Can “intelligence” be measured on a single axis? LLMs already have abilities that surpass humans. Memory capacity, speed, fluency, breadth of knowledge. On the other hand there are abilities they have not surpassed. Judgment rooted in bodily experience, the metacognition to be aware of one’s own errors, real-time adaptation to the physical world.
Symbol grounding — the fault line between “knowing” and “understanding”
Cognitive science and the philosophy of AI have a concept called the “symbol grounding problem.” The meaning of a symbol is not complete within symbols alone. Unless it is grounded in sensorimotor experience, meaning is no more than circular reference. An LLM knows the word “hot.” But it has never felt heat on skin.
This is not to say LLMs are inferior. It is that the intelligence of LLMs and the intelligence of humans may not be on the same axis. If intelligence is multiple independent axes, then framing the question as “one singularity is coming” does not hold. A picture closer to reality is this: axis by axis, at separate moments, asynchronously, AI surpasses humans.
The axis of computational power was surpassed in the 1960s. Memory and retrieval were surpassed by search engines. Fluent text generation was surpassed in 2022. The axis of “understanding grounded in bodily experience” is — still ahead of us even now. This picture is far closer to reality than “crossing a single threshold in 2045.”
Fei-Fei Li as a “Skewer”

There is a person tackling this current state of the “ungrounded LLM” head-on: Fei-Fei Li. Her trajectory runs through the question of this piece like a single skewer.
In 2007, in an era when the algorithms still did not work, she began building the large visual dataset “ImageNet.” It was released in 2009. Using this ImageNet, AlexNet was trained in 2012 and pulled the trigger on the deep-learning revolution. She is the one who made the ground.
And now she says, “the LLM is not the final form of AI.” In early 2024 she founded World Labs. The goal is “spatial intelligence” — realizing AI’s ability to understand, reason about, and interact with the 3D physical world. In November 2025 she released the commercial version, “Marble” — a model that generates 3D worlds from images or text. “Our dream of a truly intelligent machine cannot be completed without spatial intelligence” is where she stands now.
Past (ImageNet, the data revolution), present (the ungrounded LLM), future (going after grounding with world models) — Fei-Fei Li embodies these three points in one person.
An Answer to “Has the Singularity Already Begun?”

What is happening as of 2026? The market caps of AI companies have grown to rival the world’s largest. LLMs are in practical use in medicine, law, coding, and translation. The process of “AI designing AI” has begun, in a limited way. Several AI industry leaders have begun making statements that move the arrival of the singularity earlier than before. Hinton, called the “godfather” of AI, left Google in 2023 and publicly warned of the dangers of AI technology.
It is not only the world of text that changed. The Transformer architecture was extended to image generation (Stable Diffusion, DALL-E), music generation (Suno), and video generation (Sora, Runway). Text, image, audio, video — every modality is handled as tokens and generated. The name “language model” no longer reflects the reality.
If you define the “singularity” as a single threshold, the question itself does not fit reality. If you define it as “the process by which AI irreversibly accelerates the change of human society,” it has already begun.
Skepticism has its grounds too. An LLM is a probabilistic text generator, and there is still no true self-improvement loop. The symbol grounding problem is unsolved. Between AGI and current LLMs there remains a structural discontinuity. Even so, the claim that “it has not already begun” is the harder one to explain.
The Structure of a Slope — Conclusion

The singularity is not a single point but the structure of a long slope that multiple axes cross in steps — that is this piece’s position. Spike points exist in plurality, axis by axis. The compute foundation in 2006–2012, architecture in 2017, capital logic in 2020, social deployment in 2022. The spike point on the axis of “bodily grounding” has not arrived yet. These are not one event. They are different faces of the same process surfacing at different times. The number “2045” is no more than a point somewhere further up that slope. The slope began long ago.
What the machine’s mirror reflected
Having written this far, I return to the mystery at the start. Why were my hunches right? Only after learning how LLMs work did it finally become words.
I had been carving the world into units of meaning. I was tokenizing. The moment I saw a phenomenon, attention went straight to where it bites. Distant elements connected regardless of distance. Attention was at work. And I took in the structure at a glance, out of order. Rather than stacking “if A then B, if B then,” I saw the whole at once. Without noticing, I had dropped sequential processing.
Just as the Transformer leapt by removing the “process in order” premise, I too had, unknowingly, removed the same premise. I had been doing subtraction unconsciously.
The weights of those unconscious judgments — why attend there, why carve it that way — had been optimized on their own over a long time, without ever becoming language. Just as a machine grows its weights from data, mine too had grown, at some point, on the training data of a life.
Inside the machine, that criterion exists as the weight matrices W — a vast lattice of numbers. Rewritten little by little with each round of training, it decides how much to attend to which input. But what each one of those numbers means, no one can explain. In the world of AI this is called the problem of explainability. An LLM’s judgment is not “unexplainable.” It does not exist in an explainable form. The answer is in the weights. But it is numbers, not words.
My hunches were the same. When asked why I attend there and cannot answer, it is not because I forgot the answer or am hiding it. It is because, from the start, it exists only in the form of weights, not language.
Trying to understand AI, what I understood was myself. Only reflected in the machine’s mirror did I learn what I had been doing.
One thing, though, did not reflect in the mirror: the vantage point looking into that mirror itself. Running several lines of thought and viewing them from one level above — for this overview alone, I found no counterpart inside the machine.
Understood only in hindsight
The answer to the opening mystery — why my hunches are right — did not, in the end, arrive as a “point.” The axis of subtraction, the axis of tokenizing, the axis of attention had each crossed steps at separate moments, unnoticed. None was acquired in a single day. It was not a point but a slope. That I had finished climbing that slope, I did not even notice until I looked into the mirror called AI.
The singularity is probably the same. The day someone shouts “it’s here” will not come. Only in hindsight do we understand, “that was partway up the slope.” That feeling of ChatGPT having “been there before you knew it” is proof you had already set foot on that slope.
Related Reading
- A record of dating eight generative AIs — ChatGPT and I went back to being friends (in Japanese)
- Masayoshi Son’s OpenAI bet as an “all-special-moves-loaded” play — read through a 3×5 matrix and eight points (in Japanese)
- The true nature of groove — the structural reason performers quietly vanish in the AI era (in Japanese)
- The essence of the Sony × TSMC joint venture — the fab-lite strategy and the lessons of AFEELA (in Japanese)
References
- Vaswani et al., “Attention Is All You Need” (2017) — arXiv
- Rumelhart, Hinton, Williams (1986) — Nature
- AlexNet paper — NeurIPS
- Fei-Fei Li, “Building Spatially Intelligent AI” — Radical Ventures