On Artificial Intelligence
How do you think the brain works? Do you believe human action and thought can one day be explained entirely through physics, or is there some irreducible phenomenon—something beyond material explanation—that gives rise to awareness itself?
Artificial intelligence confronts this question not through metaphysics but through engineering. By building systems that process information as the brain appears to do, it tests whether cognition can emerge from mathematics operating on data.
The purpose of this essay is not to refute one theory or another.
It is to discuss the current likelihood that one theory is better supported.
I lean towards the notion that consciousnesses can be ultimately explained as an emergent property of the biology of the brain..that is physical, biological processes.
There is just so much more evidence for that theory. The only theory for an unknown unexplained, no physics-based theory is the absence of an explanation for a material basis. And the lack of even a plausible theory of how from the biological brain, consciousness emerges, is the case.
This conclusion is not to dismiss the possibility of a non material explanation.
But absent any basis for discussion of a non material explanation, this essay focuses on the material theory.
So let's start from a beginning.
Shannon’s Information Theory and the Brain
Claude Shannon’s information theory provided the intellectual breakthrough that made the pursuit of a biological basis for all intelligence and human consciousness thinkable.
Shannon proposed that information is the measurable reduction of uncertainty. In his framework, data—symbols and signals—are valuable not because of their content but because of their order and structure.
Communication occurs when a sequence of symbols reduces ambiguity in the receiver’s mind. Information, then, is not an abstract property of thought but a quantifiable entity in physical systems. Every transmission, from neurons to networks, becomes analyzable as a transformation that increases order and encodes pattern.
So Shannon provides a step on which to stand.
Oliver Sacks and Localization of Brain Functions
After Shannon, the biological parallel emerges in neuroscience: the brain is an extraordinary information-compressing device. British neurologist Oliver Sacks deepened our understanding of brain function through meticulous study of patients with injuries and neurological disease. By documenting cases of aphasia, visual agnosia, phantom limb, and even mass catatonic states, Sacks revealed that discrete regions of the brain are devoted to particular high-level functions. Clinical neuropsychology, informed by evidence of single and double dissociations, demonstrates how damage to areas like the fusiform gyrus impairs face recognition, while sparing other abilities. These case studies, reinforced by functional MRI, PET, and direct brain recordings, helped establish that consciousness is likely emergent—arising from the integrated activities of specialized modules across the cortex. Sacks’s narrative studies, and work by Alexander Luria and others, strengthened the conviction that mind is fundamentally rooted in material networks and patterns, not in irreducible essence or “magic”.
The issue of compression
The brain receives a vast torrent of data from the senses and internal states, filtering, prioritizing, and compressing it into manageable statistical representations. Visual, auditory, and somatic information is mapped onto interconnected high-dimensional networks before being projected onto lower-dimensional manifolds relevant for perception and decision-making. This capacity for dimensionality reduction is indispensable: decision circuits in the cortex and basal ganglia enable complex behaviors by extracting salient features from extensive neural populations.
LLMs: No Built-in Definitions, Only Statistical Patterns
Large language models do not possess built-in definitions of words. Instead, each word or token is mapped to a high-dimensional numeric vector, called an embedding, which stands not for the dictionary meaning but for the statistical relationships and contextual associations accrued during training. LLMs identify patterns in the sequential and spatial associations of tokens—vector substitutes for words—by detecting recurring local arrangements and correlations in massive datasets. Through training, the network learns to represent nuanced linguistic structures solely through manipulations of these vectors, such as dot products and weighted sums. Meaning emerges not from explicit definitions, but from the subtlety of association and pattern recurrence in data.
The dot product of two vectors can be used to calculate the angle between the two vectors if blotted on a graph geometrically...even very high dimensional vectors that cannot be imagined...the more towards zero the angle the greater the similarity of the vectors. Since the vectors represent tokens..mostly words...matrix algebra is an important tool in using mathematics to predict the next word in the inference generated by an LLM when it processes a prompt by a user.
Transformers: Breakthrough Technology in Token Processing
Transformers have revolutionized the usefulness of LLMs by addressing the limits of earlier architectures like recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Unlike those models, which process data sequentially or with fixed local connections, transformers leverage self-attention mechanisms that allow every token to attend to every other token in the input sequence, regardless of position. This enables the model to capture long-range dependencies and global contextual information crucial for complex language understanding.
In terms of sequence, transformer layers are typically stacked after the embedding layer of a neural network. When a text input is received, each token is first mapped into an embedding vector. Next, these vectors are processed by multiple transformer layers—in most modern LLMs, dozens or even hundreds—in which the core self-attention computation is repeated, sometimes interleaved with position-wise feed-forward layers and normalization steps. Within each transformer layer, the self-attention mechanism calculates context-dependent weights for every token pair, generating a new set of context-enriched vectors that encode pattern relationships across the entire input. The network builds up progressively richer representations at each layer, culminating in final vectors that inform token predictions or output generation. The transformer stack is responsible for almost all the high-level composition and reasoning in state-of-the-art LLMs.
Transformer architectures are thus critical: they enable models to process information in a massively parallel, non-sequential fashion and facilitate previously unattainable feats in translation, summarization, reasoning, and generation. Their ability to learn context and pattern from the order and association of tokens surpasses previous designs, making them the engine for modern advances in AI.
Neural Compression and Dimensionality Reduction
Both biological brains and deep neural networks exhibit sophisticated dimensionality reduction. The brain’s layered architecture, through filtering sensory and internal signals, reduces the dimensionality of perception, synthesizing abstract, behaviorally relevant models from vast input arrays. Transformer-based LLMs similarly compress high-dimensional token embeddings into tractable, contextual representations, harnessing mathematical constructs like dot products, projections, and normalization to isolate salient, generalizable patterns.
Emergence of Consciousness and Pattern Extraction
Ultimately, Sacks’s clinical documentation, combined with advances in brain imaging and computational neuroscience, reinforce the idea that consciousness is not localized, but emerges from the dynamic organization of distributed neural patterns. Pattern extraction—whether in brains or machines—occurs by filtering noise, compressing data, and generating order. The shared mathematical strategy of dimensionality reduction links biological and artificial intelligence: structured representation from chaos.
Whether this process is sufficient to explain consciousness is still debated. But the convergence of physics, information theory, neuroscience, and advanced machine learning brings us closer to understanding how both brains and machines produce intelligence from patterns in data.








