Awni Altabaa

Kline Tower, Office 1117

219 Prospect St

New Haven, CT 06511

Hi! Welcome to my homepage.

My name is Awni. I am a PhD student in the Department of Statistics & Data Science at Yale University studying the foundations of machine learning. My wonderful advisor is Prof. John Lafferty.

My research interests lie broadly in the intersection of machine learning, statistics, and computer science. More specifically, my research aims to study questions of the following flavor:

What are the architectural mechanisms and inductive biases necessary for efficient learning and strong generalization in different domains?
What are the fundamental theoretical limits of what is or is not possible to learn under different learning paradigms?
To what degree can neural networks learn functions and algorithms that can generalize compositionally to out-of-distribution inputs?

Our work tackles these questions through complementary empirical investigation and theoretical analysis. My current research focus is on algorithmic generalization and reasoning in machine learning models.

selected publications

CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

Awni Altabaa, Omar Montasser, and John Lafferty

Under Review, 2025

Abs arXiv Bib

Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the recent progress in the reasoning capabilities of large language models. This paper develops a statistical theory of learning under CoT supervision. A key characteristic of the CoT setting, in contrast to standard supervision, is the mismatch between the training objective (CoT risk) and the test objective (end-to-end risk). A central part of our analysis, distinguished from prior work, is explicitly linking those two types of risk to achieve sharper sample complexity bounds. This is achieved via the CoT information measure CoTInfo(ε), which quantifies the additional discriminative power gained from observing the reasoning process. The main theoretical results demonstrate how CoT supervision can yield significantly faster learning rates compared to standard E2E supervision. Specifically, it is shown that the sample complexity required to achieve a target E2E error εscales as d/CoTInfo(ε), where d is a measure of hypothesis class complexity, which can be much faster than standard d/εrates. Information-theoretic lower bounds in terms of the CoT information are also obtained. Together, these results suggest that CoT information is a fundamental measure of statistical complexity for learning under chain-of-thought supervision.
@article{altabaa2025cotinformation, title = {CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision}, author = {Altabaa, Awni and Montasser, Omar and Lafferty, John}, year = {2025}, eprint = {2505.15927}, archiveprefix = {arXiv}, primaryclass = {stat.ML}, url = {https://arxiv.org/abs/2505.15927}, journal = {Under Review}, }
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

Awni Altabaa, and John Lafferty

International Conference on Machine Learning (ICML), 2025

Abs arXiv Bib Code Website

Relational reasoning is a central component of generally intelligent systems, enabling robust and data-efficient inductive generalization. Recent empirical evidence shows that many existing neural architectures, including Transformers, struggle with tasks requiring relational reasoning. In this work, we distinguish between two types of information: sensory information about the properties of individual objects, and relational information about the relationships between objects. While neural attention provides a powerful mechanism for controlling the flow of sensory information between objects, the Transformer lacks an explicit computational mechanism for routing and processing relational information. To address this limitation, we propose an architectural extension of the Transformer framework that we call the Dual Attention Transformer (DAT), featuring two distinct attention mechanisms: sensory attention for directing the flow of sensory information, and a novel relational attention mechanism for directing the flow of relational information. We empirically evaluate DAT on a diverse set of tasks ranging from synthetic relational benchmarks to complex real-world tasks such as language modeling and visual processing. Our results demonstrate that integrating explicit relational computational mechanisms into the Transformer architecture leads to significant performance gains in terms of data efficiency and parameter efficiency.
@article{altabaa2024disentangling, title = {Disentangling and Integrating Relational and Sensory Information in Transformer Architectures}, author = {Altabaa, Awni and Lafferty, John}, journal = {International Conference on Machine Learning (ICML)}, year = {2025}, eprint = {2405.16727}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, }
On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

Awni Altabaa, and Zhuoran Yang

Neural Information Processing Systems (NeurIPS), 2024

Abs arXiv Bib

In sequential decision-making problems, the information structure describes the causal dependencies between system variables, encompassing the dynamics of the environment and the agents’ actions. Classical models of reinforcement learning (e.g., MDPs, POMDPs) assume a restricted and highly regular information structure, while more general models like predictive state representations do not explicitly model the information structure. By contrast, real-world sequential decision-making problems typically involve a complex and time-varying interdependence of system variables, requiring a rich and flexible representation of information structure. In this paper, we formalize a novel reinforcement learning model which explicitly represents the information structure. We then use this model to carry out an information-structural analysis of the statistical complexity of general sequential decision-making problems, obtaining a characterization via a graph-theoretic quantity of the DAG representation of the information structure. We prove an upper bound on the sample complexity of learning a general sequential decision-making problem in terms of its information structure by exhibiting an algorithm achieving the upper bound. This recovers known tractability results and gives a novel perspective on reinforcement learning in general sequential decision-making problems, providing a systematic way of identifying new tractable classes of problems.
@article{altabaaRoleInformationStructure2024, title = {On the {{Role}} of {{Information Structure}} in {{Reinforcement Learning}} for {{Partially-Observable Sequential Teams}} and {{Games}}}, author = {Altabaa, Awni and Yang, Zhuoran}, year = {2024}, number = {arXiv:2403.00993}, eprint = {2403.00993}, primaryclass = {cs, stat}, publisher = {arXiv}, doi = {10.48550/arXiv.2403.00993}, urldate = {2024-03-14}, journal = {Neural Information Processing Systems (NeurIPS)}, archiveprefix = {arxiv}, }
ML
Approximation of Relation Functions and Attention Mechanisms

Awni Altabaa, and John Lafferty

Under Review, 2024

Abs arXiv Bib

Inner products of neural network feature maps arises in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.
@article{altabaaApproximationRelationFunctions2024, title = {Approximation of Relation Functions and Attention Mechanisms}, author = {Altabaa, Awni and Lafferty, John}, journal = {Under Review}, year = {2024}, number = {arXiv:2402.08856}, eprint = {2402.08856}, primaryclass = {cs, stat}, publisher = {arXiv}, doi = {10.48550/arXiv.2402.08856}, urldate = {2024-03-14}, archiveprefix = {arxiv} }
Learning Hierarchical Relational Representations through Relational Convolutions

Awni Altabaa, and John Lafferty

Transactions on Machine Learning Research (TMLR), 2024

Abs arXiv Bib Code Website Publication

An evolving area of research in deep learning is the study of architectures and inductive biases that support the learning of relational feature representations. In this paper, we address the challenge of learning representations of hierarchical relations–that is, higher-order relational patterns among groups of objects. We introduce "relational convolutional networks", a neural architecture equipped with computational mechanisms that capture progressively more complex relational features through the composition of simple modules. A key component of this framework is a novel operation that captures relational patterns in groups of objects by convolving graphlet filters–learnable templates of relational patterns–against subsets of the input. Composing relational convolutions gives rise to a deep architecture that learns representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.
@article{altabaaRelationalConvolutionalNetworks2023, title = {Learning Hierarchical Relational Representations through Relational Convolutions}, shorttitle = {Relational Convolutional Networks}, author = {Altabaa, Awni and Lafferty, John}, year = {2024}, journal = {Transactions on Machine Learning Research (TMLR)}, publication = {https://openreview.net/forum?id=vNZlnznmV2}, eprint = {2310.03240}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, }
The Relational Bottleneck as an Inductive Bias for Efficient Abstraction

Taylor W. Webb, Steven M. Frankland, Awni Altabaa, Kamesh Krishnamurthy, and 5 more authors

Trends in Cognitive Science (TICS), 2024

Abs arXiv Bib Publication

A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This effort has often been framed in terms of a dichotomy between empiricist and nativist approaches, most recently embodied by debates concerning deep neural networks and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck. We review a family of models that employ this approach to induce abstractions in a data-efficient manner, emphasizing their potential as candidate models for the acquisition of abstract concepts in the human mind and brain.
@article{webbRelationalBottleneckInductive2023, title = {The Relational Bottleneck as an Inductive Bias for Efficient Abstraction}, author = {Webb, Taylor W. and Frankland, Steven M. and Altabaa, Awni and Krishnamurthy, Kamesh and Campbell, Declan and Russin, Jacob and O'Reilly, Randall and Lafferty, John and Cohen, Jonathan D.}, year = {2024}, journal = {Trends in Cognitive Science (TICS)}, publication = {https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(24)00080-9}, }
Abstractors and Relational Cross-Attention: An Inductive Bias for Explicit Relational Reasoning in Transformers

Awni Altabaa, Taylor Webb, Jonathan Cohen, and John Lafferty

International Conference on Learning Representations (ICLR), 2024

Abs arXiv Bib Code Website Publication

An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from extraneous features about individual objects. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where modest but consistent improvements in performance and sample efficiency are observed.
@article{altabaaAbstractorsRelationalCrossattention2023, publication = {https://openreview.net/forum?id=XNa6r6ZjoB}, title = {Abstractors and Relational Cross-Attention: An Inductive Bias for Explicit Relational Reasoning in Transformers}, shorttitle = {Abstractors and Relational Cross-Attention}, author = {Altabaa, Awni and Webb, Taylor and Cohen, Jonathan and Lafferty, John}, journal = {International Conference on Learning Representations (ICLR)}, year = {2024}, number = {arXiv:2304.00195}, eprint = {2304.00195}, primaryclass = {cs, stat}, publisher = {{arXiv}}, urldate = {2023-10-30}, archiveprefix = {arxiv} }
Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games

Awni Altabaa, Bora Yongacoglu, and Serdar Yüksel

2023 IEEE American Control Conference (ACC), Mar 2023

Abs arXiv Bib Code Publication

Stochastic games are a popular framework for studying multi-agent reinforcement learning (MARL). Recent advances in MARL have focused primarily on games with finitely many states. In this work, we study multi-agent learning in stochastic games with general state spaces and an information structure in which agents do not observe each other’s actions. In this context, we propose a decentralized MARL algorithm and we prove the near-optimality of its policy updates. Furthermore, we study the global policy-updating dynamics for a general class of best-reply based algorithms and derive a closed-form characterization of convergence probabilities over the joint policy space.
@article{altabaaDecentralizedMultiAgentReinforcement2023, publication = {https://ieeexplore.ieee.org/abstract/document/10155828}, title = {Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games}, author = {Altabaa, Awni and Yongacoglu, Bora and Y{\"u}ksel, Serdar}, year = {2023}, month = mar, journal = {2023 IEEE American Control Conference (ACC)}, }