Dynamical Internal Models

Bayesian Inference


Budapest University of
Technology and Economics

Wigner Research Centre for Physics,
Hungarian Academy of Sciences


Computational Cognitive Science

The general aim of computational cognitive science is to understand human cognition in engineering terms [1]. How can we formally describe what is learned by the human brain? Can we provide a mathematical formalisation of knowledge? In this sense, understanding the human brain means that we can replicate its computations, creating a machine with human-like cognitive abilities, at least in some aspects.

Figure 1. Levels of description by David Marr (1982). We distinguish three levels of understanding when studying the brain. Computational: what is the goal or function? What input-output relations should the brain compute? Algorithmic: how can we algorithmically solve these computations? How can we provide a full mathematical solution to the computational problem? Implementational: how can the neural system implement these algorithms?

In machine learning, recently there has been great progress in achieving human-level performance in various tasks, for example in object recognition or the game of Go [2]. However, humans utilize data at hand much more efficiently than current machine learning approaches. Children as young as three years old can readily learn names of categories (e.g. horses) and generalise these concepts to new objects from very few examples [3]. It is this key observation that renders humans an essential subject for artificial intelligence research as well (at least for now).

Perception and Learning

The world around us is inherently ambiguous. That is due to the fact that the information reaching our sensors is never satisfactory to logically deduce the true state of the world (Figure 2). For example different objects may look the same from certain angles or you may not know about objects that are occluded. Unsurprisingly, even the number of models that could describe our world is infinite.

Figure 2. Ambiguity. We do not have direct access to the true state of the world, only insufficient evidence supporting multiple possibilities. We can distinguish among the possibilities relying on our previous experiences. From hypotheses that account equally well for our current sensory input, we select by looking at our prior beliefs about the hypotheses. In this case a cube wireframe is a 'more likely' object to appear than the other wireframes, hence the projection is interpreted as a cube by our brains.

How can we then select a model if there are infinitely many? A key to how humans could still learn meaningful models of the world is that they look for parsimonious explanations. Therefore, in order to build machines that can learn like humans, we need to find out what is a good measure for model complexity that enables efficient learning in this universe that we live in (in which we have the physical laws that we do).

Inference, that is guessing the likely causes of our sensory inputs can be formalised in probability theory, in particular, Bayesian inference. The basic idea is that, as cognitive beings, we weigh our hypotheses by how probable our observations are conditioning on (assuming) that hypothesis. Human perception has been well characterised using ideal observer models that integrate incoming information with previously acquired knowledge according to the rules of Bayesian inference [5,6]. In this framework, learned structure and regularities about the surrounding world, internal representations used to make inferences about the world around us are formalised as probabilistic generative models. You can think of generative models like physics and graphics engines in a video game. They provide a (possibly stochastic) algorithm of how your observations (video game frames) are generated from the underlying causes (scenes, objects, object interactions etc). Inference arises from inverting these forward models.

It has been demonstrated that these internal representations of individuals can be uncovered from behavioural data using newly developed machine learning techniques [7].

Research group

The work presented here was conducted in the Computational Systems Neuroscience Lab at the Wigner Research Centre for Physics, an institute part of the Hungarian Academy of Sciences in Budapest, Hungary. At the lab we focus on understanding neural computations from a normative learning perspective. We test these theories by analysing behavioural and electrophysiological measurements. Our work is a joint project with the Brain, Memory and Language Lab at Eötvös Loránd University. I am currently a PhD student enrolled in the Psychology Doctoral School in the Cognitive Science Department at the Budapest University of Technology and Economics under the supervision of Gergő Orbán and Dezső Németh.

I will be joining the Computation and Cognition Lab at Stanford led by Noah Goodman as a visiting student researcher in January 2019.

History and context of the research

Sequential predictions are ubiquitous in a learning agent’s existence. In order to devise efficient responses in a dynamic environment, one needs to build an internal representation of the latent dynamics of the environment. Humans have been shown to create dynamical models such as intuitive physics [4] that approximate the laws of Newtonian physics and are able to reason about their model in terms of formulating new predictions or imagining hypothetical situations (e.g. what would happen to objects on a table if I kicked the table from a certain angle?). However, the structure of their internal model, namely Newtonian physics is assumed. In another study, humans are demonstrated to make probabilistic inferences about noisy stimuli using a two-state dynamical model in a simple task [8]. It remains elusive how a learning agent can acquire such complex knowledge about the dynamics of the environment.

Research goals, open questions

Subject-by-subject differences in temporal predictions resulting from variations in subjective internal models and individual learning paths have remained unexplored due to the immense difficulty related to inferring subjective dynamical representations. Cognitive Tomography [7] has been proposed to discover static internal representations from discrete choices. We extend this method in two critical ways: 1, We aim to infer internal representations from a richer set of behavioral measures, specifically we use response times; 2, Our goal is to infer a dynamical representation. We demonstrate its utility by predicting response times and choices of human participants in a probabilistic learning task on a trial-by-trial basis. Inferred behaviour-based trial-specific subjective predictions can be directly used to test theories of neural underpinnings of computations in physiological and imaging data. A peer-reviewed 4-page conference abstract on this project submitted to the Cognitive Computational Neuroscience Conference is available here.


Sequential prediction in general

In order to study the internal models entertained by human subjects, we first need to look at how we can solve sequential prediction in general. The question is, what is the general form of a model, with which we can predict upcoming observation given our previous observations from the sequence? The key is that we may assume that the system has a latent (not directly observable) state that captures all the effects of past events. Just like in the case of Newtonian physics: when judging the trajectory of a projectile, if we know its position and impulse in any given moment, we can predict its movement (using our intuitive physics model) without requiring its past trajectory. In statistical terms, the history becomes stochastically independent from the future given the current state. Interestingly, this idea generalises to any sequence with temporal dependencies. We need to learn the structure over these latent states and how our observations depend on the latent state.

Ideal Observer Model and Cognitive Tomography

The participant assumes a model for the latent dynamics and a model relating their observations to the latent states. They use these components to update their beliefs over the current state of the observed system and then generate predictions for the upcoming stimulus. The median and standard deviation of response times decreases linearly with log subjective probability [9]. Cognitive Tomography is the method of inverting this generative model. We inferred the internal representations of individuals from the stimulus sequences and the response times.

Figure 3. Ideal Observer Model and Cognitive Tomography. When processing a sequence of stimuli, participants use their internal model of the sequence to formulate predictions. They assume the system is in a latent (not directly observed) state which determines the evolution of the system. Their observations depend on the current state of the system and the evolution on a latent dynamics. They track the current state of the system using these elements. Participants' response times are stochastic but dependent on their predictions. Cognitive Tomography is the method of inverting this generative model of responses.
Experimental design

We tested our model in a probabilistic sequence learning task. In each trial, the face of a dog appeared in one of four possible positions. Participants had to manually respond to where the dog appeared using their middle and index fingers. We conducted 25 blocks of 80 trials each with all participants and recorded their response times.

Figure 4. Experimental design. Example stimulus sequence. Participants formulate predictions about the upcoming stimulus. Response times are stochastic but they depend on the participants' predictions. The more the participant expects the stimulus, the smaller the response time.
Model testing

Since we defined the complete generative model of response times, we can create synthetic data (Figure 5, left) to test whether our method can recover the true internal model. The great advantage here, in contrast to human data, is that we know the ground truth and can check whether our method works correctly. We are also able to explore how well we can recover the model depending on the noisiness and other response time parameters of the participant. Perfect recovery would mean the inferred log predicted probability and the ground truth are equal, i.e. on the middle panel of Figure 5 all points fall on the x=y line. However, a high correlation between the recovered predictions and the ground truth predicted probabilities means the internal model inferred is mainly correct but the inferred response time parameters are at the wrong scale. Importantly, even if the internal model is correctly recovered, response times cannot be predicted perfectly due to their inherent noisiness (Figure 5, right).

Internal Model Sensitivity Baseline RT Noise
  • 1
  • 2
  • 3
  • Low
  • Med
  • High
  • Low
  • Med
  • High
  • Low
  • Med
  • High
Figure 5. (Interactive.) Results on synthetic data for three different internal models and different response time parameters. Recovery of the ground truth model depends on the parameters of the response time model. Sensitivity: the measure of how sensitive response times are to the predictions of the internal model. Baseline RT: a parameter setting the baseline response time. Noise: noisiness of the response times. Increasing noise and decreasing sensitivity prohibits accurate recovery of the ground truth internal model. You can see that by increasing noise, the correlation in the middle panel drops.


Predicting response times

First, we contrasted our model's (iHMM, see [10]) performance with that of a classical measure developed to assess learning in a probabilistic sequence learning task [11-12] (Figure 6 left: hover on data points to see our model's results). We inferred the internal representation of the participant on one part of the experiment and tested the inferred model on a later part. The performance of the model is measured by the correlation between the log predicted probabilities and the reaction times: the smaller the better (see reaction time model on Figure 3.) Our model shows substantially better performance for all but one participants than the classical trigram model which predicts the most likely element based on the previous two elements (the most likely final element of a triplet).

Figure 6. (Interactive.) Individual results. Left: our model's performance vs. the trigram model (smaller correlation is better). Our model captures the variability in the response times substantially better than the earlier model for all but one participant. Right: you can see our model's performance on each individual's data. You can compare these graphs to Figure 5, right.
Predicting mistakes

Next, we aimed at testing the inferred dynamical internal model in a substantially different way: our goal was to demonstrate that the inferred model is indeed an internal model used by the participant for predictions rather than a phenomenological model capturing idiosyncratic effects. Importantly, we fitted our model only on response times of correctly executed trials. Hence, testing our model on mistake prediction can be used to verify that we indeed inferred the internal models used for prediction by the participants. As you can see on Figure 7 (right) we can predict the mistakenly pressed buttons above chance (and it is statistically significant).

Figure 7. Predicting mistakes. Left: the inferred internal model gives lower predicted probabilities on those trials that were eventually missed. Dots are individual participant's averages with two standard errors of the mean. Right: rank of the erronous response among the incorrect alternatives given by the internal model. Our model predicts the mistakenly pressed button above chance.

Expected impact and future directions

The current work was first presented as a poster at the X. Dubrovnik Conference on Cognitive Science in May. It will also be presented as a poster at the Cognitive Computational Neuroscience conference in Philadelphia in September. The 4-page refereed abstract can be accessed here. It will also be submitted to a peer review journal in the coming months. The main novelty of our project is providing trial-by-trial predictions for response time measurements in a sequential learning task. Such quantitative measures may help elucidate neurophysiological and imaging data. We hope to further explore the rich internal representations of dynamical models entertained by individuals in future research.


  1. Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to Grow a Mind: Statistics, Structure, and Abstraction. Science, 331(6022), 1279–1285.
  2. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Chen, Y. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354.
  3. Xu, F., & Tenenbaum, J. B. (2007). Word learning as Bayesian inference. Psychological Review, 114(2), 245–272.
  4. Hamrick, J., Battaglia, P., & Tenenbaum, J. B. (2011). Internal physics models guide probabilistic judgments about object dynamics. Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 1545–1550.
  5. Orbán, G., Fiser, J., Aslin, R. N., & Lengyel, M. (2008). Bayesian learning of visual chunks by human observers. Proceedings of the National Academy of Sciences, 105(7), 2745-2750.
  6. Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429–433.
  7. Houlsby, N. M. T., Huszar, F., Ghassemi, M. M., Orbán, G., Wolpert, D. M., & Lengyel, M. (2013). Cognitive Tomography Reveals Complex, Task-Independent Mental Representations. Current Biology, 2169–2175.
  8. Glaze, C. M., S Filipowicz, A. L., Kable, J. W., Balasubramanian, V., & Gold, J. I. (2018). A bias-variance trade-off governs individual differences in online learning in an unpredictable environment. Nature Human Behaviour, 2, 213–224.
  9. Carpenter, R., & Williams, M. (1995). Neural computation of log likelihood in control of saccadic eye movements. Nature, 377, 59–62.
  10. Gael, J. Van, Saatci, Y., Teh, Y. W., & Ghahramani, Z. (2008). Beam Sampling for the Infinite Hidden Markov Model. Proceedings of the 25th International Conference on Machine Learning, 1088–1095.
  11. Howard, D. V., & Howard, J. H. (2001). When it does hurt to try: Adult age differences in the effects of instructions on implicit pattern learning. Psychonomic Bulletin & Review, 8(4), 798–805.
  12. Janacsek, K., & Nemeth, D. (2012). Predicting the future: From implicit learning to consolidation. International Journal of Psychophysiology, 83(2), 213–221.


An earlier version of this webpage was created for the BMe Research Grant but it was not accepted for review as it does not use the provided template.

I would like to thank my colleagues: Dávid G. Nagy, Karolina Janacsek, Mihály Bányai, Marcell Stippinger and my supervisors Gergő Orbán and Dezső Németh for their contributions. This work has been supported by the National Brain Research Program (project 2017-1.2.1-NKP-2017-00002, PI: D. N.); Hungarian Scientific Research Fund (OTKA PD 124148, PI: K. J.); János Bolyai Research Fellowship of the Hungarian Academy of Sciences (K. J.). National Research, Development and Innovation Fund of Hungary (Grant No. K125343, B. T., D. G. N., G. O.) and an MTA Lendület Fellowship (G. O.). Image of head used in Fig. 1. was created by Svelte UX, the monitor on Fig 2. by Aybige, and the finger pressing the button by andriwidodo, downloaded from the Noun Project. The following javascript libraries were used for the page: Rellax, Animate On Scroll (aos), mpld3, D3.js

Computational Systems Neuroscience Lab

Balázs Török © 2018 |