The token parser and manipulator, next-generation Deep Learning architecture

Neural Reasoning

Alongside scaling pattern recognition side of neural network, there is also a lot of interest in extending neural network reasoning capability, mostly in the form of Graph Neural Network [Battaglia, 2018]. A notable effort in this space is term Neural Algorithmic Reasoning by [Veličković, 2021] in which traditional algorithms are being transformed into their differentiable counterparts.

State of the Art models are secretly token learners and manipulators!?

Even though I believe we won’t see this type of architecture be in the SOTA splotlight anytime soon, but if we squint our eyes enough, the current state of the art models are kind of already are?!

In 2022, we can not mention any SOTA without Transformers architecture [Vaswani, 2017]. Originally for text processing taking as input the language discrete tokens, we can see it as a purely token manipulator. Unsurprisingly, the transformers here also process a fully connected graph of its input tokens.

The Vision Transformer (VIT) [Dosovitsky, 2020] which has challenged CNN on the image domain, its backyard, is the same transformer as token manipulator, but with a simple patch-based token parser.

Another example architecture that I liked the most are DETR [Carion, 2020], where the token parser is still a CNN to take advantage of locality structure of image, with a more complex token manipulator also based on Transformer to perform object detection.

Thoughts

With all the astonishing result of scaling up deep learning model that still left a bitter taste in the mouth of many AI researchers [Sutton, 2019], some can’t help but be repulsed when hear about adding more “structure” to the model. While it might be possible that GPT-10 can internally represents and manipulates all the structures of the world in its massive vector space, I can’t help but wonder how much more inefficient it would take for not incorporating some sort of soft structures in the model itself.

From the perspective of a researcher whose job is to inject inductive biases into model, I believe the weak inductive bias presented in the architecture above will leave a sweet aftertaste, not just the bitterness. After all, even if our GPT-10 overlord will need none of the “structureness”, I can only hope the token parser and manipulator above will help speeding up the research spiral between inductive biases and scaling, so that we can have a more advanced, and hopefully, benevolent AI faster.

Overall, given the simple components yet elegantly composed used in this paper, the C-SWM model really inspired and convinced me, and I hope now you too, that there are a lot more to come for the future of Deep Learning research.

References

Kipf, Thomas, Elise van der Pol, and Max Welling. “Contrastive Learning of Structured World Models.” International Conference on Learning Representations. 2019.

Greff, Klaus, Sjoerd van Steenkiste, and Jürgen Schmidhuber. “On the binding problem in artificial neural networks.” arXiv preprint arXiv:2012.05208 (2020).

Object-Oriented Learning (OOL): Perception, Representation, and Reasoning International Conference on Machine Learning (ICML) July 17, 2020, Virtual Workshop https://oolworkshop.github.io/

Object Representations for Learning and Reasoning Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS) December 11, 2020, Virtual Workshop https://orlrworkshop.github.io

Burgess, Christopher P., et al. “Monet: Unsupervised scene decomposition and representation.” arXiv preprint arXiv:1901.11390 (2019).

Kabra, Rishabh, et al. “SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition.” arXiv preprint arXiv:2106.03849 (2021).

Kipf, Thomas, et al. “Conditional Object-Centric Learning from Video.” arXiv preprint arXiv:2111.12594 (2021).

Locatello, Francesco, et al. “Object-centric learning with slot attention.” arXiv preprint arXiv:2006.15055 (2020).

Battaglia, Peter W., et al. “Relational inductive biases, deep learning, and graph networks.” arXiv preprint arXiv:1806.01261 (2018).

Veličković, Petar, and Charles Blundell. “Neural Algorithmic Reasoning.” arXiv preprint arXiv:2105.02761 (2021).

Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems. 2017.

Dosovitskiy, Alexey, et al. “An image is worth 16x16 words: Transformers for image recognition at scale.” arXiv preprint arXiv:2010.11929 (2020).

Carion, Nicolas, et al. “End-to-end object detection with transformers.” European Conference on Computer Vision. Springer, Cham, 2020.

Sutton, Rich. “The Bitter Lesson” March 13, 2019. http://incompleteideas.net/IncIdeas/BitterLesson.html

Footnotes

  1. Spoiler alert, it’s 42! 

  2. “Object” in this paper (and the whole subfield of object-centric representation) is understand to be context-dependent, “you know it when you see it” kind of thing. Let’s not go down the philosophical rabbit hole of asking “What is an object?”. 

-->

Deep Learning is an excellently scalable approach for processing unstructured, high-dimensional, raw sensory signal. It is so good that these properties also becomes its most popular criticism. At the moment, deep learning is mostly just a giant correlation machine, devouring enormous amount of data to recognise hidden pattern in data, but still lacking in human-like systematic generalisation required in many reasoning tasks. Symbolic AI on the other hand possesses these abilities by design, but relied on handcrafted symbols that has already been abstracted away from the raw information. Among many approach to combine the best of both worlds, I am most excited about the end-to-end trainable architecture with a perception module that structurised the raw input and a reasoning module operates on top of these symbol-like vectors. While there are still a lot of work before such a system becomes practically relevant, in this blog post we will take a look at the paper Contrastive Learning of Structured World Model, an early paper that offer a glimpse into such architecture through a concrete implementation.


Sample Submission

This post outlines a few more things you may need to know for creating and configuring your blog posts.


Example content (Basic Markdown)

Howdy! This is an example blog post that shows several types of HTML content supported in this theme.