This is the first in a series of posts about grouped equilateral convolutional neural networks (GCNN). Today we keep it short, high level, and conceptual. Examples and implementation follow. As we look at GCNNs, we revisit a topic we first wrote about in 2021: Geometric Deep Learning. It is a principled, math-driven approach to network design that has grown in scope and influence since then.
From Alchemy to Science: Learn Geometric Deep Learning in 2 Minutes
Simply put, geometric deep learning derives network structure from two things: domain and task. I'll go into a lot of detail in the post, but I'll give you a quick preview here.
- Domain refers to the underlying physical space and how it is represented in the input data. For example, images are typically coded into a two-dimensional grid with values representing pixel intensities.
- The task is to train the network for classification, say segmentation, etc. Tasks may vary at different stages of the architecture. At each step, the task in question has a description of what the layer design should look like.
Take MNIST, for example. The dataset consists of images of 10-digit numbers from 0 to 10, all in grayscale. Naturally, the task is to assign the indicated number to each image.
First, consider your domain. all \(7\) Is \(7\) Every time it appears on the grid. So we need to do something like: translation equivalent: Flexibly adapts to changes in input (translation). More specifically in our context: equal sides The task can detect the properties of some objects even when those objects are moved vertically and/or horizontally to different positions. circuitIt is precisely such alternating equivariant operations that are ubiquitous not only in deep learning.
I would like to draw special attention to the fact that the most important thing in homovariance is ‘flexible adaptation’. Translation Equivalence Operations do We are interested in the new location of the object. They record features not in the abstract, but in new locations on the object. To see why this is important, consider your network as a whole. When constructing a convolution, we build a hierarchy of feature detectors. That hierarchy should work no matter where it is in the image. You also need to be consistent. Location information must be preserved between layers.
Therefore, in terms of terminology, it is important to distinguish between equivariance and equivariance. immutability. In our context, invariant operations can still be found as a feature wherever they occur. But you might forget where that feature was. Then, to build a hierarchy of features, we need to translate it like this:immutability Not full yet.
What we have done now is to derive the requirements from the domain, which is the input grid. What about work? Finally, if all we have to do is name the number, then suddenly position doesn't matter anymore. That is, once a hierarchy exists, immutability occurs. is enough. In neural networks pooling This is a task where (spatial) details are forgotten. We are only interested in the average or maximum value itself. This is suitable for “summarizing” information about a region or the entire image if you are only interested in finally returning class labels.
In short, we were able to formulate a design wish list based on (1) what was given to us and (2) the mission we had.
After completing a high-level sketch of geometric deep learning, we will expand on the given topic in the next series of posts. group equivalence Convolutional neural network.
The reason for the “isomorphism” should not now pose too much of a mystery. And what about the “group” prefix?
“Group” of group uniform distribution
As you may have guessed from the preface to “Principles” and “Mathematics Focus”: really It's about a group of “mathematical meanings”. Depends on your background, but the last time I heard about groups was when I was in school and I didn't even have a hint as to why they were important. I am certainly not qualified to give a full summary of their advantages, but I hope that by the end of this post their importance in deep learning will be intuitively understood.
group of symmetry
Here is the square.
Now close your eyes.
Now look again. Has something happened to the square?
You can't tell. Maybe it was rotated. Maybe it wasn't. On the other hand, what if the vertices are numbered?
Now you will know.
Without numbering, would I have been able to rotate the squares the way I wanted? Obviously not. This does not go unnoticed.
There are exactly four ways to rotate a square without arousing suspicion. These practices can be referred to in a variety of ways. One simple method is to use a rotation angle (90 degrees, 180 degrees, or 270 degrees). Why not anymore? Add another 90 degrees and you've got the configuration you've already seen.
The picture above shows three squares, but I've listed three possible rotations. What about the situation on the left, i.e. taken as the initial state? You can reach it by rotating 360 degrees (or double that, or triple that, or…). But the way this is handled in mathematics is to treat it as a kind of “zero rotation”. \(0\) take additional action, \(One\) For multiplication, or the identity matrix in linear algebra.
So we have a total of 4 Act It can be done on squares (unnumbered squares!) and it will remain the same. or immutability. these are Symmetry of the square. In math/physics, symmetry is a quantity that stays the same no matter what happens over time. And this is where groups come in. groups – Specifically their coercion – Perform movements such as rotation.
Before I explain how, let's look at another example. Take this sphere.
How many symmetries does a sphere have? There are infinitely many. This means that whatever group is chosen to act on the square will not be very good at representing the symmetry of the sphere.
View groups via action lens
Let's generalize following these examples: Here are some general definitions:
group \(G\) It is a finite or infinite set of elements with binary operations (called group operations) that together satisfy the four basic properties of closure, associativity, identity property, and inverse property. A task for which a group is defined is often referred to as a “group task,” and a set is referred to as a group “under” this task. coercion \(all\), \(rain\), \(Seed\)…includes binary operations between \(all\) and \(rain\) marked \(AB\) Form a group if:
Closure: Case \(all\) and \(rain\) There are two elements \(G\)the next product \(AB\) There is also \(G\).
Associativity: The defined multiplication is associative. That is, about everything \(all\),\(rain\),\(Seed\) In ~ \(G\), \((AB)C=A(BC)\).
ID: Contains an ID element. \(me\) (life \(One\), \(E\)or \(E\)) like that \(IA=AI=A\) for all elements \(all\) In ~ \(G\).
Inverse: Each element must have an inverse (or inverse). Therefore, for each element \(all\) ~ Of \(G\)The set contains elements. \(B=A^{-1}\) like that \(AA^{-1}=A^{-1}A=I\).
In action-speak, the group element specifies the permitted actions. Or, more accurately, things that can be distinguished from each other. Two actions can be configured: That's a “binary operation”. The requirements now make intuitive sense.
- A combination of two actions (e.g. two rotations) is still an action of the same type (rotation).
- If you have three such tasks, it doesn't matter how you group them. (However, the order of application must remain the same.)
- One of the possible actions is always a “null action”. (Just like in life.) As for “doing nothing,” it doesn’t matter whether it happens before or after “something.” That “something” is always the end result.
- Every action should have an ‘undo button’. In the square example, if you rotate it 180 degrees and then rotate it 180 degrees again, it returns to its original state. If I did that naught.
Going back to a more “bird's eye view”, what we see now is that we define a group based on how its elements interact with each other. However, if the group is important “in the real world”, something external to it (e.g. a neural network component) needs to be acted upon. How this works is the topic of my next post, but I'll briefly discuss the intuition here.
Outlook: Group-Isosceles CNN
Above we see that in image classification translation-Requires invariant operations (convolution, etc.): A \(One\) Is \(One\) Whether it moves horizontally, vertically, in both directions, or not at all. So what about rotation? Even if you stand on their heads, the numbers are still the same. Traditional convolutions do not support this type of operation.
You can specify a symmetry group and add it to your architectural wishlist. What group? To detect axis-aligned rectangles, suitable groups are: \(C_4\), the fourth circular group. (Above we saw that we need four elements, to give via group.) On the other hand, if we weren't interested in sorting, we would want: any Location to calculate. In principle, we find ourselves in the same situation as in the sphere. However, the images exist on separate grids. In reality, the number of spins is not unlimited.
For more realistic application, we need to think more carefully. Take the numbers. when is “Same” numbers? First of all, it depends. If it was about a handwritten address on the envelope, would we accept it? \(7\) So it was rotated 90 degrees? maybe. (You may be wondering what makes you change the position of your pen with just a single digit.) \(7\) Standing on your head? In addition to similar psychological considerations, we cannot be seriously sure about the intended message, and at least if the data points are part of the training set, we should reduce their weight.
What matters depends on the numbers themselves. all \(6\)What is upside down is \(9\).
As we scale up neural networks, they have the potential to become much more complex. We know that CNNs build a hierarchy of features starting from simple features like edges and corners. Even if we don't want rotation homogeneity in later layers, we want to include it in our initial set of layers. (The output layer, which we have already hinted at, should in any case be considered separately, since its requirements arise from the details of the task we have undertaken.)
That's it for today. Luckily I got some shine. why We'll want a group equilateral neural network. The problem still remains. How can I get it? That's what the follow-up posts in the series will be about.
Thanks for reading until then!
Photo: Ihor OINUA, Unsplash