Posit AI Blog: Using torch modules

At first, I started learning things like: torch Code a simple neural network from scratch using just one of the basics: torchFeatures:
tensor. I then simplified the task tremendously by replacing manual backpropagation with:
autograd. today we modularize Network – in both the habitual and literal sense: low-level matrix operations are replaced by: torch moduleS.

module

In other frameworks (Keras, etc.) you may be familiar with distinguishing between: Model and Layer. in torchis an instance of both
nn_Module(),Therefore, several methods have something in common. For those of you who think in terms of “models” and “layers”, I've artificially divided this section into two parts. But in reality, there is no dichotomy. New modules can be composed of existing modules up to arbitrary levels of recursion.

Base module (“layer”)

Instead of writing the affine operation directly – x$mm(w1) + b1For example, you could create a linear module like we've done so far. The following fragment instantiates a linear layer that expects three feature inputs and returns a single output per observation.

The module has two parameters: “Weight” and “Bias”. Both are now pre-initialized.

$weight
torch_tensor 
-0.0385  0.1412 -0.5436
[ CPUFloatType{1,3} ]

$bias
torch_tensor 
-0.1950
[ CPUFloatType{1} ]

Modules are callable. When you call a module, that module is executed. forward() For linear layers, this involves matrix multiplying the input and weights and adding bias.

Let's try this:

data  <- torch_randn(10, 3)
out <- l(data)

Not surprisingly, out Now we have some data.

torch_tensor 
 0.2711
-1.8151
-0.0073
 0.1876
-0.0930
 0.7498
-0.2332
-0.0428
 0.3849
-0.2618
[ CPUFloatType{10,1} ]

Moreover, this tensor knows what to do when asked to compute the gradient.

AddmmBackward

Note the difference between tensors returned by modules and self-generated tensors. When creating a tensor directly, you need to pass:
requires_grad = TRUE Triggers gradient calculation. Using modules,
torch We correctly assume that we want to perform backpropagation at some point.

But so far we haven't called backward() yet. Therefore, the slope has not been calculated yet.

l$weight$grad
l$bias$grad

torch_tensor 
[ Tensor (undefined) ]
torch_tensor 
[ Tensor (undefined) ]

Let's change this:

Error in (function (self, gradient, keep_graph, create_graph)  : 
  grad can be implicitly created only for scalar outputs (_make_grads at ../torch/csrc/autograd/autograd.cpp:47)

Why did an error occur? autograd The output tensor is expected to be a scalar, but in this example the size is a tensor. (10, 1). This error does not occur often in practice. arrangement Number of inputs (sometimes a single batch) but still interesting to see how they solve this problem.

To make the example work, we introduce a hypothetical final aggregation step that takes the average. let's call it avg. Taking that average, l$weight This can be achieved through the chain rule.

\[
\begin{equation*}
\frac{\partial \ avg}{\partial w} = \frac{\partial \ avg}{\partial \ out} \ \frac{\partial \ out}{\partial w}
\end{equation*}
\]

Of the quantities on the right, we are interested in the second. We must provide the first one. If we really meant something:

d_avg_d_out <- torch_tensor(10)$`repeat`(10)$unsqueeze(1)$t()
out$backward(gradient = d_avg_d_out)

now, l$weight$grad and l$bias$grad do With gradients:

l$weight$grad
l$bias$grad

torch_tensor 
 1.3410  6.4343 -30.7135
[ CPUFloatType{1,3} ]
torch_tensor 
 100
[ CPUFloatType{1} ]

furthermore nn_linear() , torch It provides just about every common layer you could hope for. However, few tasks can be solved in a single layer. How do you combine them? Or in general terms: How do you build it?
Model?

Container Module (“Model”)

now, Model It is just a module that contains other modules. For example, assuming all inputs flow through the same nodes and along the same edges, nn_sequential() It can be used to create simple graphs.

for example:

model <- nn_sequential(
    nn_linear(3, 16),
    nn_relu(),
    nn_linear(16, 1)
)

Using the same technique as above, we can get an overview of all model parameters (two weight matrices and two bias vectors).

$`0.weight`
torch_tensor 
-0.1968 -0.1127 -0.0504
 0.0083  0.3125  0.0013
 0.4784 -0.2757  0.2535
-0.0898 -0.4706 -0.0733
-0.0654  0.5016  0.0242
 0.4855 -0.3980 -0.3434
-0.3609  0.1859 -0.4039
 0.2851  0.2809 -0.3114
-0.0542 -0.0754 -0.2252
-0.3175  0.2107 -0.2954
-0.3733  0.3931  0.3466
 0.5616 -0.3793 -0.4872
 0.0062  0.4168 -0.5580
 0.3174 -0.4867  0.0904
-0.0981 -0.0084  0.3580
 0.3187 -0.2954 -0.5181
[ CPUFloatType{16,3} ]

$`0.bias`
torch_tensor 
-0.3714
 0.5603
-0.3791
 0.4372
-0.1793
-0.3329
 0.5588
 0.1370
 0.4467
 0.2937
 0.1436
 0.1986
 0.4967
 0.1554
-0.3219
-0.0266
[ CPUFloatType{16} ]

$`2.weight`
torch_tensor 
Columns 1 to 10-0.0908 -0.1786  0.0812 -0.0414 -0.0251 -0.1961  0.2326  0.0943 -0.0246  0.0748

Columns 11 to 16 0.2111 -0.1801 -0.0102 -0.0244  0.1223 -0.1958
[ CPUFloatType{1,16} ]

$`2.bias`
torch_tensor 
 0.2470
[ CPUFloatType{1} ]

To examine individual parameters, use their position in the sequential model. for example:

torch_tensor 
-0.3714
 0.5603
-0.3791
 0.4372
-0.1793
-0.3329
 0.5588
 0.1370
 0.4467
 0.2937
 0.1436
 0.1986
 0.4967
 0.1554
-0.3219
-0.0266
[ CPUFloatType{16} ]

And it's like nn_linear() Above, this module can be called directly from data.

In a composite module like this, you would call: backward() It is backpropagated through all layers.

out$backward(gradient = torch_tensor(10)$`repeat`(10)$unsqueeze(1)$t())

# e.g.
model[[1]]$bias$grad

torch_tensor 
  0.0000
-17.8578
  1.6246
 -3.7258
 -0.2515
 -5.8825
 23.2624
  8.4903
 -2.4604
  6.7286
 14.7760
-14.4064
 -1.0206
 -1.7058
  0.0000
 -9.7897
[ CPUFloatType{16} ]

And when you place a composite module on the GPU, all the tensors go there.

model$cuda()
model[[1]]$bias$grad

torch_tensor 
  0.0000
-17.8578
  1.6246
 -3.7258
 -0.2515
 -5.8825
 23.2624
  8.4903
 -2.4604
  6.7286
 14.7760
-14.4064
 -1.0206
 -1.7058
  0.0000
 -9.7897
[ CUDAFloatType{16} ]

Now let's see how to use it nn_sequential() The example network can be simplified:

Simple network using modules

### generate training data -----------------------------------------------------

# input dimensionality (number of input features)
d_in <- 3
# output dimensionality (number of predicted features)
d_out <- 1
# number of observations in training set
n <- 100


# create random data
x <- torch_randn(n, d_in)
y <- x[, 1, NULL] * 0.2 - x[, 2, NULL] * 1.3 - x[, 3, NULL] * 0.5 + torch_randn(n, 1)


### define the network ---------------------------------------------------------

# dimensionality of hidden layer
d_hidden <- 32

model <- nn_sequential(
  nn_linear(d_in, d_hidden),
  nn_relu(),
  nn_linear(d_hidden, d_out)
)

### network parameters ---------------------------------------------------------

learning_rate <- 1e-4

### training loop --------------------------------------------------------------

for (t in 1:200) {
  
  ### -------- Forward pass -------- 
  
  y_pred <- model(x)
  
  ### -------- compute loss -------- 
  loss <- (y_pred - y)$pow(2)$sum()
  if (t %% 10 == 0)
    cat("Epoch: ", t, "   Loss: ", loss$item(), "\n")
  
  ### -------- Backpropagation -------- 
  
  # Zero the gradients before running the backward pass.
  model$zero_grad()
  
  # compute gradient of the loss w.r.t. all learnable parameters of the model
  loss$backward()
  
  ### -------- Update weights -------- 
  
  # Wrap in with_no_grad() because this is a part we DON'T want to record
  # for automatic gradient computation
  # Update each parameter by its `grad`
  
  with_no_grad({
    model$parameters %>% purrr::walk(function(param) param$sub_(learning_rate * param$grad))
  })
  
}

Forward passes look much better now. However, we still iterate through the model's parameters, updating each one manually. Besides, you may already have such doubts. torch Provides an abstraction for general loss functions. The next and final article in this series will address both points using: torch Loss and optimizer. See you then!

Posit AI Blog: Using torch modules

Bonus: Interview with Trent Casi, Drone U’s new sales director for PROPS program, on Wingtra, latest in the drone industry and more !!

With Avinox Drive System, DJI takes flight…on two wheels

Ultimate FPV Goggles Guide: Find the Best FPV Headset for Every FPV System

Leave A Reply Cancel Reply

Tennessee college-going rate on the rise

Sexuality in Color: Bodies, Boundaries, and Microaggressions

IVF in zoos ‘could help wild population’

Anant Ambani-Radhika Merchant Sangeet: Couple Dazzle In Abu Jani and Sandeep Khosla Couture

What Is the Biden Campaign’s Theory of Victory Now?

Bonus: Interview with Trent Casi, Drone U’s new sales director for PROPS program, on Wingtra, latest in the drone industry and more !!

The Eyes Have It – LYDIA SARFATI

Borderline Personality Disorder: Types and Treatment

UK Comedian Tony Knight Dies In “Freak Accident” At 54

Utah man sentenced for killing missing teen and burying him in remote area

NEA Approves AI Guidance, But It’s Vital for Educators to Tread Carefully

Lindsay Hubbard accuses Dorinda Medley of leaking pregnancy news

Popular Posts

IVF in zoos ‘could help wild population’

Lindsay Hubbard accuses Dorinda Medley of leaking pregnancy news

Researchers discover new T cells, genes related to immune disorders

Most Read

Jessie Diggins is a U.S. cross-country ski powerhouse after 2nd World Cup win : NPR

A Novel Noninvasive Method for Treating Alzheimer’s Disease

Trump Is Trying To Stop Ex-White House Staffers From Testifying In Hush Money Trial

Posit AI Blog: Using torch modules

module

Base module (“layer”)

Container Module (“Model”)

Simple network using modules

Related Posts

Leave A Reply Cancel Reply