Example 1: Tensors and Operations#

Welcome to Nabla! This example introduces the core building block of the library: the Tensor. Nabla tensors are lazy by default — operations build a computation graph that is evaluated only when you request the result (e.g., by printing or calling .realize()).

Let’s start by importing Nabla and NumPy.

[1]:
import numpy as np

import nabla as nb

print("Nabla imported successfully!")
Nabla imported successfully!

1. Creating Tensors#

There are several ways to create tensors in Nabla:

  1. From NumPy arrays via nb.Tensor.from_dlpack() (works with any DLPack source)

  2. Factory functions like nb.zeros(), nb.ones(), nb.arange(), nb.uniform()

  3. Constants via nb.constant()

[2]:
# From NumPy arrays
np_array = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], dtype=np.float32)
x = nb.Tensor.from_dlpack(np_array)
print("From NumPy:")
print(x)
print(f"  Shape: {x.shape}, Dtype: {x.dtype}\n")
From NumPy:
Tensor(
  [[1. 2. 3.]
   [4. 5. 6.]] : f32[2,3]
)
  Shape: [Dim(2), Dim(3)], Dtype: DType.float32

[3]:
# Factory functions — zeros, ones, full
z = nb.zeros((2, 3))
o = nb.ones((2, 3))
f = nb.full((2, 3), 3.14)
print("Zeros:", z)
print("Ones: ", o)
print("Full: ", f)
Zeros: Tensor(
  [[0. 0. 0.]
   [0. 0. 0.]] : f32[2,3]
)
Ones:  Tensor(
  [[1. 1. 1.]
   [1. 1. 1.]] : f32[2,3]
)
Full:  Tensor(
  [[3.14 3.14 3.14]
   [3.14 3.14 3.14]] : f32[2,3]
)
[4]:
# Ranges and random tensors
r = nb.arange(0, 6, dtype=nb.DType.float32)
u = nb.uniform((2, 3), low=-1.0, high=1.0)
g = nb.gaussian((2, 3), mean=0.0, std=1.0)
print("Arange:  ", r)
print("Uniform: ", u)
print("Gaussian:", g)
Arange:   Tensor([0. 1. 2. 3. 4. 5.] : f32[6])
Uniform:  Tensor(
  [[ 0.5962  0.889  -0.9222]
   [ 0.1495  0.7381  0.8013]] : f32[2,3]
)
Gaussian: Tensor(
  [[ 1.6811  2.3331 -0.2512]
   [ 0.8896  1.6362 -1.9282]] : f32[2,3]
)
[5]:
# Constants from python lists (via numpy)
c = nb.constant(np.array([10.0, 20.0, 30.0], dtype=np.float32))
print("Constant:", c)
Constant: Tensor([10. 20. 30.] : f32[3])

2. Tensor Properties#

Every tensor carries metadata about its shape, dtype, and device.

[6]:
x = nb.uniform((3, 4, 5))
print(f"Shape:  {x.shape}")
print(f"Dtype:  {x.dtype}")
print(f"Device: {x.device}")
print(f"Rank:   {x.ndim}")
Shape:  [Dim(3), Dim(4), Dim(5)]
Dtype:  DType.float32
Device: Device(type=cpu,id=0)
Rank:   3

3. Arithmetic Operations#

Nabla supports standard arithmetic via Python operators and named functions. All operations are lazy — they build a graph that is evaluated on demand.

[7]:
a = nb.Tensor.from_dlpack(np.array([1.0, 2.0, 3.0], dtype=np.float32))
b = nb.Tensor.from_dlpack(np.array([4.0, 5.0, 6.0], dtype=np.float32))

print("a:    ", a)
print("b:    ", b)
print("a + b:", a + b)
print("a - b:", a - b)
print("a * b:", a * b)
print("a / b:", a / b)
print("a ** 2:", a ** 2)
a:     Tensor([1. 2. 3.] : f32[3])
b:     Tensor([4. 5. 6.] : f32[3])
a + b: Tensor([5. 7. 9.] : f32[3])
a - b: Tensor([-3. -3. -3.] : f32[3])
a * b: Tensor([ 4. 10. 18.] : f32[3])
a / b: Tensor([0.25 0.4  0.5 ] : f32[3])
a ** 2: Tensor([1. 4. 9.] : f32[3])
[8]:
# Named function equivalents
print("nb.add(a, b):", nb.add(a, b))
print("nb.mul(a, b):", nb.mul(a, b))
print("nb.sub(a, b):", nb.sub(a, b))
print("nb.div(a, b):", nb.div(a, b))
nb.add(a, b): Tensor([5. 7. 9.] : f32[3])
nb.mul(a, b): Tensor([ 4. 10. 18.] : f32[3])
nb.sub(a, b): Tensor([-3. -3. -3.] : f32[3])
nb.div(a, b): Tensor([0.25 0.4  0.5 ] : f32[3])

4. Element-wise Unary Operations#

Nabla provides a rich set of element-wise functions.

[9]:
x = nb.Tensor.from_dlpack(np.array([0.0, 0.5, 1.0, 1.5, 2.0], dtype=np.float32))
print("x:       ", x)
print("exp(x):  ", nb.exp(x))
print("log(x+1):", nb.log(x + 1.0))
print("sqrt(x): ", nb.sqrt(x))
print("sin(x):  ", nb.sin(x))
print("cos(x):  ", nb.cos(x))
print("tanh(x): ", nb.tanh(x))
x:        Tensor([0.  0.5 1.  1.5 2. ] : f32[5])
exp(x):   Tensor([1.     1.6487 2.7183 4.4817 7.3891] : f32[5])
log(x+1): Tensor([0.     0.4055 0.6931 0.9163 1.0986] : f32[5])
sqrt(x):  Tensor([0.     0.7071 1.     1.2247 1.4142] : f32[5])
sin(x):   Tensor([0.     0.4794 0.8415 0.9975 0.9093] : f32[5])
cos(x):   Tensor([ 1.      0.8776  0.5403  0.0707 -0.4161] : f32[5])
tanh(x):  Tensor([0.     0.4621 0.7616 0.9051 0.964 ] : f32[5])
[10]:
# Activation functions
x = nb.Tensor.from_dlpack(np.array([-2.0, -1.0, 0.0, 1.0, 2.0], dtype=np.float32))
print("x:       ", x)
print("relu(x): ", nb.relu(x))
print("sigmoid:", nb.sigmoid(x))
print("gelu(x): ", nb.gelu(x))
print("silu(x): ", nb.silu(x))
x:        Tensor([-2. -1.  0.  1.  2.] : f32[5])
relu(x):  Tensor([0. 0. 0. 1. 2.] : f32[5])
sigmoid: Tensor([0.1192 0.2689 0.5    0.7311 0.8808] : f32[5])
gelu(x):  Tensor([-0.0455 -0.1587  0.      0.8413  1.9545] : f32[5])
silu(x):  Tensor([-0.2384 -0.2689  0.      0.7311  1.7616] : f32[5])

5. Matrix Operations#

Matrix multiplication is a first-class operation in Nabla.

[11]:
A = nb.uniform((3, 4))
B = nb.uniform((4, 5))
C = nb.matmul(A, B)  # or A @ B
print(f"A shape: {A.shape}")
print(f"B shape: {B.shape}")
print(f"A @ B shape: {C.shape}")
print("A @ B:\n", C)
A shape: [Dim(3), Dim(4)]
B shape: [Dim(4), Dim(5)]
A @ B shape: [Dim(3), Dim(5)]
A @ B:
 Tensor(
  [[1.3086 1.1829 0.6232 1.087  1.0454]
   [2.0064 1.206  1.0642 1.251  1.3388]
   [1.5473 0.8017 0.8984 1.0166 1.047 ]] : f32[3,5]
)
[12]:
# Batched matmul
batch_A = nb.uniform((2, 3, 4))
batch_B = nb.uniform((2, 4, 5))
batch_C = batch_A @ batch_B
print(f"Batched matmul: {batch_A.shape} @ {batch_B.shape} = {batch_C.shape}")
Batched matmul: [Dim(2), Dim(3), Dim(4)] @ [Dim(2), Dim(4), Dim(5)] = [Dim(2), Dim(3), Dim(5)]
[13]:
# Outer product via broadcasting: v1[:, None] * v2[None, :]
v1 = nb.Tensor.from_dlpack(np.array([1.0, 2.0, 3.0], dtype=np.float32))
v2 = nb.Tensor.from_dlpack(np.array([4.0, 5.0], dtype=np.float32))
outer = nb.unsqueeze(v1, axis=1) * nb.unsqueeze(v2, axis=0)
print(f"Outer product ({v1.shape} x {v2.shape}):")
print(outer)
Outer product ([Dim(3)] x [Dim(2)]):
Tensor(
  [[ 4.  5.]
   [ 8. 10.]
   [12. 15.]] : f32[3,2]
)

6. Reduction Operations#

Reduce along one or more axes (or all axes for a scalar result).

[14]:
x = nb.Tensor.from_dlpack(
    np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], dtype=np.float32)
)
print("x:\n", x)
print()

# Full reductions
print("sum(x):  ", nb.reduce_sum(x))
print("mean(x): ", nb.mean(x))
print("max(x):  ", nb.reduce_max(x))
print("min(x):  ", nb.reduce_min(x))
x:
 Tensor(
  [[1. 2. 3.]
   [4. 5. 6.]] : f32[2,3]
)

sum(x):   Tensor(21. : f32[])
mean(x):  Tensor(3.5 : f32[])
max(x):   Tensor(6. : f32[])
min(x):   Tensor(1. : f32[])
[15]:
# Axis-specific reductions
print("sum(axis=0):", nb.reduce_sum(x, axis=0))  # Sum columns
print("sum(axis=1):", nb.reduce_sum(x, axis=1))  # Sum rows
print("mean(axis=1):", nb.mean(x, axis=1))
print("max(axis=0): ", nb.reduce_max(x, axis=0))
sum(axis=0): Tensor([5. 7. 9.] : f32[3])
sum(axis=1): Tensor([ 6. 15.] : f32[2])
mean(axis=1): Tensor([2. 5.] : f32[2])
max(axis=0):  Tensor([4. 5. 6.] : f32[3])
[16]:
# keepdims preserves the reduced dimension
print("sum(axis=1, keepdims=True):", nb.reduce_sum(x, axis=1, keepdims=True))
print(f"  Shape: {nb.reduce_sum(x, axis=1, keepdims=True).shape}")
sum(axis=1, keepdims=True): Tensor(
  [[ 6.]
   [15.]] : f32[2,1]
)
  Shape: [Dim(2), Dim(1)]
[17]:
# Argmax / Argmin
print("argmax(axis=1):", nb.argmax(x, axis=1))
print("argmin(axis=0):", nb.argmin(x, axis=0))
argmax(axis=1): Tensor([2 2] : i64[2])
argmin(axis=0): Tensor([0 0 0] : i64[3])

7. Shape Manipulation#

Nabla supports reshaping, transposing, squeezing, and more — all as lazy ops.

[18]:
x = nb.arange(0, 12, dtype=nb.DType.float32)
print(f"Original: shape={x.shape}")
print(x)

# Reshape
r = nb.reshape(x, (3, 4))
print(f"\nReshaped to (3, 4):")
print(r)

# Flatten
f = nb.flatten(r)
print(f"\nFlattened back: shape={f.shape}")
Original: shape=[Dim(12)]
Tensor([ 0.  1.  2. ...  9. 10. 11.] : f32[12])

Reshaped to (3, 4):
Tensor(
  [[ 0.  1.  2.  3.]
   [ 4.  5.  6.  7.]
   [ 8.  9. 10. 11.]] : f32[3,4]
)

Flattened back: shape=[Dim(12)]
[19]:
# Transpose and permute
m = nb.uniform((2, 3, 4))
print(f"Original shape:   {m.shape}")
print(f"Swap axes (1,2):  {nb.swap_axes(m, 1, 2).shape}")
print(f"Permute (2,0,1):  {nb.permute(m, (2, 0, 1)).shape}")
print(f"Move axis 2→0:    {nb.moveaxis(m, 2, 0).shape}")
Original shape:   [Dim(2), Dim(3), Dim(4)]
Swap axes (1,2):  [Dim(2), Dim(4), Dim(3)]
Permute (2,0,1):  [Dim(4), Dim(2), Dim(3)]
Move axis 2→0:    [Dim(4), Dim(2), Dim(3)]
[20]:
# Squeeze and unsqueeze
x = nb.ones((1, 3, 1, 4))
print(f"Original:       {x.shape}")
print(f"Squeeze(0):     {nb.squeeze(x, axis=0).shape}")
print(f"Squeeze(2):     {nb.squeeze(x, axis=2).shape}")

y = nb.ones((3, 4))
print(f"Unsqueeze(0):   {nb.unsqueeze(y, axis=0).shape}")
print(f"Unsqueeze(1):   {nb.unsqueeze(y, axis=1).shape}")
Original:       [Dim(1), Dim(3), Dim(1), Dim(4)]
Squeeze(0):     [Dim(3), Dim(1), Dim(4)]
Squeeze(2):     [Dim(1), Dim(3), Dim(4)]
Unsqueeze(0):   [Dim(1), Dim(3), Dim(4)]
Unsqueeze(1):   [Dim(3), Dim(1), Dim(4)]

8. Concatenation and Stacking#

[21]:
a = nb.ones((2, 3))
b = nb.zeros((2, 3))
print("Concatenate (axis=0):", nb.concatenate([a, b], axis=0).shape)
print("Concatenate (axis=1):", nb.concatenate([a, b], axis=1).shape)
print("Stack (axis=0):      ", nb.stack([a, b], axis=0).shape)
print("Stack (axis=1):      ", nb.stack([a, b], axis=1).shape)
Concatenate (axis=0): [Dim(4), Dim(3)]
Concatenate (axis=1): [Dim(2), Dim(6)]
Stack (axis=0):       [Dim(2), Dim(2), Dim(3)]
Stack (axis=1):       [Dim(2), Dim(2), Dim(3)]

9. Broadcasting#

Nabla follows NumPy broadcasting rules.

[22]:
x = nb.uniform((3, 1))
y = nb.uniform((1, 4))
z = x + y  # Broadcasts to (3, 4)
print(f"x: {x.shape} + y: {y.shape} = z: {z.shape}")
print(z)
x: [Dim(3), Dim(1)] + y: [Dim(1), Dim(4)] = z: [Dim(3), Dim(4)]
Tensor(
  [[1.5786 1.5875 0.9695 1.2206]
   [1.725  1.7339 1.1159 1.367 ]
   [0.8194 0.8283 0.2104 0.4614]] : f32[3,4]
)
[23]:
# Explicit broadcast
v = nb.Tensor.from_dlpack(np.array([1.0, 2.0, 3.0], dtype=np.float32))
b = nb.broadcast_to(v, (4, 3))
print(f"Broadcast {v.shape}{b.shape}:")
print(b)
Broadcast [Dim(3)] → [Dim(4), Dim(3)]:
Tensor(
  [[1. 2. 3.]
   [1. 2. 3.]
   [1. 2. 3.]
   [1. 2. 3.]] : f32[4,3]
)

10. Type Casting#

[24]:
x = nb.ones((3,), dtype=nb.DType.float32)
print(f"Original dtype: {x.dtype}")

x_int = nb.cast(x, nb.DType.int32)
print(f"Cast to int32:  {x_int.dtype}")

x_f64 = nb.cast(x, nb.DType.float64)
print(f"Cast to float64: {x_f64.dtype}")
Original dtype: DType.float32
Cast to int32:  DType.int32
Cast to float64: DType.float64

11. Comparisons and Logical Operations#

[25]:
a = nb.Tensor.from_dlpack(np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float32))
b = nb.Tensor.from_dlpack(np.array([2.0, 2.0, 4.0, 3.0], dtype=np.float32))

print("a:", a)
print("b:", b)
print("a == b:", nb.equal(a, b))
print("a > b: ", nb.greater(a, b))
print("a < b: ", nb.less(a, b))
print("a >= b:", nb.greater_equal(a, b))
a: Tensor([1. 2. 3. 4.] : f32[4])
b: Tensor([2. 2. 4. 3.] : f32[4])
a == b: Tensor([False  True False False] : bool[4])
a > b:  Tensor([False False False  True] : bool[4])
a < b:  Tensor([ True False  True False] : bool[4])
a >= b: Tensor([False  True False  True] : bool[4])
[26]:
# Where (conditional select)
mask = nb.greater(a, b)
result = nb.where(mask, a, b)  # Pick a where a > b, else b
print("where(a > b, a, b):", result)
where(a > b, a, b): Tensor([2. 2. 4. 4.] : f32[4])

12. Softmax#

[27]:
logits = nb.Tensor.from_dlpack(
    np.array([[2.0, 1.0, 0.1], [0.5, 2.0, 0.3]], dtype=np.float32)
)
probs = nb.softmax(logits, axis=-1)
print("Logits:\n", logits)
print("Softmax:\n", probs)
print("Row sums:", nb.reduce_sum(probs, axis=-1))
Logits:
 Tensor(
  [[2.  1.  0.1]
   [0.5 2.  0.3]] : f32[2,3]
)
Softmax:
 Tensor(
  [[0.659  0.2424 0.0986]
   [0.1587 0.7113 0.1299]] : f32[2,3]
)
Row sums: Tensor([1. 1.] : f32[2])

Summary#

In this example you learned how to:

  • Create tensors from NumPy arrays, factory functions, and constants

  • Perform arithmetic, element-wise, and matrix operations

  • Reduce tensors along axes (sum, mean, max, min, argmax)

  • Manipulate shapes (reshape, transpose, squeeze, unsqueeze)

  • Use broadcasting, type casting, comparisons, and softmax

All operations are lazy — they build a computation graph that’s evaluated on demand. This enables powerful optimizations when combined with @nb.compile.