Example 1: Tensors and Operations#
Welcome to Nabla! This example introduces the core building block of the library: the Tensor. Nabla tensors are lazy by default — operations build a computation graph that is evaluated only when you request the result (e.g., by printing or calling .realize()).
Let’s start by importing Nabla and NumPy.
[1]:
import numpy as np
import nabla as nb
print("Nabla imported successfully!")
Nabla imported successfully!
1. Creating Tensors#
There are several ways to create tensors in Nabla:
From NumPy arrays via
nb.Tensor.from_dlpack()(works with any DLPack source)Factory functions like
nb.zeros(),nb.ones(),nb.arange(),nb.uniform()Constants via
nb.constant()
[2]:
# From NumPy arrays
np_array = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], dtype=np.float32)
x = nb.Tensor.from_dlpack(np_array)
print("From NumPy:")
print(x)
print(f" Shape: {x.shape}, Dtype: {x.dtype}\n")
From NumPy:
Tensor(
[[1. 2. 3.]
[4. 5. 6.]] : f32[2,3]
)
Shape: [Dim(2), Dim(3)], Dtype: DType.float32
[3]:
# Factory functions — zeros, ones, full
z = nb.zeros((2, 3))
o = nb.ones((2, 3))
f = nb.full((2, 3), 3.14)
print("Zeros:", z)
print("Ones: ", o)
print("Full: ", f)
Zeros: Tensor(
[[0. 0. 0.]
[0. 0. 0.]] : f32[2,3]
)
Ones: Tensor(
[[1. 1. 1.]
[1. 1. 1.]] : f32[2,3]
)
Full: Tensor(
[[3.14 3.14 3.14]
[3.14 3.14 3.14]] : f32[2,3]
)
[4]:
# Ranges and random tensors
r = nb.arange(0, 6, dtype=nb.DType.float32)
u = nb.uniform((2, 3), low=-1.0, high=1.0)
g = nb.gaussian((2, 3), mean=0.0, std=1.0)
print("Arange: ", r)
print("Uniform: ", u)
print("Gaussian:", g)
Arange: Tensor([0. 1. 2. 3. 4. 5.] : f32[6])
Uniform: Tensor(
[[ 0.5962 0.889 -0.9222]
[ 0.1495 0.7381 0.8013]] : f32[2,3]
)
Gaussian: Tensor(
[[ 1.6811 2.3331 -0.2512]
[ 0.8896 1.6362 -1.9282]] : f32[2,3]
)
[5]:
# Constants from python lists (via numpy)
c = nb.constant(np.array([10.0, 20.0, 30.0], dtype=np.float32))
print("Constant:", c)
Constant: Tensor([10. 20. 30.] : f32[3])
2. Tensor Properties#
Every tensor carries metadata about its shape, dtype, and device.
[6]:
x = nb.uniform((3, 4, 5))
print(f"Shape: {x.shape}")
print(f"Dtype: {x.dtype}")
print(f"Device: {x.device}")
print(f"Rank: {x.ndim}")
Shape: [Dim(3), Dim(4), Dim(5)]
Dtype: DType.float32
Device: Device(type=cpu,id=0)
Rank: 3
3. Arithmetic Operations#
Nabla supports standard arithmetic via Python operators and named functions. All operations are lazy — they build a graph that is evaluated on demand.
[7]:
a = nb.Tensor.from_dlpack(np.array([1.0, 2.0, 3.0], dtype=np.float32))
b = nb.Tensor.from_dlpack(np.array([4.0, 5.0, 6.0], dtype=np.float32))
print("a: ", a)
print("b: ", b)
print("a + b:", a + b)
print("a - b:", a - b)
print("a * b:", a * b)
print("a / b:", a / b)
print("a ** 2:", a ** 2)
a: Tensor([1. 2. 3.] : f32[3])
b: Tensor([4. 5. 6.] : f32[3])
a + b: Tensor([5. 7. 9.] : f32[3])
a - b: Tensor([-3. -3. -3.] : f32[3])
a * b: Tensor([ 4. 10. 18.] : f32[3])
a / b: Tensor([0.25 0.4 0.5 ] : f32[3])
a ** 2: Tensor([1. 4. 9.] : f32[3])
[8]:
# Named function equivalents
print("nb.add(a, b):", nb.add(a, b))
print("nb.mul(a, b):", nb.mul(a, b))
print("nb.sub(a, b):", nb.sub(a, b))
print("nb.div(a, b):", nb.div(a, b))
nb.add(a, b): Tensor([5. 7. 9.] : f32[3])
nb.mul(a, b): Tensor([ 4. 10. 18.] : f32[3])
nb.sub(a, b): Tensor([-3. -3. -3.] : f32[3])
nb.div(a, b): Tensor([0.25 0.4 0.5 ] : f32[3])
4. Element-wise Unary Operations#
Nabla provides a rich set of element-wise functions.
[9]:
x = nb.Tensor.from_dlpack(np.array([0.0, 0.5, 1.0, 1.5, 2.0], dtype=np.float32))
print("x: ", x)
print("exp(x): ", nb.exp(x))
print("log(x+1):", nb.log(x + 1.0))
print("sqrt(x): ", nb.sqrt(x))
print("sin(x): ", nb.sin(x))
print("cos(x): ", nb.cos(x))
print("tanh(x): ", nb.tanh(x))
x: Tensor([0. 0.5 1. 1.5 2. ] : f32[5])
exp(x): Tensor([1. 1.6487 2.7183 4.4817 7.3891] : f32[5])
log(x+1): Tensor([0. 0.4055 0.6931 0.9163 1.0986] : f32[5])
sqrt(x): Tensor([0. 0.7071 1. 1.2247 1.4142] : f32[5])
sin(x): Tensor([0. 0.4794 0.8415 0.9975 0.9093] : f32[5])
cos(x): Tensor([ 1. 0.8776 0.5403 0.0707 -0.4161] : f32[5])
tanh(x): Tensor([0. 0.4621 0.7616 0.9051 0.964 ] : f32[5])
[10]:
# Activation functions
x = nb.Tensor.from_dlpack(np.array([-2.0, -1.0, 0.0, 1.0, 2.0], dtype=np.float32))
print("x: ", x)
print("relu(x): ", nb.relu(x))
print("sigmoid:", nb.sigmoid(x))
print("gelu(x): ", nb.gelu(x))
print("silu(x): ", nb.silu(x))
x: Tensor([-2. -1. 0. 1. 2.] : f32[5])
relu(x): Tensor([0. 0. 0. 1. 2.] : f32[5])
sigmoid: Tensor([0.1192 0.2689 0.5 0.7311 0.8808] : f32[5])
gelu(x): Tensor([-0.0455 -0.1587 0. 0.8413 1.9545] : f32[5])
silu(x): Tensor([-0.2384 -0.2689 0. 0.7311 1.7616] : f32[5])
5. Matrix Operations#
Matrix multiplication is a first-class operation in Nabla.
[11]:
A = nb.uniform((3, 4))
B = nb.uniform((4, 5))
C = nb.matmul(A, B) # or A @ B
print(f"A shape: {A.shape}")
print(f"B shape: {B.shape}")
print(f"A @ B shape: {C.shape}")
print("A @ B:\n", C)
A shape: [Dim(3), Dim(4)]
B shape: [Dim(4), Dim(5)]
A @ B shape: [Dim(3), Dim(5)]
A @ B:
Tensor(
[[1.3086 1.1829 0.6232 1.087 1.0454]
[2.0064 1.206 1.0642 1.251 1.3388]
[1.5473 0.8017 0.8984 1.0166 1.047 ]] : f32[3,5]
)
[12]:
# Batched matmul
batch_A = nb.uniform((2, 3, 4))
batch_B = nb.uniform((2, 4, 5))
batch_C = batch_A @ batch_B
print(f"Batched matmul: {batch_A.shape} @ {batch_B.shape} = {batch_C.shape}")
Batched matmul: [Dim(2), Dim(3), Dim(4)] @ [Dim(2), Dim(4), Dim(5)] = [Dim(2), Dim(3), Dim(5)]
[13]:
# Outer product via broadcasting: v1[:, None] * v2[None, :]
v1 = nb.Tensor.from_dlpack(np.array([1.0, 2.0, 3.0], dtype=np.float32))
v2 = nb.Tensor.from_dlpack(np.array([4.0, 5.0], dtype=np.float32))
outer = nb.unsqueeze(v1, axis=1) * nb.unsqueeze(v2, axis=0)
print(f"Outer product ({v1.shape} x {v2.shape}):")
print(outer)
Outer product ([Dim(3)] x [Dim(2)]):
Tensor(
[[ 4. 5.]
[ 8. 10.]
[12. 15.]] : f32[3,2]
)
6. Reduction Operations#
Reduce along one or more axes (or all axes for a scalar result).
[14]:
x = nb.Tensor.from_dlpack(
np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], dtype=np.float32)
)
print("x:\n", x)
print()
# Full reductions
print("sum(x): ", nb.reduce_sum(x))
print("mean(x): ", nb.mean(x))
print("max(x): ", nb.reduce_max(x))
print("min(x): ", nb.reduce_min(x))
x:
Tensor(
[[1. 2. 3.]
[4. 5. 6.]] : f32[2,3]
)
sum(x): Tensor(21. : f32[])
mean(x): Tensor(3.5 : f32[])
max(x): Tensor(6. : f32[])
min(x): Tensor(1. : f32[])
[15]:
# Axis-specific reductions
print("sum(axis=0):", nb.reduce_sum(x, axis=0)) # Sum columns
print("sum(axis=1):", nb.reduce_sum(x, axis=1)) # Sum rows
print("mean(axis=1):", nb.mean(x, axis=1))
print("max(axis=0): ", nb.reduce_max(x, axis=0))
sum(axis=0): Tensor([5. 7. 9.] : f32[3])
sum(axis=1): Tensor([ 6. 15.] : f32[2])
mean(axis=1): Tensor([2. 5.] : f32[2])
max(axis=0): Tensor([4. 5. 6.] : f32[3])
[16]:
# keepdims preserves the reduced dimension
print("sum(axis=1, keepdims=True):", nb.reduce_sum(x, axis=1, keepdims=True))
print(f" Shape: {nb.reduce_sum(x, axis=1, keepdims=True).shape}")
sum(axis=1, keepdims=True): Tensor(
[[ 6.]
[15.]] : f32[2,1]
)
Shape: [Dim(2), Dim(1)]
[17]:
# Argmax / Argmin
print("argmax(axis=1):", nb.argmax(x, axis=1))
print("argmin(axis=0):", nb.argmin(x, axis=0))
argmax(axis=1): Tensor([2 2] : i64[2])
argmin(axis=0): Tensor([0 0 0] : i64[3])
7. Shape Manipulation#
Nabla supports reshaping, transposing, squeezing, and more — all as lazy ops.
[18]:
x = nb.arange(0, 12, dtype=nb.DType.float32)
print(f"Original: shape={x.shape}")
print(x)
# Reshape
r = nb.reshape(x, (3, 4))
print(f"\nReshaped to (3, 4):")
print(r)
# Flatten
f = nb.flatten(r)
print(f"\nFlattened back: shape={f.shape}")
Original: shape=[Dim(12)]
Tensor([ 0. 1. 2. ... 9. 10. 11.] : f32[12])
Reshaped to (3, 4):
Tensor(
[[ 0. 1. 2. 3.]
[ 4. 5. 6. 7.]
[ 8. 9. 10. 11.]] : f32[3,4]
)
Flattened back: shape=[Dim(12)]
[19]:
# Transpose and permute
m = nb.uniform((2, 3, 4))
print(f"Original shape: {m.shape}")
print(f"Swap axes (1,2): {nb.swap_axes(m, 1, 2).shape}")
print(f"Permute (2,0,1): {nb.permute(m, (2, 0, 1)).shape}")
print(f"Move axis 2→0: {nb.moveaxis(m, 2, 0).shape}")
Original shape: [Dim(2), Dim(3), Dim(4)]
Swap axes (1,2): [Dim(2), Dim(4), Dim(3)]
Permute (2,0,1): [Dim(4), Dim(2), Dim(3)]
Move axis 2→0: [Dim(4), Dim(2), Dim(3)]
[20]:
# Squeeze and unsqueeze
x = nb.ones((1, 3, 1, 4))
print(f"Original: {x.shape}")
print(f"Squeeze(0): {nb.squeeze(x, axis=0).shape}")
print(f"Squeeze(2): {nb.squeeze(x, axis=2).shape}")
y = nb.ones((3, 4))
print(f"Unsqueeze(0): {nb.unsqueeze(y, axis=0).shape}")
print(f"Unsqueeze(1): {nb.unsqueeze(y, axis=1).shape}")
Original: [Dim(1), Dim(3), Dim(1), Dim(4)]
Squeeze(0): [Dim(3), Dim(1), Dim(4)]
Squeeze(2): [Dim(1), Dim(3), Dim(4)]
Unsqueeze(0): [Dim(1), Dim(3), Dim(4)]
Unsqueeze(1): [Dim(3), Dim(1), Dim(4)]
8. Concatenation and Stacking#
[21]:
a = nb.ones((2, 3))
b = nb.zeros((2, 3))
print("Concatenate (axis=0):", nb.concatenate([a, b], axis=0).shape)
print("Concatenate (axis=1):", nb.concatenate([a, b], axis=1).shape)
print("Stack (axis=0): ", nb.stack([a, b], axis=0).shape)
print("Stack (axis=1): ", nb.stack([a, b], axis=1).shape)
Concatenate (axis=0): [Dim(4), Dim(3)]
Concatenate (axis=1): [Dim(2), Dim(6)]
Stack (axis=0): [Dim(2), Dim(2), Dim(3)]
Stack (axis=1): [Dim(2), Dim(2), Dim(3)]
9. Broadcasting#
Nabla follows NumPy broadcasting rules.
[22]:
x = nb.uniform((3, 1))
y = nb.uniform((1, 4))
z = x + y # Broadcasts to (3, 4)
print(f"x: {x.shape} + y: {y.shape} = z: {z.shape}")
print(z)
x: [Dim(3), Dim(1)] + y: [Dim(1), Dim(4)] = z: [Dim(3), Dim(4)]
Tensor(
[[1.5786 1.5875 0.9695 1.2206]
[1.725 1.7339 1.1159 1.367 ]
[0.8194 0.8283 0.2104 0.4614]] : f32[3,4]
)
[23]:
# Explicit broadcast
v = nb.Tensor.from_dlpack(np.array([1.0, 2.0, 3.0], dtype=np.float32))
b = nb.broadcast_to(v, (4, 3))
print(f"Broadcast {v.shape} → {b.shape}:")
print(b)
Broadcast [Dim(3)] → [Dim(4), Dim(3)]:
Tensor(
[[1. 2. 3.]
[1. 2. 3.]
[1. 2. 3.]
[1. 2. 3.]] : f32[4,3]
)
10. Type Casting#
[24]:
x = nb.ones((3,), dtype=nb.DType.float32)
print(f"Original dtype: {x.dtype}")
x_int = nb.cast(x, nb.DType.int32)
print(f"Cast to int32: {x_int.dtype}")
x_f64 = nb.cast(x, nb.DType.float64)
print(f"Cast to float64: {x_f64.dtype}")
Original dtype: DType.float32
Cast to int32: DType.int32
Cast to float64: DType.float64
11. Comparisons and Logical Operations#
[25]:
a = nb.Tensor.from_dlpack(np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float32))
b = nb.Tensor.from_dlpack(np.array([2.0, 2.0, 4.0, 3.0], dtype=np.float32))
print("a:", a)
print("b:", b)
print("a == b:", nb.equal(a, b))
print("a > b: ", nb.greater(a, b))
print("a < b: ", nb.less(a, b))
print("a >= b:", nb.greater_equal(a, b))
a: Tensor([1. 2. 3. 4.] : f32[4])
b: Tensor([2. 2. 4. 3.] : f32[4])
a == b: Tensor([False True False False] : bool[4])
a > b: Tensor([False False False True] : bool[4])
a < b: Tensor([ True False True False] : bool[4])
a >= b: Tensor([False True False True] : bool[4])
[26]:
# Where (conditional select)
mask = nb.greater(a, b)
result = nb.where(mask, a, b) # Pick a where a > b, else b
print("where(a > b, a, b):", result)
where(a > b, a, b): Tensor([2. 2. 4. 4.] : f32[4])
12. Softmax#
[27]:
logits = nb.Tensor.from_dlpack(
np.array([[2.0, 1.0, 0.1], [0.5, 2.0, 0.3]], dtype=np.float32)
)
probs = nb.softmax(logits, axis=-1)
print("Logits:\n", logits)
print("Softmax:\n", probs)
print("Row sums:", nb.reduce_sum(probs, axis=-1))
Logits:
Tensor(
[[2. 1. 0.1]
[0.5 2. 0.3]] : f32[2,3]
)
Softmax:
Tensor(
[[0.659 0.2424 0.0986]
[0.1587 0.7113 0.1299]] : f32[2,3]
)
Row sums: Tensor([1. 1.] : f32[2])
Summary#
In this example you learned how to:
Create tensors from NumPy arrays, factory functions, and constants
Perform arithmetic, element-wise, and matrix operations
Reduce tensors along axes (sum, mean, max, min, argmax)
Manipulate shapes (reshape, transpose, squeeze, unsqueeze)
Use broadcasting, type casting, comparisons, and softmax
All operations are lazy — they build a computation graph that’s evaluated on demand. This enables powerful optimizations when combined with @nb.compile.