I am trying to understand how dot product works for dimensions more than 2.
Which is equivalent to:
dot(a, b)[i,j] = sum(a[i,:] * b[:,j])
When including more dimensions, we still sum over the last axis of a and second-last axis of b.
dot(a, b)[a1,a2,..., b1,b2,..., i,j] = sum(
a[a1,a2,...,i,:] *
b[b1,b2,...,:,j]
)
In some sense, the additional dimensions for a and b can each be imagined as a "multi-dimensional array" of many matrices. And dot performs as many standard matrix multiplications as it can with the two "multi-dimensional arrays" of many matrices.

