NDArray

Being a data scientist by trade, one of the first things I want to do with Rust is accelerate my numerical code. You would be surprised how well optimized numpy is, especially for large arrays, and I’ll be honest that moving to ndarray in Rust alone is not going to make your code faster; it will probably be the same.

Things change completely when you find something numpy doesn’t already do. If you have to jump into basic Python, speed is gone. If you use Numba, you may be able to recover some of that performnce, and if you use Cython, you might get most of it back. Numba is super convenient, and Cython is not terrible to write in either, but the build process can be tricky. Maybe we’ll cover this in other tutorials.

Enter Rust and Ndarray. They are not necessarily better than the best you can do with Numpy and Cython, but they require less code, have a smaller runtime, and overall are easier to think about because there are just less opportunities for leaky abstractions.

Install NDArray

Obviously the intended audience here are folks who know Numpy already so I am papering over some of the things NDArray and Numpy have in common. If you want to see some of the common features to both NDArray and Numpy, they have some documentation on that in NDArray.

There is also a Rust package named Numpy. It’s a binding to Numpy, so it actually uses Numpy underneath, and it’s great for Python-Rust interop (which is surprisingly convenient).

It doesn’t really fill the same use cases as NDArray, which is essentially a reimplementation of Numpy for Rust.

Let’s go ahead and install NDArray, like you would any other library in Rust. I’m using the long form for specifying this dependency because there are quite a few features you might want to tweak, and for example I include Rayon, which is useful for data parallelism. We won’t use it in this particular tutotial but we’ll cover it in another one. We want rand so that we can show off filling an array from it in a moment.

[dependencies]
rand = "*"

[dependencies.ndarray]
version = "0.15"
features = ["rayon"]

What is an NDArray

Arrays in NDArray come in quite a few flavors, which you may find surprising coming from Numpy.

  • They can be any number of dimensions, but you’ll get more convenient syntax and sometimes better performance if you commit to a specific dimensionality at compile time. This is less burdonsome in my experience than it sounds, because usually your code wouldn’t work with different dimensionalities anyway.
  • Array views are distinct from actual arrays, so you can define at a type level some restrictions that prevent mutation. This is absolutely fabulous for defensive programming in APIs. You no longer have to choose between a slow defensive copy or potentially undefined behavior.
  • Reference counted arrays are available, which feel very much like Numpy with garbage collection. These are often not a bad choice for large arrays since skipping even one clone may pay for many reference checks. But they’re totally opt-in and you can keep all your small intermediates fast and cheap with no garbage collection at all.
  • I find the types of NDArrays to be similarly flexible to Numpy but there are still opportunities to upgrade the experience in both worlds, such as the batch string operations that Pandas exposes.
  • There aren’t any fancy string packing methods in either world; in Numpy they are fixed length arrays, or pointers to str objects, and in NDArray they are generally pointers to heap allocated Strings. If you want fancier options you have to build them yourself, or perhaps use Arrow, which does have cool methods for such things.

What does an NDArray do

NDArray does a lot of the same things that Numpy does, and often using very similar syntax.

You can use ndarray::array; to avoid prefixing the array macro, which I’ll do in a few of the next code snippets. It’s not required and merely an issue of convenience.

use ndarray::array;

fn main() {
    let arr = array![[0,1],[2,3]];
    println!("Array squared: \n{}", &arr * &arr);
}

This prints out exactly what you would expect, and it works for all the elementwise operations. Notice that we use & to create a reference, and that is because we want to push NDArray to allocate a new array rather than mutate the existing one in place. If you need a handy reference of how to trigger in-place or out-of-place behavior for owned and barrowed data, the ArrayBase documentation will answer your questions much more authoratitively than I can. If you are familiar

You can iterate ndarrays in a number of ways, which look similar to Python but operate at closer to C speeds:

// This is an example from the ArrayBase documentation
// Fills the whole matrix with ones, in place
let mut a = Array::zeros((10, 10));
for mut row in a.rows_mut() {
    row.fill(1.);
}

Admittedly, that specific example is a little absurd since you could also use Array::ones or call .fill(1.) on the whole matrix. But the point is that scalar operations can be quite speedy because unlike Numpy they won’t usually create any intermediate arrays like Numpy, and unlike basic Python they also don’t require a full fledged garbage collected object for every scalar. Consider creating a few of the following matrices:

// Create a full matrix of 2D waves in one shot
let a = Array::from_shape_fn((1000, 1000), |(i, j)| (i as f32 + j as f32).sin());

// Fill a 2D tensor with random numbers (ndarray_rand can make this more convenient too)
let a = Array::from_shape_fn((182, 826), |_| rand::random::<f32>());

// Tile one array into another
let a = Array::from_shape_fn((1000, 1000), |(i, j)| a[[i % a.nrows(), j & a.ncols()]]);

Mutation

Rust makes a big deal about parallel mutation and it’s important to make sure you understand what you can and can’t mutate at any given time. For that reason, from_shape_fn is a pretty great place to start, because the output of your function will be the new array content and it requires no mutation. You can still do this other ways, though.

// Start with the same source array as before
let a = Array::from_shape_fn((182, 826), |_| rand::random::<f32>());
// But this time start with an empty array
let mut b = Array::zeros((1000, 1000));
// iter() would give us references to the elements,
// indexed_iter() would give us the location and the reference
// and indexed_iter_mut() gives us the location and a reference we can mutate the original with
for ((i, j), elem) in b.indexed_iter_mut() {
    elem = a[[i % a.nrows(), j & a.ncols()]];
}

Slicing

Slices in NDArray are very similar to Numpy, but again with a little different syntax. You have seen how to select a single scalar from an array but getting arbitrary subsets along multiple axes and such can be super powerful, and are also available.

This aspect of both numpy and NDArray are perhaps a bit too detailed for me to cover in the time I have to write this article, but the original documentation does give quite a few examples. I’ll avoid copying them wholesale, you should really check out the docs on this one.

That’s it for this article, let’s check out how to improve performance in the next article.