Considering Rust for scientific software

Time:

This talk is for researchers dissatisfied with the status quo in scientific software.

Based on the criteria of

  1. Correctness
  2. Developer productivity
  3. Performance

I contend that Rust is a unique value proposition for researchers.

There is undoubtedly a large amount of hype about Rust. This talk is not meant to diminish the enormous accomplishments of people using Python, C++, Julia and Fortran. Instead, the goal is to pitch Rust as a viable alternative to how things are done today.

Presented by

  • Max Orok Max Orok

    Max is a mechanical engineering master's student at the University of Ottawa. He is researching Monte Carlo radiation modelling. Over the summer, Max completed a Google Summer of Code placement on the ROOT team at CERN.

  • Resources

    Recordings

    Transcript

    Considering Rust for scientific software

    Bard:
    Max Orok shows science, not fiction
    and Rust ain't no contradiction
    it sure won't spill your beans,
    so use it by all means
    if permitted by your jurisdiction

    Max:
    Hello, everyone. My name Max Orok. This is a talk called�canning Rust for scientific software. This talk is for people who are interested in Rust or maybe looking for an alternative language for their programmings or research. And we're gonna be talking about the current scientific computing ecosystem. But also Rust's place in this ecosystem and where it can help researchers write good code.

    So, I'm a mechanical engineering master's student at the University of Ottawa. And I'm working as a radiation modeling researcher. My primary tool here is actually C++. But I have been using Rust for about a year and a half. I'm also a contract software developer for a company called Mevex which is a Canadian linear accelerator manufacturer.

    So, scientific software is sort of an interesting case where you have these very strict requirements. But often times it's written by a very small team. Maybe even one person or a couple people. And they have very limited time and resources because usually they have other commitments, maybe they're professors or students or researchers. And this is actually a case where correctness of the software is very important. It's possible that scientific papers will be published based on the results. And it's important more so than in other fields that we try and make sure our programs are as bug free as possible.

    And especially when sort of research is sort of on the line based on these results. But it's also an area where performance�programs is very important. Because if your program takes forever to run, it usually means that you're going to have a lot more trouble sort of iterating or maybe running a different kind of analysis. And it's usually a very good thing when your program runs quickly. Because, you know, that means a lot of people can do their jobs a lot quicker. Especially if requirements change and you have to redo a bunch of analysis work.

    So, sort of the final thing with scientific software is that the developers usually have other jobs. And writing software might just be part of someone's job. And they don't actually consider themselves to be expert software engineers. So, these might be people who are primarily physicists or chemists and biologists. Or engineers as well. And for a lot of people, programs are just a means to an end. And sometimes sort of good software engineering practices are thrown out the window. So, sometimes compilation of a program or interpretation is the first and last unit test that it gets. And there's also this idea that if it works, don't touch it. Which I think is sort of a negative idea and I think we should be able to have the ability to refactor or programs to add new features or to improve the performance. And this is sort of an idea I think we need to combat. Especially when we build our own software for people to use.

    So, working in the radiation field and sort of going to engineering school, there's the standard case study of the Therac 25. And sort of the issues with it. So with the Therac 25 was a radiation therapy device manufactured by app atomic Energy of Canada limited. And it was part of six major accidents between 1985 and 1987. And I don't want to minimize the issues with this project.

    So, there were a number of sort of complex factors that went into the problems that the Therac 25 had. You know, there were management issues or, you know, project oversight issues. And allegedly there was only one developer who did the entire software for this machine. But also, investigators did find that data races, or concurrency bugs, in the Therac 25 control software contributed to the accidents. And I think this just goes to show a little bit that software bugs do have real world consequences. And usually it's not this serious. You know? Usually we just have to rerun our code to do another analysis job.

    But it is the case that software does affect real people. And we have to be careful to try and avoid bugs as much as possible. So, moving on to the... the existing scientific landscape we have, Python is sort of the lingua franca, or the language that everybody speaks. And I think this is a very good thing because a lot of new programmers, especially today, their first language is Python. And it's important that they're able to write software in a language they're comfortable with.

    But this also brings some problems because Python is actually usually quite slow of a language. So, when people need performance, they start to reach for languages like C and C++. And these are sort of the bedrock systems programming languages that support Python. And here I'm sort of skipping over a lot of other languages. So, for things written in FORTRAN and Julia, I think all of these languages are very important and they definitely have their place. But I'm not going to talk about them specifically here.

    So, an issue I have with the current landscape of sort of scientific computing is that moving from Python, which is a lot of people's first language, to something like C++, which is, you know, sort of a more performance oriented expert level programming language, this should be a natural step because many popular Python libraries depend on C++ as sort of a backend language. And they're actually written mostly in C++ and sort of wrapped up nicely in Python for people to use. And researcher time is usually very precious. So, a lot of people want to know how to speed up their code or get better performance. And sometimes this is actually a very difficult thing to do in Python. It's necessary to move to another language like C++.

    But unfortunately right now, this is a very difficult transition step. And, you know, there's a lot of factors going on here. And, you know, the two languages are very different, have different goals. But it is a problem because, you know, I've definitely seen people leave project because they don't feel they're up to the task. Or maybe they just abandoned their efforts and sort of keep using Python. And I think here Rust really starts to shine as a viable alternative to C++ because you can achieve the same or very similar performance, but with a kinder, sort of more gentle systems programming language. Explicitly designed for non expert users. And that's what a lot of software developers identify as. So, I think it's sort of a very important use case or possibility for Rust is sort of an alternative backend implementation language. To awe CHIEF certain performance goals. And then, of course, you know, right away maybe there are some important reasons not to use Rust. Given the comparative age of all the languages, Rust is relatively young. It's only 5 years old, and only 5 years since the 1.0 release, and Python is 30 years old and C++ is around 40. It's likely they're going to be around a lot longer also. Rust has this notion of there being a lit of a learning curve associated with it.

    But I think it's easier to get up and running good code in Rust than in other languages. The compiler does a good job of guiding you away from sort of unsafe ways of doing things. And sort of more into a correct way of doing things. And especially for beginners, I think this is very helpful. So, I know that the first few months of my writing C++, it certainly wasn't very good, and I was making all sorts of out of the bounds errors and other issues that just wouldn't happen in Rust.

    Another issue, of course, is that you already have a large codebase written in another language. And the saying is that a lot of times the right tool for the job is the one you're already using. And I think this is definitely the case. And I don't think people should be rewriting their projects completely. But I would say, you know, maybe if there's a... if there's a new component and you're sort of looking for an alternative language, I think Rust is a really good choice for this. Another point might be that there's an important library that you depend on that's actually missing on the Rust side of things. And, you know, this is definitely a valid concern.

    Rust ecosystem is smaller than that of Python, of course. Python's is enormous. And that of C++ just because it's younger. But there are ways to access Python and C++ code in Rust as well. And finally, you have things like concerns about a single vendor. So, there really is only one viable Rust compiler right now even though there's work ongoing to add it to GCC. But I will say that the Rust team has done a very good job of supporting the Rust compiler on a lot of platforms. Of course, the three major operating systems. But also a variety of other platforms and you can definitely run it on a lot of systems. So, for me, though, Rust is exciting because it really aligns with my goals as a researcher. I want to write the fastest code as I can with as few bugs as possible. And I want both of those things at once. And it's a bit of a vague goal. But Rust here really helps me because entire classes of bugs are eliminated compared to another sort of unsafe systems programming language. And this means I can actually focus my time on developing a better algorithm or actually doing some other work and not having to worry about bugs as much as I would in another language.

    The other thing is that it's a productive modern programming language with a lot of developer conveniences. But it also has competitive performance to languages like C and C++. And I think this is probably the most important point for me is that it's a language explicitly designed for non expert users. And I think other languages cater to different audiences. So, I think in some regards C++ really cares about its expert developers. And Rust does too. But it also spends a lot of time making sure that the language is sort of suitable for non expert programmers which is often the case for scientific researchers. Who maybe do not identify as experts.

    And finally, there's built in documentation and testing. And this is sort of an area where I don't really want to spend any time sort of wrangling external tools or, you know, fixing issues with them. And also, having sort of an integrated package manager in Rust is a game changer. Because personally I consider time spent writing build system code to be a necessary evil, and I want to minimize as much of that as possible. So, Rust's sort of first class dependency management is really important to me and it sort of... it lets me do... focus my time on more important things.

    So, sort of jumping right in, just to some of the Rust features I find very useful for writing numerical code. I want to sort of preface this by saying that more than one feature... I think it's the sum total of these features which is important. So, you can sort of get analogs of these features using different flags for C and C++. But it's sort of how they're all baked into the language and they're on by default, which is really important. You don't need to know which special flags to pass to your compiler. These are all turned on right away. Right off the bat, we have no implicit conversions between primitive types. At the top, we're dividing two integers and getting a floating number out. And Rust is stopping us and saying there's mismatched types here. And we expected a floating point number, F64, but we found integers.

    This is a very common beginner mistake. And it's nice that it's caught right away here. It's not the most complex bug. But having it caught and sort of addressed right away is a big deal. This can be a little bit noisy sometimes. Here we're trying to convert between a 32 bit unsigned integer and convert it into the platform's size of integer and Rust is not happy here either because it wants us to do an explicit conversion where we try and... we try and convert the number, but if it wouldn't fit, we stopped execution.

    And so, this is, you know, a little bit noisy up front. But this also catches real bugs. Especially if we are running on something like a 16 bit platform where this would certainly be a bug. So, having these things sort of caught up front is really important because the more things that you catch or compile time, the less you have to worry about at runtime. And this is sort of a theme within Rust and it's something that the type system really helps with. So, there's also this notion of very safe defaults to a lot of operations. And it's what's sort of contributes to Rust being a memory safe language. So, as much as possible, it's not going to let you do unsafe operations. And oftentimes, the convenient thing, or the thing people default to, is the safe method. There are ways of saying I actually know what I want to do here, I want to do this specifically. But for the most part, I think safe defaults are a good choice. Especially for beginners.

    Here we have an example with a vector with three elements and we're going to try to access the tenth element. And, of course, this is a bug. And sort of the natural default way of using these brackets to access the element is sort of the safe default way. And we see here that we actually get a panic, which is sort of like Rust's way of, you know, winding down the system and stopping everything and exiting. So, we have panic, and it says the length of this vector is 3, we are trying to get the tenth element. Of course this is a bug. But right now a lot of performance oriented developers are saying, okay. Sometimes I know that my index is correct, and I don't want to pay the cost of bounds checking. Fine. Okay. Let's go ahead and do that.

    So, Rust also has these opt in low level control features where we can, you know, we can do the... sort of the quick or maybe the performance oriented thing, but we have to tell people that we're doing it. And Rust's way of doing this is is using these unsafe blocks. If there's a potentially memory unsafe operation going on.

    So, it's the same sort of examples. We have vector with three elements and we're trying to get the tenth one. But right away, it's a lot noisier. We have this unsafe block. Okay. Something unsafe is potentially happening here. You know? It's sort of... programmer's way of saying, okay, compiler, get out of my way, I really want to do this. But for people reviewing your code, it's very helpful. Because you can right away go to the unsafe block and the reviewing is shrunk because you look at the unsafe blocks and see if they're okay. And here, of course, this is not okay. We're trying to get the tenth element of a vector with only three.

    And, of course, this is gonna give us a garbage answer. And Rust documentation does a really good job of saying, this is actually not recommended. And use it with caution. And this is, you know... this unsafe block is sort of the visual equivalent of that. It's saying something potentially dangerous is happening here. And just be extra careful when you're using it. And having this... this opt in low level control is what sets Rust apart from a lot of other memory safe languages. Because a lot of times you really do know what you're doing. And Rust will say, you know, go ahead, no problem.

    But like I said, the unsafe block is sort of very helpful here. Because it reduces the... the onus on the code reviewer or yourself to look at where potentially dangerous things are happening. And I think another feature of Rust that really sort of is good for numerical programmers especially is that floating point numbers are treated with a lot of caution. And, you know, there are entire books written on handling floating point numbers correctly. And I think this is a good choice in a lot of cases.

    Here we have some potentially surprising code where we're adding .1 to itself three times. And if that's equal to .3, we're going to print out got.3. But otherwise, we're going to print "Got something else." And so, this is a bit of a common beginner mistake. You don't really want to trust one point numbers, this is got something else. It's slightly different than .3 when .1 is added to itself. And Rust is doing a good job and saying that floating point types cannot be used in patterns because this is not a very good way of doing things. And there's better ways of achieving the same result. This is going to be an error in later versions of the compiler.

    And this is sort of a bug where it may not be obvious right away, but having it caught at compile time is a big deal. And so, sometimes this can be a little bit annoying. So, the sort of default way of sorting floating point numbers doesn't actually work. So, if you're trying to sort this vector of floating point numbers, you'll come up with an error. There's a boundary that's not satisfied. There's a reason for this. Generally the reason is not a number, or the infinity values might be tricky to have a total ordering because the nan or non number value is not equal to itself. There's all sorts of these here. And, of course, you can sort floating point numbers in Rust. There's a standard way of doing it and it's in the Rust Cookbook as well. Myself personally, I prefer, if I have to do a little bit more code at the source, and which saves me from bugs later on, this is a tradeoff that I'm comfortable making and I would like to make in my code.

    But another thing that Rust does really well is actually, it's quite a good prototyping language or debugging language. Especially given that it's also a low level programming language. So, here we have a custom data structure called cool data. And we have these vectors of floating point numbers in it. But we also, you know, when you're writing code and prototyping, you really want to print out the value of your Ada often and sort of see what's going on to it, you know, what's happening during execution. And Rust does a really good job here.

    So, we can add this one line to our code, and it says, essentially, give me a debug representation of my structure. And then we can call this debug method and have printed out a really nice representation of our data. And this is great for prototyping. Because, you know, I just want to see what's happening and I want to sort of step through my code. And it's a very useful thing. And I'm using this all the time. And it's a very common pattern for people to use. So, another thing that makes writing scientific code very... sort of really helps it, is that integrated testing in Rust's package manager means that I think tests are going to be much more likely to be written. We have some sort of math expression here and we're testing it against this sort of known value.

    And without any external tools, we can write a unit test and check it right away. And this really removes a lot of the friction around testing. Especially compared to other languages where you might need an external framework. And removing friction means that people are gonna do it a lot more and it's sort of an easier tool to do. I find myself writing unit tests much more frequently in Rust than I do in C++ where it's a little bit more tricky. And in particular, I think documentation tests are really a killer feature for scientific code. Because a lot of scientific code, you need a lot of examples. And this is a way to make sure that your examples compile even if you change your code. So, here we have some documentation tests where it's sort of the same example as before.

    But this will actually be published as part of our documentation. And having this ability to write example code and... but also use it as documentation is sort of... is a really big deal. Because you can really do two things at once. And this will also ensure that your example code doesn't go out of date. Which may be a big deal if you're refactoring your project. So, sort of taken together, Rust's safety guarantees and the fundamentals of the language have a large qualitative impact on what kind of code we're capable of writing. And I'm just gonna use the example of data races are these sort of concurrency issues in multithreading code.

    So, in Safe Rust, you are actually guaranteed an absence of these data races. And this is a simple yes or no answer. Whereas a language like C++, we go to the C++ Core guidelines and the best that we can get from C++ today is this maybe. You know, maybe your code doesn't have a data race. Or maybe it does. And for me, as someone who just wants to write the code, this is 10 times or an order of magnitude better... or an order of magnitude worse than the simple yes or no answer that Rust gives us. And it's sort of a difference in what kind of code we're comfortable writing.

    And so, the thing that sets Rust apart is that software engineering best practices are built into the language in core tools. And I think that choosing Rust is going to have the biggest impact on small, resource constrained teams who don't identify as expert software developers. And Rust's place in scientific computing is a language with the speed and power of C++. But it's also a systems language explicitly designed for non experts and it's designed to lower barriers. It's a companion and complement language to C and C++. So, there's many tradeoffs between these languages. I see myself using them all in the future. And there's no one correct choice here. But Rust's foundational values help us to write good software. Thank you so much for listening.

    Yuli:
    Hi. Thank you. That was a great talk.

    Max:
    Hello. Thanks. Thanks for all your help. Thanks for the intro.

    Yuli:
    We have a couple questions. For example, here. What do you think about a natural next step to speed up Python code?

    Max:
    Yes. I think there are definitely a lot of alternatives here. Personally, I haven't done a lot of Cython myself. But I think it's not, you know... having the Rust ecosystem is also a really important thing. And having these�examples of different ways to do things or being able to pull a lot of dependencies into your project and sort of experiment with them I think is also a really important feature that Rust offers as a language. And it is sort of this maybe a little bit of a fresh start for some people.

    Yuli:
    Okay. Good. Okay. I have another question. What are your favorite ways to integrate Rust with Python? If any?

    Max:
    Yes, I've definitely played around a bit with the Py O3 project, the Py Oxide Project. This is a really nice way you can do a... you can integrate Rust into Python just by exposing it as sort of a Python module. Or you can also use the Python code in Rust as well.

    Yuli:
    Okay. Well, we don't have more time, sorry. But we can continue with the Q&A in the chat. If anyone has another question in the chat, you can answer all the questions. Thank you so much.

    Max:
    Thank you. Thank you very much.