7c0h

I need your help liking Rust

If you're a software developer, you know the rules: new year, new programming language. For 2020 I chose Rust because it has a lot going for it:

  • Memory safe and high performance
  • Designed for low-level tasks
  • Backed by Mozilla, of which I'm a fan
  • Named "most loved programming language" for the fourth year in a row by the Stack Overflow Annual Survey

With all of these advantages in mind, I set to build something concrete: an implementation of the Aho-Corasick algorithm. This algorithm, at its most basic, builds a Trie and then converts it into an automaton, with the final result being efficient search of sub-strings (why? I hope I can write about why in the near future). It also seemed like the type of problem you'd like to tackle with Rust: implementing a Trie in C requires some liberal use of pointers, a task for which I had expected Rust to be the right tool (memory safety!). And since I need to run a lot of text through it, I need it to be as fast as possible.

So how did I fare? Two weeks into this project, Rust and I have... issues. More specifically, I'm having real trouble figuring out what is Rust good for.

Part I: Pointers and Strings are too complicated

Dealing with pointers is straight up painful, because allocating a piece of memory and linking it to something else gets very difficult very fast. I followed this book, titled "Learn Rust With Entirely Too Many Linked Lists", and the opening alone warns me that programming a linked list requires learning "the following pointer types: &, &mut, Box, Rc, Arc, *const, and *mut". A Reddit thread, on the other hand, suggests that a doubly-linked list is straightforward - all you need to do is declare your type as Option<Weak<RefCell<Node<T>>>>. Please note that neither Option, weak, nor RefCell are mentioned in the previous implementation...

So, pointers are out as killer feature. If optimizing memory usage is not its strong point, then maybe "regular" programming is? Could I do the rest of my text handling with Rust? Sadly, dealing with Strings is not great either. Sure, I get it, Unicode is weird. And I can understand why the difference between characters and graphemes is there. But if the Rust developers thought long and hard about this, why is "get me the first grapheme of this String" so difficult? And why isn't such a common operation part of the standard library?

For the record, this is a rhetorical question - the answer to "how do I iterate over graphemes" (found here) teaches us that...

  • ... the developers don't want to commit to a specific method of doing this, because Unicode is complicated and they don't want to have to support it forever. If you want to do it, you have to pick an external library. But it won't be part of the standard library anytime soon. At the same time, ...
  • ... they don't want to "play favorites" with any specific library over any other, meaning that no trace of a specific method is to be found in the official documentation.

The result, then, is puzzling: the experts who designed the system don't want to take care of it, the official doc won't tell you who is doing it right (or, more critical, who is doing it wrong and should be avoided), and you are essentially on your own.

Part II: the community

If we've learn anything from the String case, is that "just Google it" is a valid development strategy when dealing with Rust. This leads us inevitably to A sad day for Rust, an event that took place earlier this year and highlighted how bad the Reddit side of the community can be. To quote the previous article,

the Rust subreddit has ~87,000 subscribers (...) while Reddit is not official, and so not linked to by any official resources, it’s still a very large group of people, and so to suggest it’s "not the Rust community" in some way is both true and very not true.

So, why did I bring this up? Because the Reddit thread I mentioned above displays two hallmarks of the type of community I don't want to be a member of:

  • the attitude of "it's very simple, all you need to create a new node is self.last.as_ref().unwrap().borrow().next.as_ref().unwrap().clone()"
  • the other attitude, where the highest rated comment is the one that includes nice bits like "The only surprising thing about this blog post is that even though it's 2018 and there are many Rust resources available for beginners, people are still try to learn the language by implementing a high-performance pointer chasing trie data-structure". The fact that people may come to Rust because that's the type of projects a systems language is supposedly good for seems to escape them.

If you're a beginner like me, now you know: there is a good community out there. And it would be unfair for me to ignore that other forums, both official and not, are much more welcoming and understanding. But you need to double check.

Part III: minor annoyances

I really, really wish Rust would stop using new terms for concepts that already exist: abstract methods are "traits", static methods are "associated functions", "variables" are by default not-variable (and not to be confused with constants), and any non-trivial data type is actually a struct with implementation blocks.

And down to the very, very end of the scale, the trailing commas at the end of match expressions, the 4-spaces indentation, and the official endorsement of 1TBS instead of K&R (namely, 4-spaces instead of Tabs) are just plain ugly. Unlike Python, however, Rust does get extra points for allowing other, less-wrong styles.

Part IV: not all is hopeless

Unlike previous rants, I want to point out something very important: I want to like Rust. I'm sure it's very good at something, and I really, really want to find what it is. It would be very sad if the answer to "what is Rust good for?" ended up being "writing Rust compilers".

Now, the official documentation (namely, the book) closes with a tutorial on how to build a multi-threaded web server, which is probably the key point: if Rust claims that error handling and memory safety are its main priorities are true, then multi-threaded code could be the main use case. So there's hope that everything will get easier once I manage to get my strings inside my Trie, and iterating over Gb of text will be amazingly easy.

I'll keep you updated.