What are Rust's exact auto-dereferencing rules?

ghz 21days ago ⋅ 11 views

I'm learning/experimenting with Rust, and in all the elegance that I find in this language, there is one peculiarity that baffles me and seems totally out of place.

Rust automatically dereferences pointers when making method calls. I made some tests to determine the exact behaviour:

struct X { val: i32 }
impl std::ops::Deref for X {
    type Target = i32;
    fn deref(&self) -> &i32 { &self.val }
}

trait M { fn m(self); }
impl M for i32   { fn m(self) { println!("i32::m()");  } }
impl M for X     { fn m(self) { println!("X::m()");    } }
impl M for &X    { fn m(self) { println!("&X::m()");   } }
impl M for &&X   { fn m(self) { println!("&&X::m()");  } }
impl M for &&&X  { fn m(self) { println!("&&&X::m()"); } }

trait RefM { fn refm(&self); }
impl RefM for i32  { fn refm(&self) { println!("i32::refm()");  } }
impl RefM for X    { fn refm(&self) { println!("X::refm()");    } }
impl RefM for &X   { fn refm(&self) { println!("&X::refm()");   } }
impl RefM for &&X  { fn refm(&self) { println!("&&X::refm()");  } }
impl RefM for &&&X { fn refm(&self) { println!("&&&X::refm()"); } }


struct Y { val: i32 }
impl std::ops::Deref for Y {
    type Target = i32;
    fn deref(&self) -> &i32 { &self.val }
}

struct Z { val: Y }
impl std::ops::Deref for Z {
    type Target = Y;
    fn deref(&self) -> &Y { &self.val }
}


#[derive(Clone, Copy)]
struct A;

impl M for    A { fn m(self) { println!("A::m()");    } }
impl M for &&&A { fn m(self) { println!("&&&A::m()"); } }

impl RefM for    A { fn refm(&self) { println!("A::refm()");    } }
impl RefM for &&&A { fn refm(&self) { println!("&&&A::refm()"); } }


fn main() {
    // I'll use @ to denote left side of the dot operator
    (*X{val:42}).m();        // i32::m()    , Self == @
    X{val:42}.m();           // X::m()      , Self == @
    (&X{val:42}).m();        // &X::m()     , Self == @
    (&&X{val:42}).m();       // &&X::m()    , Self == @
    (&&&X{val:42}).m();      // &&&X:m()    , Self == @
    (&&&&X{val:42}).m();     // &&&X::m()   , Self == *@
    (&&&&&X{val:42}).m();    // &&&X::m()   , Self == **@
    println!("-------------------------");

    (*X{val:42}).refm();     // i32::refm() , Self == @
    X{val:42}.refm();        // X::refm()   , Self == @
    (&X{val:42}).refm();     // X::refm()   , Self == *@
    (&&X{val:42}).refm();    // &X::refm()  , Self == *@
    (&&&X{val:42}).refm();   // &&X::refm() , Self == *@
    (&&&&X{val:42}).refm();  // &&&X::refm(), Self == *@
    (&&&&&X{val:42}).refm(); // &&&X::refm(), Self == **@
    println!("-------------------------");

    Y{val:42}.refm();        // i32::refm() , Self == *@
    Z{val:Y{val:42}}.refm(); // i32::refm() , Self == **@
    println!("-------------------------");

    A.m();                   // A::m()      , Self == @
    // without the Copy trait, (&A).m() would be a compilation error:
    // cannot move out of borrowed content
    (&A).m();                // A::m()      , Self == *@
    (&&A).m();               // &&&A::m()   , Self == &@
    (&&&A).m();              // &&&A::m()   , Self == @
    A.refm();                // A::refm()   , Self == @
    (&A).refm();             // A::refm()   , Self == *@
    (&&A).refm();            // A::refm()   , Self == **@
    (&&&A).refm();           // &&&A::refm(), Self == @
}

(Playground)

So, it seems that, more or less:

  • The compiler will insert as many dereference operators as necessary to invoke a method.

  • The compiler, when resolving methods declared using

    &self
    

    (call-by-reference):

    • First tries calling for a single dereference of self
    • Then tries calling for the exact type of self
    • Then, tries inserting as many dereference operators as necessary for a match
  • Methods declared using self (call-by-value) for type T behave as if they were declared using &self (call-by-reference) for type &T and called on the reference to whatever is on the left side of the dot operator.

  • The above rules are first tried with raw built-in dereferencing, and if there's no match, the overload with Deref trait is used.

What are the exact auto-dereferencing rules? Can anyone give any formal rationale for such a design decision?

Answer

In Rust, auto-dereferencing is a feature of the language that helps make method calls more ergonomic, particularly when working with references. It allows the compiler to automatically dereference references when trying to call methods on them. This can feel surprising at first because it's automatic and non-explicit, but it follows a strict set of rules. I'll break down how auto-dereferencing works and the rationale behind this design decision.

Auto-Dereferencing Rules

When you call a method on a value, the Rust compiler will try to find the appropriate method based on the type of the left-hand side (LHS) of the method call (i.e., the receiver). If the LHS is a reference (e.g., &T, &&T, etc.), the compiler will automatically dereference the reference as needed to match a method. Here are the key rules that the compiler follows:

  1. Method Resolution Order:

    • First, the compiler will attempt to resolve the method on the exact type of the receiver (i.e., &T, &&T, etc.). This is essentially matching the method signature to the exact type of the receiver.
    • Second, if that fails, the compiler will try dereferencing the receiver (i.e., &T, &&T, &&&T, etc.) repeatedly until it finds a match or exhausts all possibilities.
  2. Dereferencing Process:

    • The compiler will insert dereferences as needed to match the method signature, as long as the Deref trait is implemented for the type. For example, if the LHS is a &X (a reference to X), and there's a method defined on X, the compiler will automatically dereference the reference to call the method on X.
    • The compiler checks for the dereference from left to right, meaning it will first try calling the method on &X, then on X, then on &T, and so on.
  3. Deref Coercion:

    • Rust uses the Deref trait (which is defined for types like String, Vec<T>, etc.) to perform this automatic dereferencing. If a type implements Deref, Rust can coerce a reference of that type (&T) into a reference of the target type (&U), and the compiler will try to dereference as many times as necessary.
    • For example, if X implements Deref<Target = i32>, then calling a method on &X will first attempt to resolve the method on &i32, and if that doesn't work, it will try i32, because &X can be automatically dereferenced to &i32 and then i32.
  4. Call-by-Value Methods:

    • Methods that take self (by value) will behave as if they are &self methods for the dereferenced type. For example, if a method is defined for &X, calling it on X will effectively pass &X as the receiver after an implicit dereference.
  5. Refinement on Method Calls:

    • The method resolution process involves checking the LHS (the receiver) and progressively trying dereferencing it. The compiler attempts to match the method on different levels of indirection: T, &T, &&T, etc., trying to find a match for self and &self variants.

The Deref Trait

The Deref trait is key to auto-dereferencing. It's used to define how a type can be dereferenced to another type, and this is how Rust allows a method that accepts &T to be called on &U if U implements Deref<Target = T>.

For example, consider the following code:

struct X { val: i32 }
impl std::ops::Deref for X {
    type Target = i32;
    fn deref(&self) -> &i32 { &self.val }
}

In this case, Deref allows &X to be automatically dereferenced to &i32 when calling a method that expects &i32. Rust will automatically insert dereferencing operators when necessary, even if you use &X, &&X, etc.

Rationale Behind Auto-Dereferencing

The design decision behind automatic dereferencing has several motivations, all aimed at improving usability and ergonomics:

  1. Ease of Use:

    • Rust is designed to be a systems programming language that’s safe and efficient, yet ergonomic. By automatically dereferencing when calling methods on references, Rust allows developers to write less verbose code without sacrificing safety.
    • Without this feature, you would often need to manually dereference references before calling methods, which would make the code unnecessarily verbose and harder to read.
  2. Simplicity in Code:

    • Auto-dereferencing simplifies the syntax, as you don't have to manually dereference a reference to call a method on the underlying type. For example, you don’t have to explicitly write (*x).m() or &(*x).m(). Instead, you just write x.m(), and the compiler handles the dereferencing for you.
  3. Consistency:

    • This feature makes calling methods consistent, whether you are working with owned values or references. The dereferencing behavior is uniform, and the same methods can work on references of various levels (e.g., &X, &&X, etc.), as long as the appropriate Deref trait is implemented.
  4. Allowing Overloading with Deref:

    • The ability to automatically dereference allows more flexibility in how methods are defined. For example, you can define methods for both X and &X (and any number of references) without having to overload the method for each specific reference type.
    • This means that if a type implements Deref, the type can be used more naturally with different levels of indirection, making the code cleaner and more intuitive.

A Formal Explanation

Rust’s auto-dereferencing rules are deeply tied to Rust’s ownership and borrowing model, which ensures that references are always valid and cannot outlive the data they point to. Auto-dereferencing is designed to make working with references and smart pointers like Box<T>, Rc<T>, and Arc<T> more intuitive without requiring explicit dereferencing for method calls. It leverages the Deref trait to allow seamless and safe conversions between types, enabling methods to be called on references of various levels without the need for explicit dereferencing.

To summarize:

  • Automatic dereferencing allows the compiler to insert dereferencing as needed to resolve methods, reducing boilerplate and improving usability.
  • It works with both call-by-value and call-by-reference methods, making code cleaner and more consistent.
  • The Deref trait enables this mechanism, allowing references to be coerced into different types through dereferencing.

This feature is a deliberate design choice that enhances both ergonomics and flexibility in Rust, making the language more powerful while still enforcing safety and ownership guarantees.