【 Next 】

The fourth shut

The fourth shut the video version: www.bilibili.com/video/BV1S6…

Starting with this level, we will touch on the processing of generic parameters, which is a little exciting

The code for the Debug Trait we generated in previous levels has no generic parameters, so for structures with generic parameters of the following form:

struct GeekKindergarten<T> {
    blog: T,
    ideawand: i32,
    com: bool,}Copy the code

Our generated code should look like this

impl<T> Debug for GeekKindergarten<T> {
    / /...
}
Copy the code

However, since we only use the structure identifier (GeekKindergarten) in the code template, instead of the generic parameter information (

), the generated code will look like this:

impl Debug for GeekKindergarten {
// ^---- is missing the generic argument -----^
}
Copy the code

The generic parameter syntax tree node is linked to:

  • Docs. Rs/syn / 1.0 / syn…

The syntax tree node provides a utility function that helps us split the generic argument into three pieces for generating the IMPL block:

  • Docs. Rs/syn / 1.0 / syn…

In addition, he also gives another sample program code base address, it demonstrates how to deal with the generic parameter, recommend to look at, but, after all, this link is only pure code, nothing, so you still need to finish my essay, pay attention to my WeChat public number 】 【 geek kindergarten ~ sample project address:

  • Github.com/dtolnay/syn…

Let’s focus on the use of the split_for_impl() utility function. Let’s say we have a generic structure whose generic parameters T and U are Bound by the traits Blog, IdeaWand, and Com, respectively:

struct GeekKindergarten<T, U> where T: Blog + IdeaWand, U: Com {}
Copy the code

The form of the Generated Debug Trait should look like this:

impl<T,U> Debug for GeekKindergarten<T, U> where T: Blog + IdeaWand + Debug, U: Com + Debug {
 // ^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^
 // | | | |
 / / | | + -- -- -- -- -- -- -- -- -- the third part -- -- -- -- -- -- -- -- -- -- +
 / / | + -- the second part
 // +------
}
Copy the code

The split_for_impl() utility function is used to help us generate the above three code fragments, and the above restriction also appears Debug, which we need to add manually later, but split_for_impl() can not help us generate them, so I did not mark them.

Ok, let’s summarize what we need to do in level 4 and give some sample code:

  • fromDeriveInputSyntax tree node gets generic parameter information
  • Add one for each generic parameterDebugTrait is limited
  • usesplit_for_impl()The utility function shards out three pieces of template-generated code
  • Modify theimplBlock template code, using the above three fragments, to add generic parameter information
  • And because of the currentgenerate_debug_trait()The function was already verbose, so we also tweaked the structure of the code to split it into two functions.

Here’s the code, starting with the new split function:

fn generate_debug_trait_core(st :&syn::DeriveInput) -> syn::Result<proc_macro2::TokenStream> {
    letfields = get_fields_from_derive_input(st)? ;let struct_name_ident = &st.ident;
    let struct_name_literal = struct_name_ident.to_string();
    let mutfmt_body_stream = proc_macro2::TokenStream::new(); fmt_body_stream.extend(quote! ( fmt.debug_struct(#struct_name_literal) ));for field in fields.iter(){
        let field_name_idnet = field.ident.as_ref().unwrap();
        let field_name_literal = field_name_idnet.to_string();
        
        let mut format_str = "{:? }".to_string();
        if let Some(format) = get_custom_format_of_field(field)? {
            format_str = format;
        } 
        // No user - defined format is specifiedfmt_body_stream.extend(quote! ( .field(#field_name_literal, &format_args!(#format_str, self.#field_name_idnet)) )); } fmt_body_stream.extend(quote! ( .finish() ));return Ok(fmt_body_stream)
}
Copy the code

Read the comments in detail and add the Debug Trait qualification code to the impL block. I didn’t know how to write it at first. This is a reference to the heapsize project example in level 4:

fn generate_debug_trait(st: &syn::DeriveInput) -> syn::Result<proc_macro2::TokenStream> {

    letfmt_body_stream = generate_debug_trait_core(st)? ;let struct_name_ident = &st.ident;

    Gets the generic information for the modified input structure from the derived macro syntax tree node of the input
    let mut generics_param_to_modify = st.generics.clone();
    // We need to add a 'Debug' Trait qualification for each generic parameter
    for mut g in generics_param_to_modify.params.iter_mut() {
        if letsyn::GenericParam::Type(t) = g { t.bounds.push(parse_quote! (std::fmt::Debug)); }}// Use utility functions to extract the generics into 3 pieces
    let (impl_generics, type_generics, where_clause) = generics_param_to_modify.split_for_impl();

    letret_stream = quote! (// Notice how the following line uses three code snippets related to generic parameters
        impl #impl_generics std::fmt::Debug for #struct_name_ident #type_generics #where_clause {
            fn fmt(&self, fmt: &mut std::fmt::Formatter) -> std::fmt::Result {
                #fmt_body_stream
            }
        }
    );

    return Ok(ret_stream)
}
Copy the code

5 off

The fifth shut video version: www.bilibili.com/video/BV1v4…

The instructions for this level are very long and informative, so let’s take a closer look:

First, a question is thrown, for example, for the following structure:

pub struct GeekKindergarten<T> {
    ideawand: PhantomData<T>,
}
Copy the code

The PhantomData type is used in this structure, and the PhantomData type itself implements the Debug Trait in the standard library, as shown below:

impl<T: ?Sized> Debug forPhantomData<T> {... }Copy the code

In this case, we don’t have to qualify T to be Debug. One way to deal with generic constructs that have Debug implemented themselves is to generate traits that, instead of limiting each generic parameter, restrict the type of each field in the structure. Again, use the following structure as an example:

pub struct GeekKindergarten<T, U> {
    blog: Foo<U>,
    ideawand: PhantomData<T>,
}
Copy the code

The original qualification for our code generation is:

  • T: Debug, U: Debug

Instead, we should now generate:

  • Foo<U>: Debug, PhantomData<T>: Debug

However, this limitation can have serious side effects that you will encounter in later levels, so the solution hints point out another method:

  • becausePhantomDataThe use of type is all too common, so let’s putPhantomDataThis type, as a special case, we check to see if there is onePhantomDataIf so, let’s see if the generic parameter it uses is only inPhantomDataIf it is, we don’t add itDebugThe limits.
  • In the later levels, we will doPhantomDataIn other cases, provide an escape hatch to mark the type limit for a field.

Following the above tips, the author also introduces some design trade-offs for Rust process macros:

  • In Rust process macros, you will never get a Trait qualification exactly right, because the assumption is that to do this, you need to do naming resolution when you expand the process macro. Doing so leads to a sharp increase in the complexity of the Rust compiler
  • The Rust core team considers this trade-off to be so beneficial that it has no plans to support naming resolution when macros are expanded
  • Escape Hatch is a common tool for resolving problems
  • Another, more common approach is to defer the execution of name resolution to the actual compilation phase through Rust’s Trait mechanism
    • Pay special attention to the test case code of this level to see how the procedure macro call can be unknownSRefer to theStringType of case generated out can be calledStringThe type ofDebugTrait implementation.

Before we start writing code, let’s look at the main logic of the new code. For example, if we have such a generic structure, our procedure macro should behave like:

struct GeekKindergarten<T, U, V, W> {
    blog: T,
    ideawand: PhantomData<U>,
    com: U,
    foo: PhantomData<V>,
    bar: Baz<W>,
}
Copy the code
  • For T, because it doesn’t appear inPhantomData, then you need to increase TDebugqualified
  • For U, although it appears inPhantomDataIn, but because it simultaneously acts directlycomThe type of the field, so you still need to addDebugqualified
  • For V, meet the special conditions set in this level, do not addDebugqualified
  • For W, because it’s not therePhantomDataIn the generic parameter, so theDebugqualified

As you can see, to implement the above logic, we need to get the type name before <> and the type name inside <>. All that is left is to determine whether the string corresponding to these type names satisfies the various combination conditions.

We’ll start by defining a function that gets the name of the generic parameter PhantomData. This function extracts the X from PhantomData

as a string:

fn get_phantomdata_generic_type_name(field: &syn::Field) -> syn::Result<Option<String> > {if let syn::Type::Path(syn::TypePath{path: syn::Path{refsegments, .. },.. }) = field.ty {if let Some(syn::PathSegment{ref ident, ref arguments}) = segments.last() {
            if ident == "PhantomData" {
                if letsyn::PathArguments::AngleBracketed(syn::AngleBracketedGenericArguments{args, .. }) = arguments {if let Some(syn::GenericArgument::Type(syn::Type::Path( ref gp))) = args.first() {
                        if let Some(generic_ident) = gp.path.segments.first() {
                            return Ok(Some(generic_ident.ident.to_string()))
                        }
                    }
                }
            }
        }
    }
    return Ok(None)}Copy the code

We then define a function that returns as a string the name of the type where XXX is located in the structure definition foo:XXX or foo:XXX

:

fn get_field_type_name(field: &syn::Field) -> syn::Result<Option<String> > {if let syn::Type::Path(syn::TypePath{path: syn::Path{refsegments, .. },.. }) = field.ty {if let Some(syn::PathSegment{refident,.. }) = segments.last() {return Ok(Some(ident.to_string()))
        }
    }
    return Ok(None)}Copy the code

Then we’ll modify the code for the generate_debug_trait() function. Read the comments carefully:

fn generate_debug_trait(st: &syn::DeriveInput) -> syn::Result<proc_macro2::TokenStream> {

    letfmt_body_stream = generate_debug_trait_core(st)? ;let struct_name_ident = &st.ident;

    
    let mut generics_param_to_modify = st.generics.clone();

    // The following code builds two lists, one of the generic parameters used in 'PhantomData' and the other of the type names of all the fields in the input structure
    letfields = get_fields_from_derive_input(st)? ;let mut field_type_names = Vec::new();
    let mut phantomdata_type_param_names = Vec::new();
    for field in fields{
        if let Some(s) = get_field_type_name(field)? {
            field_type_names.push(s);
        }
        if let Some(s) = get_phantomdata_generic_type_name(field)? { phantomdata_type_param_names.push(s); }}for mut g in generics_param_to_modify.params.iter_mut() {
        if let syn::GenericParam::Type(t) = g {
            let type_param_name = t.ident.to_string();
            // Pay attention to the logic of this condition, the essence of the condition, try to see if you can organize the above 4 conditions into this condition
            // In the case of PhantomData, do not add constraints to the generic parameter 'T' itself, unless 'T' itself is used directly
            ifphantomdata_type_param_names.contains(&type_param_name) && ! field_type_names.contains(&type_param_name) {continue; } t.bounds.push(parse_quote! (std::fmt::Debug)); }}// < omit unmodified code >............
    
}
Copy the code

Before finishing the fifth level, let’s recall the questions left by the questionmaker:

In the test case of level 5, how can a procedure macro call produce a DebugTrait implementation that calls String without knowing that S refers to String?

In fact, the problem is very simple, you just have to remember one thing, procedure macros, in fact, are playing a game of string substitution concatenation, when the procedure macro is executed, although we parse it into a syntax tree, syntax is just a constraint on the form of string arrangement, there is no concept of type. You just need to be able to generate string permutations that conform to Rust’s syntax. The real symbol resolution, type checking, and so on are done later in the compilation phase.

The sixth clearance

This level shows a problem with a solution that was discarded in level 5. Our code can pass level 6 without modification.

The seventh shut

The seventh shut video version: www.bilibili.com/video/BV1Gq…

The seventh level deals with association types. From the tip in level 7, one of the main tasks is to find syntax-tree nodes of type SYN ::TypePath:

  • The Path length is greater than or equal to 2
  • The first item of its Path is one of the generic parameter lists

According to Rust’s syntax, the types of associations we face can have the following form:

pub trait Trait {
    type Value;    // Define an association type
}

pub struct GeekKindergarten<T:Trait> {
    blog: T::Value,
    ideawand: PhantomData<T::Value>,
    com: Foo<Bar<Baz<T::Value>>>,
}
Copy the code

In other words, the code fragment we are looking for, such as T::Value, may be nested at a very deep level. Based on previous experience, we might want to write a recursive function that nested several layers of if conditions through the syntax tree. Is there a more elegant way to write this? Fortunately, the SYN library provides us with the VISIT mode to access nodes in the syntax tree that you are interested in.

By default, Visit is not enabled in the SYN library. According to the first page of the SYN document, Visit must be added to cargo. Toml. So we need to update cargo. Toml first.

Instructions for use of Visit mode can be found in the official documentation: docs. Rs/SYN /1.0.64/…

The core principle of the Visit mode is that it defines a Trait called Visit, which contains the callback function corresponding to hundreds of syntax tree nodes of each type. When the Visit mode traverses the syntax tree, it invokes the corresponding callback function every time it reaches a syntax tree node of each type. In level 7, we only want to filter out all syn::TypePath nodes, so we just need to implement the corresponding callback function and determine whether the current node meets the above requirements. You can take a look at the official documentation given by the example, here I will directly give the relevant code implementation:

First is the definition of Visitor:

use syn::visit::{self, Visit};

// Define a structure that implements the Visit Trait. The structure defines fields that store filter criteria and filter results
struct TypePathVisitor {
    generic_type_names: Vec<String>,  // This is the filter condition, which records the names of all generic parameters, such as' T ', 'U', etc
    associated_types: HashMap<String.Vec<syn::TypePath>>,  // All syntax tree nodes that meet the criteria are recorded here
}

impl<'ast> Visit<'ast> for TypePathVisitor {
    // visit_type_path is the callback we care about
    fn visit_type_path(&mut self, node: &'ast syn::TypePath) {
        
        if node.path.segments.len() >= 2 {
            let generic_type_name = node.path.segments[0].ident.to_string();
            if self.generic_type_names.contains(&generic_type_name) {
                // If the above two filters are met, then the result is saved
                self.associated_types.entry(generic_type_name).or_insert(Vec::new()).push(node.clone()); }}// The Visit mode requires that after the current node is visited, the default implementation of the Visit method is continued to iterate through all
        // This function must be called, otherwise the node will not be traversed further
        visit::visit_type_path(self, node); }}Copy the code

Then there is the function where we initialize the Visitor and perform the traversal call, eventually returning the filter result:

fn get_generic_associated_types(st: &syn::DeriveInput) -> HashMap<String.Vec<syn::TypePath>> {
    // First build the filter criteria
    let origin_generic_param_names: Vec<String> = st.generics.params.iter().filter_map(|f| {
        if let syn::GenericParam::Type(ty) = f {
            return Some(ty.ident.to_string())
        }
        return None
    }).collect();

    
    let mut visitor = TypePathVisitor {
        generic_type_names: origin_generic_param_names,  // Initialize the Visitor with filter conditions
        associated_types: HashMap::new(),
    };

    // Start Visit the children of the entire ST node from the st syntax tree node
    visitor.visit_derive_input(st);
    return visitor.associated_types;
}
Copy the code

For example, for association types and constructs like the following:


pub trait TraitA {
    type Value1;
    type Value2;
}

pub trait TraitB {
    type Value3;
    type Value4;
}

pub struct GeekKindergarten<T: TraitA, U: TraitB> {
    blog: T::Value1,
    ideawand: PhantomData<U::Value3>,
    com: Foo<Bar<Baz<T::Value2>>>,
}
Copy the code

The above function will return a structure like this. The reason why we use a dictionary is to facilitate subsequent retrieval, and the value of the dictionary is a list is that a Trait may have multiple association types:

{
    "T": [T::Value1, T::Value2],
    "U": [U::Value3],
}
Copy the code

After filtering out all the association types, we will update the generation code of the IMPL block. The difference is that the qualification of the association type can only be placed in the WHERE clause. The code looks like this:

fn generate_debug_trait(st: &syn::DeriveInput) -> syn::Result<proc_macro2::TokenStream> {

    // < unmodified code omitted here >..........

    // The following line is the seventh addition, calling the function to find the association type information
    let associated_types_map = get_generic_associated_types(st);
    for mut g in generics_param_to_modify.params.iter_mut() {
        if let syn::GenericParam::Type(t) = g {
            let type_param_name = t.ident.to_string();        
            
            ifphantomdata_type_param_names.contains(&type_param_name) && ! field_type_names.contains(&type_param_name){continue;
            }

            // The following three lines are new this time. If it is an association type, do not add constraints to the generic parameter 'T' itself, unless 'T' itself is used directly
            ifassociated_types_map.contains_key(&type_param_name) && ! field_type_names.contains(&type_param_name){continue} t.bounds.push(parse_quote! (std::fmt::Debug)); }}// The following six lines are added in the seventh where clause
    generics_param_to_modify.make_where_clause();
    for (_, associated_types) in associated_types_map {
        for associated_type inassociated_types { generics_param_to_modify.where_clause.as_mut().unwrap().predicates.push(parse_quote! (#associated_type:std::fmt::Debug)); }}// < unmodified code omitted here >..........
}
Copy the code

eight

Eight video version: www.bilibili.com/video/BV1vV…

The goal of this level is to achieve the “Escape hatch” mentioned earlier. Due to the defects of the macro expansion mechanism of the Rust process introduced earlier, we cannot correctly infer the Trait qualification of generics in some boundary cases. In this case, we need to provide a backdoor for human intervention. This level is divided into two parts, one is required to provide a global intervention, and one is optional, precise to control each field. Because this article is already very long, so we will only do the required questions, choose to do the questions left to you to achieve.

The first is to parse a global attribute tag, attribute tag we have resolved many times, this time directly give you the code:

fn get_struct_escape_hatch(st: &syn::DeriveInput) -> Option<String> {
    if let Some(inert_attr) = st.attrs.last() {
        if let Ok(syn::Meta::List(syn::MetaList { nested, .. })) = inert_attr.parse_meta() {
            if let Some(syn::NestedMeta::Meta(syn::Meta::NameValue(path_value))) = nested.last() {
                if path_value.path.is_ident("bound") {
                    if let syn::Lit::Str(ref lit) = path_value.lit {
                        return Some(lit.value());
                    }
                }
            }
        }
    }
    None
}
Copy the code

Then, we take the user-typed intervention instruction, which is essentially a little bit of Rust code that we parse into nodes in the syntax tree and insert into the node corresponding to the WHERE clause. Parsing user input can be done using the syn::parse_str() function. Ok, go straight to the code:

fn generate_debug_trait(st: &syn::DeriveInput) -> syn::Result<proc_macro2::TokenStream> {
   
    // < unmodified code omitted here >..........


    // Determine whether the restriction intervention is set, if so, do not infer, directly use the restriction given by the user in the WHERE clause
    if let Some(hatch) = get_struct_escape_hatch(st) {
        generics_param_to_modify.make_where_clause();
        generics_param_to_modify
                    .where_clause
                    .as_mut()
                    .unwrap()
                    .predicates
                    .push(syn::parse_str(hatch.as_str()).unwrap());
    } else {
        // Move all the code that was in here to the else branch, otherwise omit..........
    }

    let (impl_generics, type_generics, where_clause) = generics_param_to_modify.split_for_impl();

    // < unmodified code omitted here >..........


}
Copy the code

Finally, it’s important to acknowledge that the code above is definitely not rigorous or buggy. On the one hand, this is just to pass the test case, and does not fully consider the scenarios that the test case does not cover; On the other hand, you should be fully aware that Rust’s procedure macros are a complex “string concatenation” process. There is no type verification. We associate “types” by string matching, so you can confuse our code by creating conflicting names. This is Rust’s procedure macro, full of tricks.