This is the 17th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Yuan Xiaobai heard Jia Dazhi mentioned, read the article can be read along the main line of the article. Code, too, has its own vein, and one way to do that is to understand the source code in terms of where the data flows. So far, from buildCTL build, the data that appears in order is dockerFile, llb.State, and now Definition. How dockerfile is read and converted to llb.State, we can come back to that. How llB. State works has been reviewed. Now what does this Definition mean?

With such a problem, Yuan Xiaobai curious opened the source, read up.

func (s State) Marshal(ctx context.Context, co ... ConstraintsOpt) (*Definition, error) {
   def := &Definition{
      Metadata: make(map[digest.Digest]pb.OpMetadata, 0),}... }Copy the code

The function starts with an empty Definition, where it looks like the data will be stored.

// Definition is the LLB definition structure with per-vertex metadata entries
// Corresponds to the Definition structure defined in solver/pb.Definition.
type Definition struct {
   Def      [][]byte
   Metadata map[digest.Digest]pb.OpMetadata
   Source   *pb.Source
}
Copy the code

The structure of Definition doesn’t look complicated either. There are only three items, and the last two are custom types that start with pb. What’s that PB? This is solver pb, which is the Protocol Buffers of Google, according to ops.proto. It is a language designed for data transmission scenarios with efficient compression algorithms. So what we can expect here is that our Definition might have to be transported more or less across the network to be processed remotely.

Look at the next sentence:

func (s State) Marshal(ctx context.Context, co ... ConstraintsOpt) (*Definition, error) {
   def := &Definition{
      Metadata: make(map[digest.Digest]pb.OpMetadata, 0),}... def, err := marshal(ctx, s.Output().Vertex(ctx, c), def, smc,map[digest.Digest]struct{} {},map[Vertex]struct{}{}, c)
   ...
}
Copy the code

Marshal (s.utput ().vertex (…)) Based on our current understanding, this is ExecState, part 2, which we saw last time. Def also passed in, it seems to guess right, is really used to store data.

And little Marshal:

func marshal(ctx context.Context, v Vertex, def *Definition, s *sourceMapCollector, cache map[digest.Digest]struct{}, vertexCache map[Vertex]struct{}, c *Constraints) (*Definition, error) {
   if _, ok := vertexCache[v]; ok {
      return def, nil
   }
   for _, inp := range v.Inputs() {
      var err error
      def, err = marshal(ctx, inp.Vertex(ctx, c), def, s, cache, vertexCache, c)
      iferr ! =nil {
         return def, err
      }
   }

   dgst, dt, opMeta, sls, err := v.Marshal(ctx, c)
   iferr ! =nil {
      return def, err
   }
   vertexCache[v] = struct{} {}ifopMeta ! =nil {
      def.Metadata[dgst] = mergeMetadata(def.Metadata[dgst], *opMeta)
   }
   if _, ok := cache[dgst]; ok {
      return def, nil
   }
   s.Add(dgst, sls)
   def.Def = append(def.Def, dt)
   cache[dgst] = struct{} {}return def, nil
}
Copy the code

You can see that there is a recursive call to little Marshal, and if V.inputs () is not empty, the recursion will be triggered:Jia Dazhi also said that when we look at recursion, we mainly look at two points. One is to abstract out what the recursive function needs to do, which is to repeat the action that is called, and the other is to look at the exit, which is to exit the recursion.

Here’s a good way to try it out:

  • The recursive function, called the little Marshal, is used to sort Vertex into definitions.
  • The exit is v.I. Nputs (), which always finds a node without Inputs to start processing.In this case, we have SourceOp, which is the first Vertex and has no input.

It goes all the way up, finds the last node that doesn’t depend on it, and starts processing it, which seems to be the legendary depth-first algorithm.

So how does that work? Get the next sentence:

dgst, dt, opMeta, sls, err := v.Marshal(ctx, c)
Copy the code

V here is our Vertex, we now have SourcOp, ExecOp, let’s see how they implement Marshal.

SourceOp Marshal()

func (s *SourceOp) Marshal(ctx context.Context, constraints *Constraints) (digest.Digest, []byte, *pb.OpMetadata, []*SourceLocation, error) {
   if s.Cached(constraints) {
      return s.Load()
   }
   iferr := s.Validate(ctx, constraints); err ! =nil {
      return "".nil.nil.nil, err
   }

   if strings.HasPrefix(s.id, "local://") {
      if_, hasSession := s.attrs[pb.AttrLocalSessionID]; ! hasSession { uid := s.constraints.LocalUniqueIDif uid == "" {
            uid = constraints.LocalUniqueID
         }
         s.attrs[pb.AttrLocalUniqueID] = uid
         addCap(&s.constraints, pb.CapSourceLocalUnique)
      }
   }
   proto, md := MarshalConstraints(constraints, &s.constraints)

   proto.Op = &pb.Op_Source{
      Source: &pb.SourceOp{Identifier: s.id, Attrs: s.attrs},
   }

   if! platformSpecificSource(s.id) { proto.Platform =nil
   }

   dt, err := proto.Marshal()
   iferr ! =nil {
      return "".nil.nil.nil, err
   }

   s.Store(dt, md, s.constraints.SourceLocations, constraints)
   return s.Load()
}
Copy the code

You can see that proto, MD := MarshalConstraints(constraints, &s.aints) generates proto:

return &pb.Op{
   Platform: &pb.Platform{
      OS:           c.Platform.OS,
      Architecture: c.Platform.Architecture,
      Variant:      c.Platform.Variant,
      OSVersion:    c.Platform.OSVersion,
      OSFeatures:   c.Platform.OSFeatures,
   },
   Constraints: &pb.WorkerConstraints{
      Filter: c.WorkerConstraints,
   },
}, &c.Metadata
Copy the code

This is where the switch from SourceOp becomes pb.op:

proto.Op = &pb.Op_Source{
   Source: &pb.SourceOp{Identifier: s.id, Attrs: s.attrs},
}
Copy the code

ExecOp Marshal()

The main process is similar to SourceOp, but the actual process is a bit more complicated, and we focus on converting data first:

func (e *ExecOp) Marshal(ctx context.Context, c *Constraints) (digest.Digest, []byte, *pb.OpMetadata, []*SourceLocation, error){... pop, md := MarshalConstraints(c, &e.constraints) pop.Op = &pb.Op_Exec{ Exec: peo, } ...Copy the code

As a whole, the data transformation process does become a little clearer. Llb. State is responsible for organizing all vertexes with depth-first recursion. However, each Op, SourceOp and ExecOp, also provides Marshal methods, where the actual data conversion is done, and Proto Buffer support, so we don’t have to worry about serialization. This completes the transformation from llb.State to Definition, where all the SourceOp, ExecOp temp Op, is converted to the standard Pb.op.

Feeling good, yuan xiaobai happily ready to go to work 🙂

Moby Buildkit # 18-LLB. State