Glyph Lefkowitz recently wrote an illuminating article detailing the challenges of writing highly concurrent software. If you haven’t read this article, I suggest you do. This is a well written article, full of wisdom that modern software engineers should not lack.

There’s a lot of grapevine to extract, but if I could be so bold as to offer a summary of its gist, it would go something like this: The combination of preemptive multitasking and shared state often leads to unmanageable complexity, and developers prefer to keep some of their sanity and avoid it as much as possible. Preemptive scheduling is good for truly parallel tasks, but explicit cooperative multitasking is preferable when mutable state is shared between multiple concurrent threads.

With collaborative multitasking, your code may still be complex, but you have the opportunity to maintain manageable complexity. When the transfer of control is explicit, the reader of the code has at least some obvious indication that things might go off track. Without explicit notation, each new statement is a potential landmine: “What if this operation is not atomic to the last operation? “The space between each command becomes an endless dark space from which horrible Heisenbugs emerge.

Over the past year, I’ve been programming primarily with Go in my work on Heka, a high-performance data, logging, and metrics processing engine. One of Go’s selling points is that there are some very useful concurrency primitives in the language. But what about Go’s concurrent approach from the point of view of incentive code that supports local reasoning?

Not very well, I’m afraid. Goroutines all have access to the same shared memory space, and the state is mutable by default, and Go’s scheduler cannot guarantee the exact timing of the context switch. In a single core setup, I think the Go runtime falls into the “implicit cooperative routing” category, option 4 in the list of asynchronous programming patterns that Glyph often presents. When Goroutines can run in parallel on multiple cores, all bets are off.

Go may not protect you, but that doesn’t mean you can’t take steps to protect yourself. By using some of the primitives provided by Go, code can be written to minimize the unexpected behavior associated with preemptive scheduling. Consider the following Go ports for Glyph’s sample account transfer code (ignoring that float is not really a good choice for storing fixed-point decimal values) :

 func Transfer(amount float64, payer, payee *Account,
        server SomeServerType) error {

        if payer.Balance() < amount {
            return errors.New("Insufficient funds")
        }
        log.Printf("%s has sufficient funds", payer)
        payee.Deposit(amount)
        log.Printf("%s received payment", payee)
        payer.Withdraw(amount)
        log.Printf("%s made payment", payer)
        server.UpdateBalances(payer, payee) // Assume this is magic and always works.
        return nil
    }
Copy the code

This is obviously not safe from multiple gorouting calls, as they may get the same result from the balance call at the same time and then collectively request more than the balance available for the recall call. It would be better if we did so that the dangerous part of the code could not be executed from multiple gorouting. There is one way to do this:

 type transfer struct {
        payer *Account
        payee *Account
        amount float64
    }

    var xferChan = make(chan *transfer)
    var errChan = make(chan error)
    func init(a) {
        go transferLoop()
    }

    func transferLoop(a) {
        for xfer := range xferChan {
            if xfer.payer.Balance < xfer.amount {
                errChan <- errors.New("Insufficient funds")
                continue
            }
            log.Printf("%s has sufficient funds", xfer.payer)
            xfer.payee.Deposit(xfer.amount)
            log.Printf("%s received payment", xfer.payee)
            xfer.payer.Withdraw(xfer.amount)
            log.Printf("%s made payment", xfer.payer)
            errChan <- nil}}func Transfer(amount float64, payer, payee *Account,
        server SomeServerType) error {

        xfer := &transfer{
            payer: payer,
            payee: payee,
            amount: amount,
        }

        xferChan <- xfer
        err := <-errChan
        if err == nil  {
            server.UpdateBalances(payer, payee) // Still magic.
        }
        return err
    }
Copy the code

There is more code here, but we eliminate the concurrency problem by implementing a simple event loop. When the code is first executed, it launches a Goroutine that runs the loop. Transport requests are passed into the loop through channels created for this purpose. The result is returned to the outside of the loop through an error channel. Because the channels are unbuffered, they block and will be serially serviced by a single running event loop no matter how many concurrent Transfer requests come in through the Transfer function.

The above code may be a little inconvenient. For such a simple case, a mutex is a better choice, but I tried to demonstrate the technique of isolating state manipulation into a Goroutine. Even when inconvenient, its performance far exceeds most requirements, and it works in even the simplest Account Struct implementations:

 type Account struct {
        balance float64
    }

    func (a *Account) Balance(a) float64 {
        return a.balance
    }

    func (a *Account) Deposit(amount float64) {
        log.Printf("depositing: %f", amount)
        a.balance += amount
    }

    func (a *Account) Withdraw(amount float64) {
        log.Printf("withdrawing: %f", amount)
        a.balance -= amount
    }
Copy the code

However, such naive implementation of Account seems foolish. It might make more sense to have the Account Struct itself provide some protection, since it doesn’t allow any withdrawals beyond the current balance. What if we change the Withdraw function to the following? :

func (a *Account) Withdraw(amount float64) {
        if amount > a.balance {
            log.Println("Insufficient funds")
            return
        }
        log.Printf("withdrawing: %f", amount)
        a.balance -= amount
    }
Copy the code

Unfortunately, this code has the same problem as our original Transfer implementation. Parallel execution or impractical context switches mean we can end up in negative balance. Fortunately, the idea of an internal event loop is equally applicable here, and perhaps more flexible, because the event loop Goroutine can be nicely coupled to each individual Account Struct instance. Here is a possible example:

type Account struct {
        balance float64
        deltaChan chan float64
        balanceChan chan float64
        errChan chan error
    }
    func NewAccount(balance float64) (a *Account) {
        a = &Account{
            balance:     balance,
            deltaChan:   make(chan float64),
            balanceChan: make(chan float64),
            errChan:     make(chan error),
        }
        go a.run()
        return
    }

    func (a *Account) Balance(a) float64 {
        return <-a.balanceChan
    }

    func (a *Account) Deposit(amount float64) error {
        a.deltaChan <- amount
        return <-a.errChan
    }

    func (a *Account) Withdraw(amount float64) error {
        a.deltaChan <- -amount
        return <-a.errChan
    }

    func (a *Account) applyDelta(amount float64) error {
        newBalance := a.balance + amount
        if newBalance < 0 {
            return errors.New("Insufficient funds")
        }
        a.balance = newBalance
        return nil
    }

    func (a *Account) run(a) {
        var delta float64
        for {
            select {
            case delta = <-a.deltaChan:
                a.errChan <- a.applyDelta(delta)
            case a.balanceChan <- a.balance:
                // Do nothing, we've accomplished our goal w/ the channel put.}}}Copy the code

The API is slightly different; both the Deposit and Withdraw functions now return an error. Instead of processing their request directly, however, they put the account balance adjustment on the deltaChan, which feeds back into the event loop that runs in the RUN method. Similarly, the Balance method requests data from the event loop by blocking until it receives a value via balanceChan.

The important thing to note in the above code is that all direct access and variation to the data values inside the structure is done in the code triggered by the event loop. If the public API calls work well and only interact with the data using the channels provided, then no matter how many concurrent calls are made to any public method, we know that only one of them will be processed at any given time. Our event loop code is easier to reason about.

This pattern is at the heart of Heka’s design. When Heka starts, it reads the configuration file and starts each plug-in in its own Goroutine. Data enters the plug-in through channels, as do time signals, shutdown notifications, and other control signals. This encourages plug-in authors to use event looping type constructs to implement their functionality, as in the example above.

Again, Go can’t protect yourself. It is entirely possible to write a Heka plug-in (or any structure) that has loose internal data management and is subject to race conditions. But with a little care and the freedom to apply Go’s competition detector, you can write code that is predictable even with preemptive scheduling.

Rob Miller

The original link: blog.mozilla.org/services/20…

Go on theme