Getting started with Tezos #4: more complex oracle

July 30th 2018: Tezos betanet has now launched, and this article doesn't reflect the current state of the project. You can still read through it, but the code examples won't work as is, and will need various changes. Up to date documentation for the betanet is available here. You can change the address to access docs for other branches like zeronet.

A word about different testnets: there is usually a zeronet running, which follows new developments closely and an alphanet which is more stable. The real network is currently called betanet (transactions persist to the main network). The alphanet is currently obsolete.

Towards a more useful oracle

In the last article, we have looked at writing a simple oracle that provides some information about the outside world to contracts living on the blockchain. One of the issues with simple oracles that provide a single piece of information is that it's really easy to set up a "hostile cache" that will extract all the value the oracle brings its users, at minimal cost. Such a cache would ask our oracle if it doesn't have an answer, but in cases where it already has a recent answer cached, it can just answer immediately at a discount. This makes it hard to make a profit for the oracle operator and makes dishonest answers much more attractive.

What if we had an oracle that provides personalized information instead? If each user cares a lot about the answer to their query, but has no interest in any other answer, it's much harder to effectively cache such requests.

Today, we are going to build an oracle that provides information about weather at a certain location. Instead of returning the requested information from the contract, we will include it in transactions sent to callers.

The system

We will have a contract that receives queries and an external server that checks its storage and calls a weather API. When it gets back the data, it sends two transactions -- one answering the query (to a specified address) and another one that lets our contract know that the query has been handled, and so the parameters can be deleted.

For the weather API, I have decided to use Dark Sky. Their API takes location coordinates and a timestamp, and provides a JSON response. For simplicity, we will only use the probability of precipitation. Improving the oracle to provide more information later on is fairly straightfoward.

The server will be built using golang and will be similar to the one from the last article. To periodically poll contract storage, we will use an RPC endpoint.

./alphanet.sh client rpc call /blocks/head/proto/context/contracts/TZ1X8F5SDGYUqdfbsxM7hk96S2fJx2MTbY4P with '{}'

The output is JSON, but will require some processing.

To sign data and send transactions, we will call alphanet.sh from golang code.

Callers will place a query by sending a transaction to the contract, containing an address to send the answer to, coordinates and time of interest.

The contract

We will make the contract as simple as possible, and keep the more complex logic out of the blockchain. The contract will simply act as a repository of recent queries that will be handled by the server and deleted from the contract storage shortly.

We need to store: coordinates (two numbers), a timestamp (one timestamp) and a callback address (one string):

pair string (pair (pair int int) timestamp)

We will store the queries in a map (a hash table).

As a key, we will use a hash of the data (a string):

map string (pair string (pair (pair int int) timestamp))

Finally, to make sure only bona fide transactions can change the data, we will have to sign them with our private key, and store the public key in the contract. Otherwise, the contract would have no idea what the public key is, and couldn't verify transactions.

storage (pair key (map string (pair string (pair (pair int int) timestamp))))

The contract will be used in two modes: to register a query, and to delete one. To register a query, we'll need its parameters as above:

pair string (pair (pair int int) timestamp)

The hash will be calculated by the contract and doesn't need to be included in the supplied data. On the other hand, to delete a query we'll need only the hash and a signature proving the transaction is legitimate.

pair signature string

Altogether, this will be the parameter type:

parameter (or (pair signature string) (pair string (pair (pair int int) timestamp)))

The union type or left right lets the compiler know that either the left side, or the right side is present. It can be inspected with IF_LEFT or IF_RIGHT instructions, and constructed with LEFT or RIGHT instruction and an argument.

In case of query deletion we won't need any return value, but in case of query registration the generated hash is returned for callers' convenience:

return (option string)

Let's see the code itself:

code {DUP;
      CAR;
      IF_LEFT{DIP{CDR;              # delete a key (remove handled query)
                  DUP;
                  CAR;
                  DIP{DUP;
                      CDR;
                      DIP{CAR}}};
              DUP;
              CDR;
              DIP{DUP;              # check authenticity
                  CAR;
                  DIP{CDR;H};
                  PAIR;
                  SWAP;
                  CHECK_SIGNATURE;
                  IF{}{FAIL};
                  NONE (pair string (pair (pair int int) timestamp))};
              UPDATE;               # update the map with nil entry
              SWAP;
              PAIR;
              NONE string;          # return an empty option
              PAIR}
             {/*more code*/}        # add a key (register a query)

We take the supplied query identifier (a hash as it happens), hash it and check that the signature was produced from the same hash. This proves that the data used to generate the signature is identical to the supplied data. Then we update the map entry specified by the query identifier -- the new value associated with it will be an empty option. Michelson maps are composed of options, and those have to be inspected when a value is retrieved with GET. A key with a value of None is effectively a key with no associated value, so removing values is done by updating them to a None.

code {DUP;
      CAR;
      IF_LEFT{/*more code*/}              # delete a key (remove handled query)
             {AMOUNT;                     # enforce a fee for using the oracle
              PUSH tez "1.00";
              CMPLE;
              IF{}{FAIL};
              DUP;                        # add a key (register a query)
              DIP{H;                      # hash query parameters
                  DUP;
                  DIP{DIP{CDR;
                          DUP;
                          CDR;
                          DIP{CAR};
                          DUP};
                      GET;                # check for duplicates
                      IF_NONE{}{FAIL}}};
              SWAP;
              DIP{SOME};
              DUP;
              DIP{UPDATE;                 # create a new map entry
                  SWAP;
                  PAIR};
              SOME;                       # return the hash
              PAIR}}

We will be the only ones removing entries, so that branch can be free to use. The other one will enforce a fee. We will check for identical query parameters and stop the execution when a duplicate is detected. Since the query is identical, we don't really need to handle it, but this way the caller doesn't waste money on gas needlessly. There is also a simpler instruction that lets us check for existing keys: MEM. It takes the same arguments as GET, but returns a bool.

So this part:

GET;                # check for duplicates
IF_NONE{}{FAIL};

Can be simplified to:

MEM;                # check for duplicates
IF{FAIL}{};

This sums up the contract code, you can see the complete version in a snippet here.

Contract set-up and usage

The contract has to be initialized with a public key. Use the same identity that will be used by the server when signing transactions.

./alphanet.sh client originate contract oraclePrecip for my_identity transferring 100 from money running container:oracle_precip.tz -init '(Pair "edpkuErT1u8QhyMp8H3zHYJoGHbbfioBLJUUYZ8dYoYE9CQstvUn7h" Map)'

Callers can register a query like so:

./alphanet.sh client transfer 1 from callbackHandler to oraclePrecip -arg '(Right (Pair "TZ1mnkv3zm1PBx8ymPRM89H3iYy3rYiEFPnW" (Pair (Pair 1234 5678) "2017-11-12T09:35:02Z")))'

Which means that the query asks for information regarding a location with coordinates 12.34, 56.78 at "2017-11-12T09:35:02Z" and the answer should be sent to "TZ1mnkv3zm1PBx8ymPRM89H3iYy3rYiEFPnW".

After getting the data, the server will send the information to the provided address:

./alphanet.sh client transfer 0 from oraclePrecip to TZ1mnkv3zm1PBx8ymPRM89H3iYy3rYiEFPnW -arg '(Pair "37a369db7c78fc1461bcf4b3cf0ab3616de69e32a7947bf50ffcaf19ef96a92fe8f10d198cc4b85a013427ee28400f4eab7778f4e58e00e9fc66e22cf1a38f0b" (Pair "expru6aCmyJ2KLXW2pfTzogUY8hXhqw9WicThaKXHaNXqwqFqsSteW" 34))'

Then it will delete the query from the oracle contract:

./alphanet.sh client transfer 0 from money to oraclePrecip -arg '(Left (Pair "6e886448685edda8f01abac52d0555df287fd75d33166f43059fb8273d96f8f8a43e4afb85c3e9c9fa226a2002287466f8283f48414b428f7571173214abb70d" "expru6aCmyJ2KLXW2pfTzogUY8hXhqw9WicThaKXHaNXqwqFqsSteW"))'

The callbackHandler is my alias for a contract that takes the query result and a signature proving that the transaction comes from someone who has the oracle private key. It doesn't return anything. I've written a simple one for testing the oracle, it just stores results in a list (most recent first). Note that a query can be registered by anyone, it doesn't have to be the same contract that is specified by the callback address.

The code of callbackHandler is here, or as a snippet:

# (contract (pair (pair signature (pair string int)) unit)
parameter (pair signature (pair string int));
return unit;
storage (pair key (list (pair string int)));
code {DUP;
      DUP;
      DIP{CDR};
      DUP;
      CAAR;
      DIP{CADR;H};
      PAIR;
      DIP{CAR};
      SWAP;
      CHECK_SIGNATURE;
      IF{}{FAIL};
      DUP;
      CADR;
      DIP{CDR;DUP;CAR;DIP{CDR}};
      SWAP;
      DIP{CONS};
      PAIR;
      UNIT;
      PAIR;}

The server

We'll build a golang server similar to the one we used last time. We'll get the contract storage and process the data to extract the query parameters we need. The data processing is abstracted into an internal package storage here. The package only exposes one function ExtractJobs. This helps us separate different parts of the program and makes future improvements easier.

func getJobs(contract string) (map[string][]types.Job, error) {
    c := exec.Command("./alphanet.sh", "client", "rpc", "call", "/blocks/prevalidation/proto/context/contracts/"+contract, "with", "{}")
    c.Env = append(c.Env, "ALPHANET_EMACS=true")
    b, err := c.CombinedOutput()
    if err != nil {
        fmt.Println("error:", string(b))
        return nil, fmt.Errorf("running commands to get storage: %v", err)
    }
    if *debug {
        fmt.Println(string(b))
    }
    jobs, err := storage.ExtractJobs(*debug, string(b))
    if err != nil {
        return nil, err
    }
    return jobs, nil
}

To get information about the latest state of each contract (as seen by our node), we can use /blocks/head/proto/context/contracts/ and the contract address. If we also need to see new changes that haven't been propagated through the network yet, such as new transactions originated by our node, we can use prevalidation instead of head.

We define the Job type as follows:

package types

import "github.com/shawntoffel/darksky"

type Job struct {
    Hash    string
    Address string
    darksky.ForecastRequest
}

Because both the main package and storage need to use it, we have to define it in a separate package. The ForecastRequest consists of coordinates and a timestamp. When composing a struct in golang, we can use up to one struct without specifying its name. This is different from including named structs, because we can access its fields directly as if they were fields of the parent struct. So Job will have fields Hash and Address as well as the fields of ForecastRequest.

To make interaction with the Dark Sky API more convenient, we are using package darksky by Shawn Toffel.

We get the query parameters as a map of slices, with the key being the callback address. We can then handle each job slice in parallel, and the only scenario where we need to handle jobs sequentially is when there are multiple queries with the same callback address.

The following paragraph has been updated to more accurately describe replay attack issues.

You might remember we used a counter to prevent replay attacks last time. Tezos contracts actually have a counter built in, that helps against some of the possible attacks. However, it doesn't prevent all of them, so we should use an explicit counter anyway. We won't use a counter in our contract here, but the users of the oracle contract need to be aware that their contracts should take precautions against duplicated transactions. For more detailed explanations of various security issues, refer to the Michelson anti-patterns list by Milo Davis.

In general, the counters complicate sending transactions to the same address in parallel -- only one of the parallel transactions would be accepted, since they all have the same counter value. Even if we try to set the counter ourselves, the order is not guaranteed for goroutines running in parallel, so a goroutine with higher counter could run before one with lower counter, and it would be refused.

If we handle transactions to the same address sequentially, we avoid most of the issue, and keep the benefits for scenarios where we need to send to many different addresses.

func dispatchAnswers(client darksky.DarkSky, shutdown <-chan bool, tc <-chan time.Time, ch chan<- error) {
    for {
        select {
        case _ = <-shutdown:
            return
        case _ = <-tc:
            fmt.Println("dispatching a batch")
            jobLists, err := getJobs(*contract)
            fmt.Println(jobLists)
            if err != nil {
                ch <- fmt.Errorf("getting jobs: %v", err)
            }
            var g errgroup.Group
            for _, jobs := range jobLists {
                bound := jobs
                g.Go(func() error {
                    return answer(client, bound)
                })
            }
            if err = g.Wait(); err != nil {
                ch <- fmt.Errorf("handling jobs: %v", err)
            }
        }
    }
}

Instead of using the go keyword, we use a package called errgroup to dispatch goroutines. It lets us wait and collect errors from all the goroutines easily. Since goroutines can access variables of the scope they are created in, we might be tempted to just use the variable jobs here, but that would actually result in undesired consequences. Most often, the goroutines would start and only afterwards call the answer with the value of jobs being different than what it was when we created the goroutine (usually the last value in the range). By binding the value to a new variable that's local to the loop iteration, we make sure that the value doesn't change before it's used.

The answer function takes care of getting data from Dark Sky, signing data and transferring it to the callback address. Then it deletes the query parameters from the contract storage.

func answer(client darksky.DarkSky, jobs []types.Job) error {
    for _, j := range jobs {
        prob, err := getProbability(client, j.ForecastRequest)
        if err != nil {
            return fmt.Errorf("getting data: %v", err)
        }
        signature, err := signData(prob, j.Hash, *identity)
        if err != nil {
            return fmt.Errorf("signing data: %v", err)
        }
        err = transferData(prob, j.Hash, signature, *source, j.Address)
        if err != nil {
            return fmt.Errorf("transferring data: %v", err)
        }
        signature, err = signDeletion(j.Hash, *identity)
        if err != nil {
            return fmt.Errorf("signing deletion: %v", err)
        }
        err = deleteQuery(j.Hash, signature, *source, *contract)
        if err != nil {
            return fmt.Errorf("deleting query: %v", err)
        }
    }
    return nil
}

getProbability simply gets the requested data from the API, extracts the information concerning the moment of interest and multiplies the rate by a 100, so that we can use it as an integer in Michelson.

func getProbability(client darksky.DarkSky, request darksky.ForecastRequest) (int, error) {
    resp, err := client.Forecast(request)
    if err != nil {
        return 0, err
    }
    hourly := resp.Hourly.Data
    var d darksky.DataPoint
    for _, h := range hourly {
        if request.Time < h.Time {
            break
        }
        d = h
    }
    prob := int(d.PrecipProbability * 100)
    return prob, nil
}

The main function is almost identical to the one we used last time, the only new thing being the Dark Sky API client. We can adjust the interval between scanning the contract storage by changing the ticker constructor parameter.

const apiKey = "0123456789abcdef9876543210fedcba"

func main() {
    client := darksky.New(apiKey)
    ch := make(chan error)
    shutdown := make(chan bool)
    ticker := time.NewTicker(time.Second * 30)
    go dispatchAnswers(client, shutdown, ticker.C, ch)
    for {
        select {
        case err := <-ch:
            ticker.Stop()
            shutdown <- true
            log.Fatal(err)
        }
    }
}

The whole program is available on my git.

Applications

We have built an oracle that provides weather information. In the next article, we are going to use it to build a proof-of-concept automated insurance provider.

If you have any questions about building things with Tezos, come by the matrix chat room.