Getting started with Tezos #3: a simple oracle

July 30th 2018: Tezos betanet has now launched, and this article doesn't reflect the current state of the project. You can still read through it, but the code examples won't work as is, and will need various changes. Up to date documentation for the betanet is available here. You can change the address to access docs for other branches like zeronet.

A word about different testnets: there is usually a zeronet running, which follows new developments closely and an alphanet which is more stable. The real network is currently called betanet (transactions persist to the main network). The alphanet is currently obsolete.

Introduction

In this article we will look at writing a simple oracle to help us interact with off-chain systems. If you haven't read them yet, start with the previous articles. No blockchain expertise is required, although general programming knowledge would be useful.

The idea

We would like to utilize some information in our contracts, but it's only available outside the blockchain. To make it easier to use such information, we can use an oracle. Oracles are essentially bridges between the blockchain and anything outside of it, the downside being that we have to trust their honesty or ensure that they are honest somehow (e.g. by paying them enough that it isn't profitable to bamboozle us).

For our example, we will build an oracle that publishes the BTC/USD exchange rate every hour. You can use any exchange with a suitable API, but I will use Bitstamp. The oracle will consist of a server and a contract. The server queries the API of the exchange, processes the received data if necessary, and calls the contract to update its data. Other contracts (users) call the oracle contract to get the latest exchange rate.

The contract

Let's start by preparing the contract. We can adapt the data publisher pattern, which can be seen e.g. in this example by Milo. The input is an option type, with requests for data containing an empty option and requests to update the data containing the new data, a key to prove authenticity and a counter value to prevent replay attacks.

An option type is an abstraction by which we let the compiler know that a value may or may not be present.

option nat

This can either be an empty value None or a natural number nat. When we encounter it in a contract, we can use the IF_NONE instruction to inspect it -- in case it contains a None, the first branch is executed, otherwise the second one is. To construct the option type, we can use NONE or SOME with some value that we want wrap.

We could also use a boolean to decide what to do, but using the option is cleaner.

The exchange rate is a positive decimal number, with 2 digit precision (at least in case of the Bitstamp API). Since there is no float type in Michelson, we will need to multiply the actual exchange rate by a hundred to get a whole number and then use the nat type for natural numbers. There's also an int type, but since we know the value will never be zero nor negative, we don't need it.

To prove that a request to update the data is coming from the oracle server, we will sign the data with a public key, and keep the associated private key secret. If you haven't seen this before, you can read up on public key cryptography on Wikipedia. For now, you only need to know that we can sign data with our private key, and then our public key can be used to check that we really did. If someone else tries to sign data with a different key, checking with our public key will fail. We can publish our public key as a part of a contract, while keeping our private key a secret.

We can use the alphanet node to handle the signing for us:

./alphanet.sh hash and sign data `(Pair 730115 1)` for my_identity

Note that we are actually signing the pair of exchange rate and the counter here, rather than the exchange rate alone. This means that a transaction can't be reused, without recalculating the hash and the signature. The hash depends only on the data, and will be the same even for different accounts (with different private keys). The main reason for using it is to get an input of uniform length for the signing procedure.

The input parameters of the contract will be the option, containing the data and their signature. When retrieving data, the contract will return the exchange rate (a natural number), otherwise it will return 1. The storage will contain the most recent exchange rate, the counter to prevent replay attacks and the public key to check new data against. The whole header will look like this:

parameter (option (pair signature (pair nat nat)));
#                                       ^    ^
#                                       |    |
#                                       data counter
storage (pair (pair key nat) nat);
#                   ^   ^     ^
#                   |   |      \
#                   key counter \
#                               data
return nat;

First, let's write the part that handles data retrieval:

code {DUP;
      CAR;
      IF_NONE{CDR;DUP;CDR;PAIR}     # data retrieval branch
             {/*more code*/}        # data update branch
     }

We simply access the storage, and take out the data, then return it. We use PAIR since we need to pair the return value with storage, which remains unchanged.

On to the tricky part. Let's check whether the counter in the input matches the counter in storage:

code {DUP;
      CAR;
      IF_NONE{CDR;DUP;CDR;PAIR}     # data retrieval branch
             {DUP;                  # data update branch
              DIP{CDDR;
                  DIP{CDR;          # access the counters
                      DUP;
                      CAR;
                      CDR};
                  CMPEQ;            # check if counters are equal
                  IF{}{FAIL};};     # fail if not

              # more code...
             }

If the counters match, we should check for authenticity of the data. The H instruction takes a piece of data and produces a hash for us:

:: 'a : 'S -> string : 'S

The CHECK_SIGNATURE takes a key and a pair of a signature and a hash, and tells us whether the key was used to create the signature:

:: key : pair signature string : 'S -> bool : 'S

Let's use them:

code {DUP;
      CAR;
      IF_NONE{CDR;DUP;CDR;PAIR}     # data retrieval branch
             {DUP;                  # data update branch
              DIP{CDDR;
                  DIP{CDR;          # access the counters
                      DUP;
                      CAR;
                      CDR};
                  CMPEQ;            # check if counters are equal
                  IF{}{FAIL};};     # fail if not
              DUP;
              DUP;
              DIP{CAR;
                  DIP{CDR;
                      H};           # make a hash of the new data
                  PAIR;
                  DIP{DUP;
                      CAAR};
                  SWAP;
                  CHECK_SIGNATURE;  # check the hash and the supplied signature against the key
                  IF{}{FAIL}};      # fail if not authentic

              # more code...
             }

By producing the hash from the supplied data ourselves, we ensure that the signature is valid for the supplied data.

Now we just need to increment the counter and replace the data in storage with the updated value. This is quite straightforward:

code {DUP;
      CAR;
      IF_NONE{CDR;DUP;CDR;PAIR}     # data retrieval branch
             {
              # ...continued

              CDAR;
              DIP{CAR;              # access the counter
                  DUP;
                  CAR;
                  DIP{CDR;          # increment the counter
                      PUSH nat 1;
                      ADD};
                  PAIR};
              SWAP;                 # replace the data in storage with updated value
              PAIR;
              PUSH nat 1;           # return a 1 to satisfy the return type
              PAIR};}

The whole contract is available in this snippet.

Using the contract

Before originating the contract, we need to get a pair of keys - a public key and a private key. Those will be used to sign the data we send to the contract. We can use an existing identity, or generate new ones:

# ./alphanet.sh gen keys (new)

./alphanet.sh client gen keys dataPublisher

New secret key alias 'dataPublisher' saved.
New public key alias 'dataPublisher' saved.
New public key hash alias 'dataPublisher' saved.

Since the keys will be used to publish data in the oracle contract, we called the alias dataPublisher. We can now check the keys:

./alphanet.sh client show identity dataPublisher
Hash: tz1h2gexGSLX2xQhd9cqgmdbzoRGBkrEGH4z
Public Key: edpkuGfBhAReC5h9G4FR4WhPE5XDyCZpxLreJ5hgMxM86mKaJWGcgG

The private key is not shown to avoid accidentally leaking it. To show it, we can use the show-secret flag:

./alphanet.sh client show identity dataPublisher -show-secret

We have to use the public key when originating the contract:

# ./alphanet.sh client originate contract (new) for (mgr) transferring (qty) from (src) running (prg) [-fee _] [-delegate _] [-force] [-delegatable] [-non-spendable] [-init _]

./alphanet.sh client originate contract oracle for dataPublisher transferring 100 from money running container:oracle.tz -init '(Pair (Pair "edpkuGfBhAReC5h9G4FR4WhPE5XDyCZpxLreJ5hgMxM86mKaJWGcgG" 1) 123456)'

Now that the contract is published, lets see how we are going to call it. We'll need to get the current value of the counter that prevents replay attacks from its storage, then use it to construct a pair with the updated price, sign the pair and finally initiate a transaction with the signature and the pair as an argument.

Here's how we get the counter:

# ./alphanet.sh client get storage for (src)

./alphanet.sh client get storage for oracle

(Pair
    (Pair "edpkuGfBhAReC5h9G4FR4WhPE5XDyCZpxLreJ5hgMxM86mKaJWGcgG" 22)
    755773)

The alphanet script lets us sign data easily:

./alphanet.sh client man hash and sign

Commands for managing the record of known programs:
  hash and sign data (data) for (name)
    ask the node to compute the hash of a data expression using the same
    algorithm as script instruction H, sign it using a given secret key, and
    display it using the format expected by script instruction
    CHECK_SIGNATURE
    (data)
      the data to hash
    (name)
      existing secret key alias

./alphanet.sh client hash and sign data '(Pair 756324 23)' for dataPublisher

Hash: "exprtZL5GgGNWd5U3fT2Xyzgewsv2j1sHnWA2ycw3hLQRenLS9s7AP"
Signature: "2489b9a4bff9e77a04f6c331a850d4d5397ee1e5021bfacc2ef039a249601740d001931fb8eaa45e6e386a0b859734170a0711fb11b791172eca41ff9fc0ea06"

With the signature, we can finally call the oracle contract and update the data. Remember, we are using the option type, because transactions asking for the data won't contain anything, while transactions trying to update the data will:

# ./alphanet.sh client transfer (qty) from (src) to (dst) [-fee _] [-arg _] [-force]

./alphanet.sh client transfer 0 from money to oracle -arg '(Some (Pair "2489b9a4bff9e77a04f6c331a850d4d5397ee1e5021bfacc2ef039a249601740d001931fb8eaa45e6e386a0b859734170a0711fb11b791172eca41ff9fc0ea06" (Pair 756324 23)))'

Check the result:

./alphanet.sh client get storage for oracle

As an exercise, you can make the oracle require a certain amount of tez in exchange for providing the data.

The oracle server

The only thing left to do is to write a server that queries the Bitstamp API and updates the data in our contract. You can do this in any language you are comfortable with, I'll use Go (also known as golang). The syntax is relatively simple and the code should be quite readable even for people who haven't used golang before.

What do we need to do in order to update the data?

First, we need to query the endpoint, and extract the piece of information that we want to use. The endpoint supplies the data in a JSON:

{"high": "6351.00", "last": "6331.00", "timestamp": "1509458347", "bid": "6327.03", "vwap": "6277.16", "volume": "1183.21242286", "low": "6193.07", "ask": "6330.98", "open": "6200.00"}

We'll define a type to make things easier:

//TickerHour holds the results of calls to ticker endpoint of Bitstamp API
type TickerHour struct {
    High      float32 `json:"high,string"`
    Last      float32 `json:"last,string"`
    Timestamp int     `json:"timestamp,string"`
    Bid       float32 `json:"bid,string"`
    VWap      float32 `json:"vwap,string"`
    Volume    float32 `json:"volume,string"`
    Low       float32 `json:"low,string"`
    Ask       float32 `json:"ask,string"`
    Open      float32 `json:"open,string"`
}

Now we can just call the endpoint and let the standard library unmarshaler do the parsing. Our struct will be filled out with fresh data:

const endpoint = "https://www.bitstamp.net/api/v2/ticker_hour/btcusd/"

func getData() (TickerHour, error) {
    var th TickerHour
    resp, err := http.Get(endpoint)
    if err != nil {
        return th, fmt.Errorf("querying http endpoint: %v", err)
    }
    b, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        return th, fmt.Errorf("reading response: %v", err)
    }
    err = json.Unmarshal(b, &th)
    if err != nil {
        return th, fmt.Errorf("unmarshaling json: %v", err)
    }
    return th, err
}

Note that we are using a global constant that holds the endpoint URL here, which is a bit dirty. In a more complex system, we might want to make the function more general to allow querying different endpoints.

Another thing to note, if you aren't familiar with golang, is the error propagation. Errors are treated as regular values, and are usually propagated upwards through the call stack until they are handled, with each function adding a piece of information. This lets us better understand the context of failure, compared to error messages generated by a single function that doesn't know why it's being called. This makes for a nicer debugging experience.

We can access the appropriate struct field to use the data:

high := int(th.High * 100)

The decimal part is multiplied out, so we can use it as a natural number in Michelson.

We'd like to push the new price to the oracle contract now. We can follow the same steps described in the section about using the contract, but we will automate them this time.

We split the logic into a few smaller functions, and when they return errors, we push them a channel ch, which we can specify when calling updateData. Channels are are a special type that makes communication between goroutines easier to think about. Goroutines are similar to threads in other languages, but are more lightweight. One goroutine can push some data into a channel and another can be waiting to retrieve them. If no one is waiting for the data, the pushing goroutine will block until the data is needed on the other side -- so channels can be used for synchronization. Channels can also have a buffer to avoid blocking the goroutine.

func updateData(ch chan<- error) {
    th, err := getData()
    if err != nil {
        ch <- fmt.Errorf("getting data: %v", err)
        return
    }
    high := int(th.High * 100)
    counter, err := getCounter(*contract)
    if err != nil {
        ch <- fmt.Errorf("getting counter: %v", err)
        return
    }
    signature, err := signData(high, counter, *identity)
    if err != nil {
        ch <- fmt.Errorf("signing data: %v", err)
        return
    }
    err = transferData(high, counter, signature, *source, *contract)
    if err != nil {
        ch <- fmt.Errorf("transferring data: %v", err)
        return
    }
}

If you are curious about concurrency in golang and how goroutines and channels work under the hood, you can start with this article about the golang scheduler, or this talk about channels.

The variables identity, source, and contract hold parsed command line arguments. Working with command line arguments is quite simple thanks to the flag package and can be seen in the complete code of the program.

You might have noticed that the function doesn't return anything. That's because we are going to be calling this function in a new goroutine, so it would have nowhere to return to. We will get to this later. Let's see the individual functions now.

We can execute commands from golang code with the os/exec package. One thing that should be noted is that the ALPHANET_EMACS flag needs to be set when calling alphanet.sh from a program, rather than manually. This lets the script know not to run in interactive mode, otherwise it won't work.

func getCounter(contract string) (int, error) {
    c := exec.Command("./alphanet.sh", "client", "get", "storage", "for", contract)
    c.Env = append(c.Env, "ALPHANET_EMACS=true")
    b, err := c.CombinedOutput()
    if err != nil {
        fmt.Println("error:", string(b))
        return 0, fmt.Errorf("running commands to get storage: %v", err)
    }
    words := strings.Fields(string(b))
    w := strings.TrimSuffix(words[3], ")")
    counter, err := strconv.Atoi(w)
    if err != nil {
        fmt.Println(w)
        return 0, fmt.Errorf("converting output to int: %v", err)
    }
    return counter, nil
}

We set up a command c and then execute it, capturing the output and error streams. If things are going well, we parse the output and extract the counter. Note that because we are relying on knowing the exact format of the reply, if the output changes in a future update to alphanet.sh, we would need to update the parsing logic here to make it work again. A more resilient approach would be to use a package that parses Michelson data for us.

Signing the data could be done with some golang packages as well, but there's no reason why we couldn't just use alphanet.sh. Since exec.Command passes arguments to the command directly, we don't need to quote the data as we would when calling it through the shell.

func signData(data, counter int, identity string) (string, error) {
    prepared := fmt.Sprintf("(Pair %d %d)", data, counter)
    c := exec.Command("./alphanet.sh", "client", "hash", "and", "sign", "data", prepared, "for", identity)
    c.Env = append(c.Env, "ALPHANET_EMACS=true")
    b, err := c.CombinedOutput()
    if err != nil {
        fmt.Println("error:", string(b))
        return "", fmt.Errorf("running commands to sign: %v", err)
    }
    lines := strings.Split(string(b), "\n")
    for _, l := range lines {
        l := strings.TrimSpace(l)
        if strings.HasPrefix(l, "Signature:") {
            words := strings.Split(l, " ")
            signature := strings.Trim(words[1], "\"")
            return signature, nil
        }
    }
    return "", fmt.Errorf("parsing output: signature not found")
}

After signing the data, we only need to send it:

func transferData(data, counter int, signature, source, contract string) error {
    arg := fmt.Sprintf("(Some (Pair \"%s\" (Pair %d %d)))", signature, data, counter)
    c := exec.Command("./alphanet.sh", "client", "transfer", "0", "from", source, "to", contract, "-arg", arg)
    c.Env = append(c.Env, "ALPHANET_EMACS=true")
    b, err := c.CombinedOutput()
    if err != nil {
        fmt.Println("error:", string(b))
        return fmt.Errorf("running commands to transfer: %v", err)
    }
    return nil
}

This is how we get new data and update the contract's storage. However, we'd like the server to update the contract every hour. We'll use a time.Ticker to let us know when an hour passes, and a dispatcher that reacts to this signal by calling the updateData function we described above. time.Ticker is a struct containing a channel C, where a timestamp gets pushed once per specified interval.

ticker := time.NewTicker(time.Hour)
go dispatchUpdates(shutdown, ticker.C, ch)

The dispatcher runs in its own goroutine and besides creating update goroutines, it also listens for a shutdown signal, to let us exit without leaving any loose ends.

func dispatchUpdates(shutdown <-chan bool, tc <-chan time.Time, ch chan<- error) {
    fmt.Println("dispatching first update")
    go updateData(ch)
    for {
        select {
        case _ = <-shutdown:
            return
        case _ = <-tc:
            fmt.Println("dispatching subsequent update")
            go updateData(ch)
        }
    }
}

When dispatching an update, we pass the channel ch to updateData so that it can report any errors.

We can limit what can be done with a channel -- chan<- is only for pushing data, while <-chan is only for listening. This helps us avoid mistakes when refactoring or fixing bugs. The select statement is analogous to switch, but listens for data on any of the channels instead. Assigning the data to an underscore lets the compiler know we aren't going to need it (in this case because we only need a binary signal).

The main goroutine listens for errors, and handles shutdown:

func main() {
    ch := make(chan error)
    shutdown := make(chan bool)
    ticker := time.NewTicker(time.Hour)
    go dispatchUpdates(shutdown, ticker.C, ch)
    for {
        select {
        case err := <-ch:
            ticker.Stop()
            shutdown <- true
            log.Fatal(err)
        }
    }
}

The whole program is available on my git.

That's it, we have an oracle now. If you liked the article, make sure to share it on your favorite social media website. If this got you interested in building things with Tezos, come by the matrix chat room.