6.6

1 Introduction

The data/collection library attempts to provide a suitable generic interface for interacting with all of Racket’s collections through a uniform interface. That said, in doing so, its approach is something of a departure from how some of the functions from racket/base operate.

Virtually all of the functions provided by data/collection are collection-agnostic, so they will operate on any type of collection with consistent behavior. This is in stark contrast to racket/base’s library, including functions with distinct behavior for different kinds of collections (e.g. list-ref, vector-ref, string-ref, etc.).

As an example, ref can operate on all kinds of sequences with reasonable behavior for all of them:

> (ref '(1 2 3) 1)

2

> (ref #(1 2 3) 1)

2

> (ref (in-naturals) 5)

5

> (ref (hash 'a 'b 'c 'd) 'a)

'b

However, it also means that some of the functions provided by data/collection have different behavior from their racket/base equivalents. When calling a function on a collection in racket/base, there is a guarantee on the type of collection recieved as a result. With generic collections, there is often no such guarantee.

For example, consider reversing a list. This is a simple operation, and it performs as one might expect.

> (reverse '(1 2 3))

'(3 2 1)

But what about reversing a vector? The same strategy would require allocating a new vector, which would be unnecessarily slow. Instead, a different kind of sequence is returned.

> (reverse #(1 2 3))

#<random-access-sequence>

The only guarantee is that the result must be a sequence, but otherwise, it can be almost anything. Fortunately, in the majority of cases, this is irrelevant, which is the point of generic collections: you don’t need to worry about what kind of collection you are dealing with, since the behavior is still what one would expect.

> (first (reverse #(1 2 3)))

3

This also permits one of the other changes from racket/base—a few of the collections operations are lazy, in that they return lazy sequences. In many cases, these lazy sequences are Racket streams, but not always. For example, map is lazy.

> (map add1 '(10 20 30))

#<stream>

Sometimes, of course, it is useful to convert a collection into a particular representation. Usually, this can be done using extend, which takes a particular sequence and returns a new sequence with the values from a different sequence added. For example, we can put the results from the example above into a vector:

> (extend #() (map add1 '(10 20 30)))

'#(11 21 31)

The implementation of extend uses the primitive collection operator, conj. It is much like cons for lists in that it adds a single value to a collection. However, it also makes no guarantees about what order the new elements are placed in. For example, conj prepends to lists but appends to vectors.

> (conj '(1 2 3) 4)

'(4 1 2 3)

> (conj #(1 2 3) 4)

'#(1 2 3 4)

This permits efficient implementation of conj on a per-collection basis. It does mean that using extend on lists will reverse the input sequence, which is probably not desired in the majority of cases. For that purpose, sequence->list is provided, which is equivalent to reverse combined with extend.

> (extend '() (map add1 '(10 20 30)))

'(31 21 11)

> (sequence->list (map add1 '(10 20 30)))

'(11 21 31)

A few other functions are lazy, such as filter and append, though functions that do not return sequences cannot be lazy, such as foldl, so they are still strict.

The existence of a generic interface also allows the various for loop sequence operations, such as in-list, in-vector, in-stream, etc., can be collected into a single operator, simply called in. When used as an ordinary function, it simply returns a lazy sequence equivalent to its input. However, when used in a for clause, it expands into a more efficient form which iterates over the sequence directly.

> (in #(1 2 3))

#<stream>

> (for ([e (in #(1 2 3))])
    (displayln e))

1

2

3

Additionally, a for/sequence form is provided, which operates similarly to for/list, but it returns a lazy sequence, so it can even operate on infinite sequences.

> (for/sequence ([e (in-naturals)])
    (* e e))

#<stream>

For a full list of all functions which support generic sequences, see the API Documentation.