Chapter 11. Storage Management

This chapter describes aspects of the storage management system and procedures that may be used to control its operation.


Section 11.1. Garbage Collection

Scheme objects such as pairs, strings, and procedures are never explicitly deallocated by a Scheme program. Instead, the storage management system automatically reclaims the storage associated with an object once it proves the object is no longer accessible. In order to reclaim this storage, Chez Scheme employs a garbage collector which runs periodically as a program runs. Starting from a set of known roots, e.g., the machine registers, the garbage collector locates all accessible objects, copies them (in most cases) in order to eliminate fragmentation between accessible objects, and reclaims storage occupied by inaccessible objects.

Chez Scheme's collector is a generation-based collector. It segregates objects based on their age (roughly speaking, the number of collections survived) and collects older objects less frequently than younger objects. Since younger objects tend to become inaccessible more quickly than older objects, the result is that most collections take little time.

Chez Scheme allows the collector to be invoked explicitly, allows the frequency of collections to be controlled, and allows the actions taken when a collection is requested by the system to be customized.

The remainder of this introduction outlines how the system decides when to collect each generation. More information on Chez Scheme's collector can be found in the report "Don't stop the BiBOP: Flexible and efficient storage management for dynamically typed languages" [11].

Generations are numbered from zero (the youngest) through the value of collect-maximum-generation. The maximum generation is currently set to 4. The maximum generation, generation 4, is a static generation from which storage is never reclaimed and into which objects are placed only when saving a heap. Thus, while the Scheme process is running, there are effectively only four generations (0, 1, 2, and 3).

Generation 0 is the generation into which new objects are placed. Generations 1 to 3 are used for objects that survive one or more collections. Every time the system performs a generation 0 collection, objects in generation 0 that survive the collection are placed into generation 1. Every time the system does a generation 1 collection, objects in generations 0 and 1 that survive are placed into generation 2. For both generation 2 and 3 collections, surviving objects are placed into generation 3. With this mechanism, it is possible for an object to skip one or more generations, but if older generation collections are infrequent, this is not likely to happen to many objects, and if the objects become inaccessible, their storage will be reclaimed eventually.

An internal counter, gc-trip, is maintained to control when each generation is collected. Each time collect is called without an argument (as from the default collect-request-handler), gc-trip is incremented by one. With a collect-generation-radix of r, the collected generation is the highest numbered generation g for which gc-trip is a multiple of rg. Thus, with the default collect-generation-radix of 4, the system collects generation 3 every 64 times, generation 2 every 16 times, generation 1 every 4 times, and generation 0 every time.

Each time collect is called with a generation argument g, gc-trip is advanced to the next rg boundary, where r is again the value of collect-generation-radix. This not only causes an immediate collection of generation g but also puts off the next collection of that (or any higher) generation by rg times.

Collections are triggered automatically by the default collect-request handler after approximately n bytes of storage have been allocated, where n is the value of the parameter collect-trip-bytes. The collect-request handler can be redefined by setting the parameter collect-request-handler.


procedure: (collect)
procedure: (collect generation)
returns: unspecified

generation must be a nonnegative fixnum no greater than the value of the parameter collect-maximum-generation.

collect causes the storage manager to perform a garbage collection. collect is normally invoked automatically by the Scheme system often enough to prevent the system from running out of storage, but in some circumstances it may be desirable to force collection at a particular time, e.g., before timing a computation.

The system ordinarily determines which generation to collect. If present, the generation argument forces collection of generation number generation and all younger generations. Specifying a generation to collect does not prevent collection of older generations if they would otherwise have been collected.

Specifying the maximum generation as the argument to collect has the same effect as specifying one less; the maximum generation is actually a static generation into which objects are promoted only when a heap is saved via the -s command line argument.


procedure: (collect-maximum-generation)
returns: the maximum generation

The maximum generation is the oldest generation known to the system. It does not change dynamically, but may be changed from one release of Chez Scheme to another.


parameter: collect-notify

If collect-notify is set to a true value, the collector prints a message whenever a collection is run. collect-notify is set to #f by default.


parameter: collect-request-handler

The value of collect-request-handler is invoked whenever the system determines that a collection should occur, i.e., some time after an amount of storage determined by the parameter collect-trip-bytes has been allocated since the last collection.

By default, collect-request-handler simply invokes collect without arguments.

Automatic collection may be disabled by setting collect-request-handler to a procedure that does nothing, e.g.:

(collect-request-handler void)

In order to reenable automatic collections after this, it is generally not sufficient merely to reset the collect request handler. Collect requests are disabled once a collect request has been made until a collection is actually performed, so if a collect request occurs while collect-request-handler is set to a procedure that does not invoke collect, either directly or indirectly, it is necessary to explicitly invoke collect to reenable collect requests.


parameter: collect-trip-bytes

This parameter determines the approximate amount of storage that is allowed to be allocated between garbage collections.

Chez Scheme allocates memory internally in large chunks and subdivides these chunks via inline operations for efficiency. The storage manager determines whether to request a collection only once per large chunk allocated. Furthermore, some time may elapse between when a collection is requested by the storage manager and when the collect request is honored, especially if interrupts are temporarily disabled via critical-section or disable-interrupts. Thus, collect-trip-bytes is an approximate measure only.


parameter: collect-generation-radix

This parameter determines how often each generation is collected by default. Generations are collected once every rg times a collection occurs, where r is the value of collect-generation-radix and g is the generation number.

Setting collect-generation-radix to one forces all generations to be collected each time a collection occurs. Setting collect-generation-radix to a very large number effectively delays collection of older generations indefinitely.


Section 11.2. Weak Pairs and Guardians

Weak pairs allow programs to maintain weak pointers to objects. A weak pointer to an object does not prevent the object from being reclaimed by the storage management system, but it does remain valid as long as the object is otherwise accessible in the system.

Guardians allow programs to protect objects from deallocation by the garbage collector and to determine when the objects would otherwise have been deallocated.

Weak pairs and guardians allow programs to retain information about objects in separate data structures (such as hash tables) without concern that maintaining this information will cause the objects to remain indefinitely in the system. In addition, guardians allow objects to be saved from deallocation indefinitely so that they can be reused or so that clean-up or other actions can be performed using the data stored within the objects.

The implementation of guardians and weak pairs used by Chez Scheme is described in [10].


procedure: (weak-cons obj1 obj2)
returns: a new weak pair

obj1 becomes the car and obj2 becomes the cdr of the new pair. Weak pairs are indistinguishable from ordinary pairs in all but two ways:

The weak pointer in the car of a weak pair is just like a normal pointer as long as the object to which it points is accessible through a normal (nonweak) pointer somewhere in the system. If at some point the garbage collector recognizes that there are no nonweak pointers to the object, however, it replaces each weak pointer to the object with the "broken weak-pointer" object, #!bwp, and discards the object.

The cdr field of a weak pair is not a weak pointer, so weak pairs may be used to form lists of weakly held objects. These lists may be manipulated using ordinary list-processing operations such as length, map, and assv. (Procedures like map that produce list structure always produce lists formed from nonweak pairs, however, even when their input lists are formed from weak pairs.) Weak pairs may be altered using set-car! and set-cdr!; after a set-car! the car field contains a weak pointer to the new object in place of the old object. Weak pairs are especially useful for building association pairs in association lists or hash tables.

Weak pairs are printed in the same manner as ordinary pairs; there is no reader syntax for weak pairs. As a result, weak pairs become normal pairs when they are written and then read.

(define x (cons 'a 'b))
(define p (weak-cons x '()))
(car p)  (a . b)

(define x (cons 'a 'b))
(define p (weak-cons x '()))
(set! x '*)
(collect)
(car p)  #!bwp

The latter example above may in fact return (a . b) if a garbage collection promoting the pair into an older generation occurs prior to the assignment of x to *. It may be necessary to force an older generation collection to allow the object to be reclaimed. The storage management system guarantees only that the object will be reclaimed eventually once all nonweak pointers to it are dropped, but makes no guarantees about when this will occur.


procedure: (weak-pair? obj)
returns: #t if obj is a weak pair, #f otherwise

(weak-pair? (weak-cons 'a 'b))  #t
(weak-pair? (cons 'a 'b))  #f
(weak-pair? "oops")  #f


procedure: (bwp-object? obj)
returns: #t if obj is the broken weak-pair object, #f otherwise

(bwp-object? #!bwp)  #t
(bwp-object? 'bwp)  #f

(define x (cons 'a 'b))
(define p (weak-cons x '()))
(set! x '*)
(collect (collect-maximum-generation))
(car p)  #!bwp
(bwp-object? (car p))  #t


procedure: (make-guardian)
returns: a new guardian

Guardians are represented by procedures that encapsulate groups of objects registered for preservation. When a guardian is created, the group of registered objects is empty. An object is registered with a guardian by passing the object as an argument to the guardian:

(define G (make-guardian))
(define x (cons 'aaa 'bbb))
 (aaa . bbb)
(G x)

The group of registered objects associated with a guardian is logically subdivided into two disjoint subgroups: a subgroup referred to as "accessible" objects, and one referred to "inaccessible" objects. Inaccessible objects are objects that have been proven to be inaccessible (except through the guardian mechanism itself or through the car field of a weak pair), and accessible objects are objects that have not been proven so. The word "proven" is important here: it may be that some objects in the accessible group are indeed inaccessible but that this has not yet been proven. This proof may not be made in some cases until long after the object actually becomes inaccessible (in the current implementation, until a garbage collection of the generation containing the object occurs).

Objects registered with a guardian are initially placed in the accessible group and are moved into the inaccessible group at some point after they become inaccessible. Objects in the inaccessible group are retrieved by invoking the guardian without arguments. If there are no objects in the inaccessible group, the guardian returns #f. Continuing the above example:

(G)  #f
(set! x #f)
(collect)
(G)  (aaa . bbb)
(G)  #f

The initial call to G returns #f, since the pair bound to x is the only object registered with G, and the pair is still accessible through that binding. When collect is called, the object shifts into the inaccessible group and is therefore returned by the second call to G. (As noted above for weak pairs, the call to collect may not actually be sufficient to prove the object inaccessible, if the object has migrated into an older generation.)

Although an object returned from a guardian has been proven otherwise inaccessible (except possibly via the car field of a weak pair), it has not yet been reclaimed by the storage management system and will not be reclaimed until after the last nonweak pointer to it within or outside of the guardian system has been dropped. In fact, objects that have been retrieved from a guardian have no special status in this or in any other regard. This feature circumvents the problems that might otherwise arise with shared or cyclic structure. A shared or cyclic structure consisting of inaccessible objects is preserved in its entirety and each piece registered for preservation with any guardian is placed in the inaccessible set for that guardian. The programmer then has complete control over the order in which pieces of the structure are processed.

An object may be registered with a guardian more than once, in which case it will be retrievable more than once:

(define G (make-guardian))
(define x (cons 'aaa 'bbb))
(G x)
(G x)
(set! x #f)
(collect)
(G)  (aaa . bbb)
(G)  (aaa . bbb)

It may also be registered with more than one guardian, and guardians themselves can be registered with other guardians.

An object that has been registered with a guardian and placed in the car field of a weak pair remains in the car field of the weak pair until after it has been returned from the guardian and dropped by the program or until the guardian itself is dropped.

(define G (make-guardian))
(define x (cons 'aaa 'bbb))
(define p (weak-cons x '()))
(G x)
(set! x #f)
(collect)
(set! y (G))
 (aaa . bbb)
(car p)  (aaa . bbb)
(set! y #f)
(collect 1)
(car p)  #!bwp

(The first collector call above would promote the object at least into generation 1, requiring the second collector call to be a generation 1 collection. This can also be forced by invoking collect several times.)

The following example illustrates that the object is deallocated and the car field of the weak pointer set to #!bwp when the guardian itself is dropped:

(define G (make-guardian))
(define x (cons 'aaa 'bbb))
(define p (weak-cons x '()))
(G x)
(set! x #f)
(set! G #f)
(collect)
(car p)  #!bwp

The example below demonstrates how guardians might be used to deallocate external storage, such as storage managed by the C library "malloc" and "free" operations.

(define malloc
  (let ([malloc-guardian (make-guardian)])
    (lambda (size)
      ; first free any storage that has been dropped.  to avoid long
      ; delays, it might be better to deallocate no more than, say,
      ; ten objects for each one allocated
      (let f ()
        (let ((x (malloc-guardian)))
          (when x
            (do-free x)
            (f))))
      ; then allocate and register the new storage
      (let ([x (do-malloc size)])
        (malloc-guardian x)
        x))))

do-malloc must return a Scheme object encapsulating a pointer to the external storage (perhaps as an unsigned integer), and all access to the external storage must be made through this pointer. do-free must deallocate the external storage using the encapsulated pointer. Both primitives can easily be defined in terms of the C-library "malloc" and "free" operators, imported as foreign procedures. (See Chapter 3.) Care must of course be taken that no pointers to the external storage exist outside of Scheme after the corresponding Scheme object has been dropped.

If it is undesirable to wait until malloc is called to free dropped storage previously allocated by malloc, a collect-request handler can be used instead to check for and free dropped storage, as shown below.

(define malloc)
(let ([malloc-guardian (make-guardian)])
  (set! malloc
    (lambda (size)
      ; allocate and register the new storage
      (let ([x (do-malloc size)])
        (malloc-guardian x)
        x)))
  (collect-request-handler
    (lambda ()
      ; first, invoke the collector
      (collect)
      ; then free any storage that has been dropped
      (let f ()
        (let ((x (malloc-guardian)))
          (when x
            (do-free x)
            (f)))))))


Section 11.3. Locking Objects

All pointers from C variables or data structures to Scheme objects should generally be discarded before entry (or reentry) into Scheme. When this guideline cannot be followed, the object may be locked via lock-object or via the equivalent C library procedure Slock_object (Section 3.5).


procedure: (lock-object obj)
returns: unspecified

Locking an object prevents the storage manager from reclaiming or relocating the object. Locking should be used sparingly, as it introduces memory fragmentation and increases storage management overhead.

Locking can also lead to accidental retention of storage if objects are not unlocked. Objects may be unlocked via unlock-object or the equivalent C library procedure Sunlock_object.


procedure: (unlock-object obj)
returns: unspecified

An object may be locked more than once by successive calls to lock-object or Slock_object, in which case it must be unlocked by an equal number of calls to unlock-object or Sunlock_object before it is truly unlocked.

An object contained within a locked object, such as an object in the car of a locked pair, need not also be locked unless a separate C pointer to the object exists. That is, if the inner object is accessed only via an indirection of the outer object, it should be left unlocked so that the collector is free to relocate it during collection.


Chez Scheme User's Guide
© 1998 R. Kent Dybvig
Cadence Research Systems
http://www.scheme.com
Illustrations © 1998 Jean-Pierre Hébert
about this book