The ABS Business Register (BR) is a register of all businesses in Australia, from transnationals to corner stores2. The Register contains details of 900,000 businesses, employing more than seven million people and operating from about 1,000,000 locations. The Register is the source of all sorts of statistical information about Australian businesses. It can provide information on businesses in particular regions, or about the main regions of activity for a particular industry. Customized reports can provide information such as numbers of businesses classified by industry, employment size range, and region. This information can be combined with other ABS economic information (Census of Population and Housing, specialized industry and labor statistics, etc.) to examine profit margins for particular industries, and to benchmark a business against other businesses of similar industry and size. However, information in the Register that could identify individual businesses remains confidential. The ABS makes available aggregated BR data,
Strict privacy legislation prevents us using live ABS data. We were therefore restricted to the use of a synthetic facsimile of the Business Register data. For more information about the ABS Business Register, please refer to [Australian Bureau of Statistics].
which is used by companies all over Australia to analyze business markets, establish business profiles of a particular area, and as a tool for other market research problems. Many different sources are used to update information on the ABS Business Register, and updating is done on a continuous basis. Most new employing businesses are identified and included on the Register within one to six months of commencing employment of staff. The BR is currently implemented using OODBMS technology, each business modeled according to a complex schema that reflects real-world business structure, including concepts such as legal entities, enterprise groups, and locations. Key issues for the
BR include scalability, complexity, reliability, change management, and heterogeneity of
computational environment. Scalability concerns include: Storage. The database currently has over 100,000,000 objects using more than 10GB of storage space. Processing capacity. The system must be able to respond to queries in a timely manner and be able to batch-process large quantities of data from external sources. External IO. A large number of external clients must be supported simultaneously. The complexity of the database is evidenced by a large schema and the need to preserve information about complex statistical data dependencies. This complexity is exacerbated by the difficulty of maintaining persistent-to-transient data mappings, and the demands of explicit storage management (alloc/free). As a central resource for the ABS's economic statistics, the reliability of the system is of paramount importance from the perspective of both data integrity and availability. Change management places considerable demands on the system at a number of levels: The capacity to deal with long transactions (update transactions on large companies can take many days by virtue of the complexity of the data entry task) The need for object instance versioning to allow for retrospective queries and changes The need for schema versioning in order to support schema change and retrospective use of the data and associated schema The BR must do all of this in support of a wide range of users, including system operators, data entry personnel, and statisticians who exist within a diverse heterogeneous computing environment.
Key Issues Raised by the ABS-BR
The challenges of the ABS-BR and the difficulties faced by the ABS in realizing and maintaining a solution indicate a number of issues that we see as being of broad relevance to high-performance object-server technology. It is these issues that have been the focus of our research effort centered on the ABS-BR. Of these, the first four relate to system requirements that led to significant complexity for the ABS-BR application pro-
Addressing Complexity and Scale in a High-Performance Object Server
grammers, and the last relates to scalability. The challenges of the ABS-BR can be broadly classified as those of complexity and scalability. Figure 10.1 illustrates some of the important concepts in the ABS-BR. Historic snapshots are globally consistent snapshots of the BR that define the resolution at which historical queries are made. The common frame is the current stable view of the BR (typically corresponding to the last historic snapshot). The current view is the view of the database that includes all updates since the last historic snapshot. Dependent source feedback refers to updates that in some way are a function of the database itself (e.g., where change in the database triggers the collection of new data items for the database). Such data items constitute a feedback loop and must therefore be excluded from certain statistical analyses. Long and short transactions are depicted with respect to different views of the BR.
