Undisplayed Graphic

Star Queries in Oracle8

An Oracle Technical White Paper

June 1997

Undisplayed Graphic

INTRODUCTION

A data warehouse should be designed to satisfy the strategic needs of a corporation. The capabilities of strategic planners and analysts should not be constrained by the functionality or design of their data warehouses. Oracle8 provides a wide breadth of data warehousing features which allows Oracle's customers to build the application that most closely suits their business needs.

This paper describes one feature of the Oracle8 universal data server that addresses a common business need in a data warehouse: high performance for queries on a star schema. This paper defines star schemas and star queries, describes the optimal query processing approaches for star queries, and explains how Oracle8 has been enhanced for star queries.

STAR SCHEMAS

Why use a Star Schema?

Almost all of the data in a data warehouse originated from an operational system. However, even if the data exists in a relational schema in an operational system, a transaction-oriented schema is not necessarily appropriate for a data-warehousing application. A typical operation in on-line transaction process (OLTP) is a "transaction." In most applications, a transaction consists of the retrieval, insertion, or update of only a handful of records. For example, a new purchase order requires information about the customer, and the products being ordered. Operational systems are typically optimized to allow the application to quickly locate these individual records.

The typical data warehousing operation involves a much larger number of records. The end-user of a data warehouse is not interested in a single purchase order. Instead, the end-user will want to examine all of the purchase orders for a given week, or for a given region. Data warehouses are used for a variety of purposes, from simple reporting to more complex analysis and sophisticated forecasting. But each of these data warehousing functions share the same characteristic of analyzing large amounts of data.

A highly normalized schema offers superior performance and efficient storage for OLTP schemas; and star schema provides similar benefits for data warehouses.

What is a Star Schema?

A "star schema" is a natural data representation for many data warehousing applications. A star schema contains one very large table (known as the FACT table), and multiple smaller tables (called DIMENSION tables).

For example, a star schema in a grocery store environment could contain the following tables and attributes (this example is from Chapter 2 of Ralph Kimball's The Data Warehouse Toolkit).

Sales (fact table; 600M rows):

time_key

product_key

store_key

promotion_key

dollar_sales

units_sales

dollar_cost

customer_count

Time (730 rows):

time_key

day_of_week

day_number_in_month

day_number_overall

week_number_in_year

week_number_overall

Month

quarter

fiscal_period

year

holiday_flag

Product (30000 rows):

product_key

description

full_description

SKU_number

package_size

brand

subcategory

category

department

package_type

diet_type

weight

weight_unit_of_measure

units_per_retail_case

units_per_shipping_case

cases_per_pallet

shelf_width_cm

shelf_height_cm

shelf_depth_cm

Store (300 rows):

store_key

store_name

store_number

store_street_address

city

store_county

store_state

store_zip

sales_district

sales_region

store_manager

store_phone

store_FAX

floor_plan_type

photo_processing_type

finance_services_type

first_opened_date

last_remodel_date

store_sqft

grocery_sqft

frozen_sqft

meat_sqft

Promotion (300 rows):

promotion_key

promotion_name

price_reduction_type

ad_type

display_type

coupon_type

ad_media_type

display_provider

promo_cost

promo_begin_date

promo_end_date

Undisplayed Graphic

Example star schema

The SALES table contains information about the sales for each product at each store on a given day (e.g., the date of the transaction and the amount of total sales). The SALES table is very large, since a grocery-store chain could easily have millions of such entries per day.

In contrast, the TIME, PRODUCT, STORE, and PROMOTION tables are very small relative to the
SALES table.

In this schema, SALES is the FACT table; product sales are the basic units of analysis for this data warehouse. The other tables (TIME, PRODUCT, STORE, and PROMOTION) are the DIMENSION tables. These DIMENSION tables provide more details of the sales transactions, describing information about which products were sold, when and where the sales occurred, and what promotions might have influenced the sale.

This star schema is an intuitive representation of multidimensional ‘datacube’ in a relational environment. Each DIMENSION table represents a business dimension while the FACT table contains the contents of the ‘datacube,’ which in this case is the sales information for each product. For example, an end user may want to examine ‘sales by store by week,’ where STORE and TIME are dimensions of the query.

Because star schemas can intuitively model the business issues facing a data warehouse; and because star schemas provide excellent query performance for business analysis in some relational database environments, star schemas are becoming increasingly common.

STAR QUERY PROCESSING: TECHNOLOGY EVOLUTION

A data warehouse implementor could build a star schema in any relational database. After all, a star schema is simply a collection of tables, and the queries on a star schema are standard SQL. However, it is not enough for a relational database to simply store the data in a star schema. The relational database must also provide efficient query access. In particular, it must have a method for supporting the efficient execution of star queries.

First-Generation Solutions

Many relational databases lack the optimized execution methods described above. Version 6 of the Oracle Server, which first went into production in 1988, lacked such methods. Nearly ten years later, at this white paper’s writing in spring of 1997, several major relational database products continue to lack star query support.

Although any relational database can process a star query and provide the correct results, the performance for star queries can be abysmal when using database systems that do not support an efficient access method for star queries. This is because many of these relational databases have simple 'query optimizers' that do not understand how to best process a star query. The query optimizer is the 'brain' of the query-processing engine. For each query, the query optimizer determines the indexes and join methods that will be used to process the query. A simple query optimizer may be more than sufficient for an OLTP application, but because star queries are considerably different than queries that are typically found in OLTP applications, the first-generation query optimizers often choose very poor strategies for star queries.

These first-generation solutions commonly choose to join the fact table to a single dimension tables as the first step of query processing. The second step would be to join the intermediate result from the first step to the second dimension table. This process continues until all tables have been joined. The shortcoming of this approach is that the database is processing large amounts of data from the fact table multiple times (once for each join of a dimension table). Because the fact table is always the largest table, this execution strategy can be very expensive.

Second-Generation Solutions

With the release of Oracle7 in 1992, Oracle introduced an improved method for executing star queries. Oracle's then-new query optimizer, the cost-based optimizer, was much better suited for the demands of complex data warehousing queries.

Oracle’s cost-based optimizer not only has information about the query and the available indexes and join methods, but it also has information about the number of rows and blocks in each referenced table, the size of its indexes, and the distribution of data in each individual column.

When presented with a star query, the cost-based optimizer recognizes that the fact table is very large and expensive to access relative to the cost of accessing the much smaller dimension tables, and it considers this cost when choosing the best execution strategy for the star query.

For many star queries, the best Oracle7 execution strategy is to initially join all of the smaller dimension tables together using ‘Cartesian product’ joins, and then join the fact table in the last phase of the query execution. A Cartesian product join is used for joining two unrelated tables. The SALES table and the PRODUCT table are related (by the product_key column), but the PRODUCT table and the STORE table are not related, so these two dimension tables must be joined using a Cartesian product join. This Cartesian-product-based strategy is often optimal because the expensive fact-table access occurs only once.

However, Oracle7’s cost-based optimizer does not blindly apply this star query optimization to every query on a star schema. The cost-based optimizer always evaluates many possible approaches for executing a given query. In some cases, the star query approach described above is not applicable. For example, the dimension tables may be large, so that building a Cartesian product of the dimension tables may be an expensive operation. In these cases, the cost-based optimizer recognizes that the star query approach is not the best solution for this query, and will choose an alternative execution strategy.

In each release of Oracle7, the query optimizer ability has been enhanced to execute star queries. In Release 7.2, the performance of Cartesian product joins was improved through more intelligent caching techniques. In Release 7.3, the optimizer became more adept at explicitly identifying star queries, particularly in star schemas with large numbers of dimension tables.

This second-generation technique represented a substantial improvement over first-generation databases. By intelligently choosing to access the fact table's data exactly once, the Oracle7 cost-based optimizer consistently chooses star query strategies that are two orders of magnitude faster than first-generation solutions.

Third-Generation Solutions

The second-generation solutions primarily entailed improvements to the query optimizer. That is, Oracle7 chose the best possible method for evaluating a star query using its existing join methods and indexing techniques. A further area for improved star query performance is to utilize new methods for indexing
and joins.

A ‘join index’ is one possible solution for processing star queries. Although there are many variations of join indexes, these indexes all share two common characteristics: the index structure spans more than one table, and the index is used to improve the performance of relational joins.

The concept of a join index can be easily illustrated by a simple example involving two tables: EMP, containing employee records, and DEPT, containing descriptions about each department. These two table share a foreign-key/primary-key relationship on the DEPTNO key.

The following query might be used to find all of the employees in a given department:

select employee_name from emp, dept

where emp.deptno = dept.deptno

and dept.department_name = 'SALES';

This query could be evaluated using a join index. For example, there could be a join index on the dept.department_name column and the emp table. The index structure would list, for each possible value of dept.department_name, the rows of the corresponding employees in the emp table.

In a star schema, a join index can be created across a single fact table and all of its dimension tables. A join index eliminates the need for generation of the Cartesian product of the dimension tables during a query. Instead of computing all of the possible combinations of the dimension tables’ rows, as in the Cartesian product approach, a join index provides immediate access to only the relevant rows of the fact table.
Efficient fact table access is the basis for all approaches to improved performance beyond the second-generation solution.

Although the join index addresses some of the performance deficiencies of the Cartesian product method, it is not without limitations.

Complexity of Join Indexes

A join index is a complex structure, and is necessarily more expensive to build and maintain than a
single-table index.

In some implementations, it is impossible to update any of the indexed columns of a join index. Although a data warehouse is theoretically static, every data warehouse DBA knows that occasional updates are a reality imposed by business requirements. For example, a data warehouse may be updated every quarter to reflect a re-organization of company's sales territory definitions. If a join-index prevents the updating of these tables, then even minor modifications to the data may require rebuilding the entire index or even unloading and reloading the entire star schema.

Multi-Column Join Index

Join indexes are not flexible. Because join indexes contain multiple columns, they have several shortcomings.

  • Join indexes are created on an ordered list of columns. The index structure is dependent on the ordering of the columns. Although the join index will be very efficient when all of the indexed columns are accessed or when the leading columns of the index are accessed, the performance will be suboptimal when other columns are accessed. For example, one might build a join index on the dimension tables (TIME, PRODUCT, STORE, PROMOTION). A query that accessed the TIME and PRODUCT dimensions would efficiently use this index, because TIME and PRODUCT are leading keys. However, a query that only accessed the PRODUCT and STORE dimensions would perform suboptimally, and a query on STORE and PROMOTION (the trailing keys) would perform even worse.
  • The join index solution becomes much less attractive for star schemas with many dimensions. A join index spans all of the dimension tables. If a given star schema has 20 dimensions, then the join index would span all 20 dimension tables. The difficulty is that a typical query on this 20-dimension star schema would only reference a few of the dimensions. A typical query may only reference five dimensions, yet the join index used to evaluate this query is still a twenty-column index. These star queries use an index that is approximately four times larger than necessary, in addition to any performance penalties associated with not using the leading keys of the index.

A data warehouse could attempt to address both of these issues by utilizing multiple, multi-key join indexes over different combinations of dimension tables. However, multiple indexes are often unacceptably costly both in terms of space utilization and machine resources for maintaining multiple indexes.

Join Indexes are not a Total Solution

Join indexes are not the best method for every possible star query. Indexes are used to efficiently identify the specific rows of interest. It is true that most queries will only examine a small percentage of the rows in the fact table, and in these cases, a join-index will be effective. But other queries may examine the entire fact tables (for example, ‘what are the total sales for all products, regions, and time periods by product, region, and month?’). A join index is not the best method for queries that span most of the fact table, and alternative techniques should be available for these queries. Some database products utilize join indexes, but have a very simplistic query optimizer that blindly applies the join-index method to virtually every database query. A better solution is to provide multiple algorithms for evaluating star queries.

A join index is also not well-adapted for hybridized star schema. A join index may span a fact table, and all of its immediate dimension tables. However, a join index does not directly address the performance issues of a ‘snowflake’ schema -- a more complex variation of a star schema in which each dimension may be a collection of related tables rather than a single table.

Fourth-Generation Solutions

Oracle8’s star query algorithm represents a significant improvement over previous star query algorithms. Rather than relying on a join index, Oracle8’s algorithm utilizes single-table bitmap indexes, and integrates this bitmap technology seamlessly into a new star query join method.

This new algorithm provides excellent performance, while utilizing less storage (bitmap indexes are
highly compressed) and offering more flexibility in both the types of schemas and the types of queries
that can be optimized.

STAR QUERY PROCESSING IN ORACLE8

Initial Setup

In order to use the Oracle8 star query algorithm, a data warehouse administrator would build bitmap indexes on each of the foreign-key columns of the fact table. In the example schema, the administrator would build single-column bitmap indexes on sales(time_key), sales(promotion_key), sales(product_key), and sales(store_key).

The data warehouse administrator may choose to build additional indexes in this schema. Any column that commonly appears in the "where-clause" of an end-user’s SQL query is a candidate for an index, particularly those columns in the larger dimension tables, such as product. In this example schema, the administrator may choose to build B-tree indexes on product.sku_number and product.sku_description, and bitmap indexes on each of the other attributes of product.

The store, time, and promotion dimension tables are very small. Since these small tables will often be cached or, even if uncached, will only require one or two I/O operations to scan the entire table, it is unnecessary to index the tables.

How it Works: An Example

Oracle8 processes a star query using two basic phases. The first phase retrieves exactly the necessary rows from the fact table. Because this retrieval utilizes bitmap indexes, it is very efficient.

The second phase joins this "result set" from the fact table to the dimension tables. Below is an example of how an end-user may query this data warehouse:

"What were the sales and profits for the grocery department of stores in the west and southwest sales districts over the last three quarters?"

This is a simple star query. The SQL generated by an end-user tool could look like:

select

store.sales_district,

time.fiscal_period,

sum(sales.dollar_sales) revenue,

sum(dollar_sales) - sum(dollar_cost) income

from

sales, store, time, product

where

sales.store_key = store.store_key and

sales.time_key = time.time_key and

sales.product_key = product.product_key and

time.fiscal_period in ('3Q95', '4Q95', '1Q96') and

product.department = 'Grocery' and

store.sales_district in ('San Francisco', 'Los Angeles')

group by

store.sales_district, time.fiscal_period;

As described above, Oracle8 will process this query in two phases. In the first phase, Oracle8 will use the bitmap indexes on the foreign-key columns of the fact table to identify and retrieve the only the necessary rows from the fact table.

That is, Oracle8 will retrieve the rows from the fact table using essentially the following query:

select ... from sales

where

store_key in (select store_key from store where

sales_district in ('WEST', 'SOUTHWEST')) and

time_key in (select time_key from time where

quarter in ('3Q96', '4Q96', '1Q97')) and

product_key in (select product_key from product where

department = 'GROCERY');

Oracle8’s new patent-pending algorithm is often called a "star transformation," since the original star query has been transformed into this subquery representation.

This method of accessing the fact table leverages the strengths of Oracle’s bitmap indexes. Intuitively, bitmap indexes provide a set-based processing scheme within a relational database. Oracle has implemented very fast methods for doing set operations such as AND (an intersection in standard set-based terminology), OR (a set-based union), MINUS, and COUNT.

In this star query, a bitmap index on store_key is used to identify the set of all rows in the fact table corresponding to sales in the West sales district. This set is represented as a bitmap (a string of 1's and 0's that indicates which rows of the fact table are members of the set).

A similar bitmap is retrieved for the fact-table rows corresponding to the sale in the Southwest sales district. The bitmap OR operation is used to combine this set of Southwest sales with the set of West sales.

Additional set operations will be done for the time dimension and the product dimension. At this point in the star query processing, there are three bitmaps: each bitmap corresponds to a separate dimension table, and each bitmap represents the set of rows of the fact table that satisfy that individual dimension’s constraints.

These three bitmaps are combined into a single bitmap using the bitmap AND operation. This final bitmap represents the set of rows in the fact table that satisfy all of the constraints on the dimension table; this is the "answer set," the exact set of rows from the fact table needed to evaluate the query. Note that none of the actual data in the fact table has been accessed; all of these operations rely solely on the bitmap indexes and the dimension tables. Because of the bitmap indexes’ patented, compressed data representations, the bitmap set-based operations are extremely efficient.

Once the answer set is identified, the bitmap is used to access the actual data from the sales table. Only those rows that are required for the end-user’s query are retrieved from the fact table.

The second phase of this query is to join these rows from the fact table to the dimension tables. Oracle8 will use the most efficient method for accessing and joining the dimension tables. Many dimension are very small, and table scans are typically the most efficient access method for these dimension tables. For large dimension tables, table scans may not be the most efficient access method. In the example above, a bitmap index on product.department may be used to quickly identify all of those products in the grocery department. Oracle8’s cost-based optimizer will automatically determine which access method is most appropriate for a given dimension table, based upon the cost-based optimizer’s knowledge about the sizes and data distributions of each dimension table.

The specific join method (as well as indexing method) for each dimension tables will likewise be intelligently determined by the cost-based optimizer. Hash joins, a sophisticated join method added to Oracle in Release 7.3, is often the most efficient algorithm for joining the dimension tables. The final answer is returned to the user once all of the dimension tables have been joined. The query technique of retrieving only the matching rows from one table and then joining to another table is commonly known as a "semi-join."

One particularly notable trait of the example query is that not every dimension table is referenced. The promotion dimension is entirely omitted. In Oracle7, as well as other database products, this star query would be executed using a multi-column index on the fact table. The typical multi-column index would contain all four dimension columns. For this particular query, the multi-column index would suboptimal, since it contains an entirely extraneous column, promotion_key. The data warehouse administrator could, of course, create a different multi-column index that only contains the fact-table columns store_key, product_key, and time_key. However, space considerations would prevent an administrator from building all possible combinations of multi-column indexes. Thus, there will always be some star queries that perform poorly in a multi-column approach, because these queries lack the optimal index.

With the Oracle8 approach, star queries always use the most appropriate indexes. Since the indexes on the fact table are single-column bitmap indexes, the optimizer will only use indexes for those columns that are specified in the query. No extraneous indexed columns will be used in the Oracle8 star query algorithm.

Another interesting characteristic of this query is that no columns are selected from the product table. That is, we are using the product table to restrict sales to the Grocery department, but none of the columns from the product table appear in the SELECT clause of the SQL query. In this case, Oracle’s cost-based optimizer will recognize that it is unnecessary to join the product table. Oracle applies the constraint "Product.department = Grocery" by using bitmap index on the product_key column of the sales table. However, Oracle will not join the product table in the second phase of the query, since no more data needs to be retrieved from the product table.

Even this simple example query demonstrates the robustness of Oracle8’s star query algorithm. This example illustrated how Oracle efficiently handles cases where not all dimensions are accessed and cases where not all dimensions are selected.

Complex Star Queries

The Oracle8 algorithm can handle other complex cases, such as an "inside-out" star query. Many star queries are "outside-in," that is, the end user places constraints on the dimension tables (the outside of the star schema) and retrieves data from the fact table. The above example, in which the end user was looking for the total sales for a given department, time period, and region, is a typical "outside-in" query. However, an end user may also want to do an "inside-out" query. For example, suppose that an analyst wanted to know all of the products that were sold to more than 10 customers on a single day in a given promotion. Then, the analysts’ query could look like:

select

product.description,

promotion_name,

sum(customer_count)

from

sales, time, product, promotion

where

sales.promotion_key = promotion.promotion_key and

sales.time_key = time.time_key and

sales.product_key = product.product_key and

time.fiscal_period = '4Q95' and

promotion.price_reduction_type = 'Markdown' and

sales.customer_count > 10

group by

product.description, promotion.promotion_name;

The first phase of processing this query is to identify all of the relevant rows of the sales table. This phase will not only take into account the constraints on the time_key and the promotion_key, but will also consider the constraint on the customer_count column. Morever, the data warehouse administrator could choose to create a bitmap index on the customer_count column, and Oracle would utilize this bitmap index in conjunction with the bitmap indexes on time_key and promotion_key. This allows for extremely efficient access to the fact table for "inside-out" queries.

Complex Star Schemas

Oracle8’s algorithm also supports different varieties of star schema designs. For example, some implementors may have a collapsed dimension table. That is, a logical dimension is stored as one or two columns in the fact table, rather than as a separate dimension table. In the above schema, the administrator could choose not to include a time dimension table. Instead, all queries could reference the time_key columns of the Sales table, rather than the columns of the Time dimension table. The Oracle8 algorithm would perform equally well without an explicit time dimension table, since all of the date constraints are applied to the time_key column of the sales table rather than the time dimension table, and a bitmap index on the time_key column of the fact table could continue to be utilized.

The Oracle8 algorithm can also efficiently handle snowflaked dimensions. Suppose that the product dimension was no longer a single table, but two tables:

Product (30,000 rows):

product_key

sku_number

sku_description

brand

category

subcategory

department_key

package_size

Product_department (50 rows):

department_key

department_name

(other department characteristics)

Undisplayed Graphic

Simple snowflake schema

The example query on this modified schema would need to be modified to include both the product and the product_department tables:

select

store.sales_district,

time.quarter,

sum(sales.dollar_sales) revenue,

sum(dollar_sales) - sum(dollar_costs) income

from

sales, store, time, product, product_department

where

sales.store_key = store.store_key and

sales.time_key = time.time_key and

sales.product_key = product.product_key and

time.quarter in ('3Q96', '4Q96', '1Q97') and

product.department_key = product_department.department_key and

product_department.department_name = 'GROCERY' and

store.sales_district in ('WEST', 'SOUTHWEST')

group by

store.sales_district, time.quarter;

Oracle will recognize this snowflake schema, and it will treat each snowflaked dimension as a single logical dimension. That is, when accessing the sales table, Oracle will effectively be doing:

select ... from sales

where store_key in (select store_key from store where

sales_district in ('WEST', 'SOUTHWEST'))

and time_key in (select time_key from time where

quarter in ('3Q96', '4Q96', '1Q97'))

and product_key in (

select product_key from product, product_department where

product.department_key = product_department.department_key and

product_department.department_name = 'GROCERY');

Oracle’s sophisticated query transformation is designed to accommodate a snowflake schema by identifying all of the tables that constitute a single logical dimension.

Intelligent Query Optimizer

For every query, Oracle’s cost-based optimizer will carefully choose the most efficient execution
method. In particular, the optimizer will weigh the Oracle8 star query algorithm against other possible processing techniques.

When the cost-based optimizer determines that the star-query algorithm is the best approach for a given query, the optimizer must still intelligently choose the best way to apply the star query algorithm. As mentioned previously, the optimizer will choose the most efficient methods for accessing and joining each dimension table. Additionally, the optimizer will also weigh the relative benefits of including each dimension table in the first phase of query processing. If a given dimension table does not sufficiently constrain the fact table, then the optimizer may determine that it is more efficient to defer that dimension table's access until the second phase of the algorithm. Unlike a join-index approach, the Oracle8 star query algorithm is flexible so that every component of the star query algorithm can be tuned, and the cost-based optimizer intelligently and transparently applies these additional optimizations.

The cost-based optimizer has been considerably enhanced within Oracle8. In Oracle7, the cost-based optimizer analyzes a star-query by considering all of the possible orders for joining the dimension tables together; this characteristic limits the Oracle7 optimizer's effectiveness for star queries with large numbers of dimension tables. The Oracle8 optimizer does not depend upon generating many possible join orderings. Since the Oracle8 star-query algorithm joins all of the dimension tables to the fact table in a single phase (the first phase of the star query algorithm), the optimizer does not need to consider all possible join-orderings; the optimizer only needs to consider the set of dimension tables that are associated with a given fact table. This approach simplifies the optimizer's algorithm for generating star query strategies and consequentially allows that the optimizer to generate more efficient query strategies.

The cost-based optimizer will also recognize star queries on schema that contain multiple fact tables. An example of a multiple-fact-table schema is a system used to track both inventory and sales. There might be separate fact tables for inventory shipments and for sales, although these two fact table may shared several dimension table (such as product). In such a system, it would not be uncommon for a single query to examine multiple fact tables. For these queries, the optimizer be analyzing mulitple "magic sets" (where each "magic set" consists of a fact table and its associated dimension tables), and will choose to apply the Oracle8 star query optimization to such a set in the query.

Oracle’s implementation of cost-based optimizer has been a key contribution to its query-processing improvements. Not only does the cost-based optimizer vastly improve query performance for the complex queries typically found in data warehouses, but the cost-based optimizer is an important component of Oracle’s extensible infrastructure. New query techniques will continue to be added quickly and seemlessly, since the cost-based optimizer provides a framework for appropriately applying each new technique.

Complete Parallelization

All aspects of this Oracle8 star query algorithm can be parallelized. This includes the retrieval and
combining of the bitmap indexes on the fact table, the retrieval of data from the fact table (phase one of the star query algorithm), and the joining of the fact table’s "answer set" to the dimension tables (phase two of the star query algorithm).

Any star query in Oracle8 can be executed in parallel (this parallelization has been implemented on both partitioned tables and non-partitioned tables). As with all parallelized queries within Oracle, the query optimizer dynamically determines the appropriate degree of parallelism.

BENEFITS OF NEW STAR QUERY ALGORITHM

The Oracle8 star query algorithm provides:

  • Excellent performance without compromises: While the Oracle7 algorithm may be limited by the size of the generated Cartesian product or by the sparsity of the fact table, the performance of the Oracle8 algorithm is proportional to the amount of data required from the fact table. The Oracle8 star query algorithm is often more than ten times faster than the Oracle7 star query algorithm, and addresses a broader ranger of star queries than the Oracle7 algorithm.The Oracle8 star query algorithm is carefully designed to provide scalable performance in parallel hardware environments.
  • Efficient storage and easy maintenance: The fact table is indexed with single column bitmap indexes. Oracle’s bitmap indexes, with their patented compression technique, require relatively little storage. For many star schemas, these bitmap indexes are collectively 30 percent smaller than a single multi-column index. Bitmap indexes can be created very rapidly, in parallel. Because the bitmap indexes can be readily combined with each other, there is no need to build any redundant indexes (as was occasionally required for multi-column index solutions).
  • Ability to handle complex star schema: As discussed above, the Oracle8 algorithm is designed for schema with large numbers of dimension tables, snowflake schemas, and schemas that contain multiple fact tables.

SUMMARY

The Oracle8 star-query algorithm represents the next generation in star query processing. The Oracle8 algorithm is truly advanced technology. While join-index solutions for star queries were implemented in commercial products over five years ago, the concept of applying a semi-join to star queries was only
proposed in academia less than two years ago. Oracle8’s innovative technique of combining this semi-join technique with query transformation technology and bitmap indexes, as well as Oracle’s unprecedented speed in bringing this query technology into its flagship product, is indicative of Oracle’s leadership in providing data warehouse technology.

BIBLIOGRAPHY

"Star Query Processing in the Oracle7 Server," Oracle White Paper, March 1996.

"Bitmap Indexes in the Oracle Server," Oracle White Paper, October 1995.

"Oracle8 Server Tuning," Oracle documentation, June 1997.

"Oracle8 Server Concepts," Oracle documentation, June 1997.

Kimball, Ralph, The Data Warehouse Toolkit, John Wiley & Sons, 1996

O'Neil, P. E., Graefe, G., "Multiple-table joins through bitmapped join indexes," SIGMOD Record, Vol. 24, No. 3, September 1995.

Undisplayed Graphic

Oracle Corporation
World Headquarters
500 Oracle Parkway
Redwood Shores, CA 94065
U.S.A.

Worldwide Inquiries:
+1.415.506.7000
Fax +1.415.506.7200
http://www.oracle.com/

Copyright Oracle Corporation 1997
All Rights Reserved

This document is provided for informational purposes only, and the information herein is subject to change without notice. Please report any errors herein to Oracle Corporation. Oracle Corporation does not provide any warranties covering and specifically disclaims any liability in connection with this document.

Oracle is a registered trademark and Enabling the Information Age, PL/SQL, Oracle 8, and Oracle7 are trademarks of Oracle Corporation.

All other company and product names mentioned are used for identification purposes only and may be trademarks of their respective owners.



This is a copy of an article published @ http://www.oracle.com/