Sunday, December 4, 2011

How aggregation works

Analytical / BI solutions provide a bird's-eye view of the enterprise data. The key focus here is summary data rather than detailed. For the later you would already have transactional or operational systems in place. Aggregation plays a major role in the design of BI solutions. What data you aggregate, at what level and  where( depending on the chosen technology or BI tool) is important to present the accurate results in the Dashboard.

In OBIEE, you can define aggregations in the

  1. Analytics page
  2. RPD

You would define the aggregation in the Analytics page for a one of report or computational need.

You would be better off defining it in the repository for common needs across subject areas.

When you define it in  the RPD, you would do it in the BMM layer with

  1. derived from existing logical columns
  2. derived from physical mappings

The Important thing to understand is that when you define aggregation using the existing logical columns, the calculations are done later than the aggregation itself. Take a look at the below example where we are calculating the discount rate. Here the values are first aggregated (Sum) and then the calculation is carried out.

image image

In the second case, for the measure DiscountRate_Physical we derive it from physical mappings. In this case, the calculation is carried out for every record first and then the aggregation (Avg) takes place.

image image image

we define DiscountRate_Physical as "xe".""."BISAMPLE"."F1 Revenue"."DISCNT_VALUE" / "xe".""."BISAMPLE"."F1 Revenue"."REVENUE"  * 100

I have used the Oracle provided BISAMPLE schema with the XE database for this example.

we can see in the below analysis the difference in valuesimage

If you have write a query to manually  fetch the results against the data warehouse, this is how it would look like.

-- derived from existing logical columns .
select d.per_name_year, (sum(discnt_value)/sum(revenue)*100) "%Discnt"
from samp_revenue_f f, samp_time_day_d d
where
f.bill_day_dt = d.calendar_date
group by d.per_name_year;

-- derived from physical mappings
select d.per_name_year, (avg(discnt_value/revenue)*100) "%Discnt"
from samp_revenue_f f, samp_time_day_d d
where
f.bill_day_dt = d.calendar_date
group by d.per_name_year;

To sum it it up, the aggregated result will vary depending on how you have designed to carry it out. Usually you define all the default aggregations and where it should be defined in your Dimensional model design document. Discuss it with your team and the Analyst at client side locations to arrive at a consensus on your approach.