Wednesday, March 28, 2012

How do I access the lowest level data?

OK, this feels like a dumb question... but how do I access the

lowest-level data in the cube? I.E., before it is aggregated?

What I'm trying to do is find all of the rows where an amount is >

1000. However, when I create a calculated member or constrain in

any way, it compares to the aggregated amount. I have tried a few

ways, but they all seem to act only on the aggregated data.

Thoughts?

BTW, the SQL would be:

Select sum(measure), count(measure)

from fact_table

where measure > 1000

Thanks,

Doug

Just to understand your issue:

For example, you have sales at a daily level, some of which are <1000 some over 1000. When you roll these up to the week, month, etc , they aggregate as expected.For a given higher-level time period, do you want to sum the sales of all the days with sales<1000, and also count the number of these days? (eg in a given week where only two days are <1000 the sum might return 750 and the count 2)

It's not actually trivial:
The way I would do it would be to use DESCENDANTS to return a set of the lowest level members that are underneath your current member, then FILTER that set to pick out those where [measures].[sales] <1000 then finally SUM the values from that set.

This is a bit cumbersome to write, but will work. Check out BOL for the syntax, (or I'll post a code sample when I can check BOL to make sure I get the syntax right!)

Anyone knows a more efficient way of doing this I would appreciate it.

Richard R.

|||Thanks Richard. Your suggestion was definitely helpful, but

ultimately it ended up being way too slow because I need to span

hundreds of thousands of records, and to do that it essentially meant

creating the cartesian that described them! Anyway, what I've

done for now is to create another dimension that has ranges what the

sum of the measures are. I.E. sales range (0-49, 50-99, 100-499,

etc.). and amm using those to constrain. I guess this makes

sense, since it is a constraint, not a measure, but it is definitely

frustrating to have to do it this way. The major drawback is that

I have to now create a dimension attribute for every type of sale, and

combination of sales types, to be able to properly analyze the

data.

If anyone finds an easier way to do this, please let me know!

Thanks,

Doug|||

Hi again Doug,

I don't know if this is will be useful or not in real life, but I've found a feature in AS2005 which automatically clusters continuous variables into groups for use as dimensions - which is pretty much exactly what you want.

Create a new dimension using your fact table as the source, and your sales value as the dimension attribute. On the properties box of the attribute is a wonderfully-named property "DiscretizationBucketCount", and underneath is is "DiscretizationMethod".

The first determines the number of buckets the values will be separated into (eg 7 groups), the second the method (Automatic, Even-spaced, Clustered)

I tried it on a 14 million fact table row, asking it to cluster the value spend attribute into 7 buckets automatically, and the dimension processed in a few minutes (Fast P4 laptop 1GB RAM). The resulting clusters were not that elegant, as they had the decimal points on the range names (eg 0-15.79, 15.89 - 99.34), but they are good enough for the evaluation cubes I'm delivering to the customer tomorrow :-D

I'll dig a bit more and see if the names or clusters can be frigged manually. It's all XML behind the scenes, so I'm sure it can.

Anyone out there want to REALLY show off their expertise?

Regards,

Richard

No comments:

Post a Comment