DAX
Percent of Whole, DAX Style
by Barry on Jul.15, 2010, under DAX, PowerPivot
A common request in business analytics is determining the contribution of a subset to the entire population. For example:
- Which stores comprise the greatest share of sales?
- What products make up the least amount of volume?
This sort of ranking and sorting generated complex, correlated sub-queries in the relational world, but are actually quite elegant when solved using an OLAP implementation. My most recent PowerPivot project involved just such a requirement in a commodity, agri-business. In appreciation of my client’s confidence, I have literally changed the names to protect the innocent, in order to illustrate the problem and solution.
I have noticed, via Google Analytics and Woopra, blog readers from outside the United States who may be more excited about the World Cup than the Bowl Championship Series. Unfortunately, this example will make use of American Football, specifically at the collegiate level. It is my hope the specifics of the game don’t get in the way. I will make every attempt to ensure the relevant background information is in this post, but please reach out via comment or Twitter if a question should arise.
The Example Problem
American collegiate football players are generally afforded four years of eligibility to play. There are exceptions for medical reasons, but the sanctioning body for the college game recognizes football student-athletes in the same class names as their student counterparts: Freshman, Sophomore, Junior and Senior. Because of the speed, power and maturity of the game, Freshman, as a rule are prone to error, slower and not as strong as upper-classmen. Another complicating factor in player development at this level is most of the highly sought after players have spent their high school years (the four years of school immediately before college) dominating less-skilled populations of players and not being challenged on a game-by-game basis. At the collegiate level, the competition is far more even from school to school.
Our ‘customer’ would like a PowerPivot measure that calculates the percentage of the whole that each class represents for a school or conference.
The Solution
As part of my presentation for SQL Saturday #28 in Baton Rouge I have been creating examples of Microsoft Business Intelligence tools using NCAA football data. To illustrate PowerPivot ratio calculations, I am using Roster data from the 2009-2010 season. Below is an image of the Roster table, including a calculated column to add a number (01-04) prefix to the class year in order to sort slicer and axis values by their progression and not alphabetically. In other words, Freshman-Sophomore-Junior-Senior and NOT Freshman-Junior-Senior-Sophomore.
To add another dimension to our player data, I have added a Teams table, with the Conference and Division (where applicable) for each school’s team.
Just the DAX
From our simple, Roster and Team data model, creating the PercentofWhole measure is a matter of calculating the number of players in a given class (FR, SO, JR, SR) and the total number of players for the Conference.
So first, the number of players in the class. The COUNTX function takes two arguments, a Table and an Expression. In our case (as illustrated below) we are iterating through the ROSTER table counting the Player_ID’s (a unique number assigned each player by the NCAA), to obtain the number of Players.
Using this measure I could easily create a cross-tabulation of Player Class by Conference as illustrated below. However, other than telling me there are relatively few players in the Independent schools (Notre Dame, Army and Navy), there is not a lot of insight.
So for our next trick, we need to have a means of using the row grand total in a calculation for each class/conference intersection.
ALL(), the Context Remover
In order to have a calculation that is context sensitive for only certain dimensions, the ALL() function can be very useful. Essentially eliminating a filter context from a measure, it allows us to leverage a total along a given dimension.
For example, our DAX expression for the total of all players below is a combination of two functions. First (and innermost) is a Count of all Player_ID’s in the Roster table. Second, this expression is the object of a CALCULATE function that filters the count by ALL values of Sort Year. It took me a while to get my head wrapped around the idea of a filter that is really every value in the dimension. Once you realize this, ALL() can be very powerful.
Putting the Numerator and Denominator Together
Because I have created two separate measures (PlayerCount and AllPlayerCount) putting them together to create the desired ratio is crazy simple.
Placing new measure into the PowerPivot PivotTable yields the end result my client was looking for. A simple cross tabulation of of class year composition by conference. As I mentioned before, there are some exceptions to the normal Freshman-Senior classifications, and they are accounted for in the column to the left of the 01-FR (Freshman) values.
Because our ALL() is only dealing with the Class Year context, adding Teams to the PivotTable yields the expected result without altering the measure.
Finally, because I have a Division column in the Team table, I can further classify by Division again without altering the measure.
Conclusion
Ratios and composition are a vital part of both business analytics and a feature for which OLAP technologies represent an elegant solution. I hope that I have illustrated how the ALL() and CALCULATE() functions can be used to precisely control calculation by dimension.
Coming soon, how to use this technique as the basis for a Heat Map, on the cheap.
