Power Testing ETL with Power BI – The Process

11 11 2014

PowerTool_1This is a short blog series on using Power BI tools to support testing ETL processes. I have presented on this subject at few SQL Saturdays over the past few years and am finally succumbing to multiple request to turn it into a blog post. Realizing the amount of content is more than I typically would put into a single post, I will be putting together this short series to cover the material. The first post is this one. It will walk through the entire process at a high level. I will follow this post with a deeper look at Power Query’s role in the process. The third post will cover Power Pivot and building out test cases. Finally, we will wrap the series up with some visualization ideas for Excel and Power View. You can find all the posts as they come online here. Let’s get started.

The Problem Area

Why use Power BI to test ETL? While working as the architect on an ETL project for moving data from third party web service to an on-premise financial solution, we needed to put together a testing strategy that could be implemented by non-developers on the project. Our situation was that our project was “too small” to engage our QA team but the requirement for reusable testing needed to be fulfilled. Our project team consisted of a BI architect (that would be me), an ETL developer, and a business analyst (Chuck Whittemore).

NOTE: We are testing the data transformations and data load. This is not intended for auditing or performance. There are other tools for reviewing those including the built in reporting in SSIS and Pragmatic Works’ BI xPress tool. If you are tracking whether a package fails or succeeds, you should use either of these options not this process.

The Big Idea

The BA and I were discussing options for testing and we theorized that we could use a new add-in for Excel (Power Query, still in preview at imagethe time) with Power Pivot to build out tests. The key to success on this project is that we needed to be able to test with non-developer tools, no SQL Server Management Studio or SSIS could be involved in the testing. The primary reason for this is that he would be doing the testing. We also did not want to recreate every step in the ETL process the same way. So, time to put theory into practice. We determined that we would create test cases in Visual Studio then build out tests to match those cases in Excel using the Power BI add-ins. He would do the work in Excel and we, the developer and I, would provide technical support as needed.

The Recommended Tools

Before we dig into the process, I want to lay out the tools used for development and for testing. While this solution can use other tools, it is important to know what we used in practice to create our solution.

ETL Development Tools

imageThe ETL development was done using SQL Server Integration Services (SSIS). At the time, we needed to use Script tasks to consume the web service content. The financial system used a custom load process that we dumped formatted data into a file for the system to pick up and load.

In the examples, I use in the presentations and will lay out here, I will be using a text file to SQL Server implementation. While complex ETL problems are common and hard to test, this simplified version is easier to follow in examples. You should be able to apply the principles used here to test any solution.

Testing Tools

imageThe testing development for the referenced project consisted of Excel with Power Query and Power Pivot. Power Query was in preview at the time, so we had some of the performance issues and early bugs to work through. None of these issues, prevented us from completing the project.

The presentation solution relies on the latest version of Power Query (which changes every month) and Power Pivot in Excel 2013. Most of the examples are easy to follow, but you should be able to solve most transformation tests with the combination of Power Query and Power Pivot. Definitely do not discount the capabilities of Power Query and the fact that new functionality is being added each month.

Team Foundation Server/Visual Studio Online

imageBoth projects use the online version of TFS. If you are currently not using a source control and work tracking solution, I highly recommend you look at the online version of TFS. It will allow you up to 5 users free and give you ability to use source control, create test plans, create test cases, log bugs and track changes. These are key features necessary to complete a good solution that can be managed and tracked.

The Process

image

I am going to walk through my demo to build out the process steps. This will allow you to see examples. I will call out any thing of relevance related to the project here as well.

1. Business Rules

The first part of any project, especially in ETL, is to understand the business rules. If you are working with a data warehouse project, this may be fairly well documented in a dimensional model. In both of our cases here, we are moving data from one system to another. The transformations and business rules are primarily driven by the target system. Here are some examples of business rules in the media library sample project.

  • Author names are stored in separate columns – FirstName and LastName
  • If an author’s name include a middle name or initial or some variation, this combination should be stored in the first name column. For example, J.R.R. Tolkien would be stored as follows:
    — FirstName: J.R.R.
    — LastName: Tolkien
  • Copyright year should be stored as a 4 digit value
  • Page numbers should not exceed 1000

Every project has some type of business rules. It is hard to build out transformations and create test cases without these rules.

2. Source to Target Map

This is the single most important document for the tester. It tells the tester how the developer is getting from source to destination and what type of data massaging needs to be handled. Typically, people use some variation of the example created by the Kimball Group over the years.

3. Developing SSIS

The developer begins the process of creating the SSIS package. He will be using the Source to Target Map as his guide and will update that document to handle special cases in the data as needed. Ideally he is working in a development environment that will allow for test build outs as well.

4. Creating Test Plans and Test Cases

The tester creates test plans and test cases in TFS. These tests are based on business rules and the source to target map. Depending on both the complexity of the solution and the time to develop, some test cases could be did the table move the correct data field for field and row count. This method can be particularly useful when working with large tables or simple data flows. However, you should have a test case for every transformation that massages the data. This will insure that the data is being transformed as expected.

image

Keep in mind, this solution will support test cases for each field in a data load if required. The tester and architect should evaluate what is the appropriate amount of coverage to guarantee the highest level of quality in the data transform. As always, there is a diminishing rate of return if you “test everything” at the lowest level. It will be expensive in terms of cost of development when the chance for error is minimal. It will also take substantially longer to test everything. You need to understand and be able to articulate how the testing was accomplished and your level of confidence in the results.

5. Building the Tests

This is the most extensive part of the process besides the SSIS development. I will not go into all the details here, but will walk through the overall process and principles. I will provide detailed examples in the follow up posts as noted above.

Let’s start with the end result. Chuck and I were able to determine that we could use DAX to create comparative formulas on data that could be brought into Power Pivot from both the source and the destination. Essentially, we wanted to use math to determine the results of the tests. So in our example, we use a formula like “if Source.CopyrightYear = Destination.CopyrightYear, then it passes, else it fails.” Depending on how you want to measure, pass could be 0 or 1. Then we add the values up to determine if data passed or failed the test. We can even tell you failure rates.

In order to get the data in a comparative state, we needed each table in the destination with a table that matched from the source. However, it is very common that sources and destinations are not one-for-one table matches. This is where Power Query comes in. Using Power Query in our example set we bring in the text file and massage or shape the data to look like the destination. Most importantly, we need to apply all business rules and transformations to the source. Once this is done, we do no massaging on the destination data. This allows us to compare what the ETL process did with what our tests say it should have done.

A key part of being able to compare is the ability to relate the two tables in Power Pivot. You need to be able to match natural keys or derived keys between the two sources. The relationship should be from the destination table to the source table. Without this relationship, you will not be able to build the calculations for the tests. Keep in mind the goal is to get our source to look like expected results. Any data in the destination should match the source in our scenario.

image

Once both tables are created and loaded into Power Pivot, we can complete the tests using DAX. In some cases, we create calculations on both tables to be compared. A classic example is row count. We count the number of rows in the source table and the destination table. Then we create a calculation on the destination to compare values. This meets the requirements of a row count test case (e.g. all data was successfully imported).

Another example of a test is to compare the content in a field from source to destination. This is where we use a lot of conditional logic to verify the contents of a field in a row is the same in both tables. Calculated columns (not measures) are used to create the comparison results. The conditional statement should result in a number. This is important in order to create a measure that sums up the results to determine if errors exist or not. If you choose success to be 1, then you will check your results against the row count to determine if there are errors. If you choose failure to be 1, then a nonzero count means you have errors. There is no right or wrong way to handle this, you would choose based on visualization techniques. Most of the time, using 1 for failures is fine. However, if you want to create KPIs, you will likely need success to be one so you have a good target to work with.

6. Testing the Initial Load

Once you have created the tests, you are ready to test the initial load. You will connect to both sources. Ideally, your source will not change so you can redo the test multiple times, but this will work regardless. Refresh the data which may require rerunning the Power Query query. Once you have refreshed the data you should be able to check the calculations in a simple pivot table to determine what tests have succeeded or failed. This is the beauty of this solution. Each subsequent execution of SSIS, you will be able to refresh your data and review your results to determine how successful the ETL is.

image      image     image

A side effect of this work is that the developer can review the test results in Excel and Power Pivot with you to more easily find the discrepancy in the data transform. In some cases, the tests are in error as well. It is important that the developer and tester work together to determine cause as well. A good team will be able to work through issues rather quickly.

7. Recording Bugs and Issues

You will need to go back to Visual Studio to change the pass/fail for each test. If a test fails you can log a bug for the developer and you that information to determine if it was fixed prior to a subsequent run. It is likely that multiple sprints will be required to complete the work so you can work with your team to determine the best ways to communicate what is ready. If you track the work in TFS, you will queries are available to help you see what work has been completed.

You can determine if the fix worked and then set the test results accordingly. This will help show progress on the project as well.

8. Visualizing the Results

You can visualize your results using KPIs, conditional formatting and even Power View. If you have a project that needs to be easily evaluated you can publish your results to SharePoint and use charts and graphs to show how accurate the process is so far.

image  image

We will dig into visualization options more in a following blog post.

Tracking Test History

No solution is perfect and that is true here as well. One of the most common questions is how do we see the historical results? This solution does not easily provide for that. I am looking at options, but for the moment the idea is that the history will be tracked through TFS. However, you could save the workbook after each iteration. This will give you some history, but you would want to make sure that you don’t refresh data on a historical workbook or the results would be overwritten.

Some final thoughts.

Power Query is not an ETL tool. It’s target destination is always the same – Power Pivot. While it’s ease of use makes it appear to be a tool to be used for ETL, it is not there yet. However, it is in its ease of use that we have a place to work with it here.

My plan is to have some deeper technical dives into parts of the solution in the future.





PASS Summit 2012 Wrap Up

13 11 2012

Wow, what a week. Once again, PASS put on a great event that provided much in the way of events and training for the SQL Server community. If you followed my countdown you know some of what I love about PASS. Last year I blogged everyday but I did not do that this week. So, what was different for me? Well, for one I volunteered much more this year than last and I was privileged to speak twice. I spent more time meeting new people and catching up with friends and that was great as well. Enjoy my wrap up from my week.

Tuesday – Leadership Meetings, Welcome Reception, and some Karaoke

Before the event officially kicked off, I joined community leaders from around the world for a series of leadership meetings. First we had a meeting on SQL Saturdays which was an opportunity to see the immense growth of these free training events around the U.S. and throughout the world. What a great opportunity for SQL Server professionals to improve their skills and for those passionate about the community to improve their abilities by leading these events. Many ideas were shared among the team including a panel on how to effectively run a SQL Saturday on a tight budget.

Once that was completed, the Regional Mentors enjoyed a lunch together and an opportunity to share what we do to support the user groups in our regions. I particularly enjoyed the fact that I was able to spend some time with Regional Mentors from Germany, Holland, and Portugal. This highlighted further the international scope and reach of PASS. This was followed by the Chapter Leaders meeting. That meeting was held as a series of round tables that the chapter leaders could move through. I was working at the table focused on leadership with Ryan Adams ( B | T ) from the North Texas SQL Server User Group – NTSSUG. We had a number of good conversations around building leadership teams for user groups and what is needed to have an effectively led user group. Check out the NTSSUG site for the by-laws sample we discussed multiple times.

All of these meetings were followed up with the Welcome Reception, which I made a small portion of as I was trying to drop my backpack at my hotel and work my way back there. After the reception, I headed out to Bush Gardens with a number of others. During that time, Jes Borland ( T ) managed to get a microphone in my hand and I had my first round of karaoke. Yes, I actually did sing and had fun doing it. All-in-all, it was a good time had by all.

Wednesday – SQL Around the World, Microsoft Announcements, Tabular Models, and Magenic Team Dinner

This was the true kick off to the event. For many, they looked at the key note as the kick off. Before that even began, I was working in the Community Zone encouraging people to participate in the SQL Around the World community activity. It was a great game. You needed to find 10 people from 10 different countries and find out something interesting about them or their country. I found a dancer and someone who had ridden a cheetah as a kid. I also surprised someone from the Czech Republic when she mentioned her home town only to have me let her know I had been to her home town many years ago. It was a fun conversation. If you did this and have other cool stories let me know. It was amazing as well over 50 different countries were represented at PASS.

Next, Ted Kummert had the first keynote session of the day. His keynote was filled with announcements concerning SQL Server including the following:

  • Hekaton: the project code name for a new in-memory OLTP engine
  • The Columnstore Index will be updateable
  • Next version of PDW will be out in H1 2013
  • Polybase: allows you to query across multiple types of data sources such as SQL Server and Hadoop with T-SQL
  • DAX Queries will be able to query SSAS Cubes

He also highlighted some recent announcements related to the SQL Server stack:

  • Microsoft HDInsight Server CTP: Hadoop for Windows Server
  • Windows Azure HDInsight Service Preview: Hadoop for Azure
  • Power View and PowerPivot fully implemented in Excel 2013

After the keynote, I hit a session on BigData and Hive which was put on by SQL CAT and very informative. My big takeaway was to use EXTERNAL tables not INTERNAL tables when working with Hive. I then went to do final prep for my Tabular Model session. In this session, “Building a Tabular Model Database”, I present on what tabular and in-memory is, and then proceed to open up a Visual Studio project and create a database. I think it went well and the attendees seemed to enjoy the upbeat nature for an end of day session. The night wrapped up with dinner with the Magenic team (7 of us). Good chance to grow relationships across offices from around the country.

Thursday – Community Zone and DAX as a Query Language

Thursday was a fairly low key day for me. Once again I spent time in the Community Zone. I had the opportunity to talk with a few people on creating a user group in their area. As always, I like to see people interested in growing their local community.

I also attended Alberto Ferrari’s session on DAX. I think the biggest surprise to me was that you can now query DAX directly from SSMS. I am not sure that I am convinced that it is a full query language yet, but it is definitely closer. The key to it all is the EVALUATE expression which allows you to create the DAX query ironically in the MDX window. Here is just a taste of DAX as a query:

EVALUATE
    ‘DimCurrency’
ORDER BY

    ‘DimCurrency’[CurrencyAlternateKey]

What I found interesting is that you can create columns, build measures, and perform many other operations against the tabular model using DAX. In the end, it will not increase the memory used as storage as it is all calculated. Look for some more on this in later blog posts as I delve more into the in-memory storage and usage when working with DAX.

Friday – More Community Zone, HDInsight, Paul White, and Window Functions

Last day. I spent more time in the Zone. I really did enjoy my time there as I continued to meet more people. I was even present when a contract was completed for the Shanghai user group. Very cool indeed. I then attended a session on HDInsight by Mike Flasco from Microsoft. This is very cool stuff as you can create simple Hadoop cluster on your desktop to test the technology. Microsoft and Hortonworks have done a great job of bringing Hadoop data into the Microsoft stack.

On my way to present my final session of the day and the conference, I stopped in for the second half of Allen White’s ( B ) optimization presentation. In a word (or two), mind-blowing! Wow, who knew that the optimizer did all those things? I was highly impressed and think he should look at a precon on the subject next year. Unlike some three hour presentations, he could have went longer as he was not stretching his content out. Nice work Paul. So, I got to follow that with a presentation on Window Functions in T-SQL. For the second time, I had the last slot of the last day. I think this presentation went well even though we were all worn out from a content-filled week. It was fun to try some ideas from the audience in the demos. That always makes for a more interesting demo. I will be doing a follow up post on what I learned from some of the attendees on the subject as well, proving once again this is a user community event. We all have something to contribute! (If you attended this session, you will find links to the blogs on the subject here.)

What’s Next?

Coming in April is the new Business Analytics conference in Chicago followed by the PASS Summit in Charlotte, North Carolina. Of course, your local user groups will continue to meet with regional SQL Saturdays sprinkled throughout the year as well. How will you participate and contribute in 2013? We look forward to seeing you all again, soon.





Oracle Tips for MSBI Devs #4: Executing an Oracle Stored Proc with IN Parameters in SSIS Execute SQL Task

1 05 2012

The first tip I published discussed how to execute an Oracle procedure with no parameters.  In this tip, I will discuss a technique that works with IN parameters in an Oracle stored procedure.

After many unsuccessful attempts at executing a stored procedure with parameters, the following pattern was developed by a one of my peers working in a blended environment, Brian Hanley (T | B).  With his permission, I have documented the solution here for your use.

The Solution:

The solution involves using variables and creating the SQL script to be executed in an Script task.

image

Here is the syntax for the procedure used in the examples:

CREATE OR REPLACE PROCEDURE
SCOTT.spDelete1 (DEPTNUMBER int) IS
BEGIN
DELETE FROM DEPT WHERE DEPTNO=DEPTNUMBER;
END spDelete1;

Create variables

Create variables to hold the name of the procedure, any parameters, and the finished script.  In my example, I only have one parameter in the procedure, so I only use three variables.  If it fits your needs, you can also separate the user/schema name into a separate variable.

image

The variable used for the statement has been set up to use string formatting with C#.

Prepare the Statement Variable

Use the Script task to build the statement variable (SPStatement).  Start by adding the variables to the script task.  Be sure to add the statement variable to the ReadWriteVariables collection.

image

The following image contains the script syntax to use to set up the variable.  As noted above, the C# String.Format function is used to update the statement variable.

public void Main()
{
    // TODO: Add your code here
    Dts.Variables["SPStatement"].Value = String.Format(Dts.Variables["SPStatement"].Value.ToString()
        ,Dts.Variables["StoredProcName"].Value.ToString()
        ,Dts.Variables["SPVar1"].Value.ToString()                
        );

    String msg = "SPStatement: " + Dts.Variables["SPStatement"].Value.ToString();
    Boolean refire = true;

    Dts.Events.FireInformation(0, "SPStatement", msg, String.Empty, 0, ref refire);

    Dts.TaskResult = (int)ScriptResults.Success;
}

Setting up the Execute SQL Task

In the Execute SQL task, you will set the SQLSourceType property to Variable and set the SourceVariable to the name of this statement variable.  In the case of our example, this is the SPStatement variable.

image

Versions:

This tip has been confirmed to work in both SQL Server 2008 R2 Integration Services and SQL Server 2012 Integration Services.  The Oracle version tested with is Oracle 11g.





Oracle Tips for MSBI Devs #3: Choosing Drivers

24 04 2012

When working with Oracle, drivers are truly a pain to get working correctly.  I will discuss my preferred choice and why for the following tools – SSIS, SSAS, and SSRS.

SSIS Drivers

Without much question, you should use the Attunity tools for working with Oracle data in the Data Flow task.  In SSIS 2008, the SSIS Connector is free and can be found here:  http://www.microsoft.com/download/en/search.aspx?q=oracle%20connector.  It includes the connection manager, source component and destination component.  Without a doubt this is the only way to work with Oracle data components in the Data Flow task.  (NOTE: I cannot find the SSIS 2012 equivalent at the moment.  However, Matt Massan’s blog post after PASS Summit 2011 notes more work is being done with Attunity.)  UPDATE: I wrote this prior to a blog post from Matt Massan on support for SSIS 2012 with v2.0 of the Microsoft Connector.  Check out Matt’s update on this: http://blogs.msdn.com/b/mattm/archive/2012/04/04/microsoft-connectors-v2-0-for-oracle-and-teradata-now-available.aspx.

However, this connector does you no good when working with the Execute SQL task.  In SSIS 2008, I use the OLE DB provider from Oracle to create the connection used with Execute SQL Task.  In my work with procedures in my first tip, I used the OLE DB provider with 2012 as well and it worked fine.

SSAS Drivers

When working with SSAS 2008 and, until I know differently, SSAS 2012, I would recommend using the Oracle OLE DB driver from Oracle.  This driver is not the fastest I have seen (third party drivers are marginally faster and the .NET driver is faster as well), but it has provided consistent results for the right price.  Third party drivers will improve the throughput, but not substantially.  The Oracle provided .NET driver is faster as well, but has an unchangeable active query timeout of one hour.  If you have any processing times that exceed this, it will unceremoniously drop the connection.  For these reasons, I have stuck with the OLE DB provider from Oracle which is not necessarily the fastest, but it has been the least painful to work with.

SSRS Drivers

It is with SSRS I have seen mixed results.  Primarily because of the better performance in the Oracle .NET driver.  If you can guarantee that your reports will return their data in under an hour, this seems to be the best option.  However, if you want to manage to a single driver set across all tools, you may find that the management of the OLE DB driver as the only driver makes sense in your organization.

Test, Test, Test

I have given you my experience using the drivers above.  However, you may find value in purchasing a third party driver or you may find a different experience when you implement in your environment.  Be sure to test and understand the implications in maintenance and system cost when choosing different drivers across your solutions.





X on XMLA: iii. Basic DDL Functions in XMLA (Create, Alter, Delete)

10 04 2012

image_thumb_thumb

XMLA can be used to manage the structure of your multidimensional databases.  While many developers use Visual Studio (BIDS), to deploy changes, as systems move to production or need to be more clearly managed, XMLA comes into play.

Some of the most common DDL type uses for XMLA including partition management, deploying changes, and promoting between environments.  In all of these cases, objects within Analysis Services need to be created, altered or deleted.

Before we dig into the details, I wanted to call out that the Execute method will be used and, keep in mind, that the full syntax is not required when using SQL Server Management Studio (SSMS).  (See X on XMLA: ii. Basic Structure of XMLA for more details.)  Furthermore, in SSMS, you can generate Create, Alter, and Delete XMLA by right clicking on the deployed object and choosing the Script To option.  If you have questions about syntax, definitely use this function to discover more about the syntax and the object you are working with.

The following sections are using the Sales Channel dimension from the Adventure Works sample database to illustrate the command syntax.  (This sample database is available on CodePlex).

Create

The Create command is used to create new objects in the database.  To child elements that are required are ParentObject and ObjectDefinition.  The ParentObject specifies the list of objects that are the parent.  In our illustration below, the parent object is the database.  This is true because the dimensions in Analysis Services belong to the database.  However, if we were creating a partition, the parent would be structured, database then cube then measure group.  This is true because the partition belongs to a specific measure group.  The order of the parent objects matter as they are read top to bottom in the XMLA.

image

Alter

The Alter command is used to modify existing objects in the database.  The Alter command uses two child elements, Object and ObjectDefinition.  The Object defines the object that is targeted.  This, like the ParentObject, is structured from top to bottom with the last object in the list as the object being targeted for alteration.  12820442882053136041exclamation_mark-md[1]

The ObjectDefinition specifies the changes to make to the object.  The changes to the structure MUST include all of the parts of the structure you want to keep the same.  I cannot emphasize how important it is that you keep this in mind.  You cannot send a simple change via XMLA.  You must send the new version in its entirety.  This is true at all levels including the database.  Where this commonly creates some issues is in the user objects at the database level and the partitions in the measure groups.  If you have specific users at the database level that are different between environment, which you should have, you need to update the alter for each environment.  If you are modifying a measure group that you have added partitions to, you must make sure these new partitions are in your script as well.

image

Delete

The Delete command is used to remove existing objects in the database.  It only has one child element – Object.  This is the simplest of the commands here and only needs the proper object definition. Like the ParentObject and Object elements above, you need to have the proper order to delete the correct object.  Use caution as this command will contain the database through the targeted object.  If you execute the command at the database level you will delete your database.

image

Results

The execution results are not always clear.  When the query is successful you will see the following:

image

Not really conclusive.  Error messages are also returned in an XML format and often contain some relevant information as to why the script failed.  Be sure to read it closely as some times multiple errors are returned and the root cause may not be plainly evident.





PASS for Today (and Tomorrow)

20 03 2012

I justPASSMNLogo wanted to plug two events that are free SQL Server training.

First, Thomas LaRock (@sqlrockstar) and Jason Strate (@stratesql) are teaming up to bring you “Choose Your Own Adventure – Performance Tuning”. Join us at the Microsoft Technology Center in Edina, MN or online for this great adventure. This adventure starts at 3:00 PM CDT. More details can be found at http://minnesota.sqlpass.org.

Is that all?, you ask. No. Starting tonight at 0:00 GMT (7:00 PM CDT) is 24 Hours of PASS. Check out the awesome roster of speakers24HOP_Speaker including the likes of Denny Lee from SQLCAT, Marco Russo, and Dejan Sarka. These sessions run through the night and will be close captioned in 15 languages, making it a truly international event.  Oh, did I also mention, I will be speaking at this as well.

Take advantage of these free training opportunities by leaders within the SQL Community.  We all look forward to seeing you at both or either of these events.





X on XMLA: i. Using XMLA

26 01 2012

image_thumb

In my previous blog, I introduced the “X on XMLA” series.  In this blog, I will do more of an introduction to XMLA by discussing how I have used XMLA in my experience with SSAS.

What is XMLA? XML for Analysis or XMLA is the API used to interact with Analysis Services and other similar multidimensional database servers.  The goal was to build out a standard API to meet this goal and it has been widely accepted by the leading OLAP vendors.  You can find out more about the standard at XMLAforAnalysis.  This is a good resource for generic XMLA topics including the current specifications document and a newsfeed.

A key point about XMLA is that it is, as its name states, XML.  The API uses SOAP protocols to communicate with the servers.  (Refer to the specification above if you would like more details.)  I like this because XML is easy to read.  While it can be annoyingly verbose at times, you are able to easily identify what you are doing which reduces the learning curve.

Uses of XMLA.  Now that you have the cursory introduction to XMLA, what is it commonly used for?  I have primarily used XMLA in the areas of loading data and deploying objects.  I use SQL Server Management Studio (SSMS) extensively to interact with my SSAS databases so I do not use the interrogation or query functions within XMLA that often.  However, you should be familiar with all of these uses.

Deploy.  I use XMLA to deploy new objects and alter existing objects on a regular basis.  In particular, this is a standard way to test deployment and to script deployment operations for production releases.  XMLA supports Create, Alter and Delete commands for every object in SSAS.  In SSMS you can generate any of these scripts using the short cut menus as noted in the image below.  You should become familiar with using XMLA as a deployment method as it is an simple way to promote changes between environments.

XMLA Create Script Menu

Load.  The Process commands are used to load data in a variety of methods including full replace or reload, incremental load, and updates.  XMLA also supports commands which will drop and recreate indexes and aggregations if that is required.  SSAS exposes additional processing options through XMLA that are not available in SSMS or Business Intelligence Development Studio (BIDS).  The script below is a basic example of processing a dimension using XMLA.  (Note:  ->> designates a line break.)


<Process xmlns=http://schemas.microsoft.com/  ->&gt;
analysisservices/2003/engine
>
      <Object>
        <DatabaseID>Adventure Works DW 2008R2 SE</DatabaseID>
        <DimensionID>Dim Time</DimensionID>
      </Object>
      <Type>ProcessFull</Type>
  </Process>


Interrogate.  While I have not used XMLA for this activity as much as the two activities noted above, I will be discussing this in detail in one of the later blogs.  The Discover method allows you to retrieve information about the SSAS instance and related database objects.  For instance, you can use XMLA to return a list of available databases on the target server.

Query. XMLA can also retrieve data from the cube.  It is able to accomplish this by executing MDX against the target.  I do not usually have a need to query from XMLA, but I will discuss this usage in this series as well.

Hopefully this gives you an idea of how powerful XMLA is and why it is a necessary tool for any BI developer working with multidimensional database servers.








Follow

Get every new post delivered to your Inbox.

Join 812 other followers

%d bloggers like this: