O, There’s the Data: Using OData in SSIS

23 07 2012

image

The Open Data Protocol (OData) is an open specification created Microsoft to enable exposing data in a standard way from a variety of sources.  OData is natively supported in many of Microsoft’s products including PowerPivot, Excel 2013, SQL Server 2012 Analysis Services Tabular Model, Windows Communication Foundation (WCF), and Entity Framework to name a few.  Furthermore, Microsoft uses OData to expose data feeds from the Windows Azure Data Marketplace as well.

I pursued adding an OData source to SSIS as a result of Mark Souza’s presentation at the Minnesota SQL Server User Group in April 2012.  I posed a question about easier interaction with Oracle.  He mentioned that OData would be a good way to solve that issue.  This led me to put together a presentation which I delivered for PASSMN in July 2012 entitled O, There’s My Data: The Open Data Protocol.  At that presentation, I reviewed the “pain and agony” of a data pro putting together a data feed using Entity Framework in C# and WCF to expose it.  For the most part, with the help of .NET pros at Magenic including Dave Stienessen ( B ) and Sergey Barskiy ( B ), I was able to create my first entity model and expose it using WCF.  After that I worked on how to consume the feed without purchasing a 3rd party tool.  Here is the rest of the story.

Using ATOM as Shown in a Channel 9 Exercise

While looking for solutions that allowed me to implement an OData feed into an SSIS package, I came across a Hands on Lab on Channel 9.  While the focus was on Reporting Services, I was able to use the steps to create a package that would read a feed and make the data available to the ETL process.  In a nutshell, this exercise involved three tasks – creating an ATOM file, processing the ATOM file and loading the data using an HTTP connection manager pointed to the OData feed.  While you are creating this package, you should run each step after you have created it in order to use the files created in the following steps.

image

Task 1 – Create ATOM File (Script Task)

In the Main method,  I used the following code which was copied and adapted from the Channel 9 exercise. (NOTE: The code for this script has an error.  The object declaration should be condensed to one line to work properly.)

public void Main()
 {
 // Get the unmanaged connection
 object nativeObject = Dts.Connections["TestSvc"].AcquireConnection(null);
    // Create a new HTTP client connection
 HttpClientConnection connection = new HttpClientConnection(nativeObject);
    // Save the file from the connection manager to the local path specified
 string filename = "C:\\Source\\SSIS 2012 Projects\\ODataIntegration\\Departments.atom";
 connection.DownloadFile(filename, true);
Dts.TaskResult = (int)ScriptResults.Success;

}

This task will create an ATOM file that will be used in the next step to retrieve the data.

Task 2 – Process ATOM File (XML Task)

This task will use the new ATOM file to create an XML file with the data.  It uses the XSLT operation type pointing to the File Connection Manager created in the previous step as the source.  This will result in another File Connection Manager to support the destination XML file with the data.  Finally, in the exercise as second operand set of XML is used to clear unsupported headers.  Admittedly, I just copied this straight from the example and still am not sure of the details of what it does.

Here is a look at the XML Task Editor so you can see the settings I used.

image

Here is the code from the Channel 9 exercise used in the SecondOperand property:

<?xml version="1.0" encoding="utf-8" ?>
  <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="no" />
 <xsl:template match="/|comment()|processing-instruction()">
 <xsl:copy>
 <xsl:apply-templates />
  </xsl:copy>
  </xsl:template>
  <xsl:template match="*">
  <xsl:element name="{local-name()}">
  <xsl:apply-templates select="@*|node()" /> </xsl:element>
  </xsl:template>
  <xsl:template match="@*">
  <xsl:attribute name="{local-name()}">
  <xsl:value-of select="." />
  </xsl:attribute>
  </xsl:template>
  </xsl:stylesheet> 

Task 3 – Load Data (Data Flow Task)

The final task is a straightforward data load using the XML Source Component pointed at the file XML file I created.  Then I created a matching table in a database which I used as the destination.image

Wrap Up on the ATOM Feed Option

This will work with SSIS 2008 and SSIS 2012.  I tested most of the work in 2012, but the code in the illustration supports 2008.  This option does require that the package write at least two files to the server to work correctly.  In some cases, this will not work in enterprise environments as the data will now rest on the server for a period of time or the admins do not want files created on the server.

Using a Custom SSIS Source to Get the Data

NOTE: This is the preferred solution, but is not available in SSIS 2008 which uses the .NET 2.0 Framework.  This solution requires the .NET 3.5 Framework.

This version uses a custom SSIS source to connect to the OData feed and populate the data flow pipeline.  I did not find this option illustrated anywhere and used help from the Dave and Sergey to put this together.  I spent many hours trying to solve this issue and at the end of the day, it is fairly simple.  So, hopefully, this will save you some time as well.

This package only has one workflow task – a data flow task which contains the rest of the code.  In the data flow task, I have a Script Component implemented as a source and a Row Count with a data viewer on the pipeline to check results.

image

This was my first experience creating a custom source.  I used a post from SSIS Talk – SSIS Using a Script Component as a Source as a reference.  If you need help creating your first script source check it out.

Be sure to set your outputs prior to creating the script or you will not have them available to map to in the code.  You also need to add the HTTP Connection Manager you are using to point to your OData feed.

Add References, Using Statements, and Declarations

Once you have the basics set up, you need to add some references including the targeted data service and System.Data.Services.Client.  These are the key references for the code we are implementing.

image

Once you have these references you will need to add the following to the Using statements to the Namespaces region.

using System.Data.Services.Client;
 using SC_68e99fec2dce4cd794450383662f6ac7.TestSvc;

The SC_ reference is the internal name for your script component and will be different from mine, although it will likely be in the same format.

Next, you need to add the following declarations in the ScriptMain class as shown here.

public class ScriptMain : UserComponent
 {
private Uri svcUri = new Uri  (http://localhost/ODataSQL/TestSvc1.svc);
 private AdventureWorksEntities context;

The AdventureWorksEntities is from the service reference I created. You will need to know the context name for the service reference you are using.

The Working Code: Using DataServiceQuery

In the CreateNewOutputRows method in the SSIS script you will add code that runs a DataServiceQuery which adds the data to the data flow pipeline. In my case, my Output was called Departments and created the buffer reference you see in the code.  It has the output fields I defined for my source.  Here is the code I used to implement the solution.

public override void CreateNewOutputRows()
 {
 context = new AdventureWorksEntities(svcUri);
 DataServiceQuery<Department> dept = context.Departments;
    foreach (Department d in dept)
 {
 DepartmentsBuffer.AddRow();
        DepartmentsBuffer.DeptID = d.DepartmentID;
 DepartmentsBuffer.DeptName = d.Name;
 DepartmentsBuffer.GroupName = d.GroupName;
 }

This will query the service and return the rows. Alas, that is all it really took to solve this problem.  While this solution does not work in SSIS 2008, if you are planning to use a lot of OData, I would recommend using this as another reason to upgrade to SQL Server 2012.

SSIS Needs an OData Source Component

What I found interesting is that Microsoft does not have a native method to load OData feeds into the Data Flow Task in SSIS.  I have since created an Connect item to see if we can get this added.  Vote here if you agree.

Resources Used throughout the Process

Connecting to Windows Azure SQL Database Through WCF

Loading Data from an ATOM Data Feed into SQL Server

SSIS – Using a Script Component as a Source

DataServiceContext Class

Chris Woodruff – 31 Days of OData Blog Series

PASSMN Presentation – July 17, 2012

Consuming SharePoint Lists via OData and SSIS – Uses Linq

About these ads

Actions

Information

One response

24 07 2012
Steve

In an ironic twist, Tim Laqua (http://twitter.com/tlaqua and http://timlaqua.com) posted a similar solution with SharePoint and Linq earlier this month. I have added his blog post to my list of references for those looking to use Linq or SharePoint. Thanks Tim.




Follow

Get every new post delivered to your Inbox.

Join 811 other followers

%d bloggers like this: