Oracle Tips for MSBI Devs #5: Working with Oracle on Windows Azure

23 02 2014

As you have likely noticed in my series, Oracle Tips for MSBI Devs, I have done a lot of work with Oracle through the years while delivering BI solutions. One of the pain points of working with Oracle in development is setting up an Oracle development server. Even though I have installed Oracle a number of times, it is never seems to be an easy process.

So, I decided to try out the Oracle Virtual Machine template in Windows Azure. I will walk through the setup process here. I need to use Oracle as a data source for some SSIS development.

Setting Up the VM

From the Windows Azure portal, select the Virtual Machines tab then “Create a Virtual Machine”. This will open up the variety of options available to create the VM. Select the FROM GALLERY option which will open another dialog.

image

On the next screen, you pick the edition Oracle you want to use for the VM. (NOTE: at the moment, Oracle images are in preview. Microsoft recently that Oracle VMs will be be available on March 12. You can find more information here.)

image

I will be using the Oracle 11g R2 Standard Edition on Windows Server 2008 R2. The next step is to name and pick the size of the VM. The minimum size for this is Small and is what I used. I then completed the setup including setting up the endpoints and creating a new user.

I had originally tried to use Oracle 12c, but significant changes have been made to support multitenancy which make set up considerably more tedious with very few good examples available on the web. Most of the advice given by Oracle pros was to “Read the _____ Manual!” While “sensible”, I just needed a simple dev environment. This is one of the significant advantages of working with SQL Server, community help is abundant and usually pleasant. For instance, Microsoft recently published a document for setting up the Oracle 12c VM. I used it to work through some of the setup instructions below.

Once the initialization was complete I used the connect image button to open an RDP connection to the VM from the Azure dashboard. One thing to keep in mind, be sure to keep track of the user name and password you created. This your admin account and you will need it to log in to the VM. Now you have a running VM. At this point, I went and found the Oracle tools that I typically use and pinned them to the task bar.

Creating and Loading an Oracle Schema

Because I always for get some of these steps, and I really don’t want to read the manual, I listed the steps I used to create a simple set of data for use. This is not good enough for production, but it is a starting point.

Step 1: Create a Listener. This is required before you can create a database. To do this open the Oracle Net Configuration Assistant. From here you can create your first listener. I left the default settings for the setup.

Step 2: Create the database. (This is the equivalent of an instance in SQL Server.)  I used the Database Configuration Assistant for Oracle Database to create my first database on the server. This can be found in the Oracle home directory on the start menu.I chose the General Purpose template for my database. Most of the steps make some sense. I did choose to add the sample schemas as this is the easiest way to verify I can connect and work with the data. After all of this, the database will be created based on your choices.

Step 3: Using SQL*Plus, I connected to the SYSTEM schema. The user-name in this case is “SYSTEM”. Now we

Step 4: Create a new user and schema. (This is similar to the SQL Server database, not a SQL Server Schema.) This will give a location to create tables for loading data in the next steps. In accordance with typical Oracle support you can read about how to do this here: http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_8003.htm#i2065278. Or I can give you a good starting script and save you time.

CREATE USER someusername
IDENTIFIED BY <<add password here>>
DEFAULT TABLESPACE example
QUOTA 10M ON example
TEMPORARY TABLESPACE temp
QUOTA 5M ON system;

Your welcome! This will create a database using existing tablespaces. This is NOT a production level script and it is barely good enough for development. But in my case, I am using Oracle as a source and don’t plan to do much development or work on it so it meets my needs. If you need more insight, I guess you will need to read the documentation.

Step 5: Create a table and add rows. I continued to use the SYSTEM login and created a couple of simple tables within my new schema. I then used simple INSERT INTO statements to add data.

Now you have some basic data to work with to test connectivity with SSIS or SSAS.

Making the Oracle Database Accessible

In order to access your Oracle database on the VM you need to enable the port. You do this by going to the Azure portal and selecting the VMs tab. Once there, go to the Endpoints tab. You may recall that when you created the VM, you were asked about the Remote Desktop and PowerShell ports. Here are the steps to create the Endpoint to support Oracle.

  1. Click Add to open the Add Endpoint dialog.
  2. On the first page, leave the default which will add a stand-alone endpoint.
  3. On the second page you need to add a name (I used “Oracle”), select the TCP protocol, and put port 1521 in both the private and public port textboxes.

Once completed you should see the new endpoint in the list of available endpoints as shown below.

image

Connecting SSIS to that Oracle Database

Now that we have data in the db, it is time to create the connection to SSIS and load data and run queries.

The first thing I needed to do was load the client tools. I used the newer Oracle Data Access Components (ODTwithODAC12012). Once that was loaded, I added the following entry to my TNSNames.ora file (look for that in a directly like the following: c:\app\<username>\product\12.1.0\client_1\Network\Admin):

ORACLEDW =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = <servername>.cloudapp.net) (PORT  1521)
(CONNECT_DATA =
(SERVER=DEDICATED)
(SERVICE_NAME = ORACLEDW)
)
)

The key parts to get right are the HOST, PORT, and SERVICE_NAME as highlighted above.

Once TNS was in place, I was able to create an ODP.NET connection to the database and proceed to load the data.

I know that some of this has been simplistic but it is great that I don’t have to install Oracle myself. This functionality makes Azure even more appealing as a hosting solution.





O, There’s the Data: Using OData in SSIS

23 07 2012

image

The Open Data Protocol (OData) is an open specification created Microsoft to enable exposing data in a standard way from a variety of sources.  OData is natively supported in many of Microsoft’s products including PowerPivot, Excel 2013, SQL Server 2012 Analysis Services Tabular Model, Windows Communication Foundation (WCF), and Entity Framework to name a few.  Furthermore, Microsoft uses OData to expose data feeds from the Windows Azure Data Marketplace as well.

I pursued adding an OData source to SSIS as a result of Mark Souza’s presentation at the Minnesota SQL Server User Group in April 2012.  I posed a question about easier interaction with Oracle.  He mentioned that OData would be a good way to solve that issue.  This led me to put together a presentation which I delivered for PASSMN in July 2012 entitled O, There’s My Data: The Open Data Protocol.  At that presentation, I reviewed the “pain and agony” of a data pro putting together a data feed using Entity Framework in C# and WCF to expose it.  For the most part, with the help of .NET pros at Magenic including Dave Stienessen ( B ) and Sergey Barskiy ( B ), I was able to create my first entity model and expose it using WCF.  After that I worked on how to consume the feed without purchasing a 3rd party tool.  Here is the rest of the story.

Using ATOM as Shown in a Channel 9 Exercise

While looking for solutions that allowed me to implement an OData feed into an SSIS package, I came across a Hands on Lab on Channel 9.  While the focus was on Reporting Services, I was able to use the steps to create a package that would read a feed and make the data available to the ETL process.  In a nutshell, this exercise involved three tasks – creating an ATOM file, processing the ATOM file and loading the data using an HTTP connection manager pointed to the OData feed.  While you are creating this package, you should run each step after you have created it in order to use the files created in the following steps.

image

Task 1 – Create ATOM File (Script Task)

In the Main method,  I used the following code which was copied and adapted from the Channel 9 exercise. (NOTE: The code for this script has an error.  The object declaration should be condensed to one line to work properly.)

public void Main()
 {
 // Get the unmanaged connection
 object nativeObject = Dts.Connections["TestSvc"].AcquireConnection(null);
    // Create a new HTTP client connection
 HttpClientConnection connection = new HttpClientConnection(nativeObject);
    // Save the file from the connection manager to the local path specified
 string filename = "C:\\Source\\SSIS 2012 Projects\\ODataIntegration\\Departments.atom";
 connection.DownloadFile(filename, true);
Dts.TaskResult = (int)ScriptResults.Success;

}

This task will create an ATOM file that will be used in the next step to retrieve the data.

Task 2 – Process ATOM File (XML Task)

This task will use the new ATOM file to create an XML file with the data.  It uses the XSLT operation type pointing to the File Connection Manager created in the previous step as the source.  This will result in another File Connection Manager to support the destination XML file with the data.  Finally, in the exercise as second operand set of XML is used to clear unsupported headers.  Admittedly, I just copied this straight from the example and still am not sure of the details of what it does.

Here is a look at the XML Task Editor so you can see the settings I used.

image

Here is the code from the Channel 9 exercise used in the SecondOperand property:

<?xml version="1.0" encoding="utf-8" ?>
  <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="no" />
 <xsl:template match="/|comment()|processing-instruction()">
 <xsl:copy>
 <xsl:apply-templates />
  </xsl:copy>
  </xsl:template>
  <xsl:template match="*">
  <xsl:element name="{local-name()}">
  <xsl:apply-templates select="@*|node()" /> </xsl:element>
  </xsl:template>
  <xsl:template match="@*">
  <xsl:attribute name="{local-name()}">
  <xsl:value-of select="." />
  </xsl:attribute>
  </xsl:template>
  </xsl:stylesheet> 

Task 3 – Load Data (Data Flow Task)

The final task is a straightforward data load using the XML Source Component pointed at the file XML file I created.  Then I created a matching table in a database which I used as the destination.image

Wrap Up on the ATOM Feed Option

This will work with SSIS 2008 and SSIS 2012.  I tested most of the work in 2012, but the code in the illustration supports 2008.  This option does require that the package write at least two files to the server to work correctly.  In some cases, this will not work in enterprise environments as the data will now rest on the server for a period of time or the admins do not want files created on the server.

Using a Custom SSIS Source to Get the Data

NOTE: This is the preferred solution, but is not available in SSIS 2008 which uses the .NET 2.0 Framework.  This solution requires the .NET 3.5 Framework.

This version uses a custom SSIS source to connect to the OData feed and populate the data flow pipeline.  I did not find this option illustrated anywhere and used help from the Dave and Sergey to put this together.  I spent many hours trying to solve this issue and at the end of the day, it is fairly simple.  So, hopefully, this will save you some time as well.

This package only has one workflow task – a data flow task which contains the rest of the code.  In the data flow task, I have a Script Component implemented as a source and a Row Count with a data viewer on the pipeline to check results.

image

This was my first experience creating a custom source.  I used a post from SSIS Talk – SSIS Using a Script Component as a Source as a reference.  If you need help creating your first script source check it out.

Be sure to set your outputs prior to creating the script or you will not have them available to map to in the code.  You also need to add the HTTP Connection Manager you are using to point to your OData feed.

Add References, Using Statements, and Declarations

Once you have the basics set up, you need to add some references including the targeted data service and System.Data.Services.Client.  These are the key references for the code we are implementing.

image

Once you have these references you will need to add the following to the Using statements to the Namespaces region.

using System.Data.Services.Client;
 using SC_68e99fec2dce4cd794450383662f6ac7.TestSvc;

The SC_ reference is the internal name for your script component and will be different from mine, although it will likely be in the same format.

Next, you need to add the following declarations in the ScriptMain class as shown here.

public class ScriptMain : UserComponent
 {
private Uri svcUri = new Uri  (http://localhost/ODataSQL/TestSvc1.svc);
 private AdventureWorksEntities context;

The AdventureWorksEntities is from the service reference I created. You will need to know the context name for the service reference you are using.

The Working Code: Using DataServiceQuery

In the CreateNewOutputRows method in the SSIS script you will add code that runs a DataServiceQuery which adds the data to the data flow pipeline. In my case, my Output was called Departments and created the buffer reference you see in the code.  It has the output fields I defined for my source.  Here is the code I used to implement the solution.

public override void CreateNewOutputRows()
 {
 context = new AdventureWorksEntities(svcUri);
 DataServiceQuery<Department> dept = context.Departments;
    foreach (Department d in dept)
 {
 DepartmentsBuffer.AddRow();
        DepartmentsBuffer.DeptID = d.DepartmentID;
 DepartmentsBuffer.DeptName = d.Name;
 DepartmentsBuffer.GroupName = d.GroupName;
 }

This will query the service and return the rows. Alas, that is all it really took to solve this problem.  While this solution does not work in SSIS 2008, if you are planning to use a lot of OData, I would recommend using this as another reason to upgrade to SQL Server 2012.

SSIS Needs an OData Source Component

What I found interesting is that Microsoft does not have a native method to load OData feeds into the Data Flow Task in SSIS.  I have since created an Connect item to see if we can get this added.  Vote here if you agree.

Resources Used throughout the Process

Connecting to Windows Azure SQL Database Through WCF

Loading Data from an ATOM Data Feed into SQL Server

SSIS – Using a Script Component as a Source

DataServiceContext Class

Chris Woodruff – 31 Days of OData Blog Series

PASSMN Presentation – July 17, 2012

Consuming SharePoint Lists via OData and SSIS – Uses Linq





Oracle Tips for MSBI Devs #4: Executing an Oracle Stored Proc with IN Parameters in SSIS Execute SQL Task

1 05 2012

The first tip I published discussed how to execute an Oracle procedure with no parameters.  In this tip, I will discuss a technique that works with IN parameters in an Oracle stored procedure.

After many unsuccessful attempts at executing a stored procedure with parameters, the following pattern was developed by a one of my peers working in a blended environment, Brian Hanley (T | B).  With his permission, I have documented the solution here for your use.

The Solution:

The solution involves using variables and creating the SQL script to be executed in an Script task.

image

Here is the syntax for the procedure used in the examples:

CREATE OR REPLACE PROCEDURE
SCOTT.spDelete1 (DEPTNUMBER int) IS
BEGIN
DELETE FROM DEPT WHERE DEPTNO=DEPTNUMBER;
END spDelete1;

Create variables

Create variables to hold the name of the procedure, any parameters, and the finished script.  In my example, I only have one parameter in the procedure, so I only use three variables.  If it fits your needs, you can also separate the user/schema name into a separate variable.

image

The variable used for the statement has been set up to use string formatting with C#.

Prepare the Statement Variable

Use the Script task to build the statement variable (SPStatement).  Start by adding the variables to the script task.  Be sure to add the statement variable to the ReadWriteVariables collection.

image

The following image contains the script syntax to use to set up the variable.  As noted above, the C# String.Format function is used to update the statement variable.

public void Main()
{
    // TODO: Add your code here
    Dts.Variables["SPStatement"].Value = String.Format(Dts.Variables["SPStatement"].Value.ToString()
        ,Dts.Variables["StoredProcName"].Value.ToString()
        ,Dts.Variables["SPVar1"].Value.ToString()                
        );

    String msg = "SPStatement: " + Dts.Variables["SPStatement"].Value.ToString();
    Boolean refire = true;

    Dts.Events.FireInformation(0, "SPStatement", msg, String.Empty, 0, ref refire);

    Dts.TaskResult = (int)ScriptResults.Success;
}

Setting up the Execute SQL Task

In the Execute SQL task, you will set the SQLSourceType property to Variable and set the SourceVariable to the name of this statement variable.  In the case of our example, this is the SPStatement variable.

image

Versions:

This tip has been confirmed to work in both SQL Server 2008 R2 Integration Services and SQL Server 2012 Integration Services.  The Oracle version tested with is Oracle 11g.





Why I am excited about SQL Server 2012 (Part 2)

28 03 2012

Earlier this month I published a blog entry on this same sumagenic-custom-soltionsbject.  In honor of the local Minneapolis launch event, I decided to expand the list.  You can find five more reasons I am excited out on Magenic’s blog.

Here is the link and enjoy SQL Server 2012.

http://magenic.com/Blog/WhyIAmExcitedaboutSQLServer2012Part2.aspx





Oracle Tips for MSBI Devs #1: Executing an Oracle Stored Proc with No Parameters in SSIS Execute SQL Task

4 03 2012

Originally posted: 3-4-2012

Welcome to a new blog series of tips related to Oracle with a focus on helping those of us who are delivering Microsoft Business Intelligence solutions using Oracle in some way.

The Solution:

In an Execute SQL Task the proper syntax for executing an Oracle stored procedure with no parameters is:

{CALL <schema name>.<stored procedure name>}

image

This works with the OLEDB driver from Oracle. (Do not use the Microsoft Oracle OLEDB driver as it is being deprecated.)

image

The Situation:

The following list highlights the most common errors when trying to get an Oracle procedure to execute.

  • EXEC or EXECUTE – as SQL Server pros we often use EXEC to make the procedure call. When working with Oracle, you need to use CALL
  • Forget the schema name – when using Oracle, you should specify the schema name when calling the procedure. In some cases, you may pick up the schema from the user, but for a consistent pattern and result, you should explicitly specify the schema name.
  • No curly braces – this is the one I did not know and a coworker tracked down some examples of this usage. Enclosing the call in curly braces appears to let the driver know that it is a procedure call and handle it appropriately.
  • Adding a semi-colon – most work with Oracle statements results in a semi-colon to conclude the statement. In this syntax, you will get an error when you conclude your statement with a semi-colon.

Versions:

This tip has been confirmed to work in both SQL Server 2008 R2 Integration Services and SQL Server 2012 Integration Services. The Oracle version tested with is Oracle 11g.

Note on Parameters:

This syntax has not worked consistently with parameters. I will post a followup tip for handling parameters at a later time.





SQL PASS Summit–Day 4 (BI Sessions)

16 10 2011

Here is the summary of the session I attended while at my 4th day at the Summit. (Technically, this is the second day of the Summit.)  I went to four sessions all of which were in the BI tracts.

Top 10 Things You Didn’t Know about SSIS by Matt Masson, Microsoft (@mattmasson)

Be sure to check out the Tech Ed Video with CTP 3 enhancements.

Two New Concepts in SQL Server 2012 Integration Services:  Projects and Parameters.  SSIS projects now compile down to a single ispac file (“ice packs”).  This will allow you to deploy packages in a project.  Parameters have been added to better support configurations and they are exposed by the package execution UIs.

The SSIS Catalog.  A SQL Server User database called “SSISDB” is where the catalog is stored.  This provides a central place for configuration including new parameters and connection managers.  Security is also much improved as encryption is now managed at the server level.  Logs and reporting will also be recorded in the catalog.

Top 10 List:

1. Change Data Capture and 2. ODBC Support through Attunity partnership.

CDC for Oracle databases will result in better integration similar to SQL Server CDC.  CDC components in SSIS will work with  both SQL Server and Oracle.  Three new tasks have been added to SSIS – CDC Control Task , CDC Source (reads changes) , CDC Splitter (handles insert/update/delete).

ODBC components have also been added to support many of the non-Microsoft data sources including bulk load capabilities with the ODBC destination object.

3. Connection Manager

We now have connection managers which are shared at the project level.  Connection managers get marked as offline if the data source is not available or you want to bring it offline.  The capability to share in-memory cache across packages at the project level has been added.

4. Flat File Source Improvements

Varying number of columns and embedded qualifiers are now supported.

5. Package Format Changes

Updated the XML format using attributes.  Layout information can now be ignored.  Better support for merging.  Lineage IDs have been replaced with refid to support better merge. Still not recommeded, but better supported.

6. Visual Studio Designer Changes

Parameters are in a Visual Studio design tab which allow you to set up defaults for various environments.

7. Script Components

Uses compiled IL in .NET for better performance.  Script component debugging is in 2012 with the ability debug script components and see the values of the data in the pipe.  All script components support debugging.

8. Troubleshooting and Logging

Because all packages use the SSIS catalog, all error and performance logging including task execution timing and row counts are recorded in the SSISDB.  Much of this information will be available in  built in reports.

9. Data Taps

Allows data in a pipeline to be dumped to a flat file.  This allows for troubleshooting packages with errors in production.  You can create data taps in SSMS using T-SQL.

10.  Server Management using PowerShell

The whole SSIS catalog can be managed or accessed through PowerShell or TSQL.  This will allow you to automate or script out deployment and frequently used tasks.

You can do a lot of these tasks right in SSMS as well using menu and shortcut functionality.  You can query the SSISDB for performance counters while the package is executing.  You can query SSISDB for package execution progress at the task and component levels.

There is built in reporting.  You start from the Integration Services Dashboard with a bunch of stats on the front page.  It includes a number of built in reports with performance data and other relevant topics.  Jamie Thompson has a CodePlex report pack (http://sqlblog.com/blogs/jamie_thomson/archive/2011/09/04/ssis-reporting-pack-v0-2-now-available.aspx).  Check out post from Jamie on the SSIS blog about tables as well(http://sqlblog.com/blogs/jamie_thomson/archive/2011/07/16/ssis-logging-in-denali.aspx).

They bubble up errors related to connections.  There are drill down reports available to the error messages in the tool. Logs have a configurable retention value. Versions of projects are also maintained and the number of versions is configurable.

11.  Project Barcelona Metadata Discovery

This is a project that is coming soon.  Demos available online.  This is not a metadata repository.  This is zero touch.  Targets get crawled and added to an Index Server.  Check out the project blog at http://blogs.msdn.com/b/project_barcelona_team_blog/.

Tier-1 BI in the World of Big Data – Denny Lee and Thomas Keiser, SQLCAT

Two models to work with Dimensional (UDM) and Tabular (formerly  BISIM).  UDM is the current large scale engine. Tabular puts all of the data in the memory, so not very suitable large cubes.

New themes for workload scale:

  • Cheap storage at scale
  • Massive query scale (both size and concurrent)
  • Scaling ETL another order of magnitude
  • Scaled and integrated reporting/BI

NoSQL ecosystem (e.g. Facebook, Yahoo, Twitter, Google, Amazon data offerings) are being used frequently.  Not only are people using these systems, they are building solutions on top of them.  This is why Microsoft has announced the HADOOP integration.

MapReduce systems (e.g. HADOOOP)  and RDBMS systems are actually complimentary, not competing in most cases.  HADOOP is about dynamic schemas.  Getting the data somewhere is the important piece.  Hadoop / NoSQL: Move compute to the Data. BASE basically available, Soft State Eventually Consistent. Each row may have differnent schema.

Tradintional RDBMS move data to compute.  They answer the question quickly and are ACID compliant.

The balance between scale and performance is getting more important.  What do you want? Guaranteed Response, but get it slow.

Hive Connector is the first Step in integraion with the BI Platform.  Hive is a data warehouse framework for Hadoop.  This is the starting point, not the end game for adding structure to Big Data for consumption by other tools such as SSAS.  HiveODBC Driver will allow the cube to be loaded directly from Hadoop.

Yahoo – “Fast” Oracle Load – 24 TB cube. This is the largest known cube in the world.  Yahoo moves data from Hadoop into Oracle and processes the cube from there.

Check out the new SSAS 2008 R2 Performance Guide from SQLCAT at http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/10/10/analysis-services-2008-r2-performance-guide.aspx.

What’s New in StreamInsight?  Isaac Kunen, Microsoft

Stream Data Processing – unbounded streams of data with queries lasting forever.  This engine handles the data coming through.  Temporal Semantics can be added to the StreamInsight data to make it more understandable.

Event processing varies greatly based on business function – latency versus aggregated data rate.  StreamInsight is designed to handle high volumes of low latency data. Common use cases – Fraud Detection, Shipping, Sales, Clickstream Analysis, Security, Building Management.  Common scenario is pattern analysis.

StreamInsight Releases:

  • StreamInsight 1.2 is available today on premise.
  • StreamInsight on Azure currently private CTP, should open up later this year.

Temporal Semantics

  • 3 event types: intervals (begin end), points (“one tick”), edge (only the beginning with an end to be defined later) events.
  • CTI = Current Time Increment.  Up to this point in time, all events are processed.  From one CTI to the next CTI is the period reported.  They can have to increment into the future, but do not need to be equal time periods.  After the CTI is issued, the data is gone from memory.
  • You write declarative queries against StreamInsight in LINQ.  It is the actual query language for SI, not just a wrapper.
  • Select and Project Operators return events from a stream.
  • Join Operator returns events that overlap streams.
  • Snapshot Window is any time the event stream changes, start a new window.  This window is completely data driven.

Resiliency through checkpointing has been added to StreamInsight.  This allows you to restart from specific points.  Performance Counters and Admin Logs have also been added to StreamInsight 1.2.  Check out his blog for more information: http://blogs.msdn.com/b/isaac/archive/2011/07/25/introducing-checkpointing-in-streaminsight-1-2.aspx.

Building the Perfect BI Semantic Model for Power View – Carl Rabeler (SQLCAT), Kasper de Jonge – Microsoft SSAS

You can import a PowerPivot workbook into SQL Server Data Tools as a Tabular SSAS project.  This will create a Tabular Model in SSAS.

Changes in SQL Server Data Tools with Tabular models change the deployed Tabular SSAS model.  You should use a small set of data when developing the model because the data needs to be loaded in memory. You can add data mining models to the tabular model by creating a connection to a SSAS Muultidimensional data source.  When you add a data source to the tabular model in design, it will add the data to the tabular model (the workspace database).  Columns can now be added into the model in any position.

There is an “effective user name” property in the user string to handle the doublehop issues without Kerberos.  Set execution context to this account sets this property in the Reporting Services Data Servivces (RSDS) connection in SharePoint.  This functionality is currently available in SQL Server 2008 R2. In SQL Server 2012, this can be seen in the profiler trace in the discoverer events.

To add images to the Tabular model, you can use an embedded image, stored image (e.g. in the database), or link.  If you use a link, be sure to set the ImageURL property to “true” for this column so Power View knows that it is an image.

The tabular model has a wealth of metadata options that will be carried forward to Power View.  For instance, if you set the Default Label and Default Image in the tabular model table, Power View will display these in many visualizations including Card View.

Be aware that in Power View, the refresh on the tool refreshes data wheras an IE refresh will reload the metadata.

Time intelligence functions in DAX will allow you to create functions as Year To Date.  However, if your tabular model uses surrogate keys to map to the Date table you need to specify the table as a Date table and specify the date column to use.  This is needed to make sure the Time Intelligence functions work correctly.

The RANKX(Table, Expression, [Value], [Order], [Time]) function was illustrated to show how to rank items within a table. Ranking:=RANKX(all(Products), Sum(Margin)).  Watch out for situations where you are using a complex key.

DirectQuery functionality will allow you to directly query SQL Server relational data.  There is a list of limitations to using DirectQuery.  Model with it carefully.





SQL PASS Summit–Day 4 (Keynote)

13 10 2011

Bill Graziano kicked off the event today with the notice of the official “Kilt Day” at the Summit.  He also recognized the volunteers who make this happen.  Thanks to all volunteers which contribute to our local, national, and international SQL communities.  If you are interested in volunteering check out http://sqlpass.org.  Bill updated us on the financials, something I won’t be blogging about here.

Quentin Clark, Corporate VP SQL Server, was the keynote speaker from Microsoft.  He talked about the new release of SQL Server 2012.

  • Data – Managing data at the fundamental level; Knowledge – how to get insights out of information; between Data and Knowledge is Information – data cleansing and quality.
  • SQL Server 2012 is one of the biggest releases Microsoft has ever done.  !00s of new features in SQL Server 2012.

Quentin’s Fantastic 12 of SQL Server 2012

  1. Requred 9s & Protection – SSIS as server; HA for StreamInsight; SQL Server AlwaysOn
  2. Blazing Fast Performance – Performance Enhancements in RDBMS, SSAS, and SSIS; ColumnStore Index with Vertipaq
  3. Rapid Data Exploration – Power View + PowerPivot; Administration from SharePoint; Reporting Alerts
  4. Managed Self Service BI – same as 3
  5. Credible, Consistent Data – BI Semantic Model (BISM); Data Quality Services (DQS); Master Data Services (MDS)
  6. Organizational Compliance – Expanded Audit (User Defined, Filtering, HA); User-defined Server Roles
  7. Peace of Mind – Production-simulated Application Testing; System Center Advisor & Management Packs; Expanded  Support (Premier Mission Critical)
  8. Scalable Data Warehousing – SQL Server Appliances (optimized & pre-tuned including Dell PDW, HP Enterprise Data Warehouse Appliance, HP Business Decision Appliance); HW + SW + Support (Just add power); Choice of hardware
  9. Fast Time to Solution – same as 8, lots of appliances
  10. Extend Any Data, Anywhere – Greater Interoperability – new drivers for PHP, Java & Hadoop; ODBC Drivers for Linux & Change Data Capture (CDC) for SSIS & Oracle; Beyond relational: FileTable, 2D Spatial, Statistical Semantic Search (functionally will replace full text search)
  11. Optimized Productivity – SQL Server Data Tools (formerly “Juneau”); Unified across database and BI (no more BIDS); Deployment & Targeting Freedom
  12. Scale on Demand – AlwaysOn; Deployment Across Public & Private; Elastic Scale (using SQL Azure)

Azure Announcements

You will now be able to access Windows Azure storage from SSMS which can be used for SQL Azure backups.  You will also be able to “right-click” deploy from SSMS to SQL Azure.  A lot of integration between SQL Server and SQL Azure is coming in SQL Server 2012.

SQL Azure Federation is being released at the end of the year.  This will allow for larger and elastic databases to be deployed to SQL Azure.  This provides for a management tool to distribute or federate your application data across SQL Azure databases which allows you to manage cost and performance.

SQL Azure management site is now going Metro (WIndows 8) including live tiles.  I like it.  This too will be released at the end of the year.

SQL Azure will now be able to support 150 GB databases.

Available today – CTPs for SQL Azure Reporting and SQL Azure Data Sync.

There really are a plethora of really cool features and tools being released with SQL Server 2012.  Nice work SQL Server project teams.





SQL PASS Summit – Day 3 (the rest)

12 10 2011

After a info filled keynote, on to the rest of the day.  I decided to attend the Bare Metal Instructor Lab and work through a full installation of SQL Server 2012 for the next 3 days.  However, I will likely switch to hands on labs of my choice for the rest week.

The lab started with the installation of SQL Server 2012 on 3 servers using prebuilt scripts.  Including the first time I have ever been on Windows Server Core which has no GUI.  This is the seriously ligthweight version of Windows Server.  To build a configuration file from setup, you can cancel an GUI based install right at the end and it will generate this file.  This file can be used with command line installs.  I had some extra time before lunch, so I fired up a different hands on lab – Exploring Project “Crescent” (which is now Power View).  It was a quick demo and that tool is still really cool.

My next session was What Does it Take to Add MDS to Your Data Warehouse/Data Mart.  They demonstrated how to manage dimensions within SQL Server Managed Data Services (MDS) Denali.  The presenters were Yair Helman and Tyler Graham from Microsoft.  Here are some the things I learned in this session:

  • MDS has an Excel add-in.  From Excel, you can do the initial data build in MDS.  The data is managed on the server while you work with it in Excel.
  • All of this meta data and the data are also available in the MDS UI which is web based.  Not only can it be viewed here, it can be manipulated here as well.
  • You can create and manage business rules about the data in MDS.  Once created you can apply it for a portion or all of the data.  In both cases, you can see what failed the business rule.  Once you make changes on the data, you can annotate the changes in bulk or individually when publishing the changes back into the system.
  • This appears to be a significant upgrade over the 2008 version of MDS.  The MDS Service requires SQL Server 2012, but can use the underlying SQL Server 2008 Data Engine.  If you have an old version or were using Denali CTP 3, you can upgrade to the new version easily.  Kudos to Microsoft on improving this tool rapidly.
  • They recommend using the data warehouse as a starting point for this tool as it is fairly clean.  Start small and continue to build it out over time.  However, I am not sure how significant the impact to cubes would be.

Final session of the day was Introduction to Data Mining in SQL Server Analysis Services by Brian Knight.

  • Data Mining has financial benefits.  Guess I need to get better at this.
  • Data Mining does not require a cube.  A table will suffice.
  • The core two algorithms to know – Decision Tree and Clustering.  Clustering is primarily for fraud detection.  Decision Tree is primarily for answering why a decision was made.
  • Check out http://sqlserverdatamining.com.
  • There is a lot of trial and error when configuring the algorithm and validating your hypothesis about the data.  You can usually prototype a solution in a week and have it ready after trial and error in about a month.
  • SQL Server Data Mining can deliver 80-90% of the data mining capabilities SAS delivers.  The basics and most common usages are there, for a bit less coin.
  • DMX is really not that hard, but very functional. It is actually simpler than MDX.

Besides these sessions and the labs, I spent some time visiting the vendors and got some good information on PDW and other data warehouse appliances.  These are exciting tools to bring BI solutions to market much faster as projects and as solutions.

Thus ends the training from Day 3, on to networking and socializing with my SQL peers.





SQL Azure’s place in the data world (Part 3)

16 01 2011

In the first two parts of this topic, I discussed how data is managed in SQL Azure and what the cost would be to an organization.  In this installment, I wanted to propose some data solutions that might be a fit for a SQL Azure database.

Here is some quick history that is shaping my thoughts on this.  While at Magenic, my focus is on business intelligence solutions on the Microsoft SQL Server product stack.  Prior to returning to Magenic, I worked at XATA Corporation as the data architect for a hosted, SaaS solution serving the trucking industry.  So, out of this background, I base my first suggestion.  In SaaS (Software as a Service), customers often use tools provided by the vendor for specific tasks which tasks generate data.  Most SaaS solutions offer some form of reporting or query based analysis for the data that is collected.  However, some users require a greater level of interaction with the data.  The issue is that the data is stored in the SaaS vendor’s multi-tenant data store which is usually not conducive to having ad hoc queries run against it.  This has led to one of the most common solutions to this problem – export the data to customer.  The problem is that the customer must now host the data on premise and is often responsible for merging new data as it comes from the vendor.  In this solution, SQL Azure could act as the “go-between” between the multi-tenant data store and the customer.  This will allow the vendor to provide a value-added BI service that the customer can leverage in a number of ways including reports, Excel, and even Access.  The vendor can keep the data synchronized and the customer can leverage the data as needed.

Beyond the SaaS BI ad hoc solution, SQL Azure can be used to support development of solutions that require a shared data store without putting the data on each desktop.  In keeping with the concept of the cloud being “anywhere”, SQL Azure can also be used to support distributed solutions that require a SQL data store to function. 

Obviously, there are still issues with using SQL Azure as a primary, production data store due to the lower SLAs from Microsoft.  However, it is not too early to investigate creative ways that will help your business leverage a relational database in the cloud.  I would encourage you to check out the free trial options from Microsoft to experiment with the platform. 





SQL Azure’s place in the data world (Part 2)

26 12 2010

In Part 1, I discussed the synchronization of data and availability of data on Microsoft’s SQL Azure platform.  In Part 2, I would like to discuss the next set of questions related to using SQL Azure with your data solution.  Here are the questions I will address in this post:

  • How much data can be supported by SQL Azure?
  • Is it really cost effective?

How much data can be supported by SQL Azure?

The answer to this question is entirely dependent on how and, in some cases, who you ask.  SQL Azure currently only supports databases up to 50 GB in size.  Because this is subscription based, you will only pay for what you use or allocate.  I will delve into the subscription and their cost in the next section.  For our purposes, it is important to realize that any database up to 50 GB is supported by Microsoft.  This limitation leads to a couple of additional questions.  First, how is the size measured?  Keep in mind that you are managing by the database not the server, so the system databases are not included in this size.  Furthermore, Microsoft does not include the size of the log in this measurement.  So when you determining how much space you will need, focus on the size of the MDF file and be sure to check how much is actually in use.

The second question that comes to mind is what happens if you need more than 50 GB? While not supported directly, it is possible to distribute or to “shard” a larger database across multiple smaller databases.  Currently this is a custom solution, but Federation is coming.  With Federation, SQL Azure will natively support sharding a large database and be able to present the database as a single unit which will make development easier.  If you can’t wait for SQL Azure Federation, check out the SQL Server and SQL Azure Shard Library available on CodePlex (http://enzosqlshard.codeplex.com/).

So you can see that if you would like to support or scale out a larger database solution it is possible with SQL Azure.

Is it cost effective?

At the time of this writing SQL Azure subscriptions break out as follows (http://www.microsoft.com/windowsazure/pricing/#sql)

Edition Max Size Cost per Month
Web Edition 1 GB $9.99
Web Edition 5 GB $49.99
Business Edition 10 GB $99.99
Business Edition 20 GB $199.98
Business Edition 30 GB $299.97
Business Edition 40 GB $399.96
Business Edition 50 GB $499.95

Microsoft also charges for data transfers.  Currently it is $0.10/GB for data in and $0.15/GB for data out.

For those of you who don’t have the privilege of working with software licensing, may not be able to appreciate the complexity of determining the cost effectiveness of the solution.  For comparison let’s review the MSRP cost for SQL Server Standard and Enterprise licenses.  (You check out other licensing options here:  http://www.microsoft.com/sqlserver/2008/en/us/pricing.aspx.)  SQL Server 2008 R2 Enterprise Edition costs $27,595/processor and SQL Server 2008 R2 Standard Edition costs $7,171/per processor.  These are the listed prices.  Customers can get a variety of discounts or other pricing options depending on their agreements with Microsoft including Software Assurance, Enterprise Agreements, and Volume Licensing.  In any case, there is is a significant license cost when ramping up a production SQL Server.

Besides licensing, you have to consider the cost of the hardware and personnel to maintain the servers.  By the time you have premium servers and a disk solution in place you could easily have spent around $100,000 for a simple production environment.  The next piece of the puzzle is related to additional manpower required to support the environment.

Now we have a bunch of numbers, but how can you compare solutions?  There are some significant differences between a SQL Azure solution and a SQL Server solution.  First, and most glaring to me, is the fact that SQL Azure is managed at the database level while SQL Server is managed at the server or instance level.  When you are trying to determine the cost-effectiveness for your application in either solution, you need to determine the number of databases required for your solution.

Next, you need to determine the cost of an on premise solution on a monthly basis so you can compare the solutions over an expected period of time.  Generally speaking, we expect hardware and software licenses to last around three to four years before an upgrade will be required for the solution.  In the case of SQL Azure, you have a known monthly cost with hardware and software licensing and related upgrades “included” in the cost.

Finally, when looking at SQL Azure, you might think that the fact it does not include Analysis Services or Integration Services in the license will make it cost-prohibitive in your solution.  However, since best practices dictate that separate servers be used for these services, they really do not factor into the cost-effectiveness of the solution.

So, what does this all mean?  It means you have some work to do to determine how cost-effective a SQL Azure solution will be for your application.  So let’s assume the following hardware/software solution to provide disaster recovery and high availability for your on premise database:

Component Cost factor Total
Dell PowerEdge with PowerVault disk – 2 Servers (2 proc/16GB) $65,000 with Rack $65,000
SQL Enterprise Licenses $25,000 / processor (Active/Passive Cluster) $50,000
Install and setup $100/hour $4,000
Total $119,000
$ Per month for 3 years $3305

As you can see, a simple on premise solution without considering the cost of maintenance and without considering the setup time will cost you around $3300 a month.  At this point, $500 a month plus data transfer costs should be less on SQL Azure for your database.  So, like me, you might think that dropping Enterprise to Standard licensing will make up the difference, but it still costs $2300 a month.  That means that you can get Enterprise capabilities with HA built in for less than standing up your own Standard environment.

Where this will break down is when your databases are larger than 50GB.  At that point you will have to scale both solutions and determine what is the more cost-effective solution for you.  You could reasonably use sharding to increase your space to 200 GB and still be cost-effective.  I admittedly did not think the cost savings would be that significant for SQL Azure.  Microsoft appears to be on the right track to provide reasonable pricing for SQL Azure.  I would be interested in hearing additional cost factors for on premise versus off premise database solutions.  (Please comment on this blog with additional factors you see that would impact the cost-effectiveness analysis of the SQL Azure solution.)

In the final blog entry on this topic, I will delve into solutions that make sense on SQL Azure and yes there are some cases for using SQL Azure sooner rather than later.








Follow

Get every new post delivered to your Inbox.

Join 730 other followers

%d bloggers like this: