Intro to Data Factory–Training on the T’s Follow Up Post

13 01 2015

PragmaticWorks-LogoThis is a follow up blog post based on the Intro to Data Factory session I gave on the Training on the T’s with Pragmatic Works. Find more free training from the past and upcoming here. I did my session on January 13, 2015.

 Intro To Data Factory

In this session, I gave a simple introduction to new Azure Data Factory using a CopyActivity pipeline between Azure Blob Storage and Azure SQL Database. Below is a diagram illustrating the factory that is created in the demo.


I have published my presentation materials here. This includes the sample JSON files, the Movies.csv, and PowerShell scripts.

Q & A

Here are a few questions that were answered during the session.

1. Does Availability refer to when data that has been transferred will be available? Or when the data source is actually available for query?

Availability refers to when the datasets will make a slice available. This is the when the dataset can be consumed as an input or be targeted as an output. This means you can consume data hourly but choose to push it to its final destination on a different cadence to prevent issues on the receiving end.

2. What pre-requisites are must haves?…e.g.(Azure account, HDInsight, Blob Storage Accounts, etc.)

    • An Azure Account is the only real must have. You could use two on premise SQL Server instances.
    • HDInsight if you want to use the HDInsight activitities
    • An Azure Storage account to use blob or table storage

3. How do you decide to use a Factory or Warehouse?

The factory is more of a data movement tool. A warehouse could be a source or target of a factory pipeline.

4. Is this similar to SSIS in SQL Server?

Yes and no. SSIS is definitely more mature and has more tooling available such as data sources and transformations. SSIS also have a good workflow constructor. The focus of the Data Factory initially was to load HDInsight tables from a variety of sources with more flexibility. The other note here is that Data Factory is being built from the ground up to support the scale of the cloud or Azure.

5. Can this be used for Big Data?

Absolutely. I would say that it is one of the primary reasons for the tool. In reference to the previous question, it will likely be the tool of choice for big data operations because it will be able to scale with Azure.

Links to Additional Resources on Data Factory or tools that were used in the presentation:

Azure Data Factory on Azure’s Website

Azure Data Factory Documentation

Azure Data Factory Pricing

Azure Storage Explorer

Azure PowerShell Documentation

Thanks for joining me for this presentation. We look forward to seeing you at the next Free Training on the T’s.

Techfuse, a New Laptop, and How Microsoft Azure Helped Save the Day

24 04 2014

On Tuesday, April 22, I had the opportunity to speak at the Techfuse conference in Minneapolis. I was presenting a session on the differences between tabular and multidimensional models with a focus on the developer’s experience. My deck has tenTechFuse_logo  slides including title, references, and bio. The rest of the time is spent in Visual Studio building out equivalent models in using SSAS Tabular and SSAS Multidimensional.

The previous week, I was issued a new laptop from my company, a Dell 7440. This is a very nice machine and I had it all set for the presentation. About 11 AM (I was scheduled to speak at 1:15 PM) it occurred to me that I did not recall seeing a VGA port only HDMI. Next question, did the projectors at the convention center support HDMI? Answer, No. Now I had about an hour and a half to resolve this issue. Simple, I decided to head downtown and get the convertor from Office Depot. This was about 8 blocks away. I could do that and get some exercise in.

I took off at about 11:30. First, I stopped at Target, it was closer. No luck. So on to Office Depot. Keep in mind that Office Depot sells laptops like mine with only HDMI support and it stands to reason that they would have the converter. No such luck. I was able to get the HDMI converted to DVI, but that would not help as I later found out. They directed me to Radio Shack where I promptly picked up a DVI – VGA converter. Now I have three pieces that when strung together should support my needs. I headed back to the convention center and arrived with 30 minutes to spare. Working with the AV guy, we got it all plugged in only to still have it not work. Turns out you need a convertor to convert the digital signal to analog for use in the older projectors. Now what?

The moderator for my room offered me her laptop to use for the presentation. Which was AWESOME! So now I have a way to give the presentation, all ten slides. However, she did not have Visual Studio with SSDT for BI and SQL Server installed. Which was fine, because I didn’t expect her to.

Here is where Azure comes in. I had created a VM with SQL Server Tabular installed along with Visual Studio 2012 and the SQL Server Data Tools for BI. So, I firedth9CGBMYN6 up the VM right before I gave the presentation. I warned the crowd about what had happened and decided to push the demos to the end of the presentation so everyone could leave if nothing worked and all the material could be covered.

I was able to get into the VM, fire up Visual Studio. Since the demo was a live build of a tabular model and multidimensional model, I used a database I had created in SQL Azure as the data source and we built it the models live. Granted we were not able to do a complete multidimensional model because the database was not formatted as star schema, but it helped highlight the difference between what needs to be done prior to development. Overall it went very well (I think, surveys are forthcoming…). At the end of the day, without the work I had been doing in Azure I would not have been able to demo and it would have been a very short presentation.

Some lessons learned –

  • Be sure to have what you need to support presenting in a variety of scenarios. I should have made sure to have a converter prior to the conference as most convention centers and other facilities haven’t upgraded their projectors yet.
  • I will likely set up Azure VMs to support more demos. Just in case. It is always good to have a backup plan though a wireless connection would have painful to do that on.
  • Roll with it. Don’t give up, try to make the best of a bad situation. People understand things don’t always go perfectly. At the end of the day, I came to talk about multidimensional and tabular model development. I could have opened the floor up for discussion and did Q&A. Make the most of every situation.

Logging into Azure and Office365 with Different Accounts

14 03 2013

I have been doing some work on the Modern Apps Live! content that required me to use both an Office365 account and an Azure account.  My Azure work is currently associated with my Microsoft Account (formerly known as my Live ID).  On the other hand, I am working with a Microsoft Office365 account which I have been unable to tie to my current account at this point.

While I was working with Office365, I was trying to open my Azure account to get some storage info that I wanted to use.  It told me I was already signed in with a different user ID and it did not have access to my Azure portal as noted in the image below.


Obviously I did not want to sign out, so I started looking for options.  The IT Director at Magenic, Dave Meier, mentioned he was having the same issue with multiple Microsoft accounts. So, we determined this was somehow related to IE so I installed Google Chrome.  By using Chrome I was able to work around the issue.  However, Dave came across the following article regarding this change in behavior starting in IE 8  Turns out they changed how they manage sessions in that version.

imageThe article brings up a couple of options to work around.  One is using Alt-F-I to change the session.  Kind of annoying as I actually use my mouse (some of you keyboard junkies will like that solution).  So I used the shortcut option and created a shortcut for IE on my Windows 8 desktop which uses the switch –noframemerging.  I added this to my Taskbar, and voila, I have an easy way to open a new session browser which ignores my other session.  Keep in mind that you are essentially running two different sessions in IE which means that you will have two sets of credentials active.  Be sure to keep track of which is which.

Also, as a quick sanity test, Google Chrome works the same way.image  Even when I open a different Chrome instance, it keeps my session state. So, it appears the IE shortcut is a nice way to handle this issue.  Although, it is nice to know I can use a Chrome instance and and IE instance if that is my preferred method.

Add a comment if you try this in FireFox and want to chime in.  I really only want to use one browser, so Chrome is still one more than I want.

UPDATE:  You can also use InPrivate or Incognito modes accomplish the same thing.  Thanks to Rocky Lhotka ( T | B ) for pointing this out.

Join Me at Modern Apps Live! Las Vegas

5 02 2013


If you are familiar at all with Visual Studio Live! then you should check out this new conference.  At the Vegas conference, I am participating in this unique three-day event which is a series of sessions that build on each other as the conference progresses called Modern Apps Live! The goal is to do a end-to-end application build during which the presenters build the application as well as focus on Application Lifecycle Management, best practices, tools, and a variety of technologies.

I will be presenting on the database design and business intelligence components of the solution and would love to see you there.  To make this even sweeter, if you register with the following code, MVSPK3, you will qualify for one of the following offers:

  • The 5 day all-access Best Value Conference Package for just $1,595 – a savings of $500.00 off the standard price of $2,095! (*Includes pre and post conference events.)


  • The 3 day Modern Apps Live! Conference for just $1,295 – a savings of $400.00 off the standard price of $1,695

This is for new registrants only.  Make sure to use the code by February 27, 2013, to take advantage of the full discount.

Also keep in mind that you will be able to take advantage of the Visual Studio Live! sessions as well as this is a cohosted conference.

I hope to see many of you there!

PASS Summit 2012 Wrap Up

13 11 2012

Wow, what a week. Once again, PASS put on a great event that provided much in the way of events and training for the SQL Server community. If you followed my countdown you know some of what I love about PASS. Last year I blogged everyday but I did not do that this week. So, what was different for me? Well, for one I volunteered much more this year than last and I was privileged to speak twice. I spent more time meeting new people and catching up with friends and that was great as well. Enjoy my wrap up from my week.

Tuesday – Leadership Meetings, Welcome Reception, and some Karaoke

Before the event officially kicked off, I joined community leaders from around the world for a series of leadership meetings. First we had a meeting on SQL Saturdays which was an opportunity to see the immense growth of these free training events around the U.S. and throughout the world. What a great opportunity for SQL Server professionals to improve their skills and for those passionate about the community to improve their abilities by leading these events. Many ideas were shared among the team including a panel on how to effectively run a SQL Saturday on a tight budget.

Once that was completed, the Regional Mentors enjoyed a lunch together and an opportunity to share what we do to support the user groups in our regions. I particularly enjoyed the fact that I was able to spend some time with Regional Mentors from Germany, Holland, and Portugal. This highlighted further the international scope and reach of PASS. This was followed by the Chapter Leaders meeting. That meeting was held as a series of round tables that the chapter leaders could move through. I was working at the table focused on leadership with Ryan Adams ( B | T ) from the North Texas SQL Server User Group – NTSSUG. We had a number of good conversations around building leadership teams for user groups and what is needed to have an effectively led user group. Check out the NTSSUG site for the by-laws sample we discussed multiple times.

All of these meetings were followed up with the Welcome Reception, which I made a small portion of as I was trying to drop my backpack at my hotel and work my way back there. After the reception, I headed out to Bush Gardens with a number of others. During that time, Jes Borland ( T ) managed to get a microphone in my hand and I had my first round of karaoke. Yes, I actually did sing and had fun doing it. All-in-all, it was a good time had by all.

Wednesday – SQL Around the World, Microsoft Announcements, Tabular Models, and Magenic Team Dinner

This was the true kick off to the event. For many, they looked at the key note as the kick off. Before that even began, I was working in the Community Zone encouraging people to participate in the SQL Around the World community activity. It was a great game. You needed to find 10 people from 10 different countries and find out something interesting about them or their country. I found a dancer and someone who had ridden a cheetah as a kid. I also surprised someone from the Czech Republic when she mentioned her home town only to have me let her know I had been to her home town many years ago. It was a fun conversation. If you did this and have other cool stories let me know. It was amazing as well over 50 different countries were represented at PASS.

Next, Ted Kummert had the first keynote session of the day. His keynote was filled with announcements concerning SQL Server including the following:

  • Hekaton: the project code name for a new in-memory OLTP engine
  • The Columnstore Index will be updateable
  • Next version of PDW will be out in H1 2013
  • Polybase: allows you to query across multiple types of data sources such as SQL Server and Hadoop with T-SQL
  • DAX Queries will be able to query SSAS Cubes

He also highlighted some recent announcements related to the SQL Server stack:

  • Microsoft HDInsight Server CTP: Hadoop for Windows Server
  • Windows Azure HDInsight Service Preview: Hadoop for Azure
  • Power View and PowerPivot fully implemented in Excel 2013

After the keynote, I hit a session on BigData and Hive which was put on by SQL CAT and very informative. My big takeaway was to use EXTERNAL tables not INTERNAL tables when working with Hive. I then went to do final prep for my Tabular Model session. In this session, “Building a Tabular Model Database”, I present on what tabular and in-memory is, and then proceed to open up a Visual Studio project and create a database. I think it went well and the attendees seemed to enjoy the upbeat nature for an end of day session. The night wrapped up with dinner with the Magenic team (7 of us). Good chance to grow relationships across offices from around the country.

Thursday – Community Zone and DAX as a Query Language

Thursday was a fairly low key day for me. Once again I spent time in the Community Zone. I had the opportunity to talk with a few people on creating a user group in their area. As always, I like to see people interested in growing their local community.

I also attended Alberto Ferrari’s session on DAX. I think the biggest surprise to me was that you can now query DAX directly from SSMS. I am not sure that I am convinced that it is a full query language yet, but it is definitely closer. The key to it all is the EVALUATE expression which allows you to create the DAX query ironically in the MDX window. Here is just a taste of DAX as a query:



What I found interesting is that you can create columns, build measures, and perform many other operations against the tabular model using DAX. In the end, it will not increase the memory used as storage as it is all calculated. Look for some more on this in later blog posts as I delve more into the in-memory storage and usage when working with DAX.

Friday – More Community Zone, HDInsight, Paul White, and Window Functions

Last day. I spent more time in the Zone. I really did enjoy my time there as I continued to meet more people. I was even present when a contract was completed for the Shanghai user group. Very cool indeed. I then attended a session on HDInsight by Mike Flasco from Microsoft. This is very cool stuff as you can create simple Hadoop cluster on your desktop to test the technology. Microsoft and Hortonworks have done a great job of bringing Hadoop data into the Microsoft stack.

On my way to present my final session of the day and the conference, I stopped in for the second half of Allen White’s ( B ) optimization presentation. In a word (or two), mind-blowing! Wow, who knew that the optimizer did all those things? I was highly impressed and think he should look at a precon on the subject next year. Unlike some three hour presentations, he could have went longer as he was not stretching his content out. Nice work Paul. So, I got to follow that with a presentation on Window Functions in T-SQL. For the second time, I had the last slot of the last day. I think this presentation went well even though we were all worn out from a content-filled week. It was fun to try some ideas from the audience in the demos. That always makes for a more interesting demo. I will be doing a follow up post on what I learned from some of the attendees on the subject as well, proving once again this is a user community event. We all have something to contribute! (If you attended this session, you will find links to the blogs on the subject here.)

What’s Next?

Coming in April is the new Business Analytics conference in Chicago followed by the PASS Summit in Charlotte, North Carolina. Of course, your local user groups will continue to meet with regional SQL Saturdays sprinkled throughout the year as well. How will you participate and contribute in 2013? We look forward to seeing you all again, soon.

SQL PASS Summit Final Day (Sessions)

16 10 2011

Here is a summary of the final day of sessions I attended at PASS.

Vertipaq vs OLAP – Change Your Data Modeling Approach by Marco Russo

This session discussed the role of each data model type in your Business Intelligence platform.

The Many-to-Many Revolution 2.0 is being released which will cover BISM multidimensional and BISM Tabular.  It is available now from

The relational model for the multidimensional cube is a star schema.  It provides the best performance and functionality.

OLAP (Multidimensional) Vertipaq (Tabular)
Dimensional modeling Relational modeling
   Facts, dimensions    Tables
   Complex Relationships    Basic Relationships (1:N)
   Script    Calculated columns
   Powerful, Complex    Measures

OLAP and MDX is much more complex than Vertipaq and DAX.  That makes the barrier to entry or learning curve with DAX and Vertipaq much lower.

The dimensional model is designed to be faster and to store aggregations in the relational world.  For solutions with Vertipaq, is the datamart still required?  Can Vertipaq replace the dimensionally modeled datamart?

Marco went through a number of scenarios which show the difference in the performance between DAX, MDX, and SQL for aggregating and analyzing data.  At this point, it is too early to say that OLAP is old and will die.  We have two tools with different purposes and areas of strengths.  For instance, Vertipaq can handle distinct count operation much easier and faster than OLAP.  Marco keeps noting that Vertipaq and DAX are “fast enough” which seems to mean that expensive operations are expensive in both models Vertipaq and OLAP.

Key take away: Adapt the technology to the data model, not the data model to the technology.

PDW Architecture and Internals – Brian Walker

Only place you can put your software on PDW is on the Landing Zone.

Flow of data in PDW:
Source – Landing Zone Files – Data Loader – Compute Nodes

You can get up to 654 TB of backup space on the Backup Node.  Yes, that is 654 Terabytes!

The bread and butter of PDW is the Compute Nodes and Storage Nodes. Each compute node is running a SQL Server 2008 EE instance. SQL is the primary interface to the compute nodes.  TempDB lives on the Compute Node as well because it serves instance specific functions.

In the storage node, SQL Server file groups are created by the appliance to support a compute node.  This includes filegroups for distributed data, replicated data, and logs.

The Admin console is available through the web browser with tabs for Alerts, Sessions, Queries, Loads, Backup/Restores, Appliance State, Locks and Performance Monitor.

There are now add ins for System Center management with PDW.

PDW supports a hub and spoke architecture.  You can connect SQL Server and Analysis Services to PDW so you can offload data to different consumers.  You can treat the data warehouse on PDW as the single source of truth that can support analytics and data marts and related functionality in a spoke.

Integration with PDW:

  • SQL Server – Remote table copy from MPP to SMP
  • PDW ADO.NET drivers for SSAS, SSIS, .NET, SSRS

Data layout approaches that support Shared Nothing querying which is the ability for the appliance to request discreet query subsets from an originating query.

  • Replicated – full copy on each discrete PDW node (good for dimensions)
  • Distributed – uniformly distributed across all nodes (good for fact tables).  Uses hash keys to distribute the data across the storage nodes.
Data Warehousing: From SMP to MPP using SQL Server Parallel Data Warehouse – Dandy Weyn (Sr. Tech Project Mgr) and Brian Mitchell (Sr Premier Field Engineer)

Simplified Data Warehouse Offerings:

  • Enterprise Edition (software only)
  • Fast Track Data Warehouse (reference architecture) – includes software, configuration guidelines, and hardware;
  • BDW Appliance (integrated architecture)
  • Parallel Data Warehouse (DW appliance)

SQL Server Appliances:

  • HP Enterprise Data Warehouse Appliance (PDW)
  • Dell Parallel Data Warehouse Appliance (PDW)
  • HP Business Data Warehouse Appliance
  • HP Business Business Application Appliance (inc SharePoint)

DWLoader or SQL Server Integration Services can be used to load data into the PDW.  DWLoader will usually be the faster method because it has been optimized for PDW.

This is a Shared Nothing Architecture unlike Oracle RAC which is a shared everything architecture.

You need to choose your distribution key very well.  It will make sure that the table is properly distributed across nodes.

Largest SQL Azure Projects in the World & How Do They Scale – Kevin Cox, Michael Thomassy (SQLCAT)

Top stats:

  • Largest Sharded Database is 20 TB
  • Largest number of databases for 1 app is 500
  • Most users on 1 app is 3 million
  • 2k block Read, 8 node end-point – 34,000 ops/sec

SQL Server is the core DB Engine for SQL Azure.

Case Studies

Flavorous – Ticket Sales Vendor

This is a truly elastic solution.  They only keep databases up for the duration of the ticket sale.  (NOTE: Checkout Chaos Monkey).  All of the applications needed to be built with SQL Azure capabilities in mind.  They were able to handle 3 million tickets in 6 minutes including 150,000 tickets in 10 seconds.

Bing Games

Bing Game tracking and scoring system.  398 SQL Azure databases supporting 30 million unique users per month, 200,000 concurrent users with 99.975% uptime.

Why SQL Azure?

  • Faster than Azure Table Storage – for entities with large number of attributes
  • Very low learning curve – prototyped and written in less than 4 days with 1 developer
  • Testability – mostly compatible with SQL Server and easy to prepopulate with millions of records.

Sepia Labs –

They are now in the process of closing their on-premise data center and they are moving everything to the cloud.  This is a private social network for mobile phones.  (Side note from them: “Windows 7 is by far the easiest mobile platform to build on.”)  They picked Azure table storage and not SQL Azure except for some analytical data.

Why Table Storage?

  • Less expensive than SQL Azure
  • Speed approximately the same for small entities (very few columns; i.e. < 10)
  • High size limit – 100 TB total between Table, Blob store, Queue per storage account.

MYOB – Account Software ISV

They service small businesses with accounting solutions.  They chose the cloud because of the types of offerings they have.  This allowed them to provide massive scalability and high availability.  They scale by a sharding technique built by the developers.  They are considering Federations in SQL Azure for the future.

Federations in SQL Azure will allow you to do less development to support sharding within SQL Azure.  It is designed to help customers take care of the following:

  • Unlimited Scalability – distribute workload equally to all nodes, scale out which is not scale up.  Scale applications horizontally to many nodes and allow applications to dynamically deliver data.
  • Rich Elasticity – expand and contract without downtime.
  • Simplified Tenancy Model – whether you have multiple tenants per database, one tenant per database, or multiple databases per tenant.

Thus concludes my week at PASS Summit 2011.

SQL PASS Summit–Day 4 (Keynote)

13 10 2011

Bill Graziano kicked off the event today with the notice of the official “Kilt Day” at the Summit.  He also recognized the volunteers who make this happen.  Thanks to all volunteers which contribute to our local, national, and international SQL communities.  If you are interested in volunteering check out  Bill updated us on the financials, something I won’t be blogging about here.

Quentin Clark, Corporate VP SQL Server, was the keynote speaker from Microsoft.  He talked about the new release of SQL Server 2012.

  • Data – Managing data at the fundamental level; Knowledge – how to get insights out of information; between Data and Knowledge is Information – data cleansing and quality.
  • SQL Server 2012 is one of the biggest releases Microsoft has ever done.  !00s of new features in SQL Server 2012.

Quentin’s Fantastic 12 of SQL Server 2012

  1. Requred 9s & Protection – SSIS as server; HA for StreamInsight; SQL Server AlwaysOn
  2. Blazing Fast Performance – Performance Enhancements in RDBMS, SSAS, and SSIS; ColumnStore Index with Vertipaq
  3. Rapid Data Exploration – Power View + PowerPivot; Administration from SharePoint; Reporting Alerts
  4. Managed Self Service BI – same as 3
  5. Credible, Consistent Data – BI Semantic Model (BISM); Data Quality Services (DQS); Master Data Services (MDS)
  6. Organizational Compliance – Expanded Audit (User Defined, Filtering, HA); User-defined Server Roles
  7. Peace of Mind – Production-simulated Application Testing; System Center Advisor & Management Packs; Expanded  Support (Premier Mission Critical)
  8. Scalable Data Warehousing – SQL Server Appliances (optimized & pre-tuned including Dell PDW, HP Enterprise Data Warehouse Appliance, HP Business Decision Appliance); HW + SW + Support (Just add power); Choice of hardware
  9. Fast Time to Solution – same as 8, lots of appliances
  10. Extend Any Data, Anywhere – Greater Interoperability – new drivers for PHP, Java & Hadoop; ODBC Drivers for Linux & Change Data Capture (CDC) for SSIS & Oracle; Beyond relational: FileTable, 2D Spatial, Statistical Semantic Search (functionally will replace full text search)
  11. Optimized Productivity – SQL Server Data Tools (formerly “Juneau”); Unified across database and BI (no more BIDS); Deployment & Targeting Freedom
  12. Scale on Demand – AlwaysOn; Deployment Across Public & Private; Elastic Scale (using SQL Azure)

Azure Announcements

You will now be able to access Windows Azure storage from SSMS which can be used for SQL Azure backups.  You will also be able to “right-click” deploy from SSMS to SQL Azure.  A lot of integration between SQL Server and SQL Azure is coming in SQL Server 2012.

SQL Azure Federation is being released at the end of the year.  This will allow for larger and elastic databases to be deployed to SQL Azure.  This provides for a management tool to distribute or federate your application data across SQL Azure databases which allows you to manage cost and performance.

SQL Azure management site is now going Metro (WIndows 8) including live tiles.  I like it.  This too will be released at the end of the year.

SQL Azure will now be able to support 150 GB databases.

Available today – CTPs for SQL Azure Reporting and SQL Azure Data Sync.

There really are a plethora of really cool features and tools being released with SQL Server 2012.  Nice work SQL Server project teams.


Get every new post delivered to your Inbox.

Join 889 other followers

%d bloggers like this: