Part2 Adding a New CheckBox – Create your own SQL tools with PowerShell and Windows forms

14 01 2015

Steve:

This is James’ second post on creating SQL tools with PowerShell and Windows forms. James is a DBA responsible for the management of a large number of SQL Server instances.

Originally posted on JamesDataTechQ:

This blog post is about adding a new CheckBox to the SQL Tool base form from my first blog post. In the first blog post I just gave an introduction on the SQL Tool, now this is where the fun starts getting creative by adding your own SQL queries and PowerShell scripts. How cool is that! You’re making your own SQL tools tailored to your own needs. The CheckBox that I am going to be adding is for an SQL process, but it is universal. The CheckBox is just starting a PowerShell Function, what ever you dream up you can start that process from your own SQL Tool. If you started making your own SQL Tools or you would like for me to cover a topic please leave me a comment.

Add a new CheckBox

Step 1. In this step we will be adding a new CheckBox object called…

View original 514 more words





Intro to Data Factory–Training on the T’s Follow Up Post

13 01 2015

PragmaticWorks-LogoThis is a follow up blog post based on the Intro to Data Factory session I gave on the Training on the T’s with Pragmatic Works. Find more free training from the past and upcoming here. I did my session on January 13, 2015.

 Intro To Data Factory

In this session, I gave a simple introduction to new Azure Data Factory using a CopyActivity pipeline between Azure Blob Storage and Azure SQL Database. Below is a diagram illustrating the factory that is created in the demo.

image

I have published my presentation materials here. This includes the sample JSON files, the Movies.csv, and PowerShell scripts.

Q & A

Here are a few questions that were answered during the session.

1. Does Availability refer to when data that has been transferred will be available? Or when the data source is actually available for query?

Availability refers to when the datasets will make a slice available. This is the when the dataset can be consumed as an input or be targeted as an output. This means you can consume data hourly but choose to push it to its final destination on a different cadence to prevent issues on the receiving end.

2. What pre-requisites are must haves?…e.g.(Azure account, HDInsight, Blob Storage Accounts, etc.)

    • An Azure Account is the only real must have. You could use two on premise SQL Server instances.
    • HDInsight if you want to use the HDInsight activitities
    • An Azure Storage account to use blob or table storage

3. How do you decide to use a Factory or Warehouse?

The factory is more of a data movement tool. A warehouse could be a source or target of a factory pipeline.

4. Is this similar to SSIS in SQL Server?

Yes and no. SSIS is definitely more mature and has more tooling available such as data sources and transformations. SSIS also have a good workflow constructor. The focus of the Data Factory initially was to load HDInsight tables from a variety of sources with more flexibility. The other note here is that Data Factory is being built from the ground up to support the scale of the cloud or Azure.

5. Can this be used for Big Data?

Absolutely. I would say that it is one of the primary reasons for the tool. In reference to the previous question, it will likely be the tool of choice for big data operations because it will be able to scale with Azure.

Links to Additional Resources on Data Factory or tools that were used in the presentation:

Azure Data Factory on Azure’s Website

Azure Data Factory Documentation

Azure Data Factory Pricing

Azure Storage Explorer

Azure PowerShell Documentation

Thanks for joining me for this presentation. We look forward to seeing you at the next Free Training on the T’s.





She Taught Me to Code–A Tribute to Sheila, My Wife

12 01 2015

A tribute is an expression of gratitude or praise. A couple of years ago, I started a series about individuals who have impacted my career. I do this as a tribute to my father-in-law, Ed Jankowski who passed away five years ago this past December (2014). Check out my original post about him and his impact on me being in software development today.

Picture - WeddingShe Taught Me to Code

My wife, Sheila, actually did get me started down the programming path. We had been married just over a year when she showed me how to use Microsoft Access to create databases, entry forms, and reports. She knew how to do some of the code behind to solve problems. As I noted in the first paragraph, her dad was a significant influence during the start of my career. His influence was not lost on her either. It was that work she did with me that got me interested in computers. If you ask her today, she is more a user of software and not a builder of applications. However, she was the first to show me the possibilities and joy of creating applications for practical uses.

My Success and My Wife

Over the years, I have worked in the corporate world and in consulting. As my career began to take off, Sheila supported me and the effort required to learn and move up in a career which I started after completing college. From Bethany Press to Magenic to Xata to Magenic and now Pragmatic Works, she has been supportive, even when it made life harder at home. Without her, I would not have been able to do much of what I have accomplished.

I really had this perspective reinforced with the article from Harvard Business Review – The One Thing About Your Spouse’s Personality That Really Affects Your Career. Here are a couple of highlights from that article that speak volumes about Sheila’s influence on my career:

First, conscientious spouses handle a lot of household tasks, freeing employees to concentrate on work (“When you can depend on someone, it takes pressure off of you,” Solomon told me). Second, conscientious spouses make employees feel more satisfied in their marriages (which ties in to the first study I mentioned). Third, employees tend to emulate their conscientious spouses’ diligent habits.

… what isn’t obvious is the extent to which so many people are parts of teams, in a sense — two-person teams that are based outside the office.

Being a data guy, it is really cool to see research reinforce the impact my wife has on my career. Interestingly, I have made job changes to retain that support as well. The companies that have supported my wife have been the most enjoyable to work at. Even when the job demands were rough, those managers and leaders who cared about my wife and family both in word and action were the best places I worked at. Here is one last quote from the HBR article about this topic as well:

We can’t and probably don’t want to know the details about these teams, but as Solomon points out, if organizations really understood the workplace effects of strong outside relationships, they might be more receptive to policies like flextime and telecommuting that make it easier for employees to spend time with their significant others.

I have been married over 20 years and have 4 teenage children. Without my wife and her support, I think that we would be in a very different place and I definitely would not be as happy.

So, to the love of my life, thanks for putting up with the long hours, working weekends and travel. I know it has been hard, but with you I am a better person and more successful. Thanks so much.





2014 Year In Review

11 01 2015

imageAs is our want, we must look back over the past year to see what happened. While I normally focus on work related items, this year was a crazy year for our family as well as my career. So let’s have a look at what happened this year.

Traveling Family

2014 was a year that saw our family do a bunch of traveling. Although our trips were not all done together, it was travel all over the world. Here are some of our highlights:

  • My two oldest children, Kristy and Alex, went on a tour of Italy with the Burnsville High School Band. They saw Venice, Rome, and a few other cities. They were able to perform with the band during that trip.
  • Kristy journeyed to Israel with Grace Church right before the missiles started being launched. She was doing a Holy Land tour which she enjoyed a lot. However, as parents, getting a text that said, we left before the missiles landed around Bethlehem did make us a bit nervous.
  • Alex worked in an orphanage in Romania. He was significantly impacted with the conditions there and is looking for his opportunity to return and serve some more.
  • Andrew and Mikayla went to a town in Indiana for a weeklong trip with Teenserve and our church. They had the opportunity to join Family Cancunteens from around the country and perform repairs and general maintenance for a town in need.
  • Alex visited colleges in LA and Lynchburg
  • Andrew traveled to Chicago with band and a church group.
  • Our entire family enjoyed a true break in Cancun, Mexico. Truly a lot of fun and great downtime.
  • We followed the Cancun trip up with a cross country trip to Los Angeles to drop my oldest, Kristy, off at Biola College for her freshman year.
  • Andrew and I went to Key West with the Boy Scouts and sailed around the Keys for a week. That was truly enjoyable. I loved being on a boat.
  • Sheila and I enjoyed our company Holiday party in the One Ocean Resort in Florida
  • We wrapped up the year visiting family for the holidays in Kentucky.

Overall, we were all over the country and even the world. We were blessed to have the opportunities to experience so much this year.

Changing Employers

In the middle of all the travel, I celebrated 10 years at Magenic in March and transitioned to Pragmatic Works in October. I loved working at Magenic. During this year, I came to the realization that I wanted to focus more on data and BI solutions, so I made the move to Pragmatic Works. I enjoy my new company as much as my old one which is very good. Thanks to everyone at both places for supporting me and my career.

More…

This past year, I also contributed to my third book. Hopefully you found it helpful. I also did a first for me this year, I reblogged a post from a friend and fellow Scouter, Jim Larson. His PowerShell work is awesome and I wanted to share it with my readers as well.

Thanks to My Readers

Finally, I wanted to thank all my readers. I appreciate your support. It has been cool to see my readership increase this year. I hope you find value in the technical content here. I look forward to hearing from you or even better, seeing you at SQL Saturdays and other events throughout the year.

Here’s to a great year in 2015!





Create your own SQL Tools with PowerShell and Windows Forms

2 12 2014

Steve:

Check out a new blogger who is sharing secrets of the DBA world. In his first post, James shows us how to create your very own SQL tool using PowerShell and Windows Form.

Originally posted on JamesDataTechQ:

By combining PowerShell and Windows Forms you can create your own SQL Tools. From the Windows Form you can execute your SQL and PowerShell scripts with just a couple clicks of a mouse. The added bonus is now your scripts are in a central location that is tailored to your needs.   I have created a base form below so you can get started adding you own scripts. If you’re new to PowerShell save the code below as a ps1 file and then run it from the PowerShell.

Untitled


####################### PowerShell Code ###############################


function CreateForm {
Add-Type -AssemblyName System.Windows.Forms
Add-Type -AssemblyName System.drawing


$form1 = New-Object System.Windows.Forms.Form
$button1 = New-Object System.Windows.Forms.Button
$button2 = New-Object System.Windows.Forms.Button
$checkBox1 = New-Object System.Windows.Forms.CheckBox
$checkBox2 = New-Object System.Windows.Forms.CheckBox
$InitialFormWindowState = New-Object System.Windows.Forms.FormWindowState

#Form Parameter
$form1.Text = “My PowerShell Form”
$form1.Name = “form1″
$form1.DataBindings.DefaultDataSourceUpdateMode =…
$System_Drawing_Size = New-Object System.Drawing.Size
$System_Drawing_Size.Width = 550
$System_Drawing_Size.Height = 150
$form1.ClientSize = $System_Drawing_Size

#Add…

View original 369 more words





Power Testing ETL with Power BI – Creating the Tests with Power Pivot

13 11 2014

PowerTool_1This is the second deep dive into Power Testing ETL with Power BI. At this point, we have created the source table which will be used in our testing. The next step is to bring in the destination table and create the tests that will be “run” against the data. In its simplest form the tests are created using logical conditions based on whether source data matches destination data and calculations applied to those data sets also match. When they don’t match, you have data load error which results in a failed test.

How to Calculate Success and Failure

The basics of the testing is turn the results into numbers and calculate if and how much we succeeded or failed. Typically, every test will result in a 1 or 0. Whether you assign 1 to success or failure is largely dependent on how you plan to display your results. If you plan to use KPIs built into the Power Pivot model, you will be comparing the number of successful tests against the number of rows expected to be imported. The primary reason for this is that you cannot target zero when using KPIs. In this scenario, successful tests result in 1 and are therefore easily compared to the number of expected rows which would be 100% successful if they matched.

The other scenario is to measure failures. In this case, we assign 1 to each failed test and count the number of failed tests. This can easily be handled in visualizations such as conditional formatting where 0 can be displayed as green and the number of failures change the state from from green to yellow then red. This helps identify the most commonly failed tests.

The method you choose is up to you and how you prefer to see the results. We will cover using both variations in visualizations, but for sake of brevity here, we will measure success against our row count. Success = 1; Failure = 0.

Creating the Power Pivot Tests

In order to create the tests, you need to open the Power Pivot window and add the destination table to the model. In our case we have created a table in the HughesMediaLibrary database called books that is our target. Here is the syntax for the target table.

CREATE TABLE dbo.Books(
BookID int IDENTITY(1,1) NOT NULL
CONSTRAINT pk_Books PRIMARY KEY CLUSTERED,
BookName varchar(100) NOT NULL,
Publisher varchar(100) NULL,
Genre varchar(50) NULL,
CopyrightYear smallint NULL,
AuthorFName1 varchar(100) NULL,
AuthorLName1 varchar(100) NULL,
AuthorFName2 varchar(100) NULL,
AuthorLName2 varchar(100) NULL,
AuthorFName3 varchar(100) NULL,
AuthorLName3 varchar(100) NULL,
AuthorFName4 varchar(100) NULL,
AuthorLName4 varchar(100) NULL,
AuthorFName5 varchar(100) NULL,
AuthorLName5 varchar(100) NULL,
PageCount int NULL
)

While I realize this is not a good normalized table, it serves our purposes well to build out the tests. This table needs to be added to the Power Pivot model before we can do the next steps.

Relating the Source and Destination

The next step is to relate the source and destination. In our case, the only data that will work is the book name. We will use the Source table as the primary table in this relationship. The idea is that all the data in the source table should exist in the target. As this is not always the case, the source is the “source of truth” for the testing scenario.

 

Building the Tests

The tests are comprised of calculated columns that handle data analysis and calculated measures which summarize results.

Validating Data Field by Field,  Row by Row

This is the primary reason that we worked with Power BI. One of the most common testing scenarios is whether the data came over correctly. In the previous post, we shaped the data with Power Query. Now we will compare it with the results from our ETL process in SSIS. We will use Book Name as the example. Every field you wish to test can follow this pattern. The test consists of a calculated column and a calculated measure.

We create a column in the destination table called Book Name Matches. (Remember we are tracking success not failures.) In each row of the data we need determine that the book name in the destination is the exact match for the book name in our source. We used the following DAX for that calculation:

=IF(RELATED(‘Booklist Source Fixes'[BookName])=’Media Library – Books'[BookName],1,0)

It looks at the related table to determine that the field names match. If they match, the test is assigned a 1 for that row. If they do not match, a 0 is assigned. (The table names are how I named the source and destination. They may not match your solution if you are following along.) Once we have the rows evaluated, we will sum the values with a Book Name Matches measure:

Book Name Matches (34):=SUM([Book Name Mismatch])

We will use the Book Name Matches (34) measure to compare with the book count. If they match, all tests passed. If they do not, then some or all rows have failed.

The number after the measure, 34, is the test key from TFS. I added this into the measure to make it easier to identify which test case is being evaluated with this measure. In some cases, you may have multiple measures that are required to complete a test. You can either evaluate them independently or create and additional measure that summarizes them for your use.

Other Validations or Tests

Some other basic validations can be created as well. A common one would the book count. In my scenario, I return the book count then evaluate it using a KPI. Another way to do this is to add another measure that checks for equality between the two book count measures in the source and destination. If they match, success. If not, failure.

You can also use measures to validate expected totals the same way we were working with counts. This is particularly helpful in financial data loads where you would want to verify a sum of balances to make sure the results match. The point is that you can add any other measures that you want to compare in order to meet the unique needs of your situation. It is also possible that you can compare to entered values. If you know that 100 widgets are to be imported, you can have the measure evaluate against 100 instead of  a measure in the source.

Recording the Results in TFS

In order to bring the process full circle, we enter test results into TFS or Visual Studio Online. This allows us the ability to track test results, bugs, and fixes in a development lifecycle tool. It is also the best way to track testing history. One caveat here is that the query results from TFS do not permit you to set test results in Excel. Ideally, we should be able to link in the tests with the results. We could then update the results in the query and push it back. This is NOT supported at the moment. As a result, you will need to open the tests in TFS to update your results. This is not a significant issue because you should also create bugs for failed tests. It’s primarily a nuisance.

An added side effect of using this method to test is that we are able to collaborate with developers to determine what the bug actually is. Because all the data is loaded into Excel reviewing results is fairly simple and may actually be easier than trying to look at the destination system.

Quick Look at SSIS

Up to this point, we have focused on how an non-developer can set up the source and destination and proceed to test. I wanted to call out the author name work done in Power Query to highlight why Power BI is a great choice. When splitting author names, the work was done using right-click operations. Here is an example of the expression code used to split out the second author name column:

(DT_STR,200,1252)TRIM((FINDSTRING(AuthorNames,”,”,1) == 0 ? NULL(DT_WSTR,200) : TRIM(SUBSTRING(AuthorNames,FINDSTRING(AuthorNames,”,”,1) + 1,FINDSTRING(AuthorNames,”,”,2) == 0 ? LEN(AuthorNames) : 1 + LEN(AuthorNames) – FINDSTRING(AuthorNames,”,”,2)))))

Compared to Power Query, this is complex and not intuitive. While Power Query is not intended for enterprise ETL use, it’s simplicity helps test complex scenarios such as our author name split without having to create and equally complex SQL statement or expression.

The next post will take a look at some of the visualization options for the test results.





Power Testing ETL with Power BI – Shaping The Data with Power Query and Power Pivot

12 11 2014

PowerTool_1This blog post digs into the details of shaping the data with Power Query and Power Pivot in order to build out the test cases. In the previous post, you were able to get a sense of the bigger picture and how the pieces work together. This post will focus entirely on creating the source table that will be used.

One of the most difficult parts of testing the data in an ETL process is that the data needs to be transformed to match the results of the ETL process. Typically this is done using a combination of tools including SQL, Excel, and even Access. The solution I am proposing will use Power Query to do the initial massaging of the data and Power Pivot to put any finishing touches in place.

Understanding the Requirements

The first thing that has to be understood are the requirements. Those requirements are driven from the business rules and the Source to Target Map. Because we are focusing on a non-developer to deliver this work, we need to move away from developer centric tools and into the world of Excel and Power BI.

Building Out the Power Query Query

Power Query is an excellent choice for this work. It allows us to transform or shape the data through a series of steps. What really makes this compelling is that Power Query is a “no code” solution. Once the tester or analyst is familiar with the tool, they understand that most operations can be accomplished using short cut or right-click menus during the design process. Here is the indepth look at what it will take to take the multiple authors in the source and separate them into multiple columns using Power Query.

Step 1 – Find the data source

In our case the data source is a CSV file. You can download that file here. This link will opens an Excel file with the pipe-delimited values that will be used as the source. I would recommend saving it as a .csv file as it is easier to work with in Power Query.

Here is the data raw so you can see what we will be working with:

BookName|Publisher|Genre|Copyright|PageCount|AuthorNames
Wish List|John Locke Books|Thriller|0011|215|John Locke
Riders of the Pale Horse|Bethany House Publishers|Christian Fiction|1994|348|T. Davis Bunn
HTML Pocket Reference|O’Reilly|Technology|2000|92000|Jennifer Niederst
Renegade|Thomas Nelson|Juvenile Fiction|2007|245|Ted DekKer
Gutenberg to Google|Billion Soul Publishing|Missions|2009|272|James Davis
Sinner|Thomas Nelson|Suspense|2008|386|Ted DekKer
SQL Server Analysis Services 2012 Cube Development Cookbook|Packt Publishing|Technology|2013|324|Baya Dewald, Steve Hughes, Paul Turley
Troubleshooting SQL Server – A Guide for the Accidental DBA|Redgate Books|Technology|2011|358|Jonathan Kehaysias, Ted Krueger

Step 2 – Open Power Query in Excel and Connect to the CSV File

image

Select the Power Query tab and select the From File option on the ribbon. Pick the From CSV option. Select the booklist.csv file and click OK. The result will be a preview of the data, which in our case is all the data. You can see it has created the Source, First Row as Header and Changed Type steps. If it did not do this for you automatically, you may need to set the delimiter and specify that the header is the first row.

image

Step 3 – Shape the Data in Power Query to Match Our ETL Process

In Power Query, we are going to split the author list and the author names. We also will apply some trimming to the data. In all we will apply ten (10) steps to query. Power Query works like an ETL tool as it shapes or transforms the data a step at a time.

Splitting the AuthorNames column

In this step, we will create a column for each author name. Our destination supports up to five authors. Our source has up to three. Right click on the AuthorNames column, select Split Column, then By Delimiter.

image

You can leave the defaults in the dialog and click OK.

image

This will result in three columns being created as AuthorNames.1, AuthorNames.2 and AuthorNames.3. Power Query does the next step which changes the data type to match what it sees in the resulting data.

Splitting the Author’s Names into First and Last Name Column

You will need to repeat this three times, once for each AuthorNames column. What is different is that we need to match a couple of business rules:

1. Author names will be stored as AuthorFName and AuthorLName for up to 5 authors (e.g. AuthorFName1).

2. Authors with middle initials or middle names or variations thereof should store these values with the first name. For example, J.R.R. Tolkien would store “J.R.R.” in the AuthorFName column and his last name, “Tolkien”, will be stored in the AuthorLName column.

Understanding these rules clarify how we should split these columns. Like before we will select to split the AuthorNames.1 column. However, in the delimiter dialog we will use a space as a delimiter and we will also choose the right most delimiter. This will pick the first space from the right, essentially the last name and everything else will be separated.

image

We will repeat the process for each column. The last step for this process is to rename columns to something meaningful for us to reference later such as the target field names like AuthorFName1. This will make the steps later simpler to follow.

Trim Author First Names for Authors after First Author

The final step we need to do is to apply a trim to the AuthorFName2 and AuthorFName3 columns. When the data is split, leading spaces were retained. In my demos, this is “discovered” as a mismatch in the test scenario. This would be an example of an easy miss for someone not used to some of the nuances of ETL. Keep in mind that we will test the tests as well throughout this process. This is a simple fix in Power Query – Right Click the affected columns and select Transform then Trim. Problem solved.

At this point, we have completed our work in Power Query. Up to this point, you may have seen the results of your query in an Excel spreadsheet. You actually want to load the data to a Power Pivot model. If you right-click on the query in the Workbook Queries panel, you can change the Load To target.

image

Select Load to Data Model and then we will finish the source data using Power Pivot.

Step 4 – Fix Additional Issues Using Calculated Columns in Power Pivot

Open the Power Pivot model in Excel. You should see data from your Power Query query as one tab of the data. While we have massaged some of the data there are still a few data issues that need to be resolved to match business rules.

3 – Copyright years must be stored as 4 digit values.

4 – Page counts should not exceed 1000.

If you look at the source data you will notice that one of the books has a two digit year for the Copyright. This should not be imported as it does not meet the rule. In our case, we will set the value to NULL in the ETL process. The same is true for one of the book page counts, it is 92,000 which greatly exceeds the maximum page count allowed by the business rule. It too will be set to NULL. The idea here is that row value checks are easily handled in Power Pivot with DAX and calculated columns.

To resolve the copyright year issue we are using the following DAX to create a new column called “Copyright Year”:

=IF([Copyright] < 1900, BLANK(), [Copyright])

To resolve the page count issue, we use the following DAX and create a “Pages” column:

=IF([PageCount]>1000, BLANK(),[PageCount])

Now we have fixed the remaining issues that violate business rules in the Power Pivot model.

Step 5 – Add Some Calculated Measures and Columns that Can Be Used for Data Validation

The final step is to add some calculations that will help us do some basic load testing. The first is just the row count. In this case, I created two measures: Source Book Count and Source Distinct Book Count (This handles a business rule that says a title can only be imported once). We can use these measures to verify that the expected data made it from source to destination. Both of these measures were created in the calculation area in Power Pivot using the Autosum functions from the ribbon. The resulting DAX is noted below.

Source Book Count:

Source Book Count:=COUNTA([BookName])

Source Distinct Book Count:

Source Distinct Book Count:=DISTINCTCOUNT([BookName])

The last calculation we need to create is the Author Count calculated column. This needs to be a column as each row could have a different number of authors. Based on what we know with the data, we will count instances of AuthorLName columns that are not NULL to determine the number of authors.

=IF(ISBLANK([AuthorLName1]),0,1)+ IF(ISBLANK([AuthorLName2]),0,1)+ IF(ISBLANK([AuthorLName3]),0,1)

This calculation would need to be modified if the source had an row with more than three columns.

Shaping Is Complete

The source transformation is now complete in the test scenario. A key point is that no code per se was written. While some DAX was required, it was fairly straightforward and likely the most complicated part of setting up the source table for testing.

Next up, creating the tests with Power Pivot and DAX.








Follow

Get every new post delivered to your Inbox.

Join 871 other followers

%d bloggers like this: