Category Archives: General

Cosmos DB for the Data Professional

Cosmos DB LogoCosmos DB is one of the fastest growing Azure services in 2018. As its popularity grows, data professionals are faced with a changing reality in the world of data. Data is no longer contained in relational databases as general rule. We saw the start of this with Hadoop data storage, but no one ever referred to Hadoop as a database. Sure Hive and other Hadoop based technologies made the data look like a database, but we (data professionals) were able to keep our distance. What’s changed?

The Cloud, Data, and Databases

As cloud reaches more and more businesses, traditional data stores are being reconsidered. We now have data stored in Azure – Azure Data Lake, Azure Storage, Azure Database Services (SQL, PostgreSQL, MySQL), Azure Data Warehouse, and now Cosmos DB. Cosmos DB is the globalized version of Azure Document DB (more about that later). If we are to grow our skillset and careers to a cloud data professional, we need to know more about other ways the data is stored and used. I want to summarize some things that we need to be aware of about Cosmos DB. If your business uses it or plans to and you are a data pro, you will need to know this.

Introducing Cosmos DB

Azure Cosmos DB is Microsoft’s globally distributed, multi-model database.

Cosmos DB Overview 201804

Source: https://docs.microsoft.com/en-us/azure/cosmos-db/introduction 

I will break down key components of Cosmos DB with a data professional in mind. There are a lot of aspects of Cosmos DB that make it very cool, but you will want to understand this when you get the call to fix the database.

Multi-model Database Service

Currently Cosmos DB supports four database models. This is like having for different database servers in one. I liken it to having SQL Server Database Engine and SQL Server Analysis Services using the same underlying engine and it only “looks different.” Cosmos DB refers to these as APIs. The API is chosen when the database is created. This optimizes the portal and database for use with that API. Other APIs can be used to query the data, but it is not optimal. Here are the four models supported and the APIs that support them.

Cosmos DB models

  • Key Value Pair: This is exactly as it sounds. The API is implemented with the Azure Table Storage APIs.
  • Wide Column or Column Family: This stores data similar to relational, but there is no row consistency (each row can look different). Cosmos DB uses the Cassandra API to support this model. (For more information on Cassandra click here.)
  • Documents: This model is based on JSON document storage. Cosmos DB currently supports two APIs for this model: SQL which is the Document DB API and Mongo DB. These are the most common models used in Cosmos DB today. Document DB is the “parent” to Cosmos DB which was rebranded.
  • Graph: Graph databases are used to map relationships in data and were made popular with Facebook for instance. Microsoft uses the open source Gremlin API to support the Graph Database Model.

None of these databases are traditional row/column stores. They are all variations of NoSQL databases.

Turnkey Global Distribution

This is a key attribute for Cosmos DB. Cosmos DB can be easily distributed around the world. Click the data center you want to replicate to and Cosmos DB takes care of the rest. Cosmos DB uses a single write node and multiple read nodes. However, because Cosmos DB was built with global distribution in mind, you can easily and safely move the write node as well. This allows you to “chase the sun” and keep write operations happening “locally”.

Data Consistency

Data consistency is a primary concern of any data professional. The following tables compare Cosmos DB Consistency Levels with SQL Server Isolation Levels. These are not a one for one match, but demonstrate the different concerns between the systems.

 

Cosmos DB

SQL Server

Consistency Level Guarantees Isolation Level Dirty Read Non- repeatable Read Phantom
Strong Reads are guaranteed to return the most recent version of an item. Serializable No No No
Bounded Staleness Consistent Prefix or read order. Reads lag behind writes by prefixes (K versions) or time (t) interval. Snapshot No No No
Session Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads. Repeatable Read No No Yes
Consistent Prefix Updates returned are some prefix of all the updates, with no gaps. Reads are not read out of order. Read Committed No Yes Yes
Eventual Out of order reads. Read Uncommitted Yes Yes Yes

As you can see, there are some similarities. These options are important to understand. In the Cosmos DB, the more consistent you need the data, the higher the latency in the distributed data. As a result, most Cosmos DB solutions usually start with Session Consistency as this gives a good, consistent user experience while reducing latency in the read replicas.

Throughput

I am not going to dig into this much. But you need to understand that Request Units (RU) are used to guarantee throughput in Cosmos DB. As a baseline, Microsoft recommends thinking that a 1 KB JSON file will require 1 RU. The capacity is reserved for each second. You will pay for what you reserve, not what you use. If you exceed capacity in a second your request will be throttled. RUs are provisioned by region and can vary by region as a result. But they are not shared between regions. This will require you to understand usage patterns in each region you have a replica.

Scaling and Partitions

Within Cosmos DB, partitions are used to distribute your data for optimal read and write operations. It is recommended to create a granular key with highly distinct values. The partitions are managed for you. Cosmos DB will split or merge partitions to keep the data properly distributed. Keep in mind your key needs to support distributed writes and distributed reads.

Indexing

By default, everything is indexed. It is possible to use index policies to influence the index operations. Index policies are modified for storage, write performance, and read or query performance. You need to understand your data very well to make these adjustments. You can include or exclude documents or paths, configure the index type, and configure the index update mode.  You do not have the same level of flexibility in indexes found in traditional relational database solutions.

Security

Cosmos DB is an Azure data storage solution which means that the data at rest is encrypted by default and data is encrypted in transit. If you need RBAC, Azure Active Directory (AAD) is supported in Cosmos DB.

SLAs

I think that the SLAs Microsoft provides with Cosmos DB are a key differentiator for them. Here is the short summary of guarantees Microsoft provides:

  • Latency: 99.99% of P99 Latency Attainment (based on hours over the guarantee)
    • Reads under 10 ms
    • Writes under 15 ms
  • Availability
    • All up – 99.99% by month
    • Read – 99.999% by month
  • Throughput – 99.99% based on reserved RUs (number of failures to meet reserved amount)
  • Consistency – 99.99% based on setting

These are financially backed SLAs from Microsoft. Imagine you providing these SLAs for your databases. This is very impressive.

Wrap Up

For more information, check out Microsoft’s online documentation on Cosmos DB.

I presented this material at the April 2018 PASS MN User Group Meeting. The presentation can be found here.

Advertisements

Five Years, A Quiet Quarter, A Look Ahead to 2016

Five Years of Blogging

From DataOnWheels Logo Original to data-on-wheels-1

My blogging story started on December 7, 2010. I have now had a blog for over 5 years. I want to thank all of you who have read my blog and interacted with me through it. You have seen me change the theme once and do a number of series. Here are some highlights from the past five years:

Top 5 Posts All Time

  1. Adding Top 10 Charts to Power View Which Honor Filters
  2. Simple batch script to generate XMLA and deploy SSAS DB
  3. T-SQL Window Functions – Part 1- The OVER() Clause
  4. Exploring Excel 2013 for BI Tip #14- Sparklines and Pivot Tables
  5. O, There’s the Data- Using OData in SSIS

Top Series All Time

The Excel BI Tips series has changed it name a couple of times. However, this tip series still rings true even today even as Microsoft invests in other tools. Look for some more Power BI content this year, but this series will continue to have updates. Also, look to see some Excel 2016 topics added to the list as that release becomes available. Here are the top ten tips from the series:

Tributes

A tribute is an expression of gratitude or praise. A couple of years ago, I started a series about individuals who have impacted my career. I do this as a tribute to my father-in-law, Ed Jankowski who passed away in December 2009. Check out my original post about him and his impact on me being in software development today.

Some Stats

I want to thank everyone again for taking time to check out my blog. Here are some stats that I thought were cool and decided to brag about here:

  • 2011 daily average: 9 – 2015 daily average: 162
  • 156 posts
  • 135,000 views
  • Best ever views in a day: 584

Thanks again for checking out my “help” library. As I noted in one of my posts, I blog to not forget and to pass along what I have learned. The key for me is that I do it when I can about topics that interest me.

A Quiet Quarter

The last statement holds true here. I have had a very quiet end of the year. I had blogs which followed up sessions, a practice that I intend to continue, and one BI Tip. November and December were quiet as my job and family took precedence as Pragmatic Works closed out the year strong and we had holiday activities at home including getting my two kids in college home. Well, the dust has settled so I am getting a few more posts published now. Look for the Minnesota SQL Server User Group and Minnesota BI User Group follow up posts this week.

Looking ahead to 2016

After a busy year last year, I am looking forward to having some new opportunities to write about Azure, SQL Server 2016, and other technologies I have not even seen yet. Are you excited for what is coming? Let’s have a great year working with data and analytics.

How I Got Started in Software Development-A Tribute to Ed

Happy Memorial Weekend everyone. This is a time to remember those who have gone before and in some cases have left us. I started a tribute series that celebrates those who have had an impact on me as a person and on my career. It started with my father-in-law, Ed Jankowski, who did so much to get me started working in technology. I thought this Memorial Weekend, I would reblog my original tribute to him. Still miss you Ed!

Celebrate with family and friends this weekend! Remember those who have left and cherish the time with those who are still here.

Data on Wheels - Steve Hughes

A tribute is an expression of gratitude or praise.  As I head into this holiday season I wanted to express thanks to those individuals who have impacted my career through the years.  What got me thinking about this was the fact that my father-in-law passed away two years ago in mid-December.  I wanted to honor his memory.  I have chosen to do this by starting an annual blog entry where I recognize an individual that has directly impacted what I am doing today.  As a result, this first tribute will recognize my father-in-law, Ed Jankowski’s influence on my career.

Ed Jankowski, My Father-in-Law

I would have to say that Ed was most directly involved with my transition to the field of software development.  I had no prior experience working on computers before I met Ed.  During my employment at Bethany House Publishers, I saw a need Beaver Hatto “automate” the book used…

View original post 296 more words

2014 Year In Review

imageAs is our want, we must look back over the past year to see what happened. While I normally focus on work related items, this year was a crazy year for our family as well as my career. So let’s have a look at what happened this year.

Traveling Family

2014 was a year that saw our family do a bunch of traveling. Although our trips were not all done together, it was travel all over the world. Here are some of our highlights:

  • My two oldest children, Kristy and Alex, went on a tour of Italy with the Burnsville High School Band. They saw Venice, Rome, and a few other cities. They were able to perform with the band during that trip.
  • Kristy journeyed to Israel with Grace Church right before the missiles started being launched. She was doing a Holy Land tour which she enjoyed a lot. However, as parents, getting a text that said, we left before the missiles landed around Bethlehem did make us a bit nervous.
  • Alex worked in an orphanage in Romania. He was significantly impacted with the conditions there and is looking for his opportunity to return and serve some more.
  • Andrew and Mikayla went to a town in Indiana for a weeklong trip with Teenserve and our church. They had the opportunity to join Family Cancunteens from around the country and perform repairs and general maintenance for a town in need.
  • Alex visited colleges in LA and Lynchburg
  • Andrew traveled to Chicago with band and a church group.
  • Our entire family enjoyed a true break in Cancun, Mexico. Truly a lot of fun and great downtime.
  • We followed the Cancun trip up with a cross country trip to Los Angeles to drop my oldest, Kristy, off at Biola College for her freshman year.
  • Andrew and I went to Key West with the Boy Scouts and sailed around the Keys for a week. That was truly enjoyable. I loved being on a boat.
  • Sheila and I enjoyed our company Holiday party in the One Ocean Resort in Florida
  • We wrapped up the year visiting family for the holidays in Kentucky.

Overall, we were all over the country and even the world. We were blessed to have the opportunities to experience so much this year.

Changing Employers

In the middle of all the travel, I celebrated 10 years at Magenic in March and transitioned to Pragmatic Works in October. I loved working at Magenic. During this year, I came to the realization that I wanted to focus more on data and BI solutions, so I made the move to Pragmatic Works. I enjoy my new company as much as my old one which is very good. Thanks to everyone at both places for supporting me and my career.

More…

This past year, I also contributed to my third book. Hopefully you found it helpful. I also did a first for me this year, I reblogged a post from a friend and fellow Scouter, Jim Larson. His PowerShell work is awesome and I wanted to share it with my readers as well.

Thanks to My Readers

Finally, I wanted to thank all my readers. I appreciate your support. It has been cool to see my readership increase this year. I hope you find value in the technical content here. I look forward to hearing from you or even better, seeing you at SQL Saturdays and other events throughout the year.

Here’s to a great year in 2015!

Lync 2013 Video Issues on Windows 8

Part of the reason I have a blog is to document issues and resolutions I do not want to forget. Yesterday morning I was on two calls using Lync and the video was blank or white. I had great audio and messaging worked fine. So the only part that was not functioning was the video. I have been using Lync for years usually the problem was related to connectivity.

I started by leaving and rejoining both calls multiple times. That was primarily just annoying with no change in the video issue. Time to search. So, giving credit to whom credit is due, I found the following blog post by Shay Atik – Lync 2013 Desktop Sharing Shows White Screen. Turns out you need to remove a registry entry related to IE and ActiveX. I am copying the steps that Shay gives here and letting you know it works.

From Shay’s blog:

1. Open the Registry Editor (Start + R -> regedit -> OK).

2. Backup the registry (just in case): File -> Export.

3. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\ActiveX Compatibility and delete the {00000000-0000-0000-0000-000000000000} expandable folder.

4. Mission completed. Run Lync desktop sharing, and you’re good to go.

Hopefully this helps someone else, and thanks Shay for posting this.