It’s always imaginable that developers are always flexible to proceed with relational databases while migrating an existing on-premise app to Azure platform while leveraging best possible architectural guidelines on migration to cloud. But, still forth in real-time cases , typical scenarios like suboptimal performance, high expenses, or worse case scenario because, NOSQL db can handle some tasks more efficiently than relational databases can. In few enterprise cases, it’s encountered a critical data storage problem, as because NOSQL solution implementation have been better off before deploying its app to production.
Moreover, there’s no single best data management choice for all data storage tasks, different data management solutions are optimized for different tasks. Let’s have a quick walk-through on various data storage option models supported on Microsoft Azure.
Let’s start first by four types of NOSQL db supported now Azure.
- Key/value pair databases: store a single serialized object for each key value. They’re good for storing large volumes of data in situations where you want to get one item for a given key value and you don’t have to query based on other properties of the item.
- Azure Blob Storage : It’s also a key/value based data storage which is same like as file system in functionality where you could search a file based on it’s folder/file name as key not file content as key. Blob offers read-write storage options (aka Block Blob) for storing large media files as well as for standard streaming purpose facilitates the usage of VHDs as Page Blob(aka Azure Drive).
- Azure Table Storage : A standard key-value pair based NOSQL storage option prevailed from Azure storage inception phase. Each value is called an entity (similar to a row, identified by a partition key and row key) and contains multiple properties (similar to columns, but not all entities in a table have to share the same columns). Querying on columns other than the key is extremely inefficient and should be avoided.
- Document Databases : Popular key/value databases in which the values are documents. “Document” here isn’t used in the sense of a Word or an Excel document but means a collection of named fields and values, any of which could be a child document. For example, in an order history table, an order document might have order number, order date, and customer fields, and the customer field might have name and address fields. The database encodes field data in a format such as XML, YAML, JSON, or BSON, or it can use plain text. One feature that sets document databases apart from other key/value databases is the capability they provide to query on nonkey fields and define secondary indexes, which makes querying more efficient. This capability makes a document database more suitable for applications that need to retrieve data on the basis of criteria more complex than the value of the document key.
Example : Mongo DB.
- Column-family databases : key/value pair based data storage enables to structure data based on collections of columns called ‘Column families‘. For example, a population database consists of one group of column called ‘Persons’ (containing firstname, middlename, lastname) , one group for person’s address & another for profile info. The database can then store each column family in a separate partition while keeping all of the data for one person related to the same key. You can then read all profile information without having to read through all of the name and address information as well.
Example : Cassendra , Apache HBase (in preview supported with HDInsight as NOSQL Blob Storage)
- Graph databases : Stores data in form of objects & relationships.The purpose of a graph database is to enable an application to efficiently perform queries that traverse the network of objects and the relationships between them. For example, the objects might be employees in a human resources database, and you might want to facilitate queries such as “find all engineers who directly or indirectly work for Product Manager.”
Example : Neo4j Graph Database.
Compared with relational databases, the NoSQL options offer far greater scalability and are more cost effective for storage and analysis of unstructured data. The tradeoff is that they don’t provide the rich querying and robust data integrity capabilities of relational databases. NoSQL options would work well for IIS log data, which involves high volume with no need for join queries. NoSQL options would not work so well for banking transactions, which require absolute data integrity and involve many relationships to other account-related data.
- A brief about NewSQL : Combines the scalability features of NOSQL along with distributed querying & transactional integrity of OldSQL.
- The first type of NewSQL systems are completely new database platforms. These are designed to operate in a distributed cluster of shared-nothing nodes, in which each node owns a subset of the data. Though many of the new databases have taken different design approaches, there are two primary categories evolving. The first type of system sends the execution of transactions and queries to the nodes that contain the needed data. SQL queries are split into query fragments and sent to the nodes that own the data. These databases are able to scale linearly as additional nodes are added.
- General-purpose databases
- These maintain the full functionality of traditional databases, handling all types of queries. These databases are often written from scratch with a distributed architecture in mind, and include components such as distributed concurrency control, flow control, and distributed query processing. This includes Google Spanner, Clustrix, FoundationDB, NuoDB,TransLattice, ActorDB,andTrafodion.
- In-memory databases
- The applications targeted by these NewSQL systems are characterized as having a large number of transactions that (1) are short-lived (i.e., no user stalls), (2) touch a small subset of data using index lookups (i.e., no full table scans or large distributed joins), and (3) are repetitive (i.e. executing the same queries with different inputs).
- These NewSQL systems achieve high performance and scalability by eschewing much of the legacy architecture of the original IBM System R design, such as heavyweight recovery or concurrency control algorithms.
- Example systems in this category are:VoltDB, Pivotal‘s SQLFire and GemFire XD, SAP HANA.
- Example : NuoDB is supported in Azure as NewSQL.
- Key Points to Consider while choosing the Data Storage Options :
What is the core data storage and data access semantic (are you storing relational or unstructured data)?
Unstructured data such as media files fits best in Blob storage; a collection of related data such as products, inventories, suppliers, customer orders, etc., fits best in a relational database.
How easy is it to query the data?
What types of questions can be efficiently asked?
Key/value data stores are very good at getting a single row when given a key value, but they are not so good for complex queries. For a user-profile data store in which you are always getting the data for one particular user, a key/value data store could work well. For a product catalog from which you want to get different groupings based on various product attributes, a relational database might work better.
NoSQL databases can store large volumes of data efficiently, but you have to structure the database around how the app queries the data, and this makes ad hoc queries harder to do. With a relational database, you can build almost any kind of query.
Can questions, aggregations, and so on be executed on the server?
If you run SELECT COUNT(*) from a table in SQL, the DBMS will very efficiently do all the work on the server and return the number you’re looking for. If you want the same calculation from a NoSQL data store that doesn’t support aggregation, this operation is an inefficient “unbounded query” and will probably time out. Even if the query succeeds, you have to retrieve all the data from the server and bring it to the client and count the rows on the client.
What languages or types of expressions can be used?
With a relational database, you can use SQL. With some NoSQL databases, such as Azure Table storage.
Ease of scalability
How often and how much will the data need to scale?
Does the platform natively implement scale-out?
How easy is it to add or remove capacity (size and throughput)?
Relational databases and tables aren’t automatically partitioned to make them scalable, so they are difficult to scale beyond certain limitations. NoSQL data stores such as Azure Table storage inherently partition everything, and there is almost no limit to adding partitions. You can readily scale Table storage up to 200 terabytes, but the maximum database size for Azure SQL Database is 500 gigabytes. You can scale relational data by partitioning it into multiple databases, but setting up an application to support that model involves a lot of programming work.
Instrumentation and Manageability
How easy is the platform to instrument, monitor, and manage?
You need to remain informed about the health and performance of your data store, so you need to know up front what metrics a platform gives you for free and what you have to develop yourself.
How easy is the platform to deploy and run on Azure? PaaS? IaaS? Linux?
Azure Table storage and Azure SQL Database are easy to set up on Azure. Platforms that aren’t built-in Azure PaaS solutions require more effort.
Is an API available that makes it easy to work with the platform?
The Azure Table Service has an SDK with a .NET API that supports the .NET 4.5 asynchronous programming model. If you’re writing a .NET app, the work to write and test the code will be much easier for the Azure Table Service than for a key/value column data store platform that has no API or a less comprehensive one.
Transactional integrity and data consistency
Is it critical that the platform support transactions to guarantee data consistency?
For keeping track of bulk emails sent, performance and low data-storage cost might be more important than automatic support for transactions or referential integrity in the data platform, making the Azure Table Service a good choice. For tracking bank account balances or purchase orders, a relational database platform that provides strong transactional guarantees would be a better choice.
How easy are backup, restore, and disaster recovery?
Sooner or later production data will become corrupted and you’ll need an undo function. Relational databases often have more fine-grained restore capabilities, such as the ability to restore to a point in time. Understanding what restore features are available in each platform you’re considering is an important factor to consider.
If more than one platform can support your data workload, how do they compare in cost?
For example, if you use ASP.NET Identity, you can store user profile data in Azure Table Service or Azure SQL Database. If you don’t need the rich querying facilities of SQL Database, you might choose Azure Table storage in part because it costs much less for a given amount of storage.