Data Technologies Archives - Anuj Varma, Hands-On Technology Architect, Clean Air Activist https://www.anujvarma.com/category/technology/data-technologies/ Production Grade Technical Solutions | Data Encryption and Public Cloud Expert Wed, 21 Apr 2021 17:00:37 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://www.anujvarma.com/wp-content/uploads/anujtech.png Data Technologies Archives - Anuj Varma, Hands-On Technology Architect, Clean Air Activist https://www.anujvarma.com/category/technology/data-technologies/ 32 32 Physical Database Design and Tuning – Oracle or SQL Server https://www.anujvarma.com/physical-database-design/ https://www.anujvarma.com/physical-database-design/#respond Wed, 24 Mar 2021 19:25:31 +0000 https://www.anujvarma.com/?p=8205 Troubleshooting Database Performance  – 3 Broad Categories Physical Database Design Query Statement Tuning DB Configuration Physical Database Design INDEXING – Look for fragmentation, If frag > 0% , try rebuild […]

The post Physical Database Design and Tuning – Oracle or SQL Server appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
Troubleshooting Database Performance  – 3 Broad Categories

  1. Physical Database Design
  2. Query Statement Tuning
  3. DB Configuration

Physical Database Design

  1. INDEXING – Look for fragmentation, If frag > 0% , try rebuild indices.
  2. Filegroups (File Placement, Object Placement) – Ldf (log files) and mdf (data files) on separate drives – SQL Server
  3. Partitioning – Horizontal or vertical (columnar).
  4. Denormilazation
  1. RE-INDEXING Notes – Look for fragmentation, If frag > 0% , try rebuild indices. Dbcc indexdefrag (allows db to stay online). dbcc dbreindex (also rebuilds statistics, also works with constraints on indices)
  2. FILEGroups Notes – SYSTEM Objects – Primary Filegroup, USER Objects in a separate filegroup,, TRANSACTION Log – Separate Volume – Lessens I/O load. Also, TEXT/IMAGE Data best in a separate filegroup.

Query Statement Tuning

  1. Look primarily for full table scans and nasty joins.
  2. Find Queriies that have a high execution count (run frequently)  – e.g. Select execution_count, physical_reads, logical_reads FROM SYS..dm_exec_query_stats a, SYS.dm_exec_cached_plans b, SYS.syscacheobjects c
  3. Subqueries vs Joins – While both do the same thing, look at the explain plan for efficiency.  Subquery is only better if an aggregate is being calculated and fed back on the fly. JOIN is better when columns from different tables are needed.

Physical Reads vs Logical Reads

Physical Reads should only happen if data is not in the buffer cache (logical read). High Physical Reads is also a symptom

Truncate vs. Shrink – Reduce Log Sizes

Truncate and Shrink Full Transaction Log  (SHRINK is what actually reduces the file size).

Indices – Clustered vs. Non Clustered

  • Clustered Seeks are fastest, unless a non-clustered includes two or more columns, in which case non clustered could be faster. INSERTS and UPDATES always faster on a clustered index.
  • Clustering – Active Active vs. Active Passive

Primary Index vs. Unique Index

Primary always creates a clustered index. Unique can be non clustered.

Buffer Pool vs. Buffer (Data) Cache Hit Ration.

Overall Process Space – Buffer Pool in SQL Server

Memory used for data cache – Data Cache.  The hit ration here is important (can obtain it from a Windows Perf Counter – Buffer Manager)

DB Configuration – Recovery Model

  • Simple – Most recent backup
  • Full – Regular Backup upto a point of failure
  • Bulk Logged –

Potential Data Type Mismatches (Oracle to SQL Server )

bfile -> Not in SQLserver

nClob –> nText

raw -> Varbinary

Special Data Types – Spatial Data Types

  • Need special treatment – User Data Types e,g, UDT
  • e.g. Geometry and Geography. STGeomFromText(‘LINESTRING(…..)

Summary

This is meant to be a quick recap of the first places to look for tuning your database performance.

Need an expert to help out with your Database Design or  Strategy? Set up a time with Anuj Varma.



The post Physical Database Design and Tuning – Oracle or SQL Server appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
https://www.anujvarma.com/physical-database-design/feed/ 0
Couchbase vs DynamoDB https://www.anujvarma.com/couchbase-vs-dynamodb/ https://www.anujvarma.com/couchbase-vs-dynamodb/#respond Sun, 28 Jul 2019 16:52:24 +0000 https://www.anujvarma.com/?p=6139 Couchbase Advantages Run on almost any cloud platform – including AWS Avoid DynamoDB’s item-size restrictions Speed up performance with in-memory processing and built-in caching Use your team’s existing SQL skills […]

The post Couchbase vs DynamoDB appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
Couchbase Advantages

  • Run on almost any cloud platform – including AWS
  • Avoid DynamoDB’s item-size restrictions
  • Speed up performance with in-memory processing and built-in caching
  • Use your team’s existing SQL skills for writing complex queries
  • Cut license and support costs by up to 50% compared to DynamoDB

The post Couchbase vs DynamoDB appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
https://www.anujvarma.com/couchbase-vs-dynamodb/feed/ 0
MariaDB auto update statistics https://www.anujvarma.com/mariadb-auto-update-statistics/ https://www.anujvarma.com/mariadb-auto-update-statistics/#respond Tue, 19 Jun 2018 21:41:00 +0000 http://www.anujvarma.com/?p=5357 To check if auto update is enabled on statistics, try this command show variables like '%metadata%'; If you see an output such as: innodb_stats_auto_recalc = 1, you're all set.. If […]

The post MariaDB auto update statistics appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
To check if auto update is enabled on statistics, try this command

show variables like '%metadata%';
If you see an output such as:
innodb_stats_auto_recalc = 1, you're all set..

If you see an output such as:

 innodb_stats_on_metadata | ON    |

it means that statistics get updated whenever metadata on the table is requested, which is typically enough. But , you may still need to set that first variable –

innodb_stats_auto_recalc = 1

Here is some more info on this topic – https://mariadb.com/kb/en/library/xtradbinnodb-server-system-variables/#innodb_stats_auto_recalc

For cloud migration projects or cloud consulting on AWS, GCP or Azure, contact Cloud Migration Architect

The post MariaDB auto update statistics appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
https://www.anujvarma.com/mariadb-auto-update-statistics/feed/ 0
Types of Non Relational Data https://www.anujvarma.com/types-of-non-relational-data/ https://www.anujvarma.com/types-of-non-relational-data/#respond Sun, 04 Feb 2018 18:10:00 +0000 http://www.anujvarma.com/?p=5123 The post Types of Non Relational Data appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
image

The post Types of Non Relational Data appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
https://www.anujvarma.com/types-of-non-relational-data/feed/ 0
The BigData Landscape https://www.anujvarma.com/the-bigdata-landscape/ https://www.anujvarma.com/the-bigdata-landscape/#respond Thu, 10 Aug 2017 14:18:55 +0000 http://www.anujvarma.com/?p=4919 This is a Work in Progress… Pre Processing of Data  (Un-Structured) Map Reduce Pre Processing of Data  (Structured or Semi-Structured) PIG Hive Hadoop (see below) Statistical Analysis (After Pre-Processing) R […]

The post The BigData Landscape appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
This is a Work in Progress…

Pre Processing of Data  (Un-Structured)

  • Map Reduce

Pre Processing of Data  (Structured or Semi-Structured)

  • PIG
  • Hive
  • Hadoop (see below)

Statistical Analysis (After Pre-Processing)

  • R is used for statistical analysis which happens after processing of data . However there is some limitation on size of data which can be used.

Hadoop

  • covers both data storage and data processing at massive scale.
  • PIG and HIVE are tools which belong to Hadoop.

The post The BigData Landscape appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
https://www.anujvarma.com/the-bigdata-landscape/feed/ 0
Running out of disk space? Sharding https://www.anujvarma.com/running-out-of-disk-space-sharding/ https://www.anujvarma.com/running-out-of-disk-space-sharding/#respond Tue, 20 Jun 2017 21:18:17 +0000 http://www.anujvarma.com/?p=4779 What is automatic sharding? Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards.

The post Running out of disk space? Sharding appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
What is automatic sharding?

Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards.

The post Running out of disk space? Sharding appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
https://www.anujvarma.com/running-out-of-disk-space-sharding/feed/ 0
Data warehouse versus data marts https://www.anujvarma.com/data-warehouse-versus-data-marts/ https://www.anujvarma.com/data-warehouse-versus-data-marts/#respond Tue, 03 Jan 2017 16:41:42 +0000 http://www.anujvarma.com/?p=4445 Most data warehousing initiatives fail (mainly because this level of standardization slows down an agency/company enough that the project gets derailed; boiling the ocean phenomenon). Avoid building a Data Warehouse […]

The post Data warehouse versus data marts appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
Most data warehousing initiatives fail (mainly because this level of standardization slows down an agency/company enough that the project gets derailed; boiling the ocean phenomenon).

Avoid building a Data Warehouse right away, but approach it in a slightly different manner. 

Build individual data marts instead; each department gets to own its own data mart. These individual data marts would still follow a common standardized technical architecture; and would be able to talk to each other.

For e.g. definitions and metadata in each data mart should follow the same convention.

This paves the way for a final data warehouse – which could simply be a loosely coupled conglomerate of these independent data marts.

The post Data warehouse versus data marts appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
https://www.anujvarma.com/data-warehouse-versus-data-marts/feed/ 0
Multiple copies of data–and editability https://www.anujvarma.com/multiple-copies-of-data-and-editability/ https://www.anujvarma.com/multiple-copies-of-data-and-editability/#respond Wed, 30 Nov 2016 19:00:23 +0000 http://www.anujvarma.com/?p=4402 A book has an author –an Author has multiple books. A book would be modeled as a document in NoSQL – as would an author. So  – we end up […]

The post Multiple copies of data–and editability appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
A book has an author –an Author has multiple books. A book would be modeled as a document in NoSQL – as would an author. So  – we end up with two documents – Book and Author. Each book has a unique title as well as an Author associated with it. This book-author association is part of each Book document.

Case 1 – Changing an Attribute that belongs on multiple documents

Now, supposing the Book title changes –let us say there is a new edition of a textbook. Do you have to update thousands of occurences of the Book document with the new title? The answer is – Yes.  However, with a little design change, you can avoid the multiple updates. Instead of having Title as an attribute of the book, suppose you separate Title into its own document (say a document called BookMetaData). Now, each book just has a BookMetaData ID associated with it. If the title of the book changes, one simply needs to update it in the BookMetaData document – and all the associated books will automatically pick up the change.

Couchbase’s alternative to handle multiple document updates

Couchbase offers something of a shortcut – using a view collation. With collated views, you can have a single query spanning all the documents that you might need.  With views, Couchbase Server allows one to keep a single canonical source of an item of data while having it show up in many different places.

NoSQL’s mantra – Denormalize, Denormalize !

The relational data model rigidly ties one to database schemas. One resorts to normalization of data and performs joins to perform complex queries.  More recently though, changes in application characteristics have led application developers to non-relational database technologies.  One can view distributed document database technology as a natural successor to relational database technology:

  1. It effortlessly scales across virtual machines or cloud instances.
  2. It doesn’t tie you to a rigid schema before inserting data, nor does it require a schema change when different data must be captured and processed.
  3. Its rich data model and view technology allows for complex data modeling, capture and queries.

Summary

One of the most frequent criticisms of NoSQL is that – updates of any document element suck ! Essentially, an update could require thousands of documents to be simultaneously updated. However, this limitation is easily overcome by allowing a slightly modified design – by separating out the ‘frequently updated’ info into its own document

The post Multiple copies of data–and editability appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
https://www.anujvarma.com/multiple-copies-of-data-and-editability/feed/ 0
Tracking relationships in NOSQL https://www.anujvarma.com/tracking-relationships-in-nosql/ https://www.anujvarma.com/tracking-relationships-in-nosql/#respond Tue, 09 Aug 2016 16:06:22 +0000 http://www.anujvarma.com/?p=4310 In NoSQL, there is no way to ‘relate’ the post with the comments. So, what do you do? Well – you essentially store the postId and the commentId – for […]

The post Tracking relationships in NOSQL appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
In NoSQL, there is no way to ‘relate’ the post with the comments.

So, what do you do?

Well – you essentially store the postId and the commentId – for EACH comment (i.e. , you store post1,comment1, post1,comment2….and so on)

This storage will work – but will be optimized for one type of query (all comments for a given post)
If you have another type of search (say, all Users who commented on this article), you are screwed. You did not store the userId along with the commentId – so again, you will be back to the drawing board.

However, if all you really care about is getting all comments on a post (first type of query), you are not only set, you will have noticeably faster retrieval times (compared to the relational model). Especially as the data set gets larger and larger.

The post Tracking relationships in NOSQL appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
https://www.anujvarma.com/tracking-relationships-in-nosql/feed/ 0
NoSQL and data integrity https://www.anujvarma.com/nosql-and-data-integrity/ https://www.anujvarma.com/nosql-and-data-integrity/#respond Tue, 14 Jun 2016 20:41:33 +0000 http://www.anujvarma.com/?p=4237 Redundant Data Storage NoSQL stores many to many relationships in the same way that de-normalized tables do – by storing them redundantly. Since you do not base your NoSQL design […]

The post NoSQL and data integrity appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
Redundant Data Storage

NoSQL stores many to many relationships in the same way that de-normalized tables do – by storing them redundantly. Since you do not base your NoSQL design on relationships between data, you database design is driven by the type queries that will run against it.

You would use the same design methodology  here that you would use to denormalize a relational database:  if query performance is of utmost importance, you would flatten (de-normalize) your database – to accommodate the query in question. 

This optimizes your tables for one type of query at the expense of other types of queries.  If your application has the need for both types of queries to be equally optimized, you would be better off not de-normalizing and not NoSQL ing it.

Integrity Violation with DeNormalized Data

There is a risk with denormalization  – that data (or entire sets of data) will get out of sync with one another. This is called an integrity violation or a data anomaly. A normalized relational database (RDBMS) is DESIGNED to prevent  such integrity violations.

How does NoSQL PREVENT integrity violations?

In a denormalized database and in NoSQL, it becomes the programmer’s responsibility to write application code to prevent integrity violations.  

Summary

The post NoSQL and data integrity appeared first on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.

]]>
https://www.anujvarma.com/nosql-and-data-integrity/feed/ 0