IT Explained

IT Explained:

Database




What is a database?


A database is a searchable collection of usually related information (data) that is electronically stored. The function of a database is not only to store data, but also to store it in a format that enables efficient searching and fast information retrieval, and to ensure data security.

Database functions themselves are procedures that perform particular operations on data in a database. Database functions include basic CRUD operations, CRUD being an acronym for create, read, update, and delete data. In different computer languages, these basic operations may have different names, for example insert instead of create.  

A database is managed by a database management system (DBMS) or a relational database management system (RDBMS).

Database systems are a blend of physical hardware that stores data, complex DBMS or RDBMS software, and different computer languages to access and manipulate data.

Where are databases used?

 

History

Before the first electronic database, filing cabinets that stored and organized information like paper documents, and could efficiently be indexed and searched, were the non-electronic databases of the time.

Charles Bachman designed the first computerized database, the Integrated Data Store (IDS), in the early 1960s. IDS introduced the concept of the navigational database, which had two models, hierarchical or network, both of which were not searchable. Searchable electronic databases were introduced in the 1970s together with structured query language (SQL), which provided a way to perform databases searches. In the 2000s, NoSQL was developed to handle the plethora of unstructured data on the internet. In the 2010s, influenced by increasing amounts of big data and the globalization of the Internet of Things (IoT), a distributed database model was developed to allow the storage of data in multiple physical locations.

   

Database applications

Anywhere that data needs to be stored, databases are used. Common examples of where databases are used include banking systems, industrial applications, government records, retail, eCommerce, personal finance, and most types of online applications. Modern uses of databases include social networking, mobile computing, cloud, and data analytics applications. These types of applications have influenced the growth of new types of databases and the use of blended database systems and frameworks.

Databases are characterized by the type of data that they store, the method used to access data, or the way that they store data. Examples include relational, in-memory, hierarchical, virtualized, columnar, graph, object, distributed, streaming, time series, and cloud databases. Databases may also be categorized by function or industry, like personal, commercial, end-user, blockchain, operational, and network databases.

 

Structured and unstructured data

The kind of data that needs to be stored and accessed determines what kind of database is used in an application. There are two types of data stored by databases, structured and unstructured data. Structured data has a defined length and format, for example numbers, dates, and strings. It is stored in a relational database and retrieved using software called SQL. Unstructured data includes multi-media and document collections. It is stored in non-relational databases and primarily accessed using NoSQL. The main difference between relational and non-relational databases is that the former explicitly defines relationships between data objects.

Relational databases


In relational databases, data is organized into rows and columns in separate tables. The data structure is separate from the physical structure, and data can be located using a search algorithm based on unique relationship identifiers, which are called keys. Relational databases use schemas, which are blueprints that describe the way the data is organized. Relational databases are designed to enable normalization, i.e. that any piece of data is only stored in one place. Almost the polar opposite of a hierarchical database, a relational database uses a many-to-many relationship model. Relational databases are row-oriented while non-relational databases are column-oriented.

 

SQL query language

A relational database is also called an SQL database because it uses the computer query language SQL to find, retrieve, modify, and delete data.

Database languages can be categorized as data definition languages or data manipulation languages. Data manipulation languages include data query languages and procedural languages. Procedural languages include support for coding processes like iteration and recursion.

SQL is a data definition language, not a procedural language. SQL is often embedded in a general-purpose host language, like Cobol, C++, Python, or Java, which can perform iteration and recursion functions. 

Popular examples of relational databases include Microsoft SQL Server, MySQL, Oracle Database, and IBM DB2.

 

Database monitoring with PRTG in 3 minutes

Non-relational databases


The use of high volumes of unstructured data in modern applications triggered the development of non-relational databases. Non-relational databases are designed to address the requirements for increased scalability. Non-relational databases are called NoSQL databases to distinguish them from SQL, or relational, databases, and because they do not use the SQL query language. NoSQL is an acronym for not SQL but can also mean not only SQL.

 

NoSQL languages

Different types of NoSQL databases have their own query languages, many of which are loosely based on SQL. Cassandra Query Language (CQL) is used to query Cassandra databases. MongoDB has drivers that enable it to interact with different languages, like Java or C#, and has its own internal shell language, MongoDB Query Language (MQL), that is based on JavaScript.

 

NoSQL data store models

The term data store refers to the storage model that is being used by a non-relational database, for example data may be stored as documents, images, or graphs. NoSQL databases are based on four main data store models: key-value store, document store, column-oriented store, and graph store. Less common, more specialist models are time-series, object, and external index stores.

Key-value store

A key-value pair, or name-value pair, is a unit of data that is identified by a name and the value of the content, for example key=country and value=bolivia. Using key-value pairs, developers can create open-ended, highly-scalable data structures that are not constrained by size or type, and are easily partitioned to enable faster queries. Databases that are based on the key-value model are regarded as the simplest version of NoSQL databases. Examples of key-value databases are Redis and Amazon DynamoDB. 

A NoSQL database is often referred to generically as a key-value database. Some NoSQL databases may be regarded as hybrid key-value databases. For example, Facebook’s Cassandra is a key-value database and column-oriented database. Oracle NoSQL database is a distributed key-value database.

Document store

A database that is based on the document store model is designed to efficiently store documents as a single instance. Media companies like newspaper houses and knowledge base systems use document-store databases to store articles, synopses, blogs, and posts. A document-store database stores documents in their entirety, making each document easily accessible. If the documents were to be stored in a relational database, a typical article would have to be split into composite, logical parts – for example author information, content, and comments from readers – and then cross-referenced on a key such as author. Examples of document-store databases are MongoDB and Elasticsearch.

Column-oriented store

Relational databases are row-oriented while non-relational databases are column oriented. Databases that are based on the column-oriented model store the multiple columns that are found in a relational database’s rows separately with individual IDs. A column-oriented database is able to scan only the columns it is interested in when searching for data, while row-oriented databases must scan all the columns in a record when filtering data on a specific column. An example of a column-oriented database is Facebook’s Cassandra. 

Graph store

Graph stores refer to collections of relationships. Databases that are based on the graph-store model are used extensively by social media platforms as they are designed to connect nodes with multiple edges, creating clusters of related information. In a graph database, a node is a primary data entity, for example a person or a business. Edges describe the relationships a node has with other nodes and edges, for example a business’ customers, branches, staff, products, etc. Edges in a graph database are themselves nodes, connected to different edges. An example of a graph database is Neo4j.

 

 database sql nosql

 

NoSQL databases

Centralized database

A centralized database operates in a single location, for example a university campus database. A centralized database is not the same as a centralized database manager. Modern enterprises use centralized database management systems that consist of multiple databases to store and manage disparate data. Centralized database management helps organizations to ensure data security, integrity, consistency, and operational efficiency.

Cloud database

In a cloud environment, a cloud database is a database service. An organization may purchase virtual space from a cloud provider to which it can deploy its database. Alternatively, an organization may purchase a subscription for a database as a service (DBaaS) offering, which may include database management, maintenance, and operational services. DBaaS is the database equivalent of software as a service (SaaS). Cloud databases are typically used by organizations where there are high traffic volumes and whose customers are distributed geographically. Examples of cloud databases include MySQL, IBM Db2, and Microsoft SQL Server.

Columnar database

A columnar database is a DBMS that stores data in columns instead of rows. Columnar databases are used mainly in analytical applications as data can be accessed very quickly, and in data warehouses. In row-based databases, rows are stored in contiguous blocks on a disk. In columnar databases, columns are stored in contiguous blocks on a disk. Where a business needs to retrieve and analyze data for a specific column or group of columns only and wants to ignore all other data in a record, columnar databases are more efficient. Examples of column-oriented DBMSs include ClickHouse, MariaDB, and Apache Cassandra.

Distributed database

In a distributed database, data is stored across multiple physical locations or over multiple computers at the same physical location. For users, the database appears to be a single database.

Flat files

Flat files are text files in which data entries are usually separated by delimiters like commas. Flat files are non-relational data storage files but the data can be integrated with relational database data using database management applications like Microsoft Access.  A common example of a flat file is a comma-separated values (CSV) file. 

Hierarchical database

In a hierarchal database, data is stored in a tree structure. Data is retrieved from a hierarchical database by traversing the tree structure from the top down using pointers. 

A hierarchical database uses a one-to-many relationship model, where a parent node may have multiple child nodes. An example of a hierarchal data storage model is the way folders and files are presented in file managers like Windows Explorer. 

In-memory database

An in-memory database (IMDB) stores data on volatile memory devices or in a computer's short-term memory, or RAM, enabling faster data access. Traditionally, if power is lost, the data is lost. In modern systems, non-volatile random-access memory technology allows the maintenance of in-memory data even if there is a power failure. One of the ways this is achieved is through automatic failover.

In the 1970s when relational databases were created, memory was expensive. Today, applications can efficiently run using IMDBs without losing any information in the event of a power failure. Applications that require high performance and low latency – like online gaming, geospatial processing, machine learning processing, medical device analysis – may use non-volatile IMDBs in conjunction with disk-based databases.

Many databases include in-memory options, like memcached and Redis. The Apache Ignite platform creates an in-memory layer over any existing database. Examples of “pure” IMDBs include SQLite, Exasol, SAP HANA, and IBM solidDB.

JSON database

Like XML, JavaScript Object Notation (JSON) is described as a rudimentary database. JSON is an open data interchange format that is used to describe data. It is also a type of document database that may be used to store catalogs on e-commerce sites and to provide live updates on websites.

TaffyDB is a flat file database system that uses JSON to store data.  

Network database

A network database allows records from multiple tables to be associated with a single record from another table. Unlike a traditional relational database that is key based, a network database is pointer based.

A network database model is regarded as an advanced version of a hierarchical data model as it uses a graph structure instead of a tree structure. In network databases, parent nodes are called occupiers and child nodes are called members. In a network database, a member node may have more than one occupier node. The occupier and member nodes form a set. The entities in this model have a many-to-many relationship.

Object-oriented database

An object-oriented database stores data in the form of objects as used in object-oriented programming. In object-orientated systems, objects can be persisted. Persisted data is data that is kept for a longer period than what it was initially created for and stored on non-volatile storage devices. An object-oriented database uses a many-to-many relationship model. Data is accessed using pointers. Examples of object-oriented databases are Magma and Realm’s mobile database.

Object-relational database

Object-relational databases are hybrid databases that combine relational database and object-oriented database characteristics. The distinguishing characteristic of object-relational databases is that they support aggregate types, for example lists and sets. An example of this is a column called address in a relational database that may include a list of related information like residence and holiday homes, etc. Examples of object-relational database management systems (ORDBMS) are PostgreSQL and Oracle.   

Online transaction processing (OLTP) database

An OLTP database system deals with multiple transactions that are simultaneously performed by multiple users, for example in e-commerce applications. An OLTP database usually manages small amounts of data at a time, for example performing an online banking transaction, but must support a large number of users. Databases often used for OLTP include MySQL, InterSystems Caché, and VoltDB, a NewSQL database that is compliant with atomicityconsistencyisolationdurability (ACID).

Personal database

Any physical medium that can store data is a database, including a hard drive on a computer. However, the databases used and discussed in IT are usually complex data storage models.

An example of personal data storage is the data stored on personal devices, like personal computers, cell phones, tablets, and external hard drives. This data is accessed and managed by applications like Microsoft Office, systems applications that store configuration and user data, and functional applications.

Personal devices also make use of IMDBs like RAM to improve device performance.

XML database

Like JSON, an Extensible Markup Language (XML) database is described as a rudimentary database. XML is a type of document database in the sense that XML files are stored with a the extension .xml, they can store data that complies with the XML format, and they can be queried using a language query tool like XQuery.

 

Hybrid database systems

The unique features of relational and non-relational databases in modern applications have merged in some database implementations to create multiple-model database systems. For example, OrientDB is a NoSQL graph database whose nodes are documents.

A database that integrates relational and non-relational database features is another type of hybrid database, for example NewSQL. 

A third type of hybrid database is a composite of an IMDB and an on-disk database. In this type of hybrid database, data can be stored in the main memory, on a disk, or using both storage mechanisms. An example of this type of hybrid database is Altibase.

 

Your database monitor PRTG at a glance
 

DBMSs


DBMS functions are software components that manage groups of related tasks. DBMS functions include dictionary, storage, transformation, presentation, security, access control, backup and recovery, data integrity, access, interface, concurrency, and transaction management modules.

For example, a DBMS stores data elements and metadata in a data dictionary (DD). A DBMS enables the storage of not only data but also data-related elements like validation rules, forms and report definitions, and schemas that describe the database structure. DBMS functions are interrelated, for example an access control module manages access to data; multiple access control enables multiple users to access the database without compromising data integrity.

Data is stored, accessed, and managed differently in RDBMSs and DBMSs. In an RDBMS, data is stored in rows in tables and accessed using a unique row identifier called a primary key. An RDBMS supports normalization and distributed databases, and supports multiple users. In a DBMS, data is either stored in a navigational or a hierarchical form. A DBMS does not support normalization or distributed databases, and only supports single users.

Database core characteristics

 

There is no formal specification for what constitutes a database but there are several de facto standards that provide guidelines for designing databases that comply with international data regulations.

Consistency

One of the core characteristics of databases, namely consistency, is based on one of two theoretical models: the ACID model (atomicity, consistency, isolation und durability) and the BASE model (basically available, soft state, eventual consistency).

A database that uses the ACID model guarantees that data is safely and consistently stored, i.e. when a process stores data and signals that the data has been stored, there is no possibility that data was lost along the way. Relational databases always adopt the ACID model and some NoSQL graph databases are designed to comply with the ACID model.

A database that uses the BASE model focuses on data being highly available for scaling purposes. Non-relational databases are usually designed to comply with the BASE model.

Normalization

Normalization is the process of organizing data in a database. Normalization involves creating tables and mapping relationships between tables according to specific rules designed to protect the data, to eliminate redundancy (duplication), and to ensure that there are no anomalies. An example of a typical anomaly that normalization prevents is creating information about a sale without first creating information about the product to be sold.

Persistence

Data persistence ensures that any information that is saved can be recovered even if the storage media or system that uses the data is disconnected. An example of persistence is the ability for a user to recover their recent browsing windows after their computer crashes.

Scaling

Scalability is the capability of a database to handle growth such as increased volumes of data and more users. For example, social media networks need to be very scalable to manage people constantly networking with new people, posting messages, and uploading data.

Data integrity

Data integrity refers to the accuracy, completeness, consistency, security, and safety of data. Data integrity is essential for regulatory compliance, like the General Data Protection Regulation (GDPR).

Cardinality

Cardinality refers to the relationship of data in a database table with respect to another table and is related to the concept of data integrity. An example of cardinality is the specification of whether the relationship between data values is many to many, one to one, or many to one, for example a person may only have one social security number. Cardinality is important because it impacts how efficiently queries are performed on databases to retrieve information.

Database pros

 

Relational databases

Relational databases provide better consistency and reliability than non-relational databases, and mitigate data redundancy. Relational databases provide easy access to data through a simple, proven query model, high security, data integrity, and normalization.

Relational databases support ACID properties, ensuring reliable database transactions. They support complex queries and offer unlimited indexing.

 

Non-relational databases

Non-relational databases provide better performance and process high volumes of data faster than relational databases. They offer high availability and do not require a schema.

Non-relational databases use horizontal scalability, which means that they expand by adding additional nodes. Scalability in non-relational databases is more cost effective than in relational databases.

Non-relational databases can store and manage disparate types of data – like news feeds, audio and video streams, and data from mobile apps – without modifying the architecture.

Databases cons

 

Relational databases

Compared to other types of databases, relational databases are slow and consume more memory space because of the way they physically store data. The process of normalization may require a lot of joins (mapping tables) which also slows the rate of data retrieval.

Relational databases use vertical scalability, which means that they expand by increasing the capacity of existing nodes. Scalability in relational databases is more costly than in non-relational databases.

Relational databases can become complex when there are many tables and it may become increasingly complicated to interact with other similarly complex systems.

Relational databases are more costly to maintain and do not support unstructured data. They also have limits to field lengths.

 

Non-relational databases

Non-relational databases do not use a standardized query language and custom languages developed by vendors are often not as powerful as SQL. They do not have as many tools available as relational databases.

Non-relational databases are not as secure as relational databases. They neither have built-in data integrity nor do they support ACID, and instead rely on “eventual consistency”.

NoSQL databases practice a lower level of normalization than SQL databases and therefore may contain data redundancies.

There is little uniformity between non-relational databases or standardized interfaces for them to communicate with each other.

Relational + non-relational = NewSQL


Two of the main reasons for using a database are for fast and efficient access to data, and to ensure data persistence. Modern applications – from social networking sites, weather forecasting, and artificial intelligence – require enormous amounts of data processing in real time. While, statistically, relational databases are still used the most, modern database systems often integrate a variety of open source or propriety databases, DBMSs, physical and virtual storage, and database frameworks.

NewSQL is a relational database management system that has the scalable properties of a NoSQL database. NewSQL databases support ACID, OLTP, vertical and horizontal scaling, complex queries, multiple store types, and distributed database systems.

An example of a modern, combination database system is Spanner by Google. Spanner is a distributed SQL database management and storage service but is usually referred to as a cloud-hosted NewSQL database. While relational databases support SQL and are more consistent than NoSQL databases, they do not scale as well as NoSQL databases. Spanner is a relational database but provides the kind of functionality NoSQL databases were designed for, like global scalability, optimized performance effected by automatic sharding, high availability, and minimal latency. Sharding is the process of distributing data from an overloaded database into multiple smaller databases to distribute load.

Other examples of NewSQL database products and services include MemSQL, Infobright, ScaleBase, and TransLattice.

 

Database monitoring


Database monitoring
ensures optimal data performance and use of resources. For example, SQL performance is measured by how quickly and efficiently SQL queries perform, and storage planning helps to estimate the real required capacity before storage limits are reached.

Proactive database monitoring identifies potential issues before they become major security breaches and is mandatory to comply with data privacy and protection legislation, like GDPR.

How a database performs has a direct influence on uptime, speed of web page loading, geographic performance, efficient hardware utilization, and customer experience.