Data Modeling

https://www.guru99.com/introduction-to-database-sql.html

Data Models define how data is connected to each other and how they are processed and stored inside a system.

The data model is a blueprint that visually communicates how information is organized in the biz.

A data model is one of the first steps towards building the foundation for a powerful and flexible data warehouse.

Data Modelling is the analysis of data objects and their relationships to other data objects. This is a process through which we define and analyze the data requirements needed to support the business processes of the corresponding information system. As a part of this process, we define data entities and their relationships in a system to build a strong data base design and architecture for the information system.

This is a very important step in the data warehousing project. Indeed, it is fair to say that the foundation of the data warehousing system is the data model. A good data model will allow the data warehousing system to grow easily, as well as allowing for good performance.

We create 3 models as we progress from requirements to the actual database which is to be used by the information system.

Levels of Data Modeling

Feature Conceptual Logical Physical
Entity Names  
Entity Relationships  
Attributes    
Primary Keys  
Foreign Keys  
Table Names    
Column Names    
Column Data Types    

Types of Data Modeling

The data in the data warehouse can be modeled in two ways:

  • ER Modeling
  • Dimension Modeling

Entity Relationship Modeling

An ER model is a high-level conceptual data model. Entity Relationship Modeling (ER Modeling) is a graphical approach to database design. It uses Entity/Relationship to represent real-world objects.

Entity: Real-world object distinguishable from other objects.

http://www.edugrabs.com/entity-and-its-types/

e.g. Customer, Product, Student, Employee, etc.

Entity names must be short and descriptive.

Attribute: Characteristics of an entity.

e.g. Customer Name, Customer ID, Customer Age, Customer Phone No. etc.

Attribute names must be descriptive, compound words.

Types of Attributes

Attribute Definition Example
     
Composite Can be split into components Date of Joining – day, month & year
Simple Cannot be divided into simpler components Age of an employee
     
Single-valued Can take on only a single value for each entity instance Age of an employee
Multi-valued Can take many values Skillset of an employee
     
Stored Attribute that needs to be stored permanently Phone number of an employee
Derived Attribute that can be calculated based on other attributes Age of an employee can be calculated by DOB & CurrDate
     

Relationship: Association among multiple entities.

  1. Association: the simple relationship between two entities. Represented using a single line.
(1…10)
(1…*)
Student
Course
  • A course may have 1 to 10 students enrolled.
  • A student must be enrolled in at least one course (or many courses)
  1. Generalization: Combining entities into a common group. Eg: Arts students, Commerce students, Science students are all part of a bigger entity STUDENTS.
  2. Specialization: Splitting an entity into specific categories.

     4. Composition: A strong whole-part relationship. That is, deletion of one entity will result in deletion of all the associated entities.

e.g.: Deletion of School will result in deletion of the Department.

Or, the closing of a bank will result in the closing of all Accounts.

  1. Aggregation: Opposite of composition. If an entity can be migrated it will still exist after deletion of the parent entity.

Eg: Department can exist if it is shifted to another branch of the school. 

Or, Accounts from the closing branch are shifted to another branch of the same bank.

Types of Cardinalities

It is the numerical mapping between the entities, this describes ‘how many’ of one entity is related to ‘how many’ of other entity. There are three types of cardinalities

  • One to one
  • One to many
  • Many to many

Normalization

https://www.guru99.com/database-normalization.html

Normalization is done in order to remove the redundancy in the database.

To maintain consistency and integrity. If normalization is not performed anomalies arises.

Types of Normal Form

1NF
  • Each table cell should contain a single value.
  • Each record needs to be unique.
2NF
  • Rule 1- Be in 1NF
  • Rule 2- Single-Column Primary Key
3NF
  • Rule 1- Be in 2NF
  • Rule 2- Has no transitive functional dependencies

What are Anomalies?

 

Types of Anomalies

Updation
Insertion
Deletion

Keys

Used to establish and identify relation between tables. They also ensure that each record within a table can be uniquely identified by combination of one or more fields within a table.

A key is an attribute, or a group of attributes, which assumes a unique value for each entity instance.

Different types of keys

https://www.youtube.com/watch?v=EavNLiTk-eo

EmployeeID

EmployeeName

DOB

DOJ

SSN

DeptID

MgrID

Primary Key – Primary key must contain UNIQUE values, and cannot contain NULL values (EmployeeID)

Foreign Key – A Foreign key is a key used to link two tables together

Alternate Key – Another candidate key column/attribute that you didn’t choose as Primary key (SSN)

 

Candidate Key – Maximum unique identifiers (EmployeeID, SSN)

Super Key – when you add another column/attribute to a candidate key it becomes a Super Key

EmployeeID + EmployeeName + DOB

Composite Key – If a table does have a single column that qualifies for a candidate key, then you have to select 2 or more columns to make a row unique (employee_name + DOB + DOJ + mgr_id)

Dimension Modeling

https://www.1keydata.com/datawarehousing/dimensional.html

Dimensional modeling is a powerful technique for representing the requirements of the business users in the context of the database table.

This modeling mainly uses 3 features

  • Facts
  • Dimensions
  • Measures

Facts

https://www.1keydata.com/datawarehousing/fact-table-types.html

Fact is the collection of all the related data items which the business user needs.

For example, if the user needs the sales information then this fact table consists of the all related info data such as various products, various dates and various locations so that they can accumulate and give the sales details.

Types of facts

Additive: The facts which can be summed up through all the dimensions in the fact table.

Semi Additive: The facts that can be summed up through all the dimensions in the fact table.

Non-Addictive: Facts that cannot be summed up for any of the dimensions present in the fact table.

Types of fact tables

Cumulative table

This type of fact table describes what has happened over a period.

e.g. Total sales of the product A in the store X on the Z day.

Snap Shot

This type of fact table describes the state of things in a particular instance of time.

e.g. sales of product A at 1pm to 3pm of time.

Lookup table

This is the table which gives the detailed information about the dimensions.

e.g. lookup table for product would be product id, manufacture, manufacturing date, etc.

Dimensions

Dimension is a collection of units of the same type. Every data point in the fact table is associated with one and only one member from each of the dimension.

Types of Dimension

https://www.helicaltech.com/types-of-dimensions-in-data-warehouse/

  1. Slow changing dimension
  2. Rapidly changing dimension
  3. Junk dimension
  4. Conformed dimension
  5. Degenerate dimension
  6. Role play dimension
Slow changing dimension

It’s the attributes of the dimension changes slowly over a period of time it is known as slow changing dimension. It depends on the project requirement whether to keep the data which has been changed or not.

Type1 – Overwrite the old value

Type2 – Add a new row

Type3 – Add a new column

Rapidly changing dimension

It’s the attributes of the dimensions changes fast – rapidly changing dimension. Also known as fast changing dimension. The fast-changing dimensions can be divided into mini dimensions. Thus, there can be two or more mini dimensions for one main dimensions. Thus, the fact table consists of two or more foreign keys i.e. few foreign keys for the mini dimension table and one frame main dimension table.

Junk dimension

It is the dimension table with different types of attributes in it. This dimension table is used to reduce the number of foreign keys in the fact table.

Conformed dimension

It’s a dimension table is used by various other fact table then it is known as conformed dimension.

Degenerated dimension

It is the attribute of the dimension are kept as a column in the fact table rather than in a dimensional table, it is known as degenerated dimension.

2 ways – Data Modeling

Thus, in data modeling we do it in two ways. They are either done using

  • Star schema
  • Snow flake schema

Thus, the four main steps included in doing the data modeling are

  • Choose the business process
  • Declare the gain
    • This means identify the lowest level of information in the fact table
  • Identify the dimension
  • Identify the fact table
Star Schema

https://www.1keydata.com/datawarehousing/star-schema.html

Star Schema is a simplest designing technique for OLAP systems. Star Schema consists of Fact table, dimensions and measures.

In this design, the fact table will be sitting in the middle which is surrounded by the various dimensions. The dimensions will be in the form of the tables. Each primary key in the dimensional table will be foreign key of the fact table.

Snowflake Schema

https://www.1keydata.com/datawarehousing/snowflake-schema.html

This is an extended form of the star schema. Wherein star schema the fact tables is connected to various dimensional tables. The dimensional table in this schema is normalized into one or move lookup table.

Difference between Star and Snowflake Schema