https://www.guru99.com/introduction-to-database-sql.html
Data Models define how data is connected to each other and how they are processed and stored inside a system.
The data model is a blueprint that visually communicates how information is organized in the biz.
A data model is one of the first steps towards building the foundation for a powerful and flexible data warehouse.
Data Modelling is the analysis of data objects and their relationships to other data objects. This is a process through which we define and analyze the data requirements needed to support the business processes of the corresponding information system. As a part of this process, we define data entities and their relationships in a system to build a strong data base design and architecture for the information system.
This is a very important step in the data warehousing project. Indeed, it is fair to say that the foundation of the data warehousing system is the data model. A good data model will allow the data warehousing system to grow easily, as well as allowing for good performance.
We create 3 models as we progress from requirements to the actual database which is to be used by the information system.
Levels of Data Modeling
Feature | Conceptual | Logical | Physical |
Entity Names | ✓ | ✓ | |
Entity Relationships | ✓ | ✓ | |
Attributes | ✓ | ||
Primary Keys | ✓ | ✓ | |
Foreign Keys | ✓ | ✓ | |
Table Names | ✓ | ||
Column Names | ✓ | ||
Column Data Types | ✓ |
Types of Data Modeling
The data in the data warehouse can be modeled in two ways:
- ER Modeling
- Dimension Modeling
Entity Relationship Modeling
An ER model is a high-level conceptual data model. Entity Relationship Modeling (ER Modeling) is a graphical approach to database design. It uses Entity/Relationship to represent real-world objects.
Entity: Real-world object distinguishable from other objects.
http://www.edugrabs.com/entity-and-its-types/
e.g. Customer, Product, Student, Employee, etc.
Entity names must be short and descriptive.
Attribute: Characteristics of an entity.
e.g. Customer Name, Customer ID, Customer Age, Customer Phone No. etc.
Attribute names must be descriptive, compound words.
Types of Attributes
Attribute | Definition | Example |
Composite | Can be split into components | Date of Joining – day, month & year |
Simple | Cannot be divided into simpler components | Age of an employee |
Single-valued | Can take on only a single value for each entity instance | Age of an employee |
Multi-valued | Can take many values | Skillset of an employee |
Stored | Attribute that needs to be stored permanently | Phone number of an employee |
Derived | Attribute that can be calculated based on other attributes | Age of an employee can be calculated by DOB & CurrDate |
Relationship: Association among multiple entities.
- Association: the simple relationship between two entities. Represented using a single line.
(1…10) |
(1…*) |
Student |
Course |
- A course may have 1 to 10 students enrolled.
- A student must be enrolled in at least one course (or many courses)
- Generalization: Combining entities into a common group. Eg: Arts students, Commerce students, Science students are all part of a bigger entity STUDENTS.
- Specialization: Splitting an entity into specific categories.
4. Composition: A strong whole-part relationship. That is, deletion of one entity will result in deletion of all the associated entities.
e.g.: Deletion of School will result in deletion of the Department.
Or, the closing of a bank will result in the closing of all Accounts.
- Aggregation: Opposite of composition. If an entity can be migrated it will still exist after deletion of the parent entity.
Eg: Department can exist if it is shifted to another branch of the school.
Or, Accounts from the closing branch are shifted to another branch of the same bank.
Types of Cardinalities
It is the numerical mapping between the entities, this describes ‘how many’ of one entity is related to ‘how many’ of other entity. There are three types of cardinalities
- One to one
- One to many
- Many to many
Normalization
https://www.guru99.com/database-normalization.html
Normalization is done in order to remove the redundancy in the database.
To maintain consistency and integrity. If normalization is not performed anomalies arises.
Types of Normal Form
1NF
- Each table cell should contain a single value.
- Each record needs to be unique.
2NF
- Rule 1- Be in 1NF
- Rule 2- Single-Column Primary Key
3NF
- Rule 1- Be in 2NF
- Rule 2- Has no transitive functional dependencies
What are Anomalies?
Types of Anomalies
Updation
Insertion
Deletion
Keys
Used to establish and identify relation between tables. They also ensure that each record within a table can be uniquely identified by combination of one or more fields within a table.
A key is an attribute, or a group of attributes, which assumes a unique value for each entity instance.
Different types of keys
https://www.youtube.com/watch?v=EavNLiTk-eo
EmployeeID
EmployeeName
DOB
DOJ
SSN
DeptID
MgrID
Primary Key – Primary key must contain UNIQUE values, and cannot contain NULL values (EmployeeID)
Foreign Key – A Foreign key is a key used to link two tables together Alternate Key – Another candidate key column/attribute that you didn’t choose as Primary key (SSN) |
Candidate Key – Maximum unique identifiers (EmployeeID, SSN)
Super Key – when you add another column/attribute to a candidate key it becomes a Super Key EmployeeID + EmployeeName + DOB |
Composite Key – If a table does have a single column that qualifies for a candidate key, then you have to select 2 or more columns to make a row unique (employee_name + DOB + DOJ + mgr_id)
Dimension Modeling
https://www.1keydata.com/datawarehousing/dimensional.html
Dimensional modeling is a powerful technique for representing the requirements of the business users in the context of the database table.
This modeling mainly uses 3 features
- Facts
- Dimensions
- Measures
Facts
https://www.1keydata.com/datawarehousing/fact-table-types.html
Fact is the collection of all the related data items which the business user needs.
For example, if the user needs the sales information then this fact table consists of the all related info data such as various products, various dates and various locations so that they can accumulate and give the sales details.
Types of facts
Additive: The facts which can be summed up through all the dimensions in the fact table.
Semi Additive: The facts that can be summed up through all the dimensions in the fact table.
Non-Addictive: Facts that cannot be summed up for any of the dimensions present in the fact table.
Types of fact tables
Cumulative table
This type of fact table describes what has happened over a period.
e.g. Total sales of the product A in the store X on the Z day.
Snap Shot
This type of fact table describes the state of things in a particular instance of time.
e.g. sales of product A at 1pm to 3pm of time.
Lookup table
This is the table which gives the detailed information about the dimensions.
e.g. lookup table for product would be product id, manufacture, manufacturing date, etc.
Dimensions
Dimension is a collection of units of the same type. Every data point in the fact table is associated with one and only one member from each of the dimension.
Types of Dimension
https://www.helicaltech.com/types-of-dimensions-in-data-warehouse/
- Slow changing dimension
- Rapidly changing dimension
- Junk dimension
- Conformed dimension
- Degenerate dimension
- Role play dimension
Slow changing dimension
It’s the attributes of the dimension changes slowly over a period of time it is known as slow changing dimension. It depends on the project requirement whether to keep the data which has been changed or not.
Type1 – Overwrite the old value
Type2 – Add a new row
Type3 – Add a new column
Rapidly changing dimension
It’s the attributes of the dimensions changes fast – rapidly changing dimension. Also known as fast changing dimension. The fast-changing dimensions can be divided into mini dimensions. Thus, there can be two or more mini dimensions for one main dimensions. Thus, the fact table consists of two or more foreign keys i.e. few foreign keys for the mini dimension table and one frame main dimension table.
Junk dimension
It is the dimension table with different types of attributes in it. This dimension table is used to reduce the number of foreign keys in the fact table.
Conformed dimension
It’s a dimension table is used by various other fact table then it is known as conformed dimension.
Degenerated dimension
It is the attribute of the dimension are kept as a column in the fact table rather than in a dimensional table, it is known as degenerated dimension.
2 ways – Data Modeling
Thus, in data modeling we do it in two ways. They are either done using
- Star schema
- Snow flake schema
Thus, the four main steps included in doing the data modeling are
- Choose the business process
- Declare the gain
- This means identify the lowest level of information in the fact table
- Identify the dimension
- Identify the fact table
Star Schema
https://www.1keydata.com/datawarehousing/star-schema.html
Star Schema is a simplest designing technique for OLAP systems. Star Schema consists of Fact table, dimensions and measures.
In this design, the fact table will be sitting in the middle which is surrounded by the various dimensions. The dimensions will be in the form of the tables. Each primary key in the dimensional table will be foreign key of the fact table.
Snowflake Schema
https://www.1keydata.com/datawarehousing/snowflake-schema.html
This is an extended form of the star schema. Wherein star schema the fact tables is connected to various dimensional tables. The dimensional table in this schema is normalized into one or move lookup table.