Data Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Data Architecture

Data Architecture is intended to provide a mechanism for the various stakeholders at various levels of Government
to identify, discover, describe, manage, protect, and share the data it has and reuse information consistently within
and across ministry or division of directorate or for the entire Government of Bangladesh.
Data Architecture provides standards for accessing data for online analytical processing (OLAP), including executive
information systems (EIS) and decision support systems (DSS).

Whole of Government: The ICT for Government of Bangladesh

To realize the business strategy as defined, and in the long term realize the Vision for Digital Bangladesh by 2021 –
one of the key drivers for ICT is Data domain – the most critical and complex. The data architecture needs to be
defined in a manner that addresses all the challenges of the Government and makes it flexible to adopt in a rapidly
changing business environment

Data Architecture Principles


The principles listed below are key guidelines for the design data for systems across the Government of Bangladesh.
It is imperative that the principles would be followed rigorously for all data services – acquisition, storage, retention,
archival and consumption that would lead to a secure, collaborative and adaptive ICT for the realization of Digital
Bangladesh Vision 2021

These principles are listed to achieve the following objectives:

 Enable architecture review: Any new system development would require architecture review, data
architecture principles would provide the necessary review parameters as far as database design is
concerned
 Provide a guidance mechanism: to database design team or data architect, on what are the criteria that
defines the best the database design
 Discover gaps in data security, plan for secured and adaptive data architecture: Data architecture
principles compliance depicts the loopholes in data security, data protection and overall data design

Name DP1: Data access through defined business rules

Data access to be based on business rules only, all data access to be made following defined
Description and approved CRUD for all roles accessing the system

Scope Government hosted systems, portals, reports and mobile apps

1. For new systems implementation – define CRUD and approve from data architect
Implementation 2. Architecture review checklist to include CRUD review
Steps 3. Prohibit data access from ad-hoc query through DBA defined rules
4. Review and enhance existing system’s CRUD
 Security breaches frequently occur at data access using ad-hoc queries; use of CRUD would
ensure security
Benefit
 Data governance becomes organized and eases data management
 Data rights becomes streamlines
Name DP2: Data is an asset, shared and governed

 Data is an asset - Data to be cleaned, synchronized and preserved using central data
management tools. Data to be archived as per data archival policy.
Description  Data is shared – Data to follow data sharing rules as per data classification
 Data is governed – Data stewards to maintain data throughout its life cycle

Scope Data within the purview of Government of Bangladesh


1. Data Cleaning and Synchronization
2. Data to be cleaned and synchronized using data clean-up tools at extraction level in Master
Data Management System
3. Implementation of Master Data Management System along with Data clean up tools
4. Data Archival Policy
5. Draft and publish data archival policy
6. Implementation of data archival system
Implementation
7. Implementation of ETL tool
Steps
8. Architecture review of data archival for all systems
9. Data Classification
10. Finalize Data classification rules
11. Publish and enforce data classification adoption across ministries
12. Draft and publish centralized data sharing rules
13. Data Governance
14. Please follow below section for data governance details
 Data is maintained as an asset to the Government of Bangladesh – secured, governed and
shared
Benefit
 Data is preserved and is easily retrievable
 Data is classified ensuring access to the right person

Name DP3: Data design aligned to National Meta Data Standards

Data type, length and uniqueness for key and common data entities are aligned to published
Description
National Meta Data Standards
Scope  Core and Common data entity
Implementation 1. Draft and Publish Meta Data Standards
Steps 2. Architecture review of Meta Data for new system implementation
 Sharing of data becomes easy as there would not be any compatibility issues
Benefit
 Eases system development, API development effort

Name DP4: Data synchronized with federated master data management

Core data entity must have relationships as per data standards, and established
Description
mechanism to incorporate in the National Master Data Management System
Scope  Core and Common data entity
1. Implement MDM platform
Implementation
2. Establish mechanism for data extraction and load to master data management system
Steps
3. Review data cardinality
 Ease data integration for effective consumption in reports and analytical tools
Benefit
 Checks Data cardinality as per national standard

Name DP5: Core Data Access

Description Core data entity must have established identifier to access, store and preserve.

Scope  Core data entity


Establish Core Data identifier adoption across all levels of data transaction – access, store
and preserve. Following are the identifier for core data:
 Citizen – National ID (NID)
Implementation
 Business – Business Identification Number (BIN)
Steps
 Employee – Government ID
 Things – Asset ID
 GIS – Geo Spatial ID

Benefit  Ease data integration for effective consumption in reports and analytical tools

Name DP6: Data security

Data to be made available to citizens, business or other entities who require the information
as part of their role.
Description
For secured data – proper encryption and security measures for data protection

Scope  All Data


Establish data governance as illustrated in the report
Establish Open Data Catalogue
Implementation 1. On classified public data draft open data catalogue
Steps 2. Extract data in open data repository
3. Publish open data API, catalogue for public consumption

Benefit  Secured and accessible data for all

Core Data Entity


The data entities that seldom changes or is slow to change, Core Data of all such entities and its associated properties
may be created as a Single Source of Truth, to be used by all the ministries providing services relating to the
Government of Bangladesh. A Unique ID may be created for each such core data and every property, on the lines of
data identifier
Following are classified as the core data entities for Government of Bangladesh:
 Citizen
Data related to citizens are classified as a core data entity; different citizen’s different profiles are currently
stored in different ministry owned systems. Connecting this data would enable a 360-degree view of the
citizens. Please follow Appendix for Citizen Data Profile across different ministries
 Employee
Another Core data entity that is critical to the Government of Bangladesh, employee profile is mostly
maintained centrally across ministries but depending upon the type of employee the profile management
may differ. A central management of employee data is critical to the success outcomes
 Things
Another crucial data entity that is all ministry owned infrastructure such as Land, Offices, Machinery, etc. A
centralized management of the assets from procurement to disposal would benefit greatly in terms of lower
cost to Government
 Business
Business entity can be maintained as a single source of truth, avoiding duplication across ministries to derive
the best outcome both for the business and for Government
 GIS
A geo spatial data when maintained as a single source of truth would enable many benefits to decision-
making, logistics and cost

Structured, Un-structured and Semi-Structured Data Repository


A Digital Data Resource is a digital container of information. A Digital Data Resource may correspond to three types
of data: “Structured Data Resource”, “Semi-Structured Data Resource”, and “Unstructured Data Resource”.
1. Structured Data Resource: Structured Data Resource is a type of Digital Data Resource containing only
structured data. A Data Schema is used to define/describe a Structured Data Resource.
2. Semi-Structured Data Resource: A Semi-Structured Data Resource is a Digital Data Resource containing
semi-structured data. A Semi-Structured Data Resource contains partly structured and partly unstructured
data.
3. Un-structured Data Resource: An Unstructured Data Resource is a type of Digital Data Resource that
contains only unstructured data. Unstructured data is collection of data values that are likely to be
processed only by specialized application programs.

Content Repository
Content repository would comprise of easy to retrieve, indexed documents, media files, web graphics and templates

Meta Data Repository


Meta data is data about data defines and describes data or information. It is used to manage data, information and
knowledge. Metadata is the structured information that describes, explains, locates or otherwise makes it easier to
retrieve, use or manage an information resource
Once a National level data standard is established, a Meta data repository would be needed to manage and maintain
the critical data of data

Data Models
A data model ensures that data is defined accurately so it is used in the manner intended by both end users and
remote applications.
There are three types of data models –
 Conceptual Model
A conceptual data model identifies the highest-level relationships between the different entities. Features
of conceptual data model include the important entities and the relationships among them.
 Logical Model
A logical data model describes the data in as much detail as possible, without regard to how they will be
physical implemented in the database
 Physical Model
A physical database model shows all table structures, including column name, column data type, column
constraints, primary key, foreign key, and relationships between tables.

Data modeling tools can evaluate an existing database structure and reverse engineer a data model. The reverse
engineered data model can be used to capture valuable information about the existing database.

Data Governance
Data governance encompasses the strategies and technologies used to make sure Government of Bangladesh’s data
stays in compliance with regulations and policies. It is proposed to be a collection of processes, roles, policies,
standards, and metrics to ensure the effective and efficient use of information in enabling Government of
Bangladesh as a whole to achieve its goals.

The Roles – Data Steward


Data governance strategy would push a cultural movement away from data ownership to data stewardship. Since
data is an asset of value to the entire enterprise, data stewards are made accountable for properly managing the
data. Data governance would be exercised at the enterprise level with federated governance to individual ministries
and directorates. It would be proactively exercised when a new process, application, repository or interface is
introduced. Existing data is likely to be impacted.

The Process – Data Steward


Under each stage of data life cycle, data steward would have a critical role to play

Data Life Cycle


Create
Stage
 Define the attributes of the identified core data & the relationship
Data Steward
 Identify the systems for data acquisition of core data
Role
 Finalize Data classification and security requirements
Data Life Cycle
Store
Stage
 Identify database repository and list the data that is stored
Data Steward  Define and manage security requirement for data Storage
Role  Draft data storage & backup policy need to be enforced
 DR setup

Data Life Cycle


Use
Stage
 Review and manage data models
 Review CRUD for data
Data Steward  Based on classification of data – transfer to open data repository
Role

Data Life Cycle


Share
Stage
 Define data integration standards
 Approve and finalize CRUD
Data Steward  Approve API documentations and requests
Role  Define security requirements for confidential data

Data Life Cycle


Archive & Destroy
Stage
 Draft and create archival policy and dispose policy
 Implement archival – ETL tools/Integration platform
Data Steward  Monitor archival process and disposal process
Role

Master Data Management

Master Data refers to those commonly required data, which are agreed upon and shared across the Government. It
may be a reference data such as a list of values to be used for a data element such as sectors in Government. Gartner
defines Master Data as “Master data is the consistent and uniform set of identifiers and extended attributes that
describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites, hierarchies and
chart of accounts”

It is proposed that Government of Bangladesh follow a Virtual Master Data Management Architecture with the
following capabilities:

Master Data Virtualization Service: The service would enable same data acquisition from multiple data source, yet
maintaining a single master following Federated Architecture. As an example, each ministry follows its own codes
and identifier for similar entities such as Districts, the virtualization service would enable a mapping with all those
different entities for the same district, for example, Code 4 in Finance Ministry might represent Khulna while Khulna
is represented as Code 6 in Social Welfare ministry. The virtualization service would enable the mapping of Khulna
with all the ministries

MDM Repository: The repository to store and preserve the master data to enable single source of truth view

Data Synchronization: The data synchronization and clean up would enable cleaning of data entities from say free
form text entry, this would also enable meta data standard compliance

Data Conflict Resolution: The tool/capability would display the data conflict among various sources of same master
data to help resolve conflict and preserve the right data

Data Warehouse

A data warehouse is a collection of data designed to support decision-making and analytical processing. Data
warehouses contain a wide variety of data, usually from multiple data sources, presenting a comprehensive view of
a particular business environment. Due to the nature of the data stored in a data warehouse, the size of the data
warehouse is usually very large, so it requires special design and planning.

A data mart is a subset of a data warehouse. Where data warehouses are designed to support many requirements
for multiple business needs, data marts are designed to support specific requirements for specific decision support
applications (i.e., particular business needs). Although a data mart is a subset of a data warehouse, it is not
necessarily smaller than a data warehouse. Specific decision support needs may still require large amounts of data.
Data marts are typically considered a solution for distributed users who want exclusive control of the information
required for their business need.

Data warehouse efforts should begin with a specific requirement for a specific decision support application, similar
to the practices of a data mart design. For scalability, the tools and databases used should be designed to support a
very large data warehouse, instead of using data mart specific products.
Future State Data Architecture Model

You might also like