Data Administration
A disclipline, set of practices, and/or organizational group that deals with managing databases and their contents.
Data Analysis
A study of information to establish trends or exceptions. The review of facts and application of statistical processes to describe, summarize, andentify data patterns and significance.
Data Analyst / Modeler
An IT professional responsible for capturing and modeling data definitions, business rules and data quality requirements, logical and physical data models.
Data Anomaly
A data value that is different from what is normal or usual.
Data Appliance
A combination of hardware, software, DBMSs and storage, resulting in high performance in both speed and storage,
Data Archaeology
The recovery of information stored in outdated or obsolete computer formats or technologies.
Data Architect
A senior data analyst / modeler responsible for data integration and architecture.
Data Architecture
A discipline, process, and program focusing on integrating sets of information. One of the four Enterprise Architectures (with Application Architecture, Busines Architecture, and System Architecture). See also Data Modeling
Data Audit
A review of sets of information according to regulatory or compliance requirements, employing various Data Quality techniques.
Data Augmentation
To enhance a set of information using additional information from internal and/or external data sources
Data Categorization
see Data Classification
Data Classification
The categorization of data, following various schemas to support various business or technology goals.
Data Cleaning
See Data Cleansing
Data Cleansing
Also referred to as data scrubbing. Data Cleansing is the process of detecting dirty data in a database (data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly) and then removing and/or correcting the data. Data cleansing is often necessary to bring consistency to different sets of data that have been merged from separate databases. Cleansing data involves consolidating data within a database by removing inconsistent data, removing duplicates and reindexing existing data in order to achieve the most accurate and concise database. It can involve manual tasks or processes automated by special Data Quality tools. A particular type of Data Cleansing is Address Cleansing, in which street addresses are converted to a standard format as set forth by the U.S. Postal Service master database. For example, standard abbreviations are utilized, typos are corrected and ZIP codes are converted to 9-digit format. Address cleansing is usually done in conjunction with address matching, a process that validates an address against one of the 57 million addresses in the USPS database.
Data Completeness
see Completeness
Data Consistency
see Consistency
Data Consolidation
The aggregation and summarization of data from heterogeneous (varied, diverse) sources.
Data Conversion
The manipulation of information sets from one format or structure to another. Data Conversion is often required when acquiring sets of information from outside sources.
Data Cube
In a Data Warehouse context, a cube is a three- (or higher) dimensional array of values, commonly used to describe a time series of data in a common subject area. For example, information in a Sales cube might be viewed by different dimensions: over time, by products, by sales location, etc.
Data Currency
see Currency
Data Deduplication
see Deduplication
Data Deluge
An overabundance of information, making it difficult toentify significant data.
Data Dictionary
A database about data and database structures. A catalog of all data elements, containing their names, structures, and information about their usage, for the benefit of programmers and others interested in the data elements and their usage.
Data Discovery
see Data Mining
Data Element
The smallest piece of information considered meaningful and usable. A single logical data fact, the basic building block of a Logical Data Model.
Data Element
The most elementary unit of data that can beentified and described in a dictionary or repository. A data element cannot be subdivided.
Data Element Domain
A category of data elements that have similar base meaning, such as "date."
Data Encryption
Encryption is the process of transforming information into an unreadable
"ciphertext" form. Only those with the proper key can decrypt and read
the information. Encryption has long been used by militaries and governments
to facilitate secret communication, but is now used to protect private commercial
information, applications, hardware and software.
Data Enrichment
An activity that supplements and/or improves the existing data.
Data Flow Diagram (DFD)
A document depicting the flow of information between external entities, processes and data stores.
Data Forensics
The process of locating and reviewing documents, files, and correspondences including email. Theentification or restoration of stored, deleted, and erased files (including email) from a computer, combined with certifying the authenticity of the files. Often conducted in the context of preparation for litigation or in analyzing potential wrong-doing.
Data Formatting
In a Data Modeling context, ensuring that data fields conform to the proper data type and other contraints. See Data Type.
Data Governance
The organizational bodies, rules, decision rights, and accountabilities of people and information systems as they perform information-related processes. Data Governance determines how an organization makes decisions -- how we "decide how to decide." See also Decision Rights.
Data Governance Framework
A logical structure for organizing how we think about and communicate Data Governance concepts.
Data Governance Methodology
A logical structure providing step-by-step instructions for performing Data Governance processes.
Data Governance Office (DGO)
A centralized organizational entity responsible for facilitating and coordinating Data Governance and/or Stewardship efforts for an organization.
Data Householding
see Householding
Data Hygiene
A characteristic of a database, describing how "clean" the data in it is. See Data Cleansing.
Data Integration
The process of connecting enterprise data, fragmented across disparate systems, to create an accurate and consistent view of core information.
Data Integration Architect
A senior Data Integration developer responsible for designing technology or strategies to connect data stores or to replicate, extract, transform, or load data records.
Data Integrity
The accuracy, consistency, correctness, and soundness of a body of information.
Data Lineage
The history of how a data field moves through IT systems and is transformed along the way.
Data Management
A broader term that encompasses Data Administration and other efforts.
Data Manipulation
A synonym for data transformation.
Data Mapping
The process of assigning a source data element to a target data element.
Data Mart
A repository of data gathered from operational data and other sources. The data may derive from an enterprise-wide database or data warehouse or from more specialized sources. The emphasis of a data mart is on meeting the expectations and needs of a particular group of users, so it may be designed to assist them in performing analysis and understanding the content.
Data Masking
Any process for replacing real data with fake data.
Data Matching
A way to compare data so that similar, but slightly different records can be aligned. Matching may use "fuzzy logic" to find duplicates in the data. For example, Data Matching technologies and processes may recognizes that 'Will' and 'Williamt' and 'Bill' may be the same individual.
Data Merge
The combination of sets of data into a consolidated set. Often part of a Merge/Purge process. See Deduplication.
Data Migration
The process of transferring data from repository to another.
Data Mining
The analysis of data for relationships not previously discovered. Data Mining (DM) is also known as Knowledge Discovery. It is the process of automatically searching large volumes of data for patterns that may be used to predict future behavior.
Data Model
A method of visualizing the informational needs of a system. A data model typically takes the form of an ERD (Entity Relationship Diagram). A Conceptual Data Model is completely devoid of database-level information, while a Logical Data Model stores generic characteristics (such as indexes and foreign keys) without adding anything specific to a single DBMS. Physical Data Models translate information from a Logical Data Model to designs that are specific to a certain DBMS.
Data Model Administrator
An IT professional responsible for data model version control and change control.
Data Model Notation
see Crow's Foot Notation
Data Modeling
The discipline, process, and organizational group that conducts analysis of data objects used in a business or other context,entifies the relationships among these data objects, and creates models that depict those relationships. See also Data Model.
Data Monitoring
The process of checking and controling data integrity over time
Data Owner
A role or group who is empowered to make decisions about how a data entity can be structured, manipulated, or used.
Data Privacy
The assurance that a person's or organization's personal and private information is not inappropriately disclosed. Ensuring Data Privacy requires Access Management, eSecurity, and other data protection efforts.
Data Profiling
The process of examining data in an existing database and collecting statistics and information about that data. The information collected may be used to collect metrics on data quality, assess whether metadata accurately describes the actual values in the source database, determine if existing data can be repurposed, or understand risks and challenges in using the data.
Data Propagation
The distribution of data from a source to one or more target data stores. More generically, this term refers to a method of moving data from one location (a source) to another location (a target).
Data Purging
The removal of data records. Purging is subject to Record Retention rules.
Data Quality
The practice of correcting, standardizing, and verifying data.
Data Quality Analyst
An IT professional responsible for determining the fitness of data for use.
Data Replication
The process of copying a portion of a database from one environment to another and keeping the subsequent copies of the data in sync with the original source.
Data Scrubbing
The process of cleaning up data in a database that is incorrect, incomplete, or duplicated.
Data Security Administrator
A person responsible for working with tools and technologies that control access to data.
Data Stakeholders
Those who use, affect, or are affected by data. Data Stakeholders may be upstream producers, gatherers, or acquirers of information; downstream consumers of information, those who manage, transform, or store data, or those who set policies, standards, architectures, or other requirements or constraints.
Data Standardization
The transformation of data into consistent formats.
Data Steward
A person with data-related responsibilities as set by a Data Governance or Data Stewardship program. Often, Data Stewards fall into multiple types. Data Quality Stewards, Data Definition Stewards, Data Usage Stewards, etc.
Data Storage
The holding of data in a database or other data structure.
Data Store
A data store is a general term for a place to put information. Databases, files, and spreadsheets are all examples of data stores.
Data Survivorship
Rules that describe which source data element overrides the others in cases of duplicates.
Data Synchronization
The discipline, process, or technologies used to allow applications to update data on two systems so that the data sets areentical. T
Data Timeliness
see Timeliness
Data Transformation
The process of redefining data based on some predefined rules, specific formulas, or techniques.
Data Type
The kind of data that a data item represents. Examples are:
• Date - Usually Gregorian calendar dates.
• Time - Usually time on the military 24 hour clock.
• Money - Currency.
• Boolean - Logical values (True or False).
• Characters - Alphanumeric text.
• Number - Number with decimal precision and complete decimal accuracy.
• Integer - Number without any decimal precision.
• Short Integer - Same as Integer, but smaller values.
• Long Integer - Same as Integer, but larger values.
• Byte - Usually a small number (less than 256).
• Float - Number with partial decimal accuracy (number times 10 to a power).
• Bitmap - Usually an image stored as a straight bitmap representation.
Data Uniqueness
see Uniqueness
Data Validation
As a broad concept, Data Validation refers to the confirmation of the reliability of data through a checking process. As a set of processes Data Validation refers to a systematic review of a data set toentify outliers or suspect values. More specifically, data validation refers to the systematic process of independently reviewing a body of analytical data against established criteria to provide assurance that the data are acceptable for their intended use. Within databases, Data Validation refers to procedures built into databases to define and check acceptable input for fields, and to accept or reject the data.
Data Validity
see Validity
Data Vaulting
An approach to data storage and backup, sometimes called RSS (Remote Storage Service), where data is transferred over the Internet to a remote and secure storage location.
NovaStor currently offers a full-service online data vaulting
Data Verification
Evaluation of data to determine if data obtained from environmental operations are of the right type, quality, and quantity to support their intended use.
Data Visualization
Techniques (often employed in Business Intelligence tools) for turning data into information by using the high capacity of the human brain to visually recognize patterns and trends.
Data Warehouse
A physically separate store of data transformed from the operational environment.The warehouse collects data from transaction systems and operational data stores, then combines that data in an aggregate, summary form suitable for enterprise wide data analysis and reporting for predefined business needs Operational update of data does not occur in the data warehouse environment. Gartner says that the five components of a data warehouse are production data sources, data extraction and conversion, the data warehouse database management system, data warehouse administration and business intelligence (BI) tools.
Data Warehouse Architect
A person responsible for the modeling and design of data warehouse databases and the processes and systems that feed data into them.
Data Webhouse
A Data Warehouse containing web statistics.
Database Administrator
An IT professional responsible for devloping physical data models and for maintaining physical, structured data assets.
Database Hardening
see Deduplication
Database Management System (DBMS)
A software system that facilitates the creation and maintenance of a database or databases, and the execution of computer programs using the database or databases. See also RDBMS
DBMS
see Database Management System
Decentralized Database
A centralized database that has been partitioned according to a business or end-user defined subject area. Typically ownership is also moved to the owners of the subject area. (DM Review definition)
Decision Rights
The system of determining who makes a decision, and when, and how, and under what circumstances. Formalizing Decision Rights is a key function of Data Governance.
Decision Support System
A term no longer widely used, that originally described computer system designed to collect, store, process, and provide access to information to support managerial decision making.
Dedup
See Deduplication
Deduplication
Finding the same (duplicate) entry in multiple files. Deduplication is used when merging two or more data sets. Deduplication is a useful tool when performing data mining tasks, where the data originated from different sources or different organizations. Synonyms are data matching and record linkage (a term used by statisticians, historians, and epidemiologists) Commercial mail and database applications refer to it as merge/purge processing or list washing. Other names used to describe the same concept include entity resolution, duplicate detection, record matching, instanceentification, and database hardening.
De-idenifcation
The process for removing any personallyentifying fields from a record. In the case of medical information, HIPAA requires both the deidentification of both obvious and remotely connected information. Removal ofentifying information can occur through several means, including filtering and the techniques above.
Deliverable
Something that must be provided to meet a commitment in a Service Level Agreement or a contract. Deliverable is also used in a more informal way to mean a planned output of any process. (Baseline ITIL definition)
Deming Cycle
Synonym for Plan-Do-Check-Act. An approach to tasks that builds in quality.
Deployment
The introduction of a program, system, or application
Derived Data
Data that is the result of a computational step applied to other data.
Descending
Sorting of the selected information from a higher to a lower level.
Development Environment
An environment used to create or modify IT services or applications. Development environments are not typically subjected to the same degree of control as test environments or live/production environments.
DFD
see Data Flow Diagram
DGO
see Data Governance Office
Digital Signature
A core function of a public key infrastructure (PKI). A digital signature can proveentity because it is created with the private key portion (which only the key holder should access) of a public/private key pair. Anyone with the sender's widely published public key can decrypt the signature and, by doing so, receive the assurance that the data must have come from the sender (nonrepudiation of the sender) and that the data has not changed (integrity). The data that is encrypted with the private key is not the entire message, but a short, fixed-length block of data that is computed from the message using a so-called "hash" function. (Gartner definition)
Dimension
High level data grouping which serves as a mechanism for slicing the statistical measures of the warehouse, such as: geography, account, product and time.
Dirty Data
Inconsistent, missing, incomplete, or erroneous data.
Distributed Data Management
A form of client/server computing in which some portion of the application data executes on two or more computers.
Document Imaging
The conversion of paper documents into electronic documents. The process of imaging is also known as digitizing. As a discipline, this involves technology selection (scanners and similar device), and strategies for the storage and management of electronic images.
Document Management
A function in which applications or middleware perform data management tasks tailored for typical unstructured documents (including compound documents). It may also be used to manage the flow of documents through their life cycles. (Gartner definition)
Document Type Definition (DTD)
In XML, a text file that specifies the meaning of each tag.
Domain
A category of related data elements in a data model. Also known as a Subject Area. Architects talk about working in the "Customer Domain" or the "Product Domain."
DQ
Data Quality
DQA
Data Quality Auditing
DQM
Data Quality Management
DQMS
Data Quality Management System
DQO
Data Quality Objectives
Drill Across
The act of requesting data from two or more fact tables in a value chain in a single report.
Drill Anywhere
Drilling anywhere allows exploration of all data in any direction.
Drill Down
The act of adding a row header or replacing a row header in a report to break down the rows of the answer set in greater detail.
Drilling downward allows going deeper into the data. Examples of this would be to go to the Month level from the Quarter level within Time.
Drill Up
Also known as rolling up data, drilling upward summarizes data, such as aggregating individual sales to a total sales.
Driving Forces
Forces that tend to change a situation in desirable ways.
DSS
Decision Support System
Due Diligence
Performing an appropriate amount of research and analysis to uncover issues of stakeholders.
Duplicate Detection
see Deduplication
Copyright 2004-2008 The Data Governance Institute, LLC. All Rights Reserved The site is brought to you in partnership with the Business Intelligence Network