A disclipline, set of practices, and/or organizational group that deals with managing databases and their contents.
A study of information to establish trends or exceptions. The review of facts and application of statistical processes to describe, summarize, andentify data patterns and significance.
Data Analyst / Modeler
An IT professional responsible for capturing and modeling data definitions, business rules and data quality requirements, logical and physical data models.
A data value that is different from what is normal or usual.
A combination of hardware, software, DBMSs and storage, resulting in high performance in both speed and storage,
The recovery of information stored in outdated or obsolete computer formats or technologies.
A senior data analyst / modeler responsible for data integration and architecture.
A discipline, process, and program focusing on integrating sets of information. One of the four Enterprise Architectures (with Application Architecture, Busines Architecture, and System Architecture). See also Data Modeling
A review of sets of information according to regulatory or compliance requirements, employing various Data Quality techniques.
To enhance a set of information using additional information from internal and/or external data sources
see Data Classification
The categorization of data, following various schemas to support various business or technology goals.
See Data Cleansing
Also referred to as data scrubbing. Data Cleansing is the process of detecting dirty data in a database (data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly) and then removing and/or correcting the data. Data cleansing is often necessary to bring consistency to different sets of data that have been merged from separate databases. Cleansing data involves consolidating data within a database by removing inconsistent data, removing duplicates and reindexing existing data in order to achieve the most accurate and concise database. It can involve manual tasks or processes automated by special Data Quality tools. A particular type of Data Cleansing is Address Cleansing, in which street addresses are converted to a standard format as set forth by the U.S. Postal Service master database. For example, standard abbreviations are utilized, typos are corrected and ZIP codes are converted to 9-digit format. Address cleansing is usually done in conjunction with address matching, a process that validates an address against one of the 57 million addresses in the USPS database.
The aggregation and summarization of data from heterogeneous (varied, diverse) sources.
The manipulation of information sets from one format or structure to another. Data Conversion is often required when acquiring sets of information from outside sources.
In a Data Warehouse context, a cube is a three- (or higher) dimensional array of values, commonly used to describe a time series of data in a common subject area. For example, information in a Sales cube might be viewed by different dimensions: over time, by products, by sales location, etc.
An overabundance of information, making it difficult toentify significant data.
A database about data and database structures. A catalog of all data elements, containing their names, structures, and information about their usage, for the benefit of programmers and others interested in the data elements and their usage.
see Data Mining
The smallest piece of information considered meaningful and usable. A single logical data fact, the basic building block of a Logical Data Model.
The most elementary unit of data that can beentified and described in a dictionary or repository. A data element cannot be subdivided.
Data Element Domain
A category of data elements that have similar base meaning, such as "date."
Encryption is the process of transforming information into an unreadable
"ciphertext" form. Only those with the proper key can decrypt and read
the information. Encryption has long been used by militaries and governments
to facilitate secret communication, but is now used to protect private commercial
information, applications, hardware and software.
An activity that supplements and/or improves the existing data.
Data Flow Diagram (DFD)
A document depicting the flow of information between external entities, processes and data stores.
The process of locating and reviewing documents, files, and correspondences including email. Theentification or restoration of stored, deleted, and erased files (including email) from a computer, combined with certifying the authenticity of the files. Often conducted in the context of preparation for litigation or in analyzing potential wrong-doing.
In a Data Modeling context, ensuring that data fields conform to the proper data type and other contraints. See Data Type.
The organizational bodies, rules, decision rights, and accountabilities of people and information systems as they perform information-related processes. Data Governance determines how an organization makes decisions -- how we "decide how to decide." See also Decision Rights.
Data Governance Framework
A logical structure for organizing how we think about and communicate Data Governance concepts.
Data Governance Methodology
A logical structure providing step-by-step instructions for performing Data Governance processes.
Data Governance Office (DGO)
A centralized organizational entity responsible for facilitating and coordinating Data Governance and/or Stewardship efforts for an organization.
A characteristic of a database, describing how "clean" the data in it is. See Data Cleansing.
The process of connecting enterprise data, fragmented across disparate systems, to create an accurate and consistent view of core information.
Data Integration Architect
A senior Data Integration developer responsible for designing technology or strategies to connect data stores or to replicate, extract, transform, or load data records.
The accuracy, consistency, correctness, and soundness of a body of information.
The history of how a data field moves through IT systems and is transformed along the way.
A broader term that encompasses Data Administration and other efforts.
A synonym for data transformation.
The process of assigning a source data element to a target data element.
A repository of data gathered from operational data and other sources. The data may derive from an enterprise-wide database or data warehouse or from more specialized sources. The emphasis of a data mart is on meeting the expectations and needs of a particular group of users, so it may be designed to assist them in performing analysis and understanding the content.
Any process for replacing real data with fake data.
A way to compare data so that similar, but slightly different records can be aligned. Matching may use "fuzzy logic" to find duplicates in the data. For example, Data Matching technologies and processes may recognizes that 'Will' and 'Williamt' and 'Bill' may be the same individual.
The combination of sets of data into a consolidated set. Often part of a Merge/Purge process. See Deduplication.
The process of transferring data from repository to another.
The analysis of data for relationships not previously discovered. Data Mining (DM) is also known as Knowledge Discovery. It is the process of automatically searching large volumes of data for patterns that may be used to predict future behavior.
A method of visualizing the informational needs of a system. A data model typically takes the form of an ERD (Entity Relationship Diagram). A Conceptual Data Model is completely devoid of database-level information, while a Logical Data Model stores generic characteristics (such as indexes and foreign keys) without adding anything specific to a single DBMS. Physical Data Models translate information from a Logical Data Model to designs that are specific to a certain DBMS.
Data Model Administrator
An IT professional responsible for data model version control and change control.
Data Model Notation
see Crow's Foot Notation
The discipline, process, and organizational group that conducts analysis of data objects used in a business or other context,entifies the relationships among these data objects, and creates models that depict those relationships. See also Data Model.
The process of checking and controling data integrity over time
A role or group who is empowered to make decisions about how a data entity can be structured, manipulated, or used.
The assurance that a person's or organization's personal and private information is not inappropriately disclosed. Ensuring Data Privacy requires Access Management, eSecurity, and other data protection efforts.
The process of examining data in an existing database and collecting statistics and information about that data. The information collected may be used to collect metrics on data quality, assess whether metadata accurately describes the actual values in the source database, determine if existing data can be repurposed, or understand risks and challenges in using the data.
The distribution of data from a source to one or more target data stores. More generically, this term refers to a method of moving data from one location (a source) to another location (a target).
The removal of data records. Purging is subject to Record Retention rules.
The practice of correcting, standardizing, and verifying data.
Data Quality Analyst
An IT professional responsible for determining the fitness of data for use.
The process of copying a portion of a database from one environment to another and keeping the subsequent copies of the data in sync with the original source.
The process of cleaning up data in a database that is incorrect, incomplete, or duplicated.
Data Security Administrator
A person responsible for working with tools and technologies that control access to data.
Those who use, affect, or are affected by data. Data Stakeholders may be upstream producers, gatherers, or acquirers of information; downstream consumers of information, those who manage, transform, or store data, or those who set policies, standards, architectures, or other requirements or constraints.
The transformation of data into consistent formats.
A person with data-related responsibilities as set by a Data Governance or Data Stewardship program. Often, Data Stewards fall into multiple types. Data Quality Stewards, Data Definition Stewards, Data Usage Stewards, etc.
The holding of data in a database or other data structure.
A data store is a general term for a place to put information. Databases, files, and spreadsheets are all examples of data stores.
Rules that describe which source data element overrides the others in cases of duplicates.
The discipline, process, or technologies used to allow applications to update data on two systems so that the data sets areentical. T
The process of redefining data based on some predefined rules, specific formulas, or techniques.
The kind of data that a data item represents. Examples are:
• Date - Usually Gregorian calendar dates.
• Time - Usually time on the military 24 hour clock.
• Money - Currency.
• Boolean - Logical values (True or False).
• Characters - Alphanumeric text.
• Number - Number with decimal precision and complete decimal accuracy.
• Integer - Number without any decimal precision.
• Short Integer - Same as Integer, but smaller values.
• Long Integer - Same as Integer, but larger values.
• Byte - Usually a small number (less than 256).
• Float - Number with partial decimal accuracy (number times 10 to a power).
• Bitmap - Usually an image stored as a straight bitmap representation.
As a broad concept, Data Validation refers to the confirmation of the reliability of data through a checking process. As a set of processes Data Validation refers to a systematic review of a data set toentify outliers or suspect values. More specifically, data validation refers to the systematic process of independently reviewing a body of analytical data against established criteria to provide assurance that the data are acceptable for their intended use. Within databases, Data Validation refers to procedures built into databases to define and check acceptable input for fields, and to accept or reject the data.
An approach to data storage and backup, sometimes called RSS (Remote Storage Service), where data is transferred over the Internet to a remote and secure storage location.
NovaStor currently offers a full-service online data vaulting
Evaluation of data to determine if data obtained from environmental operations are of the right type, quality, and quantity to support their intended use.
Techniques (often employed in Business Intelligence tools) for turning data into information by using the high capacity of the human brain to visually recognize patterns and trends.
A physically separate store of data transformed from the operational environment.The warehouse collects data from transaction systems and operational data stores, then combines that data in an aggregate, summary form suitable for enterprise wide data analysis and reporting for predefined business needs Operational update of data does not occur in the data warehouse environment. Gartner says that the five components of a data warehouse are production data sources, data extraction and conversion, the data warehouse database management system, data warehouse administration and business intelligence (BI) tools.
Data Warehouse Architect
A person responsible for the modeling and design of data warehouse databases and the processes and systems that feed data into them.
A Data Warehouse containing web statistics.
An IT professional responsible for devloping physical data models and for maintaining physical, structured data assets.
Database Management System (DBMS)
A software system that facilitates the creation and maintenance of a database or databases, and the execution of computer programs using the database or databases. See also RDBMS
see Database Management System
A centralized database that has been partitioned according to a business or end-user defined subject area. Typically ownership is also moved to the owners of the subject area. (DM Review definition)
The system of determining who makes a decision, and when, and how, and under what circumstances. Formalizing Decision Rights is a key function of Data Governance.
Decision Support System
A term no longer widely used, that originally described computer system designed to collect, store, process, and provide access to information to support managerial decision making.
Finding the same (duplicate) entry in multiple files. Deduplication is used when merging two or more data sets. Deduplication is a useful tool when performing data mining tasks, where the data originated from different sources or different organizations. Synonyms are data matching and record linkage (a term used by statisticians, historians, and epidemiologists) Commercial mail and database applications refer to it as merge/purge processing or list washing. Other names used to describe the same concept include entity resolution, duplicate detection, record matching, instanceentification, and database hardening.
The process for removing any personallyentifying fields from a record. In the case of medical information, HIPAA requires both the deidentification of both obvious and remotely connected information. Removal ofentifying information can occur through several means, including filtering and the techniques above.
Something that must be provided to meet a commitment in a Service Level Agreement or a contract. Deliverable is also used in a more informal way to mean a planned output of any process. (Baseline ITIL definition)
Synonym for Plan-Do-Check-Act. An approach to tasks that builds in quality.
The introduction of a program, system, or application
Data that is the result of a computational step applied to other data.
Sorting of the selected information from a higher to a lower level.
An environment used to create or modify IT services or applications. Development environments are not typically subjected to the same degree of control as test environments or live/production environments.
see Data Flow Diagram
see Data Governance Office
A core function of a public key infrastructure (PKI). A digital signature can proveentity because it is created with the private key portion (which only the key holder should access) of a public/private key pair. Anyone with the sender's widely published public key can decrypt the signature and, by doing so, receive the assurance that the data must have come from the sender (nonrepudiation of the sender) and that the data has not changed (integrity). The data that is encrypted with the private key is not the entire message, but a short, fixed-length block of data that is computed from the message using a so-called "hash" function. (Gartner definition)
High level data grouping which serves as a mechanism for slicing the statistical measures of the warehouse, such as: geography, account, product and time.
Inconsistent, missing, incomplete, or erroneous data.
Distributed Data Management
A form of client/server computing in which some portion of the application data executes on two or more computers.
The conversion of paper documents into electronic documents. The process of imaging is also known as digitizing. As a discipline, this involves technology selection (scanners and similar device), and strategies for the storage and management of electronic images.
A function in which applications or middleware perform data management tasks tailored for typical unstructured documents (including compound documents). It may also be used to manage the flow of documents through their life cycles. (Gartner definition)
Document Type Definition (DTD)
In XML, a text file that specifies the meaning of each tag.
A category of related data elements in a data model. Also known as a Subject Area. Architects talk about working in the "Customer Domain" or the "Product Domain."
Data Quality Auditing
Data Quality Management
Data Quality Management System
Data Quality Objectives
The act of requesting data from two or more fact tables in a value chain in a single report.
Drilling anywhere allows exploration of all data in any direction.
The act of adding a row header or replacing a row header in a report to break down the rows of the answer set in greater detail.
Drilling downward allows going deeper into the data. Examples of this would be to go to the Month level from the Quarter level within Time.
Also known as rolling up data, drilling upward summarizes data, such as aggregating individual sales to a total sales.
Forces that tend to change a situation in desirable ways.
Decision Support System
Performing an appropriate amount of research and analysis to uncover issues of stakeholders.
Copyright 2004-2008 The Data Governance Institute, LLC. All Rights Reserved The site is brought to you in partnership with the Business Intelligence Network