Thursday, March 18, 2010

Data Management

  • Data Management
    • is the process of managing data as a resource that is valuable to an organization or business. One of the largest organizations that deal with data management, DAMA (Data Management Association), states that data management is the process of developing data architectures, practices and procedures dealing with data and then executing these aspects on a regular basis.
  • A Database Management System (DBMS)
  • is a set of computer programs that controls the creation, maintenance, and the use of the database with computer as a platform or of an organization and its end users. It allows organizations to place control of organization-wide database development in the hands of database administrators (DBAs) and other specialists.

  • A DBMS
  • is a system software package that helps the use of integrated collection of data records and files known as databases. It allows different user application programs to easily access the same database. DBMSs may use any of a variety of database models, such as the network model or relational model. In large systems, a DBMS allows users and other software to store and retrieve data in a structured way. Instead of having to write computer programs to extract information, user can ask simple questions in a query language.

Traditional File Processing
Traditional File-Based Approach
 Collection of applications that each define and manage their own files
 File: collection of records containing logically-related data (Pascal’s files of
records, C++’s “struct” or “class” declarations, COBOL’s Data Division)
 Tight integration between program and data (files)
– physical storage structure visible in application code (e.g., ISAM, VSAM)
– run-time performance can (and must) be programmed

Limitations of File-Based Approach
 Tendency to separate and isolate logically-related data
– subsequent data models capture more information: true DB software “knows
about” some inter-entity or inter-file relationships
 Separate files ) redundancy in defining and storing data
– wasted storage space (only a problem for large applications)
– redundant efforts to enter replicated data and maintain its consistency (seri-
ous problem)
 Program-data dependence: data definition in application program ) pro-
gram valid for only one DB with a fixed structure
 Labor intensive
– maintenance difficult (duplication, procedurality)
– inter-file links must be coded in programs
3


  • Data Mining
  • Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations

  • Data Warehouse
  • Data warehousing has quickly evolved into a unique and popular business application class. Early builders of data warehouses already consider their systems to be key components of their IT strategy and architecture. Numerous examples can be cited of highly successful data warehouses developed and deployed for businesses of all sizes and all types. Hardware and software vendors have quickly developed products and services that specifically target the data warehousing market. This paper will introduce key concepts surrounding the data warehousing systems.

  • What is a data warehouse? A simple answer could be that a data warehouse is managed data situated after and outside the operational systems. A complete definition requires discussion of many key attributes of a data warehouse system. Later in Section 2, we will identify these key attributes and discuss the definition they provide for a data warehouse. Section 3 briefly reviews the activity against a data warehouse system. Initially in Section 1, however, we will take a brief tour of the traditions of managing data after it passes through the operational systems and the types of analysis generated from this historical data.

Key developments in early years of data warehousing were:
1960s General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.
1970s ACNielsen and IRI provide dimensional data marts for retail sales.
  • 1983 Teradata introduces a database management system specifically designed for decision support.
  • 1988 — Barry Devlin and Paul Murphy publish the article An architecture for a business and information system in IBM Systems Journal where they introduce the term "business data warehouse".
  • 1990 — Red Brick Systems introduces Red Brick Warehouse, a database management system specifically for data warehousing.
  • 1991 — Prism Solutions introduces Prism Warehouse Manager, software for developing a data warehouse.
  • 1991 Bill Inmon publishes the book Building the Data Warehouse.
  • 1995 — The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.
  • 1996 Ralph Kimball publishes the book The Data Warehouse Toolkit.
  • 1997 — Oracle 8, with support for star queries, is released.

0 comments: