Data quality by and large means the fittingness of usage of the information. It concerns on how the informations can run into concern regulations and aid in organisation ‘s undertaking. In add-on, informations integrating is ever a ‘good spouse ‘ of informations quality. In order to hold a good information quality, it involves integrating of informations. Data integrating by and large means uniting informations or information which resides in different beginnings to work as a individual value. This article emphasizes on the importance of informations quality and integrating for organisations. Besides, the informations ownership besides being discussed a small spot, which can assist organisations to understand classs of people who invariably use information. There are besides assorted methods for informations quality and integrating proposed by research workers and expertness from all over the universe. However, within this article, merely some of them are discussed.
Keywords: informations quality, informations integrating, informations ownership, informations quality methodological analysis, informations integrating methodological analysis, informations administration, ISO 8000.
Everyone has a different position of informations quality. To the history director, informations quality means accurate computation of client activity. To the medical industries, informations quality means ability for linkage of patient ‘s record or medical information. Each definition is concentrating towards the person ‘s position of what is good and what is non for their organisation. In general, we define informations quality in term of fittingness for usage, which means can run into user outlooks. In pattern, this means placing a set of informations quality aims accompanied with any informations set and so mensurating that information set harmonizing to those aims. Most of organisations presents are concentrating on undertaking possible users or clients. Unfortunately, they do non cognizant and giving deficiency attendings towards the importance in quality and integrating of informations. Poor data quality will take to inefficient of concern processing and operation.
Definition of Data Quality and Data Integration
Harmonizing to Data for Development, Inc ( 2002 ) , ‘data quality ‘ is a metric whereby the value of the informations can be measured. However, informations quality is besides an exploitable doctrine whereby it can be controlled ( the value can be increased or decreased ) .
Data quality consist of measuring and procedure, hence, pull stringsing quality degrees of the informations can do a mensurable alteration in the value the plan that the information is used. Therefore, organisation should supply plan to guarantee informations quality is maintained for dependability.
Harmonizing to SearchDataManagement.com ( 2008 ) , informations quality can be defined as the dependableness and effectivity of informations, chiefly in a information warehouse. In order to keep the quality of the informations, it is required to continuously revision and look intoing the information. This involves updating, standardising, canceling duplicate, etc. Data quality is critical from a concern position ; hence most of big organisations hire expertness in informations direction to be in-charge of informations quality.
Harmonizing to Wikipedia.org, informations can be considered as high quality if it could run into the demand for operation, planning, and determination devising. In add-on, information is considered as high quality if they could stand for the existent concept of the concern regulations of the organisation right.
Harmonizing to International Association for Information and Data Quality, the term ‘data quality ‘ is defined as informations that can run into the outlooks of information professionals and client, which include all features of the information merchandises and services to accomplish the organisation ‘s mission. Furthermore, information quality is the phase where information can run into the demands of the information professionals to execute their occupations. In brief, informations quality is about informations that is fit for use, which fulfill the demand of the writers, decision makers, and users.
2.2 Data Integration
Harmonizing to Wikipedia, informations integrating can be defined as meeting informations which resides in separate beginnings and giving incorporate position of the informations to the users. Data integrating is of import, particularly for scientific ( to unite research consequences ) and commercial ( to unify databases ) field. Data integrating is needed when the sum of information is increasing continuously and many parties need to portion that informations.
Harmonizing to SearchCRM.com, information integrating is about fall ining separate parts in order to do these parts can work together. There are a figure of general use of informations integrating, which include the integrating during the development of merchandise ( constituents that have been produced individually are combined ) and besides the integrating performed by organisations to convey merchandises from different makers together for a incorporate working system.
Harmonizing to University Of Northern British Columbia GIS & A ; Remote Sensing Lab Glossary, informations integrating can be defined as uniting the information files or database from separate functional units of organisations or among separate organisations which collect the information on the similar entities.
In the other manus, The Advertising Research Foundation ( 2003 ) defines data integrating as a formal procedure of uniting the information from different informations beginnings and utilizes the information within the database accurately by sing certain values that is non gettable in individual informations beginning.
Ownership Of Datas
Harmonizing to Loshin ( 2001 ) , there are 11 types of parties ( organisations or persons ) that can be considered or could claim themselves as the proprietor of the informations. They include ;
The organisation or persons that produce or generate the informations can be considered as the proprietor of the information. The Godhead brings a bad investing when they create the information to place value from that information in the hereafter. For illustration, the conditions prognosis establishments that analyze and predict the conditions. The aggregation of all parts that has been used can bring forth a utile information set. This establishment make the information about the conditions, therefore it can claims the ownership of the informations.
The organisation or persons that uses or devour the information besides considered as the proprietor of the information. The consumer obtains the value from the informations. For illustration, a gross revenues organisation which utilizes information provided from different organisations. The information becomes of import to the appropriate operation of the squad. Therefore, the gross revenues squad will claim ownership of the informations they uses.
The compiler selects information from different beginnings and compiles them. By uniting informations sets, the compiler is adding value and may look frontward acquire the advantage of ownership. For illustration, a company that provides the service to roll up newspaper articles for peculiar subjects. This service has created a unit of information which is more valuable than the scattered points.
Datas that has been conveying into the organisation or created within it can be considered as owned perfectly by the organisation. The organisation utilizes all input and produced informations for its go oning informations processing demands. The value received from the information exists in the organisation as a whole. For illustration, the banking industry that collects information from external informations sellers and besides informations within the industry. These informations are combined into a individual operational informations centre and so distributed with some added values to the consumers and persons outside from the endeavor.
Funding organisation consists of two ; the organisation that financess for the informations creative activity and besides the organisation that creates the information. For illustration, the ministry or large organisations that provide financess for establishments that involve in R & A ; D field, to make their research.
The decipherer is the organisation or persons that translated the information that exist in peculiar format, which hard to be read or entree. The decipherer can be considered as the proprietor of the information. The cost of decrypting procedure and executing is in the value to be obtained from the information. For illustration, the organisation that translated information from videotapes which is no longer utilize in presents into certain clear format.
The packager is the persons or the organisations that set-up the information for a certain use. There is value added through the packaging procedure, whereby the collected information has become utile. For illustration, writer who had published books.
The value of any informations that can be read is utilized by the reader. The reader obtains value by adding that information to the information depository. The of import portion is in the procedure of choice and use of informations by the reader. For illustration, confer withing houses that provide expertness patterns in certain countries. In order to go an expert, persons have to acquire the cognition in the pattern country by reading every bit much information about the country.
It involves the personal privateness or image right of first publication of the persons. The persons or topics can claim ownership of the informations. For illustration, informations about patients and medical specialty that the patient consumed. The single patients may claim ownership of their personal information and claim that the infirmary had no right to sell it.
The person or organisation that licenses or purchases the information may considered as the proprietor of the information. The buyer believes that the investing made to acquire the informations entitled them to acquire the ownership.
Some people think that informations should be available to all without any limitation. This ownership is applied to certain phase in scientific communities. The end is to enlarge the cognition of a peculiar country.
Criteria for a Good Data Quality and a Good Data Integration
Research workers and writers all over the universe had conducted surveies and suggested standards or dimension in order to hold good informations quality and informations integrating. Capello, et. Al. ( 2004 ) had suggested an architecture for appraisal of informations quality. The architecture is composed by three faculties, including Selector, Quality Assessment and Profiling. The figures of the proposed architecture are as below ;
Figure 1: Data Quality Architecture
( Beginning: Capello, et. Al. ( 2004 )
The Selector faculty determines which data is accessible for user. The Service conveys request from user to the Selector faculty in order to choose and recover the needed informations. Datas are kept in the Data Repository. The Quality Repository contains the quality metadata associated with informations. The informations retrieved by the Selector faculty are sent to the Quality Assessment faculty which identifies the appropriate rating for the information before directing the information to the user. The Profiling contains features of users who entree the system. Therefore, if this architecture being implemented, merely quality informations will be provided to users, every bit good as informations is protected from any unauthorised users.
Furthermore, Pipino, et. Al. ( 2002 ) had suggested a set of dimension which contains 16 standards that a information should hold in order to be considered as a good information quality and ready for integrating. The dimensions can be simplified into a tabular array as below ;
Table 1: Dimension of Data Quality
Data is accessible, available or can be retrieved easy.
Appropriate sum of informations
The sum of information is appropriate and suited for the occupation handling.
Data is considered as right and trusty.
Data is complete and adequate in comprehensiveness and deepness for the occupation.
Data is compactly represented.
Data is represented in a similar format.
Ease of Manipulation
Data is easy to be exploited, applied and can be integrated for different undertakings.
Data is dependable and right.
Data is in suited linguistic communications, units, symbols and the definition is clear.
Data is non bias.
Data is applicable and utile for the occupation.
Data is extremely regarded harmonizing to its contents and beginnings.
Entree to informations is restricted suitably to protect the information.
Data is ever up-to-date and suited for the undertaking.
Data is easy to understand.
Datas can supply benefit and advantage.
Importance of Data Quality and Integration
Many writers had been written and identified the importance of informations quality and integrating for the organisation. Those thoughts had similarities on how data quality and integrating can profit the organisation.
Importance of Data Quality
Harmonizing to Hall ( 2005 ) , informations quality is of import because a hapless informations quality can take to assorted jobs, such as misguided selling publicity and wrong information being send to user ( incorrect spelling of names, rubric, company, etc. ) . Customer will see that the organisation has less credibleness and they will doubt to utilize the merchandises or services provided by the organisation. In today ‘s scenario, the current focal point is towards client relationship direction ( CRM ) . This means that information supports on distributed here and at that place, in and out of the organisation. Therefore, informations quality and integrating is of import for the use of web-based applications, client information systems, etc.
Similarly, Longbottom ( 2007 ) besides had pointed out that hapless informations quality leads to the incorrect spelling of names and user ‘s ( client ) gender had been assumed wrongly by the organisation. In add-on, there are besides multiple records of the same individual. This could convey down the good image of the company. The worst portion is when client had wrongly being charged for the service that they did non used. Therefore, Longbottom ( 2007 ) had recommended organisations to utilize informations cleansing attack by engaging other companies that provide such services. For illustration, a company call Datanomic in United Kingdom provides on-site attack of full informations cleansing across multiple informations beginnings.
Furthermore, Loshin ( 2001 ) had provided several importance of informations quality for organisation. The importances are as follows ;
Good quality of informations leads to operational efficiency
Information processing has similar fabrication concatenation as the fabrication concatenation of other merchandises. Information is invariably processed in and out of treating stage. During this procedure, a figure of operations are conducted by utilizing the information. If a merchandise could non make a certain criterion, it has to be rejected or fixed. The same thing occurs for information, whereby when information is wrong, record demands to be fixed or deleted. This leads to detain of clip processing. A good informations quality can bring forth information harmonizing to the clip projection.
Good informations quality is critical to help determination devising
As we know, information is of import for a decision-making. If the information is of hapless quality, directors may depend on decisions obtained from uneffective premise.
Good informations enhances informations warehouse public-service corporation
Data warehouse are used for analytical processing. Anything that develops the capableness to analyse informations increases the value of informations warehouse. If the information quality in the warehouse is hapless, a long clip is spent to follow and take errors.
Good informations leads to client trueness, bad informations leads to client abrasion
As what other writers had mentioned, there are instances whereby client information being described wrongly in the client record, charge, etc. These instances normally end up with the client halt utilizing the services. This job can be worse if the organisation has no capablenesss to observe mistakes. In the other manus, if a possible client is presented with a high-quality information, the image and credibleness of the organisation is enhanced. This can better the chance to turn a possible client into a trueness client.
Poor informations quality leads to breakdown in organisational assurance
If an organisation shows the deficiency of ability to pull off information direction issues, it can raise the wonder whether the organisation can manage critical procedures. If clients lose their trust for the organisation, it leads to client abrasion. If employees do non swear in the company ‘s ability to make concern, employee might go forth the company.
Good informations enable system and informations migration undertakings
Good informations enable the company to get the right information about the information and systems that are being migrated. However, bad informations could curtail this. It is due to the inclination of the implementers to plan foremost and papers subsequently. After certain clip, systems are modified or fixed but without any updates to the certification. This state of affairs brings troubles to the planimeters because they need to go ‘information archaeologists ‘ to delve out what is incorrect with the system.
Good information additions ROI ( Return On Investment ) on IT investing
If organisation implements the right procedures, they can restrict the downtime and procedure failure. Organizations can continue with informations treating in a greater bandwidth because they do non hold to analyze and repair informations quality jobs. This can increase processing volume without an addition in resources. Therefore, it can increase return on information engineering investing.
Importance of Data Integration
Harmonizing to Marco ( 2004 ) , informations integrating can cut down IT redundancy. This means that, duplicate of informations can be avoided.
In add-on, informations integrating can forestall IT applications failure. When a corporation decided to implement IT initiatives ( such as informations warehouse, endeavor resource planning ( ERP ) , client relationship direction ( CRM ) , etc. ) , the failure is between 65 % – 80 % ( Marco, 2004 ) . When Marco had examined the causes of the failure, several causes had been found. First, those undertakings did non run into a definable and mensurable concern demand. Second, the undertakings have troubles to acknowledge the bing IT environment of the organisation and its concern regulations ( information elements, informations flows, seller applications, usage applications, etc ) .
Harmonizing to Claiborne, et. Al. ( 2003 ) , there is assorted importance of informations integrating. First, informations integrating enables organisation to analyse current values and tendencies. In an operational environment, organisations use question and coverage tools together with production studies to find current position. They by and large summarize the information and merely keep historical informations values for a limited clip. Though an operational system provides the most current values, these values may non be appropriate for tracking and analysing how something has changed through the clip. Data integrating assists this work because it combines informations that are resided in different system together.
Second, informations integrating enables organisation to handle informations as a corporate plus. Data is the plus that can turn and reproduce with no bound. Data integrating helps to make a individual version of dependable informations so that companies can handle informations as the tremendous plus. To make this efficaciously, the line of descent of the informations, its beginning and/or derivation, must be readily available.
Furthermore, informations integrating provides a model that assist organisations to convey a complete position of a client, to liberate from the treating load on operational systems, to standardise the concern procedures and informations definitions, and besides to unite current and past values from different beginnings in order to see the large position. ( Claiborne, et. al. , 2003 ) .
Methodologies for Data Quality and Data Integration
Methodology for Data Quality
There are broad Numberss of methodological analysiss and techniques that has been proposed by writers and research workers all over the universe. Bartini, et. Al. ( 2009 ) summarized the methods for informations quality into a list in the tabular array as below ;
Table 2: Data Quality Methodologies
Entire Data Quality Management
Datawarehouse Quality Methodology
Jeusfeld et Al. 1998
Entire Information Quality Management
A methodological analysis for information quality appraisal
Lee et Al. 2002
Canadian Institute for Health Information methodological analysis
Long and Seko 2005
Data Quality Assessment
Pipino et Al. 2002
Information Quality Measurement
Eppler and MA? unzenmaier 2002
ISTAT methodological analysis
Falorsi et Al 2003
Activity-based Measuring and Evaluating of merchandise information Quality ( AMEQ ) methodological analysis
Su and Jin 2004
Loshin Methodology ( Cost-effect Of Low Data Quality
Data Quality in Cooperative Information Systems
Scannapieco et Al. 2004
Methodology for the Quality Assessment of Financial Data
De Amicis and Batini 2004
Comprehensive methodological analysis for Data Quality direction
Batini and Scannapieco
Table 3: Methodologies and Types of Schemes
Data-driven and Process-driven
Data and schema integrating
Error localisation and rectification
Cost optimisation Process Control
Data and schema integrating
Error localisation and rectification
These methodological analysiss can be classified into three parts ; audit methodological analysiss, operational methodological analysiss, and besides economic methodological analysiss ( Bartini. et. al. , 2009 ) . Audit methodological analysiss accent on the assessment stage and supply limited support to the betterment stage. Operational methodological analysiss accent on the proficient issues of both the appraisal and betterment stages but do non turn to economic issues. Economic methodological analysiss accent on the rating of costs. ( Bartini. et. al. , 2009 ) . The categorization is described by Bartini, et. Al. as in the figures below ;
Figure 2: Categorization of Methodologies
( Beginning: Bartini. et. al. , 2009 )
Methodology for Data Integration
Kamel & A ; Zviran ( 1991 ) had suggested a five-step methodological analysis for informations integrating. The stairss include ;
Policy of integrating preparation
It involves make up one’s minding the integrated planetary position for each site of informations, every bit good as finding subschema that each site of informations will utilize and portion with the other site of informations. This determination involves the high degree direction of organisation.
Each site of informations will translated into an tantamount scheme by utilizing a common information theoretical account. The end product scheme is known as common-model local scheme. There are many types of theoretical account can be used, depending on demands and penchants, such as attribute-based informations theoretical account ( ABDM ) , Entity-Relationship theoretical account, etc.
Each scheme will be evaluated and analyzed to find possible struggle, which include name struggle, structural struggle, scale struggle, and struggle in application semantic.
After struggle has been identified, attempts will be made to work out them. Users ‘ feedback is of import at this phase in order to clear up the semantics or relationship of each scheme.
Global scheme meeting
This involves unifying each scheme into a planetary scheme. The consequence will be evaluated and restructured so that it will supply quality informations for users.
Practices of Data Quality and Data Integration
Harmonizing to Loshin ( 2001 ) , there are is a figure of countries which data quality and integrating can be implemented within database direction. They are ;
Data Quality and Operation
Business regulations can be considered as guideline for informations quality regulations. This is because informations quality is an indispensable portion of any operational specification. Organization can streamline its operation by implementing informations quality and integrating methodological analysiss, so that good informations can interact between each other to help concern operations. This can forestall bad informations from doing the concern flow to decelerate and forestalling wrong information from come ining the system.
Data Quality and Databases
Databases are designed with precautions for informations quality, such as void testing, informations standardization, referential unity, etc. In add-on, rule-based informations quality can be used to steer the entry of information into the database.
Data Quality and Data Warehouse
Data warehouse and informations marketplace is usage for analytical environment. Flawed informations will convey blemished consequence and flawed determination. If an organisation can guarantee merely high-quality of informations enters the warehouse, it can diminish possible of blemished determination.
Data Quality and Electronic Data Interchange
Electronic Data Interchange ( EDI ) is the term used for standardised format which represent concern information for electronic communicating. EDI is implemented through a procedure of concerted informations standardisation within a peculiar environment. EDI enables straight-through processing ( STP ) , which is the ability to automatize concern operation. Any STP and EDI are largely based on informations quality.
Data Quality and World Wide Web
Internet can be considered as the largest database in the universe. The World Wide Web plays function as the information depository. Web sites can be categorized into some sort of information services, including informations presentation system, informations aggregation system, and besides database question system. Therefore, guaranting informations quality at the satisfactory degree is critical as the Internet is kept on turning twenty-four hours by twenty-four hours.
8.0 Data Quality, Data Governance and Data Integration Trend
8.1 Datas Administration
I ) ISO 8000
Harmonizing to Wikipedia.com, it is a criterion for informations quality, which is an ISO criterion under development. ISO 8000 is similar to other ISO and IEC criterions, which is copyrighted and is non freely available. The parts that has been published:
Part 100: ( ISO/TS 8000-100:2009 ) – overview.
Part 102: ( ISO 8000-102:2009 ) – vocabulary.
Part 110: ( ISO 8000-110:2009 ) – semantic encryption, sentence structure and informations specification conformity.
Part 120: ( ISO/TS 8000-120:2009 ) – birthplace.
Part 130: ( ISO/TS 8000-130:2009 ) – truth.
Part 140: ( ISO/TS 8000-140:2009 ) – completeness.
There are parts which are undergoing on development of ISO TC 184/SC 4:
Part 1: ( ISO/TS 8000-1 ) – Overview, demands and rules.
Part 2: ( ISO/TS 8000-2 ) – Vocabulary.
Part 150: ( ISO/TS 8000-150 ) – model for quality direction.
8.2 Data Quality and Integration Forecast
Kelly ( 2010 ) , a intelligence editor of SearchDataManagement.com had written an article about experts doing a anticipation on tendencies of informations quality and informations integrating for the twelvemonth 2010. The experts are Aaron Zornes, San Francisco main research officer, and Rob Karel, the chief analyst at Cambridge. Rob Karel had provided a list of anticipations for informations integrating and informations quality as below ;
Data administration engineering supports and patterns will take the informations quality treatment.
Real-time informations quality services will derive power.
Data integrating and information quality will maintain on researching the inundation of information.
Enterprise integrating schemes will unite the competences of application integrating and information integrating into a shared service organisation.
Aaron Zornes, on the other manus provide a set of anticipation as below ;
In 2010, most organisations will emphasis on informations administration range as they focus on merchandise, client, or seller administration.
In 2011, most of systems planimeters and system advisers will concentrate on bring forthing informations administration models.
In 2012, merely 25 % organisations will actively include procedures and competences of informations quality in their informations integrating undertaking.
In 2012, organisations that could non implement informations direction and integrating to their bing informations warehouse will non able to prolong their operational analysis.
By the terminal of 2012, the separate markets for tools of informations integrating and information quality will unify into one.
Most of organisations presents are concentrating on undertaking possible users or clients. Unfortunately, they do non cognizant and giving deficiency attendings towards the importance in quality and integrating of informations. Poor data quality will take to inefficient of concern processing and operation. Therefore, organisation should implement or pattern informations quality methodological analysiss in order to assist their organisation to win in concern operation.