The merged company continues to grow More stores have been a
The merged company continues to grow. More stores have been added and Internet sales are growing. The company is now considering expanding internationally. Before expansion, the company wants to explore new ways to grow, cut costs, and increase profits. In order to study the company’s sales patterns and determine their needs, the management has decided to establish a data warehouse. 1. What other external feeds should be included? These can be from partners, government services, or other available RSS feeds. Explain. 2. When the data was moved into the merged database, duplicates were eliminated and errors removed. How do we ensure only clean data will be entered into the data warehouse? 3.How frequently should the data in the data warehouse be updated? Remember the cost is inversely proportional to the time, i.e., the shorter the load interval the higher the cost. Should the update interval be the same for all feeds? Justify your answer. 4.Can the updates be done in parallel? Do any require sequential order?
Solution
Answer:
1. The purpose of establishing a datawarehouse by a company is mainly to be able to make guided and data-driven decisions so as to gain competitive and strategic advantage in the business. In order to achieve the said purpose, it is required that datawarehouse is rich in data and makes data available for all perspectives possible. Hence, data is populated in datawarehouse from as many data sources as possible.
These resources may include not only the internal departments/vertical/units of business but also external sources that collaborate/partner/work with the business with respect to different business activities. External sources may also include legal or auditing agencies associated with business; law conformance, regulatory or other relevant govt. agencies needed to be compliant with etc.
2. As data in datawarehouse is used to make management decisions that may impact the business in short-term and long-term. It is imperative decisions made based on data in datawarehouse are such that benefit the business. For this to happen, it is necessary to ensure that data residing in warehouse is appropriate, consistent, integrated, accurate, timely, reliable, free from any errors, in proper and required format etc.
As data datawarehouse come from many sources, it is obvious to have certain issues and can not be fed to warehouse on as-it-is basis. Issues pertaining with data can be any depending on the nature and type of data source. Regularly occurring issues are inconsistency in data, presence and proportion of missing values, incompatible data formats, duplicates, timeliness, accuracy, reliability of data source etc.
Therefore, before feeding to datawarehouse, data need to be cleaned in terms of missing values, duplicates, inherent inconsistencies or errors. It also need to be converted to a common and required format. All these activities commonly referred as Pre-processing of data.
Hence, we can say that data needs to be pre-processed before being fed to datawarehouse.
3. The nature of data residing in warehouse is historical and static. It is not supposed to be in continuous state of updation because that may affect to stability and effectiveness of decision. However, it contains relevant data all the time, it is to be updated at regular intervals. The frequency and duration of these intervals depends on the requirement of business and vary from one business to other. On an average, it can be updated on daily, weekly or fortnightly basis as the case may be.
4. Parallel updates from different sources depend on mutual dependency of data sources. Updates from independent data sources can be run in parallel, while in case of dependency, it will need to be done in sequential order.
