Skip to main content

CS614 Current Midterm Paper Fall 2013 File shared by saki78695 File 1



Today CS614 Midterm Term Paper
List down four basic tasks of data transformation? Data Transformation
ƒ  Basic tasks
  1. ƒ  Selection
  2. ƒ  Splitting/Joining
  3. ƒ  Conversion
  4. ƒ  Summarization
  5. 5.      ƒ  Enrichment
Identify the given statements as correct and incorrect "The approach of TQM refers to the involvement of only 20% employee inthe continuous improvemnt process" and 2nd statement  was "orr's law says that data quality is a function of its use not its collection"
Solution: 1st is wrong 2nd is right
Stat 1.TQM  approach  is  advocating  the  involvement  of  all  employees  in  the  continuous
improvement process, the ultimate goal being the customer satisfaction.
Stat:Law #2:  “Data quality is a function of its use, not its collection!”
 Identify the given statement as correct and incorrect "in Molap the complexity cannot go beyound o(1) in any case" 2nd statement was "Drill down is a cube operation and its basic purpose is to select and project"
Solution: both are incorrect
1st:The only time the time complexity goes beyond O(1) is when the cube size is so large that it can not fit in the main memory, in such a case a page or a block fault will occur.
2nd:Drill down is cube operation BUT its basic purpose is “get more details”
 if dirty data in DWH is used by  the government for decision making then what would be  the effects?explain with exemple
Solution:
Serious Problems due to dirty data
ƒ  Decisions  taken  at  government  level  using  wrong  data  resulting  in  undesirable results. 
•  In direct mail marketing sending letters to wrong addresses loss of money and bad
reputation.
Administration: The government analyses data collected by population census to decide
which  regions  of  the  country  require  further  investments  in  health,  education,  clean
drinking water, electricity etc. because of current and expected future trends. If the rate of
birth in one region has increased over the last couple of years, the existing health facilities
and  doctors  employed  might  not  be  sufficient  to  handle  the  number  of  current  and
expected patients. Thus, additional dispensaries or employment of doctors will be needed.
Inaccuracies  in  analyzed  data  can  lead  to  false  conclusions  and  misdirected  release  of
funds with catastrophic results for a poor country like Pakistan.

Supporting business processes: Erroneous data leads to unnecessary costs and probably
bad reputation when used to support business processes. Consider a company using a list
of consumer addresses and buying habits and preferences to advertise a new product by
direct mailing. Invalid addresses cause the letters to be returned as undeliverable. People
being duplicated in the mailing list account for multiple letters sent to the same person,
leading to unnecessary expenses and frustration. Inaccurate information about consumer
buying  habits  and  preferences  contaminate  and  falsify  the  target  group,  resulting  in
advertisement of products that do not correspond to consumer’s needs. Companies trading
such data face the possibility of an additional loss of reputation in case of erroneous data.
   identify the given statement as correct and incorrect"Transactional fact table always stores the complete records for the event that dont occur?
Solution:False Statement
Correct is:
Transactional fact tables don’t have records for events that don’t occur
ƒ  Example:  No records(rows) for products that were not sold.

Comments

Popular posts from this blog

CS614 Quiz No.4 Shared by Princess (solved), Spring 2014

  “What means What”. The phrase refers to: Select correct option:  Meta data  External data Transformed data Internal representations Question # 2 of 10 Which of the following is NOT one of the activities of “Maintenance and Growth” phase in Kimball’s DWH development approach? Select correct option: Education Technical Education Program Support  Interface Deployment                 Question # 3 of 10 Horizontally wide data means: Select correct option: Dataset has large no. of attributes Dataset has large no. of records Dataset has attribute skews Dataset has partitioning skews                 Question # 4 of 10 Which of the following is NOT one of the top-10 mistakes that should be avoided during DWH development? Select correct option: Not interacting directly with end ...

CS614 Quiz No.4 Shared by MT Khan (Solved)

Question # 1 of 10 ( Start time: 09:04:39 PM ) Total Marks: 1 A typical cycle of implementing the change in DWH comprises of the sequence: Select correct option: Production -> QA -> Development Development-> QA -> Production(CORRECT) Development -> Production -> QA Production -> Development -> QA Question # 2 of 10 ( Start time: 09:05:16 PM ) Total Marks: 1 Vertically wide data means: Select correct option: Dataset has large no. of attributes Dataset has large no. of records(CORRECT) Dataset has attribute skews Dataset has partitioning skews Question # 3 of 10 ( Start time: 09:05:43 PM ) Total Marks: 1 In ___________ phase of kimballs approach, we identify the components needed now and in future. Select correct option: Requirement definition Architectural design Product development Analytical application development Question # 4 of 10 ( Start time: 09:06:56 PM ) Total Marks: 1 Technical architecture design supports the communicat...

CS614 Quiz No.3 Shared by Students (Solved), Spring 2014

______ index stores first value in each block in the sequential file and a pointer to the block.  Select correct option:   Dense  Sparse  B-Tree  Hash In context of data parallelism, the work done by query processor should be:  Select correct option:  Almost zero  Maximum  Pipelined  Filtered across partitions The optimizer uses a hash join to join two tables if they are joined using an equijoin and  Select correct option:   Outer table has less number of rows  Inner table has less number of rows  Cardinality of tables is equal  Large amount of data needs to be joined Bitmap index is appropriate for:  Select correct option:  Low cardinality data  High cardinality data  Clustered data  Aggregated data If a task takes “T” time units to execute on a single data item, then execution of this task on “N” data items will take __...