# Data Mining

Basic data analysis

1)     Produce descriptive statistics for all variables.

Solution:

Attached in Work Book Excel

2.How is missing data handled? (Note: score1 = f (score2 and score3))

Ans.

3.What are the correlation values of the other scores with final score? (Hint use correlation from data analysis in Excel)

4.What hypotheses can you formulate from these results?

Data Mining – Prediction, Classification, & Rules

5.Use MLR to predict the final score value for each case on the test data worksheet.

6.How good is the MLR Model?

In questions 7 to 14, an “acceptable final score” is one that is above the mean. You will need to add another column to the worksheet call it “Acceptable” where 0 means not acceptable and 1 means acceptable.

2)     Use KNN to classify the test data on worksheet as “acceptable” or “non-acceptable”

3)     How good is the KNN Model?

4)     Use KNN to predict the final score value for each case on the test data worksheet.

5)     How good is the KNN Model?

6)     Use CART to classify the test data on worksheet as “acceptable” or “non-acceptable”

7)     How good is the CART Model?

8)     Use CART to predict the final score value for each case on the test data worksheet.

9)     How good is the CART Model?

Data Mining Associations

10) The dataset in the file Cosmetics.xlsx contains data on the purchases of different cosmetic items at a large chain drugstore. The store wants to analyze associations among purchases of these items for purposes of point- of- sale display, guidance to sales personnel in promoting cross sales, and guidance for piloting an eventual time- of- purchase electronic recommender system to boost cross sales. Run the Association Method on Cosmetics.xlsx

a.      Send the updated Cosmetics.xlxs spreadsheet as an attachment.

b.     For the first row, explain the “Conf. %” output and how it is calculated.

c.      For the first row, explain the “Support (a)” and “Support(c),” output and how it is calculated.

d.     For the first row, explain the “Lift Ratio” and how it is calculated.

e.      For the first row, explain the rule that is represented there in words.

f.      Interpret the first three rules in the output in words.

g.     Reviewing the first couple of dozen rules, comment on their redundancy.

h.     What advice would you give the chain drugstore based on this analysis.

Get a 10 % discount on an order above \$ 100
Use the following coupon code :
SKYSAVE