1Which of the following are the key ideas underlying Classif

1.Which of the following are the key ideas underlying Classification and Regression tree

Trimming

Recursive Partitioning

Pruning

Feedforward Partitioning

None of the Above

2.What does it mean for Data Mining Method to be Data driven, not data driven? Provide two examples of models, which are considered data driven and explain

3. List one Example of a Model-driven Method and Explain

4.Name and define three types of layers shown in Neural Network

5.Which of the following are commonly used to measure Impurity in a Classification tree

ROC Curve

Gini Index

Confusion Matrix

Entropy

None of the Above

6. In a neural network, define and explain what the w and 0 values represent and how they are used in the model

7. In Neural Networks, the characteristics of a one-way flow, no cycles is described as Multilayer Feedforward Network

True

False

8. Which Data-Mining Method does CHAID (Chi-Squared Automatic Interaction Detection) apply to and what is the primary purpose?

9. In looking for the probability of passing an exam versus the number of hours studying. What type of curve is shown in the following illustration?

Probibility Curve

Naïve-Bayes Curve

Linear Regression curve

Logistic Regression curve

None of the above

10. In a Neural Network, define and explain the following

Case Updating-

Batch Updating-

11. Which of the following are true as it applies to Naïve Bayes

Incorporate the Concept of conditional probability

Named after the Reverend Thomas Bayes

Can only be used with Categorical Variables

Data-driven not model driven

None of the above

12. Which of the following characterize Classification and Regression Tree

Considered highly transparent, easy to interpret

Can be used for either Classification or Prediction

Model-Driven (requires the assumptions of statistical models)

Computationally cheap even on large samples

All of the above

13.What is meant by the term “blackbox” and which data model does it generaly apply to

Which of the following is a more effective visualization of the data

Pie

Bar

Both graphs are equally effective

14. For evaluating regression results, is it better to use Adjusted R Squared or R Squared? Explain

15. Define and Explain the following terms, as they apply to variable selection.

16.Why is it important to partition data when we develop a model? List two types of partitions and explain

16b. Explain how lift charts are used to explain model performance

17. In evaluating model performance, which of the following metrics is most useful and why?

18.Oversampling Is used when the event of interest is rare

True

False

19.Which of the following is true as it describes the Naïve Rule

Classify all records as belonging to the most prevalent class

Is another term for Naïve Bayes

Often used as benchmark

Using external predictor info should outperform the Naïve Rule

All of the above

20. Which of the following characterizes k-nearest neighbors:

Used for classification (categorical outcome) or Prediction (numerical outcome)

Highly automated, data driven method

Rules on distance between records to determine neighbors

Used R-Squared to evaluate performance

All of the Above

21.Correlation Analysis is a key step in Dimension Reduction

True

False

22.A model that fits the training data perfectly leaving no error (residuals) is likely to perform well with new data

True

False

23. A chart that plots the pairs (Sensitivity, 1-Speciality), as cutoff value increases from 0 and 1 is known as a ROC curve

True

False

24. When the event of interest is rare, which method may be appropriate in order to develop a model?

Overfitting

Oversampling

PCA

CHAID

None of the Above

Solution

Solved three problems, post multiple question to get remaining answers

Q1) The correct answer is Option B, Recursive Partitioning

Explanation: There are two methods of classifications that are recursive partitioning and prunning

Q2) There are two types of things: Data driven and Data Informed

Data Driven - You are using the data provided by the other companies to make your estimations/model

Data Informed - You are yourself collecting the data which has less probability of error at the time of making a design

Q5) The correct answer is Option B Gini Index

Gini Impurity determines the amount of time the element of set will be wrongly labelled

1.Which of the following are the key ideas underlying Classification and Regression tree Trimming Recursive Partitioning Pruning Feedforward Partitioning None o
1.Which of the following are the key ideas underlying Classification and Regression tree Trimming Recursive Partitioning Pruning Feedforward Partitioning None o
1.Which of the following are the key ideas underlying Classification and Regression tree Trimming Recursive Partitioning Pruning Feedforward Partitioning None o

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site