1 Define the KDD process and describe all its step 2 Differe
1- Define the KDD process and describe all its step?
2- Differentiate between the classification of data and clustering of data with the help of suitable examples.
3-Explain different approaches to handle the problem of missing values of attributes while data cleaning.
4- Why do we need preprocessing of the data? Explain any four data preprocessing techniques with the help of suitable examples 4
Solution
KDD process:
Knowledge detection in database (KDD) is the procedure of discover useful information as of a compilation of information. This extensively second-hand data removal means is a method that include information grounding and assortment, information washing out, incorporate preceding information on statistics set and interpret precise solution as of the experimental grades.
main KDD request area comprise advertising, deception discovery, cable and developed.
ladder concerned in the whole KDD development are:
name the objective of the KDD procedure as of the customer’s viewpoint.
appreciate submission domain mixed up and the information that\'s necessary
choose a objective information position or separation of information samples on which detection is be perform.
wash and pre-process information by decide strategy to feel absent field and change the information as per the needs.
want information removal algorithms to discover hidden patterns. This process includes deciding which models and parameters might be inappropriate for the in general KDD procedure.
exploration used for pattern of notice in a exacting emblematic shape, which comprise organization system or foliage, falling off and cluster.
understand unnecessary information from the mine pattern.
employ the facts and integrate it keen on an additional organization for additional achievement
text it and build information for paying attention party
all its step:
The on the whole procedure of pronouncement and interpret pattern from information involve the frequent deference of the next steps:
budding an understanding of
the request domain
the pertinent prior information
the goal of the customer
create a aim data set: select a information set, or focus on a separation of variables, or data sample, on which detection is to be perform.
information cleaning and pre-processing.
taking away of sound or outliers.
collect essential in order to model or explanation for sound.
strategy for treatment absent data field.
secretarial for time series in series and recognized modify.
in order decrease and outcrop.
judgment useful kind to stand for the information depending on the objective of the job.
choose the data removal job.
decide whether the goal of the KDD process is categorization, weakening, cluster, etc.
choose the information mining algorithm(s).
select method(s) to be second-hand for penetrating for pattern in the information.
decide which model and parameter may be suitable
Data mining.
penetrating for pattern of attention in a meticulous representative shape or a locate of such representation as categorization rules otherwise foliage, weakening, cluster, and so onward.
interpret mine pattern.
consolidate open acquaintance.
2)Answer:
distinguish flanked by categorization of data plus cluster of data:
classification:
classification is oversee information system second-hand to allocate per-defined label to illustration on the basis of sort.
->categorization algorithm requires tuition information.
->categorization , replica is uses pre-defined instances.
->through categorization the group are particular before hand, with every exercise information case in point belonging to a exacting class
->categorization algorithms are hypothetical to be taught the association between the type of the case and the group of students they go to.
->Example :
An cover corporation trying in the direction of assign clientele into high danger in addition to low risk category.
CLUSTERING:
cluster is unsubstantiated method used to collection dissimilar instance on the foundation of description.
->cluster does not necessitate schooling information.
->cluster does not allocate pre-defined tag to each in addition to every collection.
->With cluster the group (or clusters) are base on the similarity of data instance to every previous.
->No predefined output class is used in training and the clustering algorithm is supposed to learn the grouping
->Example :
An online movie corporation recommend you a film since other clientele who had complete similar movie choice as you inside the past have favourably rated so as to picture.
--------------------------------------------------------sc
pc