PCA and LDA for Breast Cancer Detection

Positioning

This page is an educational reading path, not a medical decision tool.

Breast cancer detection background

Medical datasets often contain many measurement features. The challenge is keeping the model accurate while preserving the important patterns in the data.

Wisconsin Breast Cancer Data

This dataset is popular for classification learning because it contains numeric features from cell characteristics. In the learning version, each feature group is explained as a signal for class separation.

Why high-dimensional data needs reduction

Too many features can make visualization difficult and expose models to noise. Dimensionality reduction helps find a more compact representation.

How PCA simplifies features

PCA finds directions of largest variance in the data. Principal components make the data easier to visualize and can become compact inputs for classification.

How LDA differs from PCA

LDA uses label information to find projections that separate classes. That makes it supervised, unlike PCA, which focuses on variance without reading labels.

Model-result comparison

This section reads how model performance changes when using original features, PCA, or LDA. Evaluation metrics are used to compare compactness and accuracy.

Visual interpretation of dimensionality reduction

2D and 3D visuals help reveal whether the two classes become more separated after transformation. This makes dimensionality reduction feel concrete, not just formulaic.

Model limitations

A structured dataset does not represent every clinical context. The educational model here is not intended for medical diagnosis.

Practice version on machinelearning.co.id

The practice version connects the PCA article, 2D/3D PCA lab, PCA vs LDA comparison, and classification evaluation step by step.