2025年4月21日 星期一

Pandas DF 專換訓練ML 模型 使用 Pytorch 完整流程範例

 從 Pandas 資料 ➜ 標準化 ➜ 分群 ➜ 建立 PyTorch Dataset ➜ 模型訓練。

我會以下列順序一步步教你完整流程範例。


✅ 假設你已經有:

python
type0data, type1data, type2data # NumPy 陣列,每個 shape=(N, 6),特徵為 b0~b5

🧱 Step 1:組合所有類別 & 建立 label

python
import numpy as np # 將所有資料組合起來 X = np.vstack([type0data, type1data, type2data]) # 建立 label(0, 1, 2) y = np.concatenate([ np.zeros(len(type0data)), np.ones(len(type1data)), np.full(len(type2data), 2) ]) print(X.shape) # (total_samples, 6) print(y.shape) # (total_samples,)

🧼 Step 2:資料標準化(Zero-mean, Unit-std)

python
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # 注意:要先 fit,再 transform

✂️ Step 3:分群 (Train/Test Split)

python
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42, stratify=y)

🔄 Step 4:轉成 PyTorch Dataset

python
import torch from torch.utils.data import TensorDataset, DataLoader # 轉成 tensor X_train_tensor = torch.tensor(X_train, dtype=torch.float32) y_train_tensor = torch.tensor(y_train, dtype=torch.long) X_test_tensor = torch.tensor(X_test, dtype=torch.float32) y_test_tensor = torch.tensor(y_test, dtype=torch.long) # 建立 PyTorch Dataset train_dataset = TensorDataset(X_train_tensor, y_train_tensor) test_dataset = TensorDataset(X_test_tensor, y_test_tensor) # 建立 DataLoader(for batch training) train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=32)

🧠 Step 5:簡單模型訓練範例

python
import torch.nn as nn import torch.optim as optim # 建立簡單 MLP 模型 class SimpleNet(nn.Module): def __init__(self): super(SimpleNet, self).__init__() self.model = nn.Sequential( nn.Linear(6, 32), nn.ReLU(), nn.Linear(32, 3) # 3 類別 ) def forward(self, x): return self.model(x) model = SimpleNet() criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001)

🚀 Step 6:訓練模型

python
for epoch in range(20): model.train() total_loss = 0 for xb, yb in train_loader: pred = model(xb) loss = criterion(pred, yb) optimizer.zero_grad() loss.backward() optimizer.step() total_loss += loss.item() print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}")

🧪 Step 7:測試模型

python
model.eval() correct = 0 total = 0 with torch.no_grad(): for xb, yb in test_loader: pred = model(xb) predicted = torch.argmax(pred, dim=1) correct += (predicted == yb).sum().item() total += yb.size(0) print(f"Accuracy: {correct / total:.2%}")

✅ 小結:

步驟說明
標準化使用 StandardScaler 處理各特徵
分群使用 train_test_split 並保留類別分布
TensorDatasetX, y 包裝成可被 PyTorch 處理的資料集
模型nn.Sequential 建構簡單分類器
訓練搭配 CrossEntropyLoss + Adam 優化器


沒有留言:

張貼留言