Handel Class Note AI and ML , ROS : PyTorch LSTM 模型可用的標準化處理

針對這些 DataFrame（df、df_type0/1/2）進行 PyTorch LSTM 模型可用的標準化處理。

下面我會依據 LSTM 模型需求幫你設計這個流程，包括：

資料標準化（Zero-mean, Unit-std）
轉為 NumPy 陣列（每個 sample 為一筆時間序列的特徵向量）
可選：轉為 PyTorch Dataset

✅ Step 1：標準化全體 df（再套用到各類別）

✅ 假設每筆資料是獨立時間點（如 IMU frame）

python
from sklearn.preprocessing import StandardScaler

# 取出特徵欄位 b1 ~ b9（排除 type）
feature_cols = ['b1', 'b2', 'b3', 'b4', 'b5', 'b6', 'b7', 'b8', 'b9']

# 建立標準化器
scaler = StandardScaler()

# 對全體資料 df 做 fit（避免只 fit 某一類產生偏差）
scaler.fit(df[feature_cols])

✅ Step 2：對每個資料集標準化（df / df_type0 / df_type1 / df_type2）

python
# 標準化後存回 DataFrame（保留其他欄位如 type）
df_scaled = df.copy()
df_scaled[feature_cols] = scaler.transform(df[feature_cols])

df_type0_scaled = df_type0.copy()
df_type0_scaled[feature_cols] = scaler.transform(df_type0[feature_cols])

df_type1_scaled = df_type1.copy()
df_type1_scaled[feature_cols] = scaler.transform(df_type1[feature_cols])

df_type2_scaled = df_type2.copy()
df_type2_scaled[feature_cols] = scaler.transform(df_type2[feature_cols])

✅ Step 3：轉成 NumPy 陣列（準備餵給 LSTM）

📌 LSTM 輸入格式為 `(batch_size, seq_len, feature_dim)`

如果每筆資料都是單點（沒有序列），你可以手動 reshape 加 seq_len=1：

python
import numpy as np

# 提取資料
X_all = df_scaled[feature_cols].to_numpy().reshape(-1, 1, len(feature_cols))  # [N, 1, 9]
y_all = df_scaled['type'].to_numpy().astype(int)

對於各類別：

python
X0 = df_type0_scaled[feature_cols].to_numpy().reshape(-1, 1, len(feature_cols))
y0 = df_type0_scaled['type'].to_numpy().astype(int)

X1 = df_type1_scaled[feature_cols].to_numpy().reshape(-1, 1, len(feature_cols))
y1 = df_type1_scaled['type'].to_numpy().astype(int)

X2 = df_type2_scaled[feature_cols].to_numpy().reshape(-1, 1, len(feature_cols))
y2 = df_type2_scaled['type'].to_numpy().astype(int)

✅ Step 4：轉為 PyTorch Dataset（LSTM 可用格式）

python
import torch
from torch.utils.data import TensorDataset, DataLoader

# 將全體轉為 tensor dataset
X_tensor = torch.tensor(X_all, dtype=torch.float32)
y_tensor = torch.tensor(y_all, dtype=torch.long)

dataset_all = TensorDataset(X_tensor, y_tensor)

# 分類別也可建立獨立資料集
dataset0 = TensorDataset(torch.tensor(X0, dtype=torch.float32), torch.tensor(y0, dtype=torch.long))
dataset1 = TensorDataset(torch.tensor(X1, dtype=torch.float32), torch.tensor(y1, dtype=torch.long))
dataset2 = TensorDataset(torch.tensor(X2, dtype=torch.float32), torch.tensor(y2, dtype=torch.long))

📦 DataLoader 建立（批次訓練）

python
loader_all = DataLoader(dataset_all, batch_size=32, shuffle=True)
loader0 = DataLoader(dataset0, batch_size=32, shuffle=True)

✅ LSTM 輸入範例格式（模型）

python
class LSTMClassifier(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.lstm = torch.nn.LSTM(input_dim, hidden_dim, batch_first=True)
        self.fc = torch.nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        _, (hn, _) = self.lstm(x)  # hn: [1, batch, hidden]
        out = self.fc(hn.squeeze(0))
        return out

model = LSTMClassifier(input_dim=9, hidden_dim=32, output_dim=3)

🔍 小提醒：

如果你要處理 時間序列（多筆一組），需先將 DataFrame 按時間組成「序列視窗」，這邊是以每筆為單點（seq_len=1）做範例。
若你有 IMU 連續資料想轉為每 20 筆一個序列，也可用 sliding window 處理。

是否要我再補上「每 type 資料切成 seq_len=100 的滑動視窗」的處理邏輯？

這樣才真正符合 LSTM 使用場景。

Handel Class Note AI and ML , ROS

2025年4月29日星期二

PyTorch LSTM 模型可用的標準化處理

✅ Step 1：標準化全體 df（再套用到各類別）

✅ 假設每筆資料是獨立時間點（如 IMU frame）

✅ Step 2：對每個資料集標準化（df / df_type0 / df_type1 / df_type2）

✅ Step 3：轉成 NumPy 陣列（準備餵給 LSTM）

📌 LSTM 輸入格式為 `(batch_size, seq_len, feature_dim)`

✅ Step 4：轉為 PyTorch Dataset（LSTM 可用格式）

📦 DataLoader 建立（批次訓練）

✅ LSTM 輸入範例格式（模型）

🔍 小提醒：

沒有留言:

張貼留言

2025年4月29日 星期二

PyTorch LSTM 模型可用的標準化處理

✅ Step 1：標準化全體 df（再套用到各類別）

✅ 假設每筆資料是獨立時間點（如 IMU frame）

✅ Step 2：對每個資料集標準化（df / df_type0 / df_type1 / df_type2）

✅ Step 3：轉成 NumPy 陣列（準備餵給 LSTM）

📌 LSTM 輸入格式為 (batch_size, seq_len, feature_dim)

✅ Step 4：轉為 PyTorch Dataset（LSTM 可用格式）

📦 DataLoader 建立（批次訓練）

✅ LSTM 輸入範例格式（模型）

🔍 小提醒：

沒有留言:

張貼留言

2025年4月29日星期二

📌 LSTM 輸入格式為 `(batch_size, seq_len, feature_dim)`