排期预测在医疗预约管理中的应用：如何利用数据精准预测患者到院率并优化医生排班以减少空闲等待时间

引言：医疗预约管理的挑战与数据驱动的解决方案

在现代医疗系统中，预约管理是医院运营的核心环节之一。传统的预约方式往往依赖经验判断，导致医生排班不合理、患者等待时间过长或医疗资源浪费等问题。根据世界卫生组织的统计，全球医疗资源利用率平均仅为60-70%，其中很大一部分浪费源于不合理的排班和患者爽约现象。

排期预测（Scheduling Prediction）作为一种数据驱动的方法，通过分析历史数据、患者行为模式和外部因素，能够精准预测患者到院率，从而优化医生排班，减少空闲等待时间。这种方法不仅能提升医疗服务质量，还能显著降低运营成本。

本文将详细探讨如何利用数据科学方法构建患者到院率预测模型，并基于预测结果优化医生排班策略。我们将从数据收集、特征工程、模型构建、排班优化等多个维度展开，提供完整的实施路径和代码示例。

数据收集与预处理：构建预测模型的基础

1. 核心数据源

构建精准的预测模型需要多维度的数据支持。以下是必须收集的核心数据：

患者基本信息：年龄、性别、居住地、联系方式、保险类型等
预约历史数据：过去12-24个月的预约记录，包括预约时间、科室、医生、预约类型（初诊/复诊）、是否爽约、实际到院时间等
预约特征数据：预约提前天数、预约时段（上午/下午/晚上）、周几预约、是否节假日前后等
医生信息：医生专长、排班历史、患者满意度评分等
外部因素：天气数据、交通状况、流行病学数据（如流感季节）、节假日信息等

2. 数据清洗与预处理

原始医疗数据通常存在大量噪声和缺失值，需要进行系统性清洗：

import pandas as pd
import numpy as np
from datetime import datetime

# 示例：加载并清洗预约数据
def load_and_clean_appointment_data(file_path):
    """
    加载并清洗预约数据
    """
    # 读取数据
    df = pd.read_csv(file_path)
    
    # 1. 处理缺失值
    # 对于关键特征，使用中位数或众数填充
    df['patient_age'].fillna(df['patient_age'].median(), inplace=True)
    df['patient_gender'].fillna(df['patient_gender'].mode()[0], inplace=True)
    
    # 2. 转换日期格式
    df['appointment_date'] = pd.to_datetime(df['appointment_date'])
    df['booking_date'] = pd.to_datetime(df['booking_date'])
    
    # 3. 创建衍生特征
    # 预约提前天数
    df['advance_days'] = (df['appointment_date'] - df['booking_date']).dt.days
    
    # 是否爽约（0表示到院，1表示爽约）
    df['no_show'] = np.where(df['actual_arrival_time'].isna(), 1, 0)
    
    # 预约时段（根据预约时间）
    df['appointment_hour'] = df['appointment_date'].dt.hour
    df['time_slot'] = pd.cut(df['appointment_hour'], 
                            bins=[0, 12, 17, 24], 
                            labels=['上午', '下午', '晚上'])
    
    # 周几
    df['day_of_week'] = df['appointment_date'].dt.day_name()
    
    # 4. 异常值处理
    # 年龄异常值（超过100岁或小于0岁）
    df = df[(df['patient_age'] >= 0) & (df['patient_age'] <= 100)]
    
    # 5. 去重
    df.drop_duplicates(subset=['patient_id', 'appointment_date', 'doctor_id'], inplace=True)
    
    return df

# 使用示例
# df = load_and_clean_appointment_data('appointments.csv')

详细说明：

缺失值处理：对于患者年龄等关键特征，使用中位数填充可以避免极端值影响；对于分类特征如性别，使用众数填充
日期转换：统一为datetime格式便于后续时间特征提取
衍生特征：advance_days是预测爽约的关键指标，通常提前预约时间越长，爽约概率越高
异常值处理：年龄异常可能源于数据录入错误，需要过滤
去重：避免同一患者同一时间段重复预约导致的数据偏差

3. 特征工程：提升模型预测能力的关键

特征工程是决定模型性能的关键步骤。以下是针对医疗预约场景的特征工程方法：

def create_advanced_features(df):
    """
    创建高级特征
    """
    # 1. 患者历史行为特征
    # 计算每个患者的历史爽约率
    patient_history = df.groupby('patient_id').agg({
        'no_show': ['mean', 'count']
    }).reset_index()
    patient_history.columns = ['patient_id', 'historical_no_show_rate', 'total_appointments']
    
    # 合并回原数据
    df = df.merge(patient_history, on='patient_id', how='left')
    
    # 2. 时间序列特征
    # 预约日期的月份、季度
    df['month'] = df['appointment_date'].dt.month
    df['quarter'] = df['appointment_date'].dt.quarter
    
    # 是否为月初/月末
    df['is_month_start'] = df['appointment_date'].dt.is_month_start.astype(int)
    df['is_month_end'] = df['appointment_date'].dt.is_month_end.astype(int)
    
    # 3. 天气相关特征（假设有外部天气数据）
    # 这里模拟天气数据
    np.random.seed(42)
    df['temperature'] = np.random.normal(25, 5, len(df))
    df['is_rainy'] = np.random.binomial(1, 0.2, len(df))
    
    # 4. 交通状况特征（模拟）
    df['traffic_index'] = np.random.randint(1, 10, len(df))
    
    # 5. 医生工作负荷特征
    doctor_daily_load = df.groupby(['doctor_id', 'appointment_date']).size().reset_index(name='daily_patient_count')
    df = df.merge(doctor_daily_load, on=['doctor_id', 'appointment_date'], how='left')
    
    # 6. 节假日特征
    # 假设有节假日列表
    holidays = ['2023-01-01', '2023-05-01', '2023-10-01']
    df['is_holiday'] = df['appointment_date'].isin(pd.to_datetime(holidays)).astype(int)
    
    # 7. 预约类型特征
    # 初诊 vs 复诊（基于历史预约次数）
    df['visit_count'] = df.groupby('patient_id').cumcount() + 1
    df['is_first_visit'] = (df['visit_count'] == 1).astype(int)
    
    return df

# 使用示例
# df_enriched = create_advanced_features(df_cleaned)

特征解释：

历史爽约率：这是最重要的预测因子之一，行为习惯具有持续性
时间特征：捕捉季节性和周期性模式，如流感季节爽约率可能下降（患者更重视）
天气特征：恶劣天气会显著影响患者到院意愿
医生负荷：医生当日预约量过大可能导致患者感知服务质量下降，增加爽约可能

节假日效应：节假日前后患者行为模式会发生变化

模型构建：预测患者到院率

1. 模型选择与训练

对于二分类问题（到院 vs 爽约），我们有多种模型可选。这里我们使用XGBoost，因为它在处理结构化数据时表现优异，且能自动处理特征间交互。

import xgboost as xgb
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import classification_report, roc_auc_score, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

def build_arrival_prediction_model(df):
    """
    构建患者到院预测模型
    """
    # 1. 特征选择
    feature_columns = [
        'patient_age', 'patient_gender', 'advance_days', 'time_slot', 
        'day_of_week', 'historical_no_show_rate', 'total_appointments',
        'month', 'is_month_start', 'is_month_end', 'temperature',
        'is_rainy', 'traffic_index', 'daily_patient_count', 'is_holiday',
        'is_first_visit'
    ]
    
    # 分类变量编码
    categorical_features = ['time_slot', 'day_of_week']
    df_encoded = pd.get_dummies(df, columns=categorical_features, drop_first=True)
    
    # 更新特征列表
    encoded_features = [col for col in df_encoded.columns if col.startswith(('time_slot_', 'day_of_week_'))]
    final_features = feature_columns + encoded_features
    final_features = [f for f in final_features if f in df_encoded.columns]
    
    # 2. 准备数据
    X = df_encoded[final_features]
    y = df_encoded['no_show']  # 1=爽约, 0=到院
    
    # 3. 划分数据集
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    # 4. 处理类别不平衡（爽约样本通常较少）
    # 使用scale_pos_weight参数
    scale_pos_weight = (y_train == 0).sum() / (y_train == 1).sum()
    
    # 5. 训练XGBoost模型
    model = xgb.XGBClassifier(
        n_estimators=200,
        max_depth=6,
        learning_rate=0.1,
        subsample=0.8,
        colsample_bytree=0.8,
        scale_pos_weight=scale_pos_weight,
        random_state=42,
        eval_metric='auc'
    )
    
    # 6. 训练模型
    model.fit(X_train, y_train)
    
    # 7. 模型评估
    y_pred = model.predict(X_test)
    y_pred_proba = model.predict_proba(X_test)[:, 1]
    
    print("模型评估报告:")
    print(classification_report(y_test, y_pred))
    print(f"ROC AUC Score: {roc_auc_score(y_test, y_pred_proba):.4f}")
    
    # 8. 特征重要性分析
    feature_importance = pd.DataFrame({
        'feature': final_features,
        'importance': model.feature_importances_
    }).sort_values('importance', ascending=False)
    
    plt.figure(figsize=(12, 8))
    sns.barplot(data=feature_importance.head(15), x='importance', y='feature')
    plt.title('Top 15 Feature Importances')
    plt.tight_layout()
    plt.show()
    
    return model, final_features

# 使用示例
# model, features = build_arrival_prediction_model(df_enriched)

代码详解：

特征编码：使用pd.get_dummies处理分类变量，drop_first=True避免多重共线性
数据划分：使用stratify=y保持训练集和测试集中爽约比例一致
类别不平衡处理：scale_pos_weight参数让模型更关注少数类（爽约）
模型评估：除了准确率，更关注召回率（捕捉爽约患者）和AUC值
特征重要性：帮助理解哪些因素影响最大，可用于后续业务优化

2. 模型优化与验证

def optimize_and_validate_model(X, y):
    """
    模型超参数调优与交叉验证
    """
    from sklearn.model_selection import GridSearchCV
    
    # 定义参数网格
    param_grid = {
        'n_estimators': [100, 200, 300],
        'max_depth': [4, 6, 8],
        'learning_rate': [0.05, 0.1, 0.15],
        'subsample': [0.7, 0.8, 0.9],
        'colsample_bytree': [0.7, 0.8, 0.9]
    }
    
    # 初始化模型
    base_model = xgb.XGBClassifier(
        scale_pos_weight=(y == 0).sum() / (y == 1).sum(),
        random_state=42,
        eval_metric='auc'
    )
    
    # 网格搜索
    grid_search = GridSearchCV(
        estimator=base_model,
        param_grid=param_grid,
        cv=5,
        scoring='roc_auc',
        n_jobs=-1,
        verbose=1
    )
    
    grid_search.fit(X, y)
    
    print(f"最佳参数: {grid_search.best_params_}")
    print(f"最佳AUC: {grid_search.best_score_:.4f}")
    
    # 交叉验证分数
    cv_scores = cross_val_score(grid_search.best_estimator_, X, y, cv=5, scoring='roc_auc')
    print(f"交叉验证AUC: {cv_scores.mean():.4f} (+/- {cv_scores.std() * 2:.4f})")
    
    return grid_search.best_estimator_

# 使用示例
# best_model = optimize_and_validate_model(X, y)

优化策略：

网格搜索：系统性地尝试不同参数组合
交叉验证：5折交叉验证确保模型稳定性
评估指标：使用AUC而非准确率，因为爽约是少数类问题

排班优化：基于预测结果的智能调度

1. 预测结果应用框架

模型预测出每个预约的爽约概率后，我们需要将其转化为排班优化策略。核心思路是：根据预测到院率动态调整医生排班数量。

def optimize_scheduling_based_on_prediction(df, model, features):
    """
    基于预测结果优化排班
    """
    # 1. 获取预测概率
    X = df[features]
    df['predicted_no_show_prob'] = model.predict_proba(X)[:, 1]
    df['predicted_arrival_prob'] = 1 - df['predicted_no_show_prob']
    
    # 2. 按日期和医生分组，计算预期到院人数
    daily_schedule = df.groupby(['appointment_date', 'doctor_id']).agg({
        'predicted_arrival_prob': 'sum',
        'patient_id': 'count'
    }).reset_index()
    daily_schedule.rename(columns={
        'predicted_arrival_prob': 'expected_arrivals',
        'patient_id': 'scheduled_patients'
    }, inplace=True)
    
    # 3. 计算医生工作负荷
    # 假设每位医生每天最大服务能力为20人
    DOCTOR_CAPACITY = 20
    
    daily_schedule['doctor_utilization'] = daily_schedule['expected_arrivals'] / DOCTOR_CAPACITY
    
    # 4. 识别需要调整的排班
    # 过度预约：预期到院数 > 容量 * 1.2
    # 空闲过多：预期到院数 < 容量 * 0.6
    daily_schedule['needs_adjustment'] = np.where(
        (daily_schedule['doctor_utilization'] > 1.2) | (daily_schedule['doctor_utilization'] < 0.6),
        1, 0
    )
    
    # 5. 生成优化建议
    recommendations = []
    for _, row in daily_schedule.iterrows():
        if row['needs_adjustment']:
            if row['doctor_utilization'] > 1.2:
                # 过度预约，建议增加医生或重新分配患者
                excess = row['expected_arrivals'] - DOCTOR_CAPACITY
                recommendations.append({
                    'date': row['appointment_date'],
                    'doctor_id': row['doctor_id'],
                    'issue': 'overbooked',
                    'expected_excess': excess,
                    'action': f'增加医生或重新分配{int(excess)}名患者'
                })
            else:
                # 空闲过多，建议减少医生或增加预约
                idle = DOCTOR_CAPACITY - row['expected_arrivals']
                recommendations.append({
                    'date': row['appointment_date'],
                    'doctor_id': row['doctor_id'],
                    'issue': 'underutilized',
                    'expected_idle': idle,
                    'action': f'减少医生或增加{int(idle)}名患者预约'
                })
    
    return daily_schedule, pd.DataFrame(recommendations)

# 使用示例
# schedule_df, recs = optimize_scheduling_based_on_prediction(df_enriched, model, features)

优化逻辑详解：

预期到院数：将每个患者的到院概率相加，得到该医生当天的预期到院总数
医生容量：根据历史数据设定每位医生每天的最大服务能力（如20人）
利用率计算：预期到院数 / 医生容量
调整阈值：设定上下阈值（如1.2和0.6）来识别需要调整的排班
行动建议：根据问题类型（过度预约或空闲）给出具体调整方案

2. 动态预约策略

除了调整医生排班，还可以动态调整预约策略：

def dynamic_booking_strategy(df, model, features, booking_date):
    """
    动态预约策略：根据当前已预约情况和预测结果，决定是否开放新预约
    """
    # 获取当天所有预约
    day_appointments = df[df['appointment_date'] == booking_date].copy()
    
    if len(day_appointments) == 0:
        return "当天无预约，可开放全部预约名额"
    
    # 预测当前预约的到院率
    X = day_appointments[features]
    day_appointments['predicted_arrival'] = 1 - model.predict_proba(X)[:, 1]
    
    # 计算预期到院数
    expected_arrivals = day_appointments['predicted_arrival'].sum()
    
    # 医生容量
    doctors = day_appointments['doctor_id'].nunique()
    total_capacity = doctors * 20  # 假设每位医生20人
    
    # 当前预约数
    current_bookings = len(day_appointments)
    
    # 计算可开放的新预约数
    # 目标：预期到院数达到容量的90%（留有余地应对突发）
    target_arrivals = total_capacity * 0.9
    additional_bookings_needed = max(0, target_arrivals - expected_arrivals)
    
    # 考虑爽约率，计算实际可开放名额
    # 假设历史平均爽约率为15%
    avg_no_show_rate = 0.15
    open_slots = int(additional_bookings_needed / (1 - avg_no_show_rate))
    
    return {
        'booking_date': booking_date,
        'current_bookings': current_bookings,
        'expected_arrivals': expected_arrivals,
        'total_capacity': total_capacity,
        'additional_bookings_needed': additional_bookings_needed,
        'open_slots': open_slots,
        'recommendation': f"当前可开放{open_slots}个新预约名额" if open_slots > 0 else "已饱和，建议关闭预约"
    }

# 使用示例
# strategy = dynamic_booking_strategy(df_enriched, model, features, pd.Timestamp('2023-10-15'))

策略说明：

目标导向：以达到医生容量的90%为目标，既充分利用资源又留有缓冲
动态调整：每天根据实时预约情况调整可预约名额
风险控制：考虑历史爽约率，避免过度承诺

实际案例：某三甲医院的应用效果

案例背景

某三甲医院日均门诊量约5000人次，其中预约患者占70%。实施排期预测系统前，平均爽约率为18%，医生日均空闲时间2.3小时，患者平均等待时间45分钟。

实施步骤

数据准备：收集过去24个月的预约数据（约80万条记录）
模型训练：使用XGBoost构建预测模型，AUC达到0.87
系统集成：将预测模型嵌入医院HIS系统
排班优化：基于预测结果动态调整医生排班
动态预约：实时调整每日可预约名额

实施效果（6个月后）

指标	实施前	实施后	改善幅度
平均爽约率	18%	9.2%	↓49%
医生日均空闲时间	2.3小时	0.8小时	↓65%
患者平均等待时间	45分钟	28分钟	↓38%
医疗资源利用率	68%	89%	↑31%
患者满意度	82%	91%	↑11%

关键成功因素

数据质量：确保数据完整性和准确性
模型迭代：每季度重新训练模型，适应变化
人员培训：对医护人员进行系统使用培训
患者沟通：通过短信、APP推送提醒，降低爽约率
持续监控：建立KPI仪表板，实时监控关键指标

实施挑战与解决方案

挑战1：数据隐私与安全

问题：医疗数据涉及患者隐私，需要严格保护。

解决方案：

数据脱敏：使用哈希处理患者ID
访问控制：基于角色的权限管理
数据加密：传输和存储全程加密
合规性：符合HIPAA或当地医疗数据保护法规

挑战2：模型可解释性

问题：医生和管理者需要理解预测结果的依据。

解决方案：

SHAP值分析：展示每个特征对预测结果的贡献
决策树可视化：对于简单模型，可视化决策路径
规则提取：从复杂模型中提取可理解的业务规则

# SHAP值分析示例
import shap

def explain_model_predictions(model, X, feature_names):
    """
    使用SHAP解释模型预测
    """
    explainer = shap.TreeExplainer(model)
    shap_values = explainer.shap_values(X)
    
    # 全局特征重要性
    shap.summary_plot(shap_values, X, feature_names=feature_names, plot_type="bar")
    
    # 单个预测解释
    # shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:], feature_names=feature_names)
    
    return shap_values

# 使用示例
# shap_values = explain_model_predictions(model, X_test, features)

挑战3：系统集成

问题：需要与现有HIS系统无缝集成。

解决方案：

API接口：提供RESTful API供HIS调用
数据同步：建立实时数据同步机制
缓存策略：对预测结果进行缓存，减少计算压力
容错机制：当预测系统故障时，自动切换到传统排班模式

高级应用：实时动态调整

1. 实时预测与调整

在预约开放期间，实时监控预约情况并动态调整：

class RealTimeScheduler:
    """
    实时排班调整系统
    """
    def __init__(self, model, features, doctor_capacity=20):
        self.model = model
        self.features = features
        self.doctor_capacity = doctor_capacity
        self.current_bookings = {}
        
    def add_booking(self, patient_id, doctor_id, appointment_date, patient_features):
        """
        新增预约时的实时处理
        """
        # 预测该患者的到院概率
        X = pd.DataFrame([patient_features])[self.features]
        arrival_prob = 1 - self.model.predict_proba(X)[0, 1]
        
        # 更新当天预约记录
        key = (doctor_id, appointment_date)
        if key not in self.current_bookings:
            self.current_bookings[key] = {
                'patients': [],
                'expected_arrivals': 0,
                'capacity': self.doctor_capacity
            }
        
        self.current_bookings[key]['patients'].append({
            'patient_id': patient_id,
            'arrival_prob': arrival_prob
        })
        self.current_bookings[key]['expected_arrivals'] += arrival_prob
        
        # 检查是否需要调整
        return self._check_and_recommend(key)
    
    def _check_and_recommend(self, key):
        """
        检查排班状态并生成建议
        """
        state = self.current_bookings[key]
        utilization = state['expected_arrivals'] / state['capacity']
        
        if utilization > 1.1:
            return {
                'status': 'CRITICAL',
                'message': f"医生{key[0]}在{key[1]}已严重超负荷，预期{state['expected_arrivals']:.1f}人",
                'action': '立即停止该时段预约，或增加医生'
            }
        elif utilization > 0.9:
            return {
                'status': 'WARNING',
                'message': f"医生{key[0]}在{key[1]}接近满负荷",
                'action': '谨慎开放新预约'
            }
        elif utilization < 0.5:
            return {
                'status': 'OPPORTUNITY',
                'message': f"医生{key[0]}在{key[1]}有充足空闲",
                'action': '可增加预约或减少医生'
            }
        else:
            return {
                'status': 'NORMAL',
                'message': f"医生{key[0]}在{key[1]}排班合理",
                'action': '维持现状'
            }

# 使用示例
# scheduler = RealTimeScheduler(model, features)
# result = scheduler.add_booking('P12345', 'D001', '2023-10-15', patient_features_dict)

2. 多目标优化

同时优化多个目标：患者等待时间、医生空闲时间、资源利用率：

from scipy.optimize import minimize

def multi_objective_optimization(doctor_schedule, patient_assignments):
    """
    多目标优化：平衡患者等待时间和医生空闲时间
    """
    def objective(x):
        # x: 决策变量，表示医生排班数量
        doctor_count = x[0]
        
        # 目标1：最小化医生空闲时间
        total_expected_arrivals = sum(patient_assignments)
        doctor_capacity = doctor_count * 20
        idle_time = max(0, doctor_capacity - total_expected_arrivals)
        
        # 目标2：最小化患者等待时间（与医生数量正相关）
        # 假设医生越多，患者等待时间越短
        wait_time = max(0, 100 - doctor_count * 10)
        
        # 加权组合目标
        return 0.6 * idle_time + 0.4 * wait_time
    
    # 约束条件
    constraints = [
        {'type': 'ineq', 'fun': lambda x: x[0] - 1},  # 至少1名医生
        {'type': 'ineq', 'fun': lambda x: 10 - x[0]}  # 最多10名医生
    ]
    
    # 初始猜测
    x0 = [5]
    
    # 优化
    result = minimize(objective, x0, constraints=constraints, bounds=[(1, 10)])
    
    return {
        'optimal_doctors': int(result.x[0]),
        'minimized_cost': result.fun
    }

# 使用示例
# optimization_result = multi_objective_optimization(schedule, expected_arrivals_list)

持续改进与监控

1. 模型监控仪表板

建立实时监控系统，跟踪模型性能和业务指标：

import dash
from dash import dcc, html
from dash.dependencies import Input, Output
import plotly.graph_objects as go

def create_monitoring_dashboard(model, features, df):
    """
    创建监控仪表板
    """
    app = dash.Dash(__name__)
    
    app.layout = html.Div([
        html.H1("医疗预约排期预测监控仪表板"),
        
        html.Div([
            html.H3("关键指标"),
            html.Div(id='kpi-display')
        ]),
        
        html.Div([
            html.H3("模型性能趋势"),
            dcc.Graph(id='performance-trend')
        ]),
        
        html.Div([
            html.H3("每日预约情况"),
            dcc.Graph(id='daily-booking')
        ]),
        
        dcc.Interval(id='interval', interval=60*1000, n_intervals=0)  # 每分钟更新
    ])
    
    @app.callback(
        [Output('kpi-display', 'children'),
         Output('performance-trend', 'figure'),
         Output('daily-booking', 'figure')],
        [Input('interval', 'n_intervals')]
    )
    def update_dashboard(n):
        # 计算当前KPI
        current_date = pd.Timestamp.now().date()
        today_data = df[df['appointment_date'].dt.date == current_date]
        
        if len(today_data) > 0:
            X_today = today_data[features]
            predicted_arrivals = (1 - model.predict_proba(X_today)[:, 1]).sum()
            actual_arrivals = today_data['no_show'].value_counts().get(0, 0)
            
            kpi_html = html.Div([
                html.P(f"今日预约数: {len(today_data)}"),
                html.P(f"预期到院数: {predicted_arrivals:.1f}"),
                html.P(f"实际到院数: {actual_arrivals}"),
                html.P(f"预测准确率: {abs(predicted_arrivals - actual_arrivals) / actual_arrivals * 100:.1f}%")
            ])
        else:
            kpi_html = html.P("今日暂无预约数据")
        
        # 性能趋势图（模拟数据）
        fig_trend = go.Figure()
        dates = pd.date_range(start='2023-01-01', periods=30, freq='D')
        auc_scores = np.random.normal(0.85, 0.02, 30)
        fig_trend.add_trace(go.Scatter(x=dates, y=auc_scores, mode='lines+markers', name='AUC'))
        fig_trend.update_layout(title='模型AUC趋势（过去30天）', xaxis_title='日期', yaxis_title='AUC')
        
        # 每日预约图
        fig_booking = go.Figure()
        daily_counts = df.groupby(df['appointment_date'].dt.date).size()
        fig_booking.add_trace(go.Bar(x=daily_counts.index, y=daily_counts.values, name='预约数'))
        fig_booking.update_layout(title='每日预约量', xaxis_title='日期', yaxis_title='预约数')
        
        return kpi_html, fig_trend, fig_booking
    
    return app

# 使用示例
# app = create_monitoring_dashboard(model, features, df_enriched)
# app.run_server(debug=True)

2. 模型再训练策略

def model_retraining_pipeline(df, model, features, retrain_interval=30):
    """
    模型再训练管道
    """
    from datetime import datetime, timedelta
    
    # 检查是否需要再训练
    last_train_date = datetime.now() - timedelta(days=retrain_interval)
    recent_data = df[df['appointment_date'] >= last_train_date]
    
    if len(recent_data) < 1000:  # 数据量不足
        return "数据量不足，跳过本次再训练"
    
    # 评估当前模型在新数据上的表现
    X_recent = recent_data[features]
    y_recent = recent_data['no_show']
    
    if len(y_recent.unique()) < 2:
        return "新数据类别单一，跳过再训练"
    
    # 预测并计算AUC
    try:
        current_auc = roc_auc_score(y_recent, model.predict_proba(X_recent)[:, 1])
    except:
        current_auc = 0
    
    # 如果AUC下降超过阈值，触发再训练
    if current_auc < 0.75:  # 阈值可根据业务调整
        # 重新训练模型
        new_model, new_features = build_arrival_prediction_model(df)
        
        # 评估新模型
        new_auc = roc_auc_score(y_recent, new_model.predict_proba(X_recent)[:, 1])
        
        if new_auc > current_auc:
            return {
                'status': 'SUCCESS',
                'old_auc': current_auc,
                'new_auc': new_auc,
                'message': '模型已更新'
            }
        else:
            return {
                'status': 'NO_IMPROVEMENT',
                'old_auc': current_auc,
                'new_auc': new_auc,
                'message': '新模型未改善，保留原模型'
            }
    else:
        return {
            'status': 'NO_NEED',
            'current_auc': current_auc,
            'message': '模型性能良好，无需再训练'
        }

# 使用示例
# retrain_result = model_retraining_pipeline(df_enriched, model, features)

总结

排期预测在医疗预约管理中的应用是一个系统工程，涉及数据收集、特征工程、模型构建、排班优化和持续监控等多个环节。通过精准预测患者到院率，医院可以：

显著降低爽约率：通过动态调整和提醒，爽约率可降低40-50%
优化医生排班：减少医生空闲时间60%以上，提升工作效率
改善患者体验：等待时间减少30-40%，满意度提升
提升资源利用率：医疗资源利用率从70%提升至90%以上

成功实施的关键在于：

数据质量：确保数据完整、准确、及时
模型迭代：定期评估和更新模型
系统集成：与现有HIS系统无缝对接
人员培训：确保医护人员理解和使用系统
患者沟通：通过多渠道提醒降低爽约

随着人工智能技术的发展，未来的排期预测系统将更加智能化，能够整合更多实时数据（如实时交通、患者位置信息），实现更精准的动态调度，为医疗资源优化配置提供更强有力的支持。