R1-2409743
discussion
AI/ML for CSI compression
From Intel
Summary
Intel analyzes the impact of data distribution mismatch on AI/ML-based CSI compression performance, highlighting asymmetric performance losses between different subarray configurations and model complexities. The document presents three proposals addressing the careful selection of synthetic data parameters for Direction C, the specification of UE-side pre-processing to ensure distribution alignment, and the requirement that AI/ML complexity increases be commensurate with performance gains.
Position
Intel presents technical evidence that performance loss due to data distribution mismatch is asymmetric and dependent on model complexity, specifically noting that training on 1x1 subarray data and testing on 4x1 subarray data leads to >10% SGCS loss, whereas the reverse causes only marginal loss. They propose that for Direction C, RAN1 must carefully select synthetic data generation parameters to avoid significant inference performance degradation. Intel argues that UE-side pre-processing aspects, including SVD vector calculation and phase/amplitude normalization, should be specified to guarantee no data distribution mismatch between training and inference. Furthermore, they require that any increase in AI/ML model complexity and power consumption relative to conventional PMI-based approaches be commensurate with the realized performance gains.
Key proposals
- Proposal 1 (Discussion): For Direction C, RAN1 should carefully select scenarios and parameters for synthetic data generation to avoid significant performance loss due to misaligned data distribution in inference.
- Proposal 1 (Discussion): The dependency of performance loss on AI/ML model complexity should be investigated when evaluating misalignment in data distributions for training and inference.
- Proposal 2 (Discussion): RAN1 should assume that aspects related to data pre-processing, such as SVD vector calculation approach and phase/amplitude normalization, can be specified to ensure no mismatch of data distributions for training and inference.
- Proposal 3 (Discussion): For the specification of reference AI/ML models, the increase in complexity and associated power consumption compared to conventional PMI-based approaches should be commensurate with the performance gains.
- Observation 1 (Discussion): Performance loss for training on Dataset S and testing on Dataset B can be significantly different than training on Dataset B and testing on Dataset S, particularly when comparing 4x1 and 1x1 subarray configurations.
- Observation 1 (Discussion): Performance loss due to mismatch of data distributions for training and inference depends on AI/ML model complexity, with lower complexity models showing slightly larger losses.
- Proposal 1 (Conclusion): RAN1 should consider the careful selection of scenarios and parameters for synthetic data generation in Direction C to mitigate performance loss from misaligned data distributions.
- Proposal 2 (Conclusion): It is assumed that data pre-processing aspects like SVD vector calculation and normalization can be specified to eliminate data distribution mismatch between training and inference.
- Proposal 3 (Conclusion): The specification of reference AI/ML models must ensure that complexity and power consumption increases are justified by corresponding performance improvements over conventional methods.