Data Curation for AIOps: The Key to Smarter Network Operations

data curation for AIOps
Share this post

Interest in AIOps is rising across the IT and communications sector. A forecast from Mordor Intelligence shows the AIOps market could grow from 27.24 billion dollars in 2024 to 79.91 billion dollars by 2029. As networks expand, operators face new challenges. Today’s 5G systems generate huge amounts of data. A network with ten million users can create up to nine petabytes of data every day. Without proper structure, this information becomes difficult to analyze in real time.

Historically, operators have stored large volumes of raw data in data lakes. However, this information often arrives in non-standard formats. As a result, teams spend significant time and resources identifying what is relevant. Raw packet data and CDRs are often too detailed and noisy for immediate AI analysis. Although AI tools can handle large datasets, their output depends entirely on the quality of the data used. Processing raw data at scale also becomes costly.

Transforming Data at the Source

To achieve true AI-driven operations, operators need smart monitoring and strong data curation practices. They also require efficient pipelines that prepare data across RAN, Core, Transport and MEC layers. These pipelines should normalize, enrich and label information at the source. This reduces noise and improves the quality of data sent to the AIOps environment.

Pipelines can also enforce data protection rules by removing or anonymizing sensitive fields. Telecom expertise remains crucial. Domain specialists must identify the right data, validate it and label it for model training. Human engineers still design, train and refine AI models. Their feedback corrects errors and ensures accurate results.

Tokenization: Moving From Raw to Curated Data

A single raw event record in a 5G network can contain up to 180 tokens. Through proper curation, this number can be reduced to about 25 tokens. This represents an 85 percent reduction. It also reduces GPU use and cuts processing costs, especially in cloud environments such as AWS Bedrock.

Curation not only saves compute resources but also improves result quality. It reduces the need for large data lake storage while maintaining analytical value. After curation, operators can merge the data with subscriber demographics, infrastructure metrics, geospatial information and even social media insights. With the right domain knowledge, AI models can generate a complete view of network conditions and user experience.

Packet-Level Precision for Better Insights

Curated datasets support specific use cases. Deep Packet Inspection (DPI) gives precise visibility into what was sent, when events occurred and how each system responded. By combining this with control plane metadata and identifiers such as IMSI or SUPI, operators can measure performance at cell, slice, device or subscriber levels. This precision helps train AI models with a clear understanding of network behavior.

ALSO READ: Africa Rewrites Its Digital Narrative in the Modern Age

Curated data also provides high-value insights with low volume. A single curated feed can be only one hundredth the size of the raw data. Despite the smaller size, it retains maximum value for analysis. This supports better SLA management, higher NPS scores and improved customer experience.

Reducing the Cost of AI Operations

NETSCOUT’s Omnis AI Streamer was developed to deliver curated, high-fidelity metadata from packet flows. It helps operators detect operational trends, automate analysis and identify risks that may lead to outages or breaches. These curated feeds can be configured for multiple use cases.

In practice, the AI Streamer has reduced data volumes by up to 93 percent. It also decreases GPU memory needs, increases processing speed and delivers more throughput with fewer GPU instances. Operators can create playbooks that set feed schedules, key metrics and filters to ensure that only necessary curated data reaches AIOps engines.

For example, QUIC latency metrics can be aggregated to monitor premium 5G slices for YouTube users. Any issue can be traced to specific cells or nodes. This supports precise troubleshooting with minimal data overhead.

User Plane Data is also useful. Fields such as TEID, QoS Flow ID, IP addresses, latency and application signatures can support SLA breach detection, QoE estimation and application-level monitoring. Credit of this story goes to Mobile world live

Conclusion: Why Curated Data Matters

High-quality and low-volume curated data is essential for AIOps success. When analytics and filtering start at the source, operators can deliver “gold-standard” data into AI pipelines. This approach aligns with the TM Forum’s autonomous network framework and supports new revenue opportunities.

Through improved service quality, early fault detection and better security, curated data unlocks the full value of AIOps and moves operators closer to true automation.


Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *