GenAI 모델 트레이닝 너머 : 실전 환경에서 AI 추론 워크로드의 비용 및 레이턴시의 삭감과 확장성의 향상

Beyond GenAI Model Training: Reducing Cost and Latency and Improving Scalability of AI Inferencing Workloads in Production

발행일: 2025년 09월 | 리서치사:

IDC | 페이지 정보: 영문 18 Pages | 배송안내 : 즉시배송

샘플 요청 목록에 추가

※ 본 상품은 영문 자료로 한글과 영문 목차에 불일치하는 내용이 있을 경우 영문을 우선합니다. 정확한 검토를 위해 영문 목차를 참고해주시기 바랍니다.

IDC Perspective는 생성형 AI(GenAI) 추론 워크로드를 실전 환경에서 확장할 때의 과제와 혁신을 탐구하고, 비용 삭감, 레이턴시 개선, 확장성에 중점을 두고 있습니다. 추론 퍼포먼스를 최적화하기 위한 모델 압축, 배치처리, 캐시, 병렬화 등의 방법에 대해서도 중점적으로 다루고 있습니다. AWS, DeepSeek, Google, IBM, Microsoft, NVIDIA, Red Hat, Snowflake, WRITER 등의 벤더는 GenAI 추론 효율성과 지속가능성을 높이기 위한 기술 혁신을 추진하고 있습니다. 본 문서는 조직이 추론 전략을 사용 사례에 맞추어 조정하고, 정기적으로 비용을 재검토하고, 전문가와 제휴하는 것으로 신뢰성과 확장성이 뛰어난 AI 도입을 실현하도록 어드바이스하고 있습니다. "AI 추론의 최적화는 단순히 속도 문제가 아닙니다. 비용, 확장성, 지속 가능성 간의 균형을 설계하여 혁신과 비즈니스 영향이 만나는 생산 환경에서 생성형 AI의 잠재력을 실현하는 것입니다."라고 IDC의 AI 소프트웨어 리서치 디렉터 Kathy Lange는 말했습니다.

이그제큐티브 스냅숏

상황 개요

AI 추론이란 무엇인가? 왜 중요한 것인가?
효율적인 AI 추론에 대한 수요 증가
- GenAI 추론 인프라 스택
- GenAI 추론 퍼포먼스에 영향을 미치는 요인
  - 모델 압축 기술
  - 데이터 배치처리 기술
  - 캐시와 기억 테크닉
  - 효율적인 데이터 로딩과 사전 처리
  - 입력과 출력 사이즈를 축소한다.
    - 병렬화
    - 모델 라우팅
    - 가장 효과적이라고 생각되는 소프트웨어 플랫폼 최적화 방법은 어떤 것인가?
    - 테스트 시계산(추론 시계산이라고도 불림)
    - 새로운 조사 분야
- 테크놀러지 공급업체 혁신

테크놀러지 구입자에 대한 어드바이스

참고 자료

관련 조사
요약

KSA 25.10.01

The IDC Perspective explores the challenges and innovations in scaling generative AI (GenAI) inference workloads in production, emphasizing cost reduction, latency improvement, and scalability. It highlights techniques like model compression, batching, caching, and parallelization to optimize inference performance. Vendors such as AWS, DeepSeek, Google, IBM, Microsoft, NVIDIA, Red Hat, Snowflake, and WRITER are driving advancements to enhance GenAI inference efficiency and sustainability. The document advises organizations to align inference strategies with use cases, regularly review costs, and partner with experts to ensure reliable, scalable AI deployment."Optimizing AI inference isn't just about speed," says Kathy Lange, research director, AI Software, IDC. "It's about engineering the trade-offs between cost, scalability, and sustainability to unlock the potential of generative AI in production, where innovation meets business impact."

Executive Snapshot

Situation Overview

What Is AI Inference, and Why Is It Important?
Growing Demand for Efficient AI Inference
- The GenAI Inference Infrastructure Stack
- Factors That Influence GenAI Inference Performance
  - Model Compression Techniques
  - Data Batching Techniques
  - Caching and Memorization Techniques
  - Efficient Data Loading and Preprocessing
  - Reducing Input and Output Sizes
    - Parallelization
    - Model Routing
    - Which Software Platform Optimization Techniques Are Considered Most Effective?
    - Test-Time Compute (aka Inference-Time Compute)
    - An Emerging Field of Research
- Technology Supplier Innovation

Advice for the Technology Buyer

Learn More

Related Research
Synopsis

02-2025-2992
(주말 및 공휴일 제외)

라이선스 / 가격

※ 부가세 별도