forked from asterisk/asterisk
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrfc3951.txt
10867 lines (7701 loc) · 365 KB
/
rfc3951.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Network Working Group S. Andersen
Request for Comments: 3951 Aalborg University
Category: Experimental A. Duric
Telio
H. Astrom
R. Hagen
W. Kleijn
J. Linden
Global IP Sound
December 2004
Internet Low Bit Rate Codec (iLBC)
Status of this Memo
This memo defines an Experimental Protocol for the Internet
community. It does not specify an Internet standard of any kind.
Discussion and suggestions for improvement are requested.
Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2004).
Abstract
This document specifies a speech codec suitable for robust voice
communication over IP. The codec is developed by Global IP Sound
(GIPS). It is designed for narrow band speech and results in a
payload bit rate of 13.33 kbit/s for 30 ms frames and 15.20 kbit/s
for 20 ms frames. The codec enables graceful speech quality
degradation in the case of lost frames, which occurs in connection
with lost or delayed IP packets.
Andersen, et al. Experimental [Page 1]
RFC 3951 Internet Low Bit Rate Codec December 2004
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Outline of the Codec . . . . . . . . . . . . . . . . . . . . . 5
2.1. Encoder. . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2. Decoder. . . . . . . . . . . . . . . . . . . . . . . . . 7
3. Encoder Principles . . . . . . . . . . . . . . . . . . . . . . 7
3.1. Pre-processing . . . . . . . . . . . . . . . . . . . . . 9
3.2. LPC Analysis and Quantization. . . . . . . . . . . . . . 9
3.2.1. Computation of Autocorrelation Coefficients. . . 10
3.2.2. Computation of LPC Coefficients. . . . . . . . . 11
3.2.3. Computation of LSF Coefficients from LPC
Coefficients . . . . . . . . . . . . . . . . . . 11
3.2.4. Quantization of LSF Coefficients . . . . . . . . 12
3.2.5. Stability Check of LSF Coefficients. . . . . . . 13
3.2.6. Interpolation of LSF Coefficients. . . . . . . . 13
3.2.7. LPC Analysis and Quantization for 20 ms Frames . 14
3.3. Calculation of the Residual. . . . . . . . . . . . . . . 15
3.4. Perceptual Weighting Filter. . . . . . . . . . . . . . . 15
3.5. Start State Encoder. . . . . . . . . . . . . . . . . . . 15
3.5.1. Start State Estimation . . . . . . . . . . . . . 16
3.5.2. All-Pass Filtering and Scale Quantization. . . . 17
3.5.3. Scalar Quantization. . . . . . . . . . . . . . . 18
3.6. Encoding the Remaining Samples . . . . . . . . . . . . . 19
3.6.1. Codebook Memory. . . . . . . . . . . . . . . . . 20
3.6.2. Perceptual Weighting of Codebook Memory
and Target . . . . . . . . . . . . . . . . . . . 22
3.6.3. Codebook Creation. . . . . . . . . . . . . . . . 23
3.6.3.1. Creation of a Base Codebook . . . . . . 23
3.6.3.2. Codebook Expansion. . . . . . . . . . . 24
3.6.3.3. Codebook Augmentation . . . . . . . . . 24
3.6.4. Codebook Search. . . . . . . . . . . . . . . . . 26
3.6.4.1. Codebook Search at Each Stage . . . . . 26
3.6.4.2. Gain Quantization at Each Stage . . . . 27
3.6.4.3. Preparation of Target for Next Stage. . 28
3.7. Gain Correction Encoding . . . . . . . . . . . . . . . . 28
3.8. Bitstream Definition . . . . . . . . . . . . . . . . . . 29
4. Decoder Principles . . . . . . . . . . . . . . . . . . . . . . 32
4.1. LPC Filter Reconstruction. . . . . . . . . . . . . . . . 33
4.2. Start State Reconstruction . . . . . . . . . . . . . . . 33
4.3. Excitation Decoding Loop . . . . . . . . . . . . . . . . 34
4.4. Multistage Adaptive Codebook Decoding. . . . . . . . . . 35
4.4.1. Construction of the Decoded Excitation Signal. . 35
4.5. Packet Loss Concealment. . . . . . . . . . . . . . . . . 35
4.5.1. Block Received Correctly and Previous Block
Also Received. . . . . . . . . . . . . . . . . . 35
4.5.2. Block Not Received . . . . . . . . . . . . . . . 36
Andersen, et al. Experimental [Page 2]
RFC 3951 Internet Low Bit Rate Codec December 2004
4.5.3. Block Received Correctly When Previous Block
Not Received . . . . . . . . . . . . . . . . . . 36
4.6. Enhancement. . . . . . . . . . . . . . . . . . . . . . . 37
4.6.1. Estimating the Pitch . . . . . . . . . . . . . . 39
4.6.2. Determination of the Pitch-Synchronous
Sequences. . . . . . . . . . . . . . . . . . . . 39
4.6.3. Calculation of the Smoothed Excitation . . . . . 41
4.6.4. Enhancer Criterion . . . . . . . . . . . . . . . 41
4.6.5. Enhancing the Excitation . . . . . . . . . . . . 42
4.7. Synthesis Filtering. . . . . . . . . . . . . . . . . . . 43
4.8. Post Filtering . . . . . . . . . . . . . . . . . . . . . 43
5. Security Considerations. . . . . . . . . . . . . . . . . . . . 43
6. Evaluation of the iLBC Implementations . . . . . . . . . . . . 43
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.1. Normative References . . . . . . . . . . . . . . . . . . 43
7.2. Informative References . . . . . . . . . . . . . . . . . 44
8. ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . 44
APPENDIX A: Reference Implementation . . . . . . . . . . . . . . . 45
A.1. iLBC_test.c. . . . . . . . . . . . . . . . . . . . . . . 46
A.2 iLBC_encode.h. . . . . . . . . . . . . . . . . . . . . . 52
A.3. iLBC_encode.c. . . . . . . . . . . . . . . . . . . . . . 53
A.4. iLBC_decode.h. . . . . . . . . . . . . . . . . . . . . . 63
A.5. iLBC_decode.c. . . . . . . . . . . . . . . . . . . . . . 64
A.6. iLBC_define.h. . . . . . . . . . . . . . . . . . . . . . 76
A.7. constants.h. . . . . . . . . . . . . . . . . . . . . . . 80
A.8. constants.c. . . . . . . . . . . . . . . . . . . . . . . 82
A.9. anaFilter.h. . . . . . . . . . . . . . . . . . . . . . . 96
A.10. anaFilter.c. . . . . . . . . . . . . . . . . . . . . . . 97
A.11. createCB.h . . . . . . . . . . . . . . . . . . . . . . . 98
A.12. createCB.c . . . . . . . . . . . . . . . . . . . . . . . 99
A.13. doCPLC.h . . . . . . . . . . . . . . . . . . . . . . . .104
A.14. doCPLC.c . . . . . . . . . . . . . . . . . . . . . . . .104
A.15. enhancer.h . . . . . . . . . . . . . . . . . . . . . . .109
A.16. enhancer.c . . . . . . . . . . . . . . . . . . . . . . .110
A.17. filter.h . . . . . . . . . . . . . . . . . . . . . . . .123
A.18. filter.c . . . . . . . . . . . . . . . . . . . . . . . .125
A.19. FrameClassify.h. . . . . . . . . . . . . . . . . . . . .128
A.20. FrameClassify.c. . . . . . . . . . . . . . . . . . . . .129
A.21. gainquant.h. . . . . . . . . . . . . . . . . . . . . . .131
A.22. gainquant.c. . . . . . . . . . . . . . . . . . . . . . .131
A.23. getCBvec.h . . . . . . . . . . . . . . . . . . . . . . .134
A.24. getCBvec.c . . . . . . . . . . . . . . . . . . . . . . .134
A.25. helpfun.h. . . . . . . . . . . . . . . . . . . . . . . .138
A.26. helpfun.c. . . . . . . . . . . . . . . . . . . . . . . .140
A.27. hpInput.h. . . . . . . . . . . . . . . . . . . . . . . .146
A.28. hpInput.c. . . . . . . . . . . . . . . . . . . . . . . .146
A.29. hpOutput.h . . . . . . . . . . . . . . . . . . . . . . .148
A.30. hpOutput.c . . . . . . . . . . . . . . . . . . . . . . .148
Andersen, et al. Experimental [Page 3]
RFC 3951 Internet Low Bit Rate Codec December 2004
A.31. iCBConstruct.h . . . . . . . . . . . . . . . . . . . . .149
A.32. iCBConstruct.c . . . . . . . . . . . . . . . . . . . . .150
A.33. iCBSearch.h. . . . . . . . . . . . . . . . . . . . . . .152
A.34. iCBSearch.c. . . . . . . . . . . . . . . . . . . . . . .153
A.35. LPCdecode.h. . . . . . . . . . . . . . . . . . . . . . .163
A.36. LPCdecode.c. . . . . . . . . . . . . . . . . . . . . . .164
A.37. LPCencode.h. . . . . . . . . . . . . . . . . . . . . . .167
A.38. LPCencode.c. . . . . . . . . . . . . . . . . . . . . . .167
A.39. lsf.h. . . . . . . . . . . . . . . . . . . . . . . . . .172
A.40. lsf.c. . . . . . . . . . . . . . . . . . . . . . . . . .172
A.41. packing.h. . . . . . . . . . . . . . . . . . . . . . . .178
A.42. packing.c. . . . . . . . . . . . . . . . . . . . . . . .179
A.43. StateConstructW.h. . . . . . . . . . . . . . . . . . . .182
A.44. StateConstructW.c. . . . . . . . . . . . . . . . . . . .183
A.45. StateSearchW.h . . . . . . . . . . . . . . . . . . . . .185
A.46. StateSearchW.c . . . . . . . . . . . . . . . . . . . . .186
A.47. syntFilter.h . . . . . . . . . . . . . . . . . . . . . .190
A.48. syntFilter.c . . . . . . . . . . . . . . . . . . . . . .190
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .192
Full Copyright Statement . . . . . . . . . . . . . . . . . . . . .194
1. Introduction
This document contains the description of an algorithm for the coding
of speech signals sampled at 8 kHz. The algorithm, called iLBC, uses
a block-independent linear-predictive coding (LPC) algorithm and has
support for two basic frame lengths: 20 ms at 15.2 kbit/s and 30 ms
at 13.33 kbit/s. When the codec operates at block lengths of 20 ms,
it produces 304 bits per block, which SHOULD be packetized as in [1].
Similarly, for block lengths of 30 ms it produces 400 bits per block,
which SHOULD be packetized as in [1]. The two modes for the
different frame sizes operate in a very similar way. When they
differ it is explicitly stated in the text, usually with the notation
x/y, where x refers to the 20 ms mode and y refers to the 30 ms mode.
The described algorithm results in a speech coding system with a
controlled response to packet losses similar to what is known from
pulse code modulation (PCM) with packet loss concealment (PLC), such
as the ITU-T G.711 standard [4], which operates at a fixed bit rate
of 64 kbit/s. At the same time, the described algorithm enables
fixed bit rate coding with a quality-versus-bit rate tradeoff close
to state-of-the-art. A suitable RTP payload format for the iLBC
codec is specified in [1].
Some of the applications for which this coder is suitable are real
time communications such as telephony and videoconferencing,
streaming audio, archival, and messaging.
Andersen, et al. Experimental [Page 4]
RFC 3951 Internet Low Bit Rate Codec December 2004
Cable Television Laboratories (CableLabs(R)) has adopted iLBC as a
mandatory PacketCable(TM) audio codec standard for VoIP over Cable
applications [3].
This document is organized as follows. Section 2 gives a brief
outline of the codec. The specific encoder and decoder algorithms
are explained in sections 3 and 4, respectively. Appendix A provides
a c-code reference implementation.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119 [2].
2. Outline of the Codec
The codec consists of an encoder and a decoder as described in
sections 2.1 and 2.2, respectively.
The essence of the codec is LPC and block-based coding of the LPC
residual signal. For each 160/240 (20 ms/30 ms) sample block, the
following major steps are performed: A set of LPC filters are
computed, and the speech signal is filtered through them to produce
the residual signal. The codec uses scalar quantization of the
dominant part, in terms of energy, of the residual signal for the
block. The dominant state is of length 57/58 (20 ms/30 ms) samples
and forms a start state for dynamic codebooks constructed from the
already coded parts of the residual signal. These dynamic codebooks
are used to code the remaining parts of the residual signal. By this
method, coding independence between blocks is achieved, resulting in
elimination of propagation of perceptual degradations due to packet
loss. The method facilitates high-quality packet loss concealment
(PLC).
2.1. Encoder
The input to the encoder SHOULD be 16 bit uniform PCM sampled at 8
kHz. It SHOULD be partitioned into blocks of BLOCKL=160/240 samples
for the 20/30 ms frame size. Each block is divided into NSUB=4/6
consecutive sub-blocks of SUBL=40 samples each. For 30 ms frame
size, the encoder performs two LPC_FILTERORDER=10 linear-predictive
coding (LPC) analyses. The first analysis applies a smooth window
centered over the second sub-block and extending to the middle of the
fifth sub-block. The second LPC analysis applies a smooth asymmetric
window centered over the fifth sub-block and extending to the end of
the sixth sub-block. For 20 ms frame size, one LPC_FILTERORDER=10
linear-predictive coding (LPC) analysis is performed with a smooth
window centered over the third sub-frame.
Andersen, et al. Experimental [Page 5]
RFC 3951 Internet Low Bit Rate Codec December 2004
For each of the LPC analyses, a set of line-spectral frequencies
(LSFs) are obtained, quantized, and interpolated to obtain LSF
coefficients for each sub-block. Subsequently, the LPC residual is
computed by using the quantized and interpolated LPC analysis
filters.
The two consecutive sub-blocks of the residual exhibiting the maximal
weighted energy are identified. Within these two sub-blocks, the
start state (segment) is selected from two choices: the first 57/58
samples or the last 57/58 samples of the two consecutive sub-blocks.
The selected segment is the one of higher energy. The start state is
encoded with scalar quantization.
A dynamic codebook encoding procedure is used to encode 1) the 23/22
(20 ms/30 ms) remaining samples in the two sub-blocks containing the
start state; 2) the sub-blocks after the start state in time; and 3)
the sub-blocks before the start state in time. Thus, the encoding
target can be either the 23/22 samples remaining of the two sub-
blocks containing the start state or a 40-sample sub-block. This
target can consist of samples indexed forward in time or backward in
time, depending on the location of the start state.
The codebook coding is based on an adaptive codebook built from a
codebook memory that contains decoded LPC excitation samples from the
already encoded part of the block. These samples are indexed in the
same time direction as the target vector, ending at the sample
instant prior to the first sample instant represented in the target
vector. The codebook is used in CB_NSTAGES=3 stages in a successive
refinement approach, and the resulting three code vector gains are
encoded with 5-, 4-, and 3-bit scalar quantization, respectively.
The codebook search method employs noise shaping derived from the LPC
filters, and the main decision criterion is to minimize the squared
error between the target vector and the code vectors. Each code
vector in this codebook comes from one of CB_EXPAND=2 codebook
sections. The first section is filled with delayed, already encoded
residual vectors. The code vectors of the second codebook section
are constructed by predefined linear combinations of vectors in the
first section of the codebook.
As codebook encoding with squared-error matching is known to produce
a coded signal of less power than does the scalar quantized start
state signal, a gain re-scaling method is implemented by a refined
search for a better set of codebook gains in terms of power matching
after encoding. This is done by searching for a higher value of the
gain factor for the first stage codebook, as the subsequent stage
codebook gains are scaled by the first stage gain.
Andersen, et al. Experimental [Page 6]
RFC 3951 Internet Low Bit Rate Codec December 2004
2.2. Decoder
Typically for packet communications, a jitter buffer placed at the
receiving end decides whether the packet containing an encoded signal
block has been received or lost. This logic is not part of the codec
described here. For each encoded signal block received the decoder
performs a decoding. For each lost signal block, the decoder
performs a PLC operation.
The decoding for each block starts by decoding and interpolating the
LPC coefficients. Subsequently the start state is decoded.
For codebook-encoded segments, each segment is decoded by
constructing the three code vectors given by the received codebook
indices in the same way that the code vectors were constructed in the
encoder. The three gain factors are also decoded and the resulting
decoded signal is given by the sum of the three codebook vectors
scaled with respective gain.
An enhancement algorithm is applied to the reconstructed excitation
signal. This enhancement augments the periodicity of voiced speech
regions. The enhancement is optimized under the constraint that the
modification signal (defined as the difference between the enhanced
excitation and the excitation signal prior to enhancement) has a
short-time energy that does not exceed a preset fraction of the
short-time energy of the excitation signal prior to enhancement.
A packet loss concealment (PLC) operation is easily embedded in the
decoder. The PLC operation can, e.g., be based on repeating LPC
filters and obtaining the LPC residual signal by using a long-term
prediction estimate from previous residual blocks.
3. Encoder Principles
The following block diagram is an overview of all the components of
the iLBC encoding procedure. The description of the blocks contains
references to the section where that particular procedure is further
described.
Andersen, et al. Experimental [Page 7]
RFC 3951 Internet Low Bit Rate Codec December 2004
+-----------+ +---------+ +---------+
speech -> | 1. Pre P | -> | 2. LPC | -> | 3. Ana | ->
+-----------+ +---------+ +---------+
+---------------+ +--------------+
-> | 4. Start Sel | ->| 5. Scalar Qu | ->
+---------------+ +--------------+
+--------------+ +---------------+
-> |6. CB Search | -> | 7. Packetize | -> payload
| +--------------+ | +---------------+
----<---------<------
sub-frame 0..2/4 (20 ms/30 ms)
Figure 3.1. Flow chart of the iLBC encoder
1. Pre-process speech with a HP filter, if needed (section 3.1).
2. Compute LPC parameters, quantize, and interpolate (section 3.2).
3. Use analysis filters on speech to compute residual (section 3.3).
4. Select position of 57/58-sample start state (section 3.5).
5. Quantize the 57/58-sample start state with scalar quantization
(section 3.5).
6. Search the codebook for each sub-frame. Start with 23/22 sample
block, then encode sub-blocks forward in time, and then encode
sub-blocks backward in time. For each block, the steps in Figure
3.4 are performed (section 3.6).
7. Packetize the bits into the payload specified in Table 3.2.
The input to the encoder SHOULD be 16-bit uniform PCM sampled at 8
kHz. Also it SHOULD be partitioned into blocks of BLOCKL=160/240
samples. Each block input to the encoder is divided into NSUB=4/6
consecutive sub-blocks of SUBL=40 samples each.
Andersen, et al. Experimental [Page 8]
RFC 3951 Internet Low Bit Rate Codec December 2004
0 39 79 119 159
+---------------------------------------+
| 1 | 2 | 3 | 4 |
+---------------------------------------+
20 ms frame
0 39 79 119 159 199 239
+-----------------------------------------------------------+
| 1 | 2 | 3 | 4 | 5 | 6 |
+-----------------------------------------------------------+
30 ms frame
Figure 3.2. One input block to the encoder for 20 ms (with four sub-
frames) and 30 ms (with six sub-frames).
3.1. Pre-processing
In some applications, the recorded speech signal contains DC level
and/or 50/60 Hz noise. If these components have not been removed
prior to the encoder call, they should be removed by a high-pass
filter. A reference implementation of this, using a filter with a
cutoff frequency of 90 Hz, can be found in Appendix A.28.
3.2. LPC Analysis and Quantization
The input to the LPC analysis module is a possibly high-pass filtered
speech buffer, speech_hp, that contains 240/300 (LPC_LOOKBACK +
BLOCKL = 80/60 + 160/240 = 240/300) speech samples, where samples 0
through 79/59 are from the previous block and samples 80/60 through
239/299 are from the current block. No look-ahead into the next
block is used. For the very first block processed, the look-back
samples are assumed to be zeros.
For each input block, the LPC analysis calculates one/two set(s) of
LPC_FILTERORDER=10 LPC filter coefficients using the autocorrelation
method and the Levinson-Durbin recursion. These coefficients are
converted to the Line Spectrum Frequency representation. In the 20
ms case, the single lsf set represents the spectral characteristics
as measured at the center of the third sub-block. For 30 ms frames,
the first set, lsf1, represents the spectral properties of the input
signal at the center of the second sub-block, and the other set,
lsf2, represents the spectral characteristics as measured at the
center of the fifth sub-block. The details of the computation for 30
ms frames are described in sections 3.2.1 through 3.2.6. Section
3.2.7 explains how the LPC Analysis and Quantization differs for 20
ms frames.
Andersen, et al. Experimental [Page 9]
RFC 3951 Internet Low Bit Rate Codec December 2004
3.2.1. Computation of Autocorrelation Coefficients
The first step in the LPC analysis procedure is to calculate
autocorrelation coefficients by using windowed speech samples. This
windowing is the only difference in the LPC analysis procedure for
the two sets of coefficients. For the first set, a 240-sample-long
standard symmetric Hanning window is applied to samples 0 through 239
of the input data. The first window, lpc_winTbl, is defined as
lpc_winTbl[i]= 0.5 * (1.0 - cos((2*PI*(i+1))/(BLOCKL+1)));
i=0,...,119
lpc_winTbl[i] = winTbl[BLOCKL - i - 1]; i=120,...,239
The windowed speech speech_hp_win1 is then obtained by multiplying
the first 240 samples of the input speech buffer with the window
coefficients:
speech_hp_win1[i] = speech_hp[i] * lpc_winTbl[i];
i=0,...,BLOCKL-1
From these 240 windowed speech samples, 11 (LPC_FILTERORDER + 1)
autocorrelation coefficients, acf1, are calculated:
acf1[lag] += speech_hp_win1[n] * speech_hp_win1[n + lag];
lag=0,...,LPC_FILTERORDER; n=0,...,BLOCKL-lag-1
In order to make the analysis more robust against numerical precision
problems, a spectral smoothing procedure is applied by windowing the
autocorrelation coefficients before the LPC coefficients are
computed. Also, a white noise floor is added to the autocorrelation
function by multiplying coefficient zero by 1.0001 (40dB below the
energy of the windowed speech signal). These two steps are
implemented by multiplying the autocorrelation coefficients with the
following window:
lpc_lagwinTbl[0] = 1.0001;
lpc_lagwinTbl[i] = exp(-0.5 * ((2 * PI * 60.0 * i) /FS)^2);
i=1,...,LPC_FILTERORDER
where FS=8000 is the sampling frequency
Then, the windowed acf function acf1_win is obtained by
acf1_win[i] = acf1[i] * lpc_lagwinTbl[i];
i=0,...,LPC_FILTERORDER
The second set of autocorrelation coefficients, acf2_win, are
obtained in a similar manner. The window, lpc_asymwinTbl, is applied
to samples 60 through 299, i.e., the entire current block. The
Andersen, et al. Experimental [Page 10]
RFC 3951 Internet Low Bit Rate Codec December 2004
window consists of two segments, the first (samples 0 to 219) being
half a Hanning window with length 440 and the second a quarter of a
cycle of a cosine wave. By using this asymmetric window, an LPC
analysis centered in the fifth sub-block is obtained without the need
for any look-ahead, which would add delay. The asymmetric window is
defined as
lpc_asymwinTbl[i] = (sin(PI * (i + 1) / 441))^2; i=0,...,219
lpc_asymwinTbl[i] = cos((i - 220) * PI / 40); i=220,...,239
and the windowed speech is computed by
speech_hp_win2[i] = speech_hp[i + LPC_LOOKBACK] *
lpc_asymwinTbl[i]; i=0,....BLOCKL-1
The windowed autocorrelation coefficients are then obtained in
exactly the same way as for the first analysis instance.
The generation of the windows lpc_winTbl, lpc_asymwinTbl, and
lpc_lagwinTbl are typically done in advance, and the arrays are
stored in ROM rather than repeating the calculation for every block.
3.2.2. Computation of LPC Coefficients
From the 2 x 11 smoothed autocorrelation coefficients, acf1_win and
acf2_win, the 2 x 11 LPC coefficients, lp1 and lp2, are calculated
in the same way for both analysis locations by using the well known
Levinson-Durbin recursion. The first LPC coefficient is always 1.0,
resulting in ten unique coefficients.
After determining the LPC coefficients, a bandwidth expansion
procedure is applied to smooth the spectral peaks in the
short-term spectrum. The bandwidth addition is obtained by the
following modification of the LPC coefficients:
lp1_bw[i] = lp1[i] * chirp^i; i=0,...,LPC_FILTERORDER
lp2_bw[i] = lp2[i] * chirp^i; i=0,...,LPC_FILTERORDER
where "chirp" is a real number between 0 and 1. It is RECOMMENDED to
use a value of 0.9.
3.2.3. Computation of LSF Coefficients from LPC Coefficients
Thus far, two sets of LPC coefficients that represent the short-term
spectral characteristics of the speech signal for two different time
locations within the current block have been determined. These
coefficients SHOULD be quantized and interpolated. Before this is
Andersen, et al. Experimental [Page 11]
RFC 3951 Internet Low Bit Rate Codec December 2004
done, it is advantageous to convert the LPC parameters into another
type of representation called Line Spectral Frequencies (LSF). The
LSF parameters are used because they are better suited for
quantization and interpolation than the regular LPC coefficients.
Many computationally efficient methods for calculating the LSFs from
the LPC coefficients have been proposed in the literature. The
detailed implementation of one applicable method can be found in
Appendix A.26. The two arrays of LSF coefficients obtained, lsf1 and
lsf2, are of dimension 10 (LPC_FILTERORDER).
3.2.4. Quantization of LSF Coefficients
Because the LPC filters defined by the two sets of LSFs are also
needed in the decoder, the LSF parameters need to be quantized and
transmitted as side information. The total number of bits required
to represent the quantization of the two LSF representations for one
block of speech is 40, with 20 bits used for each of lsf1 and lsf2.
For computational and storage reasons, the LSF vectors are quantized
using three-split vector quantization (VQ). That is, the LSF vectors
are split into three sub-vectors that are each quantized with a
regular VQ. The quantized versions of lsf1 and lsf2, qlsf1 and
qlsf2, are obtained by using the same memoryless split VQ. The
length of each of these two LSF vectors is 10, and they are split
into three sub-vectors containing 3, 3, and 4 values, respectively.
For each of the sub-vectors, a separate codebook of quantized values
has been designed with a standard VQ training method for a large
database containing speech from a large number of speakers recorded
under various conditions. The size of each of the three codebooks
associated with the split definitions above is
int size_lsfCbTbl[LSF_NSPLIT] = {64,128,128};
The actual values of the vector quantization codebook that must be
used can be found in the reference code of Appendix A. Both sets of
LSF coefficients, lsf1 and lsf2, are quantized with a standard
memoryless split vector quantization (VQ) structure using the squared
error criterion in the LSF domain. The split VQ quantization
consists of the following steps:
1) Quantize the first three LSF coefficients (1 - 3) with a VQ
codebook of size 64.
2) Quantize the next three LSF coefficients 4 - 6 with VQ a codebook
of size 128.
3) Quantize the last four LSF coefficients (7 - 10) with a VQ
codebook of size 128.
Andersen, et al. Experimental [Page 12]
RFC 3951 Internet Low Bit Rate Codec December 2004
This procedure, repeated for lsf1 and lsf2, gives six quantization
indices and the quantized sets of LSF coefficients qlsf1 and qlsf2.
Each set of three indices is encoded with 6 + 7 + 7 = 20 bits. The
total number of bits used for LSF quantization in a block is thus 40
bits.
3.2.5. Stability Check of LSF Coefficients
The LSF representation of the LPC filter has the convenient property
that the coefficients are ordered by increasing value, i.e., lsf(n-1)
< lsf(n), 0 < n < 10, if the corresponding synthesis filter is
stable. As we are employing a split VQ scheme, it is possible that
at the split boundaries the LSF coefficients are not ordered
correctly and hence that the corresponding LP filter is unstable. To
ensure that the filter used is stable, a stability check is performed
for the quantized LSF vectors. If it turns out that the coefficients
are not ordered appropriately (with a safety margin of 50 Hz to
ensure that formant peaks are not too narrow), they will be moved
apart. The detailed method for this can be found in Appendix A.40.
The same procedure is performed in the decoder. This ensures that
exactly the same LSF representations are used in both encoder and
decoder.
3.2.6. Interpolation of LSF Coefficients
From the two sets of LSF coefficients that are computed for each
block of speech, different LSFs are obtained for each sub-block by
means of interpolation. This procedure is performed for the original
LSFs (lsf1 and lsf2), as well as the quantized versions qlsf1 and
qlsf2, as both versions are used in the encoder. Here follows a
brief summary of the interpolation scheme; the details are found in
the c-code of Appendix A. In the first sub-block, the average of the
second LSF vector from the previous block and the first LSF vector in
the current block is used. For sub-blocks two through five, the LSFs
used are obtained by linear interpolation from lsf1 (and qlsf1) to
lsf2 (and qlsf2), with lsf1 used in sub-block two and lsf2 in sub-
block five. In the last sub-block, lsf2 is used. For the very first
block it is assumed that the last LSF vector of the previous block is
equal to a predefined vector, lsfmeanTbl, obtained by calculating the
mean LSF vector of the LSF design database.
lsfmeanTbl[LPC_FILTERORDER] = {0.281738, 0.445801, 0.663330,
0.962524, 1.251831, 1.533081, 1.850586, 2.137817,
2.481445, 2.777344}
Andersen, et al. Experimental [Page 13]
RFC 3951 Internet Low Bit Rate Codec December 2004
The interpolation method is standard linear interpolation in the LSF
domain. The interpolated LSF values are converted to LPC
coefficients for each sub-block. The unquantized and quantized LPC
coefficients form two sets of filters respectively. The unquantized
analysis filter for sub-block k is defined as follows
___
\
Ak(z)= 1 + > ak(i)*z^(-i)
/__
i=1...LPC_FILTERORDER
The quantized analysis filter for sub-block k is defined as follows
___
\
A~k(z)= 1 + > a~k(i)*z^(-i)
/__
i=1...LPC_FILTERORDER
A reference implementation of the lsf encoding is given in Appendix
A.38. A reference implementation of the corresponding decoding can
be found in Appendix A.36.
3.2.7. LPC Analysis and Quantization for 20 ms Frames
As previously stated, the codec only calculates one set of LPC
parameters for the 20 ms frame size as opposed to two sets for 30 ms
frames. A single set of autocorrelation coefficients is calculated
on the LPC_LOOKBACK + BLOCKL = 80 + 160 = 240 samples. These samples
are windowed with the asymmetric window lpc_asymwinTbl, centered over
the third sub-frame, to form speech_hp_win. Autocorrelation
coefficients, acf, are calculated on the 240 samples in speech_hp_win
and then windowed exactly as in section 3.2.1 (resulting in
acf_win).
This single set of windowed autocorrelation coefficients is used to
calculate LPC coefficients, LSF coefficients, and quantized LSF
coefficients in exactly the same manner as in sections 3.2.3 through
3.2.4. As for the 30 ms frame size, the ten LSF coefficients are
divided into three sub-vectors of size 3, 3, and 4 and quantized by
using the same scheme and codebook as in section 3.2.4 to finally get
3 quantization indices. The quantized LSF coefficients are
stabilized with the algorithm described in section 3.2.5.
From the set of LSF coefficients computed for this block and those
from the previous block, different LSFs are obtained for each sub-
block by means of interpolation. The interpolation is done linearly
in the LSF domain over the four sub-blocks, so that the n-th sub-
Andersen, et al. Experimental [Page 14]
RFC 3951 Internet Low Bit Rate Codec December 2004
frame uses the weight (4-n)/4 for the LSF from old frame and the
weight n/4 of the LSF from the current frame. For the very first
block the mean LSF, lsfmeanTbl, is used as the LSF from the previous
block. Similarly as seen in section 3.2.6, both unquantized, A(z),
and quantized, A~(z), analysis filters are calculated for each of the
four sub-blocks.
3.3. Calculation of the Residual
The block of speech samples is filtered by the quantized and
interpolated LPC analysis filters to yield the residual signal. In
particular, the corresponding LPC analysis filter for each 40 sample
sub-block is used to filter the speech samples for the same sub-
block. The filter memory at the end of each sub-block is carried
over to the LPC filter of the next sub-block. The signal at the
output of each LP analysis filter constitutes the residual signal for
the corresponding sub-block.
A reference implementation of the LPC analysis filters is given in
Appendix A.10.
3.4. Perceptual Weighting Filter
In principle any good design of a perceptual weighting filter can be
applied in the encoder without compromising this codec definition.
However, it is RECOMMENDED to use the perceptual weighting filter Wk
for sub-block k specified below:
Wk(z)=1/Ak(z/LPC_CHIRP_WEIGHTDENUM), where
LPC_CHIRP_WEIGHTDENUM = 0.4222
This is a simple design with low complexity that is applied in the
LPC residual domain. Here Ak(z) is the filter obtained for sub-block
k from unquantized but interpolated LSF coefficients.
3.5. Start State Encoder
The start state is quantized by using a common 6-bit scalar quantizer
for the block and a 3-bit scalar quantizer operating on scaled
samples in the weighted speech domain. In the following we describe
the state encoding in greater detail.
Andersen, et al. Experimental [Page 15]
RFC 3951 Internet Low Bit Rate Codec December 2004
3.5.1. Start State Estimation
The two sub-blocks containing the start state are determined by
finding the two consecutive sub-blocks in the block having the
highest power. Advantageously, down-weighting is used in the
beginning and end of the sub-frames, i.e., the following measure is
computed (NSUB=4/6 for 20/30 ms frame size):
nsub=1,...,NSUB-1
ssqn[nsub] = 0.0;
for (i=(nsub-1)*SUBL; i<(nsub-1)*SUBL+5; i++)
ssqn[nsub] += sampEn_win[i-(nsub-1)*SUBL]*
residual[i]*residual[i];
for (i=(nsub-1)*SUBL+5; i<(nsub+1)*SUBL-5; i++)
ssqn[nsub] += residual[i]*residual[i];
for (i=(nsub+1)*SUBL-5; i<(nsub+1)*SUBL; i++)
ssqn[nsub] += sampEn_win[(nsub+1)*SUBL-i-1]*
residual[i]*residual[i];
where sampEn_win[5]={1/6, 2/6, 3/6, 4/6, 5/6}; MAY be used. The
sub-frame number corresponding to the maximum value of
ssqEn_win[nsub-1]*ssqn[nsub] is selected as the start state
indicator. A weighting of ssqEn_win[]={0.8,0.9,1.0,0.9,0.8} for 30
ms frames and ssqEn_win[]={0.9,1.0,0.9} for 20 ms frames; MAY
advantageously be used to bias the start state towards the middle of
the frame.
For 20 ms frames there are three possible positions for the two-sub-
block length maximum power segment; the start state position is
encoded with 2 bits. The start state position, start, MUST be
encoded as
start=1: start state in sub-frame 0 and 1
start=2: start state in sub-frame 1 and 2
start=3: start state in sub-frame 2 and 3
For 30 ms frames there are five possible positions of the two-sub-
block length maximum power segment, the start state position is
encoded with 3 bits. The start state position, start, MUST be
encoded as
start=1: start state in sub-frame 0 and 1
start=2: start state in sub-frame 1 and 2
start=3: start state in sub-frame 2 and 3
start=4: start state in sub-frame 3 and 4
start=5: start state in sub-frame 4 and 5
Andersen, et al. Experimental [Page 16]
RFC 3951 Internet Low Bit Rate Codec December 2004
Hence, in both cases, index 0 is not used. In order to shorten the
start state for bit rate efficiency, the start state is brought down
to STATE_SHORT_LEN=57 samples for 20 ms frames and STATE_SHORT_LEN=58
samples for 30 ms frames. The power of the first 23/22 and last
23/22 samples of the two sub-frame blocks identified above is
computed as the sum of the squared signal sample values, and the
23/22-sample segment with the lowest power is excluded from the start
state. One bit is transmitted to indicate which of the two possible
57/58 sample segments is used. The start state position within the
two sub-frames determined above, state_first, MUST be encoded as
state_first=1: start state is first STATE_SHORT_LEN samples
state_first=0: start state is last STATE_SHORT_LEN samples
3.5.2. All-Pass Filtering and Scale Quantization
The block of residual samples in the start state is first filtered by
an all-pass filter with the quantized LPC coefficients as denominator
and reversed quantized LPC coefficients as numerator. The purpose of
this phase-dispersion filter is to get a more even distribution of
the sample values in the residual signal. The filtering is performed
by circular convolution, where the initial filter memory is set to
zero.
res(0..(STATE_SHORT_LEN-1)) = uncoded start state residual
res((STATE_SHORT_LEN)..(2*STATE_SHORT_LEN-1)) = 0
Pk(z) = A~rk(z)/A~k(z), where
___
\
A~rk(z)= z^(-LPC_FILTERORDER)+>a~k(i+1)*z^(i-(LPC_FILTERORDER-1))
/__
i=0...(LPC_FILTERORDER-1)
and A~k(z) is taken from the block where the start state begins
res -> Pk(z) -> filtered
ccres(k) = filtered(k) + filtered(k+STATE_SHORT_LEN),
k=0..(STATE_SHORT_LEN-1)
The all-pass filtered block is searched for its largest magnitude
sample. The 10-logarithm of this magnitude is quantized with a 6-bit
quantizer, state_frgqTbl, by finding the nearest representation.
Andersen, et al. Experimental [Page 17]
RFC 3951 Internet Low Bit Rate Codec December 2004
This results in an index, idxForMax, corresponding to a quantized
value, qmax. The all-pass filtered residual samples in the block are
then multiplied with a scaling factor scal=4.5/(10^qmax) to yield
normalized samples.
state_frgqTbl[64] = {1.000085, 1.071695, 1.140395, 1.206868,
1.277188, 1.351503, 1.429380, 1.500727, 1.569049,
1.639599, 1.707071, 1.781531, 1.840799, 1.901550,
1.956695, 2.006750, 2.055474, 2.102787, 2.142819,
2.183592, 2.217962, 2.257177, 2.295739, 2.332967,
2.369248, 2.402792, 2.435080, 2.468598, 2.503394,
2.539284, 2.572944, 2.605036, 2.636331, 2.668939,
2.698780, 2.729101, 2.759786, 2.789834, 2.818679,
2.848074, 2.877470, 2.906899, 2.936655, 2.967804,
3.000115, 3.033367, 3.066355, 3.104231, 3.141499,
3.183012, 3.222952, 3.265433, 3.308441, 3.350823,
3.395275, 3.442793, 3.490801, 3.542514, 3.604064,
3.666050, 3.740994, 3.830749, 3.938770, 4.101764}
3.5.3. Scalar Quantization
The normalized samples are quantized in the perceptually weighted
speech domain by a sample-by-sample scalar DPCM quantization as
depicted in Figure 3.3. Each sample in the block is filtered by a
weighting filter Wk(z), specified in section 3.4, to form a weighted
speech sample x[n]. The target sample d[n] is formed by subtracting
a predicted sample y[n], where the prediction filter is given by
Pk(z) = 1 - 1 / Wk(z).
+-------+ x[n] + d[n] +-----------+ u[n]
residual -->| Wk(z) |-------->(+)---->| Quantizer |------> quantized
+-------+ - /|\ +-----------+ | residual
| \|/
y[n] +--------------------->(+)
| |
| +------+ |
+--------| Pk(z)|<------+
+------+
Figure 3.3. Quantization of start state samples by DPCM in weighted
speech domain.