Audio file formats can get really confusing, really fast. There are a lot of them, with different characteristics, and different qualities, so let’s take a look at them.
We’ll kick things off with the basics – what is an audio file format?
An audio file format is a format that stores digital audio data on your computer. It comes with a bit layout, known as an audio coding format, which can be compressed or uncompressed.
With compression, you have lossy and lossless compression, but we’ll get to those in a minute.
The data contained in the audio file format can be a raw bitstream, but commonly, it’s embedded in an audio data format that has a defined storage layer.
Each audio file you can listen to is stored in one of the many audio file formats – you have WAV, FLAC, MP3, Vorbis, WMA, ALAC, AIFF, PCM, and a lot more.
How does the format impact sound quality?
While a lot of people would think that the sound format itself can in no way impact sound quality, that’s a mistake. To explain this a bit further, let’s take a look at the “flagship” CD audio quality.
When you store audio on a CD, a technique known as Pulse Code Modulation is used, or PCM. With PCM, you have snapshots of the amplitude of an audio waveform, which are measured t regular, specific time intervals.
The CD format has 44,100 such measurements per second, which makes the sample rate 44.1kHz.
Here is where the Nyquist Theorem comes in, which states that a PCM digital audio system’s high-frequency limit must have a sample rate that’s at least double the highest frequency.
Therefore, the 44.1kHz sample rate we just spoke about can theoretically store frequencies that are a touch above 20kHz, which is the upper limit of the best human hearing.
Then, we come to the word length – which is the 16 binary digits used to represent each measurement’s sample. The greater the word length, the greater the dynamic range you can capture.
For example, audio CDs use a 16-bit word length, which gets you a theoretical signal-to-noise ratio of 93dB. Each increase of around 9dB equates to a perceived doubling of the volume.
Therefore, even though the dynamic range of CD-quality audio is pretty good, it falls short of the 140dB capability the human ear has.
Just to further put this into context, PCM data, which is a raw, uncompressed format, in professional environments uses 24-bit word lengths, which means you get a theoretical 141dB signal-to-noise ratio and a sample rate of 192kHz.
That PCM data is usually stored as a WAV or AIFF file, two formats we’ll talk about later, but both of them tend to be rather large and they take up a lot of space. And then we also have the other, compressed formats, which can be either lossy or lossless.
The lossy formats, which we will get to in a moment, are perceptual audio encoding formats. Assumptions are made as to what we can and can’t hear, in different contexts, and what we can’t hear is “lost” to reduce the file size.
And that is why lossy formats always come with some level of quality loss because they just don’t have all the information the sound comes with.
Uncompressed vs. lossless vs. lossy audio formats
Since we already covered uncompressed audio file formats and how they work, let’s take a look at how lossless compares to uncompressed, and then we’ll discuss the use of psychoacoustics in lossy audio formats.
As we mentioned, uncompressed audio files tend to be rather large in terms of file size. Compressed lossless audio files are smaller, without losing the sound quality. Why would you do that?
Because in some situations, even with Blu-ray movies sometimes, you’re limited in terms of space. Sometimes you just want to use that space for something else, and that compression does matter.
The lossless compression formats, like FLAC, work similarly to a ZIP file on computers. When you “unpack” the file, you get a copy that’s completely identical to the file you packed. The same thing happens with lossless compression formats.
They do drop data to save space, yes, but they use flags that indicate what kind of data was lost, so it can be added back in when the file is decoded.
A good example, although very simplified, would be a movie that has a lot of silence in it. An uncompressed file will still have data where you have silence in the movie, and that data does take up a lot of space.
A FLAC file, on the other hand, will just have flags where you have silence, but the data is dropped completely. This results in a much smaller file, but when decoded, you have the same audio file.
Again, this is extremely simplified, because a real lossless audio algorithm doesn’t just drop the data, but uses complex statistical models to analyze patterns in said data.
And so, we arrive at the lossy file formats, like MP3 or WMA, and back to psychoacoustics.
Psychoacoustics is the study of how humans perceive sound. In the audio world, it’s important because our brains just can’t perceive all data that reaches our ears.
Therefore, people who advocate lossy file formats will claim that it is unnecessary to store and reproduce all that data. The argument is that if the human ear can’t hear it, you don’t have to reproduce it – humans won’t be able to hear the difference.
In this situation, there are a couple of principles that are used. One of them is the minimum audition threshold, which states the human ear is not equally sensitive to all frequencies.
It gets less sensitive at the extremes, which allows for some content that is quieter and towards those frequencies, to be dropped. Then you have simultaneous masking, as well as temporal masking, two more principles that aim to determine which data is unnecessary.
At the end of the day, a lot of studies have determined that you actually can hear the difference between lossy and lossless audio with a pair of decent headphones. Some might argue that you can also hear the difference between lossless and uncompressed, but that difference is far too small.
Audio Formats Comparison
As we mentioned, there is a massive variety of audio formats, but there are a few that are just more common out there, and you’ll come across more often. So, let’s take a look at them and see how they compare.
WAV And AIFF
We will kick things off with the two uncompressed file formats, WAV and AIFF.
The thing to know about them is that they’re both pretty much interchangeable when it comes to sound quality.
WAV is the native uncompressed audio format for Windows, while AIFF is the same thing, but for Apple’s macOS. Whichever one you choose, there won’t be an impact in terms of quality.
They both support both 16-bit audio, they’re both supported across various operating systems, and both use PCM encoding. The one difference is that each of them organizes data differently. But that’s all there is, otherwise, these two are identical, and sound identical, too.
WAV was initially developed by Microsoft and IBM, and it was used to store an audio bitstream on PCs. It applies the RIFF (Resource Interchange File Format) bitstream format that stores data in chunks, and is the main format for Microsoft Windows systems if you want to store raw, uncompressed audio.
The RIFF format is oftentimes used as a “wrapper” for various formats. The most common WAV audio format is pure uncompressed audio with a linear pulse code modulation format. You can edit and manipulate WAV files rather easily with the right software.
Considering it uses RIFF, you can use the INFO chunk of the WAV file to add metadata, including XMP data or ID3 chunks. However, some applications may not expect to see this extra information, which may be confusing.
Now, WAV files are usually used to have the highest possible quality files in cases where you don’t have a disk space constraint, or in situations where you can’t afford to spend time compressing and uncompressing data.
AIFF was made by Apple back in 1988, and it was based on the IFF (Interchange File Format) which was rather common on Amiga systems. The data, like in WAV files, is uncompressed PCM.
There is a compressed variant known as AIFF-C or AIFC, but it’s not all that popular. Alongside audio data, AIFF can also contain the musical note of a sample, as well as loop point data, which are things that are important in musical applications.
The file format can store metadata in Name, Author, Annotation, Comment, and Copyright chunks, and you can even embed an ID3v2 chunk, as well as an Application chunk with XMP data in it.
FLAC, ALAC, And WMA
When it comes to the lossless compression file formats, FLAC and ALAC, things are pretty much the same.
Similarly to the situation between WAV and AIFF, ALAC is the FLAC alternative that has been designed by Apple. Most websites that offer Hi-Res music will give you a choice between FLAC and ALAC.
The thing is FLAC is an open-source option and one that can be played by most major platforms – except for iTunes. Apple designed ALAC as an iTunes alternative.
Again, audibly, there is no discernible difference – they’re both lossless compression audio file formats that sound identical.
FLAC stands for Free Lossless Audio Codec and is developed by the Xiph.Org Foundation. When you have a digital audio file that has been compressed by FLAC’s algorithm, the size is typically reduced to anywhere from 50 to 70 percent of its original size.
Then, it’s decompressed to an identical copy. FLAC, as we mentioned, is open source and has royalty-free licensing. There’s support for album cover art, fast seeking, and metadata tagging. There’s a bit of history to the format, with development starting in 2000 by Josh Coalson.
The Xiph.Org Foundation took over in 2003, and you might recognize the foundation because they’re also the guys behind Vorbis.
Since FLAC is a lossless format, it’s great if you have multiple CDs and you’d like to maintain the audio collection’s quality. A FLAC copy can recover the original data rather well, unlike an MP3 copy, for example.
If you’re ripping a CD, you can even create a CUE file, which allows you to burn an audio CD from the FLAC files which is identical to the original CD, including pregap and track order.
ALAC is known as Apple Lossless, and it has been completely developed by apple. In the beginning, when the format was initially made in 2004, Apple kept it proprietary. Later on, in late 2011, not only was it made open source, but it was also made royalty-free.
At this point, there was already an independent, reverse-engineered encoder and decoder, that were open-source and available before the release.
Even though this format was initially made for iTunes, since it didn’t support FLAC, at this point there are a lot of encoders/decoders based on the open-source work.
For example, libavcodec, which is an open-source library, has both an encoder and decoder, so any media player based on it (MPlayer and VLC come to mind) can play ALAC files. Windows 10 also supports ALAC since 2015, which is great.
There is also WMA, or more specifically WMA Lossless (because the original WMA format isn’t lossless), which stands for Windows Media Audio 9 Lossless.
It was released in 2003, and it’s capable of compressing an audio CD to a range from 206 to 411MB, with bitrates from 470 to 940 kbit/s.
In the end, you’re left with a bit-for-bit duplicate with absolutely identical quality to the original. The file extension is .wma, like the other Windows Media Audio formats, and there is support for 6 discrete channels and up to 24-bit, 96kHz lossless audio.
Even though the format hasn’t been documented publicly, some projects have reverse engineered it for non-Microsoft platforms.
MP3, AAC, And OGG
Last but not least, we arrive at the lossy compressed audio file formats.
The big one is MP3, but there are also AAC and OGG, and there are some notable differences between them.
MP3 is the one you’ve come across. It was originally known as the third format of the MPEG-1 standard, but afterward, it was retained and extended to support more bit-rates, and more audio channels.
As a file format, it designates files that have an elementary stream of data, without other complexities of the MP3 standard.
When it comes to compression, it uses lossy compression that relies on oftentimes inexact approximations, as well as partial discarding of data.
The result is a massive file size reduction, especially when you factor in that the remaining data is further reduced in size with FFT and MDCT algorithms.
However, this also means that you’re looking at a significant loss in audio quality, one you can observe yourself.
It is still the most popular file format, simply because there is pretty much near-universal support for it, and because many people that listen to MP3 files don’t care about the difference in sound quality.
Next up we have AAC, which stands for Advanced Audio Coding. It was initially meant to be the successor of MP3 because, at the same bit rate, it achieves higher sound quality.
However, MP3 isn’t going anywhere anytime soon, so they coexist side by side. This format is standard for many iOS devices, as well as game consoles and some smartphones.
AAC is the result of the cooperation and contributions of some truly massive companies, such as Bell Labs, Dolby Laboratories, LG Electronics, Sony Corporation, Microsoft, and NTT, to name a few.
In terms of improvements over MP3, you have more sample rates, higher efficiency, and coding accuracy, and much better handling of frequencies above 16kHz, to name a few. And even though we spoke about the near-universal support for MP3, AAC has some rather strong industry support, which means things might get interesting soon.
Last but not least, we have OGG, a free, open container format that is developed by the Xiph.Org Foundation, the same guys behind FLAC. The format is made to provide streaming and manipulation of digital multimedia in an efficient manner, with the Vorbis format providing the audio layer.
Even though before 2007 the .ogg extension was used for just about any file that used the Ogg container format, now it’s only used for Ogg Vorbis audio files.
Even though the format doesn’t offer too much when compared to MP3 or AAC, there is still rather good support for it incorporated in various media players, as well as portable media players and GPS receivers.