Thursday, June 3, 2010

How Shazam works


There is a cool service called Shazam, which take a short sample of music, and identifies the song.  There are couple ways to use it, but one of the more convenient is to install their free app onto an iPhone.  Just hit the “tag now” button, hold the phone’s mic up to a speaker, and it will usually identify the song and provide artist information, as well as a link to purchase the album. What is so remarkable about the service, is that it works on very obscure songs and will do so even with extraneous background noise.  I’ve gotten it to work sitting down in a crowded coffee shop and pizzeria.


So I was curious how it worked, and luckily there is a paper written by one of the developers explaining just that.  Of course they leave out some of the details, but the basic idea is exactly what you would expect:  it relies on fingerprinting music based on the spectrogram.
Here are the basic steps:
  1. Beforehand, Shazam fingerprints a comprehensive catalog of music, and stores the fingerprints in a database.
  2. A user “tags” a song they hear, which fingerprints a 10 second sample of audio.
  3. The Shazam app uploads the fingerprint to Shazam’s service, which runs a search for a matching fingerprint in their database.
  4. If a match is found, the song info is returned to the user, otherwise an error is returned.
Spectrogram of a song sample with peak intensities marked in red. Wang, Avery Li-Chun. An Industrial-Strength Audio Search Algorithm. Shazam Entertainment, 2003. Fig. 1A,B.
Here’s how the fingerprinting works:
You can think of any piece of music as a time-frequency graph called a spectrogram.  On one axis is time, on another is frequency, and on the 3rd is intensity.  Each point on the graph represents the intensity of a given frequency at a specific point in time. Assuming time is on the x-axis and frequency is on the y-axis, a horizontal line would represent a continuous pure tone and a vertical line would represent an instantaneous burst of white noise.  Here’s one example of how a song might look:


The Shazam algorithm fingerprints a song by generating this 3d graph, and identifying frequencies of “peak intensity.”  For each of these peak points it keeps track of the frequency and the amount of time from the beginning of the track.  Based on the paper’s examples, I’m guessing they find about 3 of these points per second. [Update: A commenter below notes that in his own implementation he needed more like 30 points/sec.]  So an example of a fingerprint  for a 10 seconds sample might be:


Shazam builds their fingerprint catalog out as a hash table, where the key is the frequency.  When Shazam receives a fingerprint like the one above, it uses the first key (in this case 823.44), and it searches for all matching songs.  Their hash table might look like the following:
Top graph: Songs and sample have many frequency matches, but they do not align in time, so there is no match. Bottom Graph: frequency matches occur at the same time, so the song and sample are a match. Wang, Avery Li-Chun. An Industrial-Strength Audio Search Algorithm. Shazam Entertainment, 2003. Fig. 2B.
[Some extra detail: They do not just mark a single point in the spectrogram, rather they mark a pair of points: the "peak intensity" plus a second "anchor point".  So their key is not just a single frequency, it is a hash of the frequencies of both points.  This leads to less hash collisions which in turn speeds up catalog searching by several orders of magnitude by allowing them to take greater advantage of the table's constant (O(1)) look-up time.  There's many interesting things to say about hashing, but I'm not going to go into them here, so just read around the links in this paragraph if you're interested.]

If a specific song is hit multiple times (based on examples in the paper I think it needs about 1 frequency hit per second), it then checks to see if these frequencies correspond in time.  They actually have a clever way of doing this  They create a 2d plot of frequency hits, on one axis is the time from the beginning of the track those frequencies appear in the song, on the other axis is the time those frequencies appear in the sample.  If there is a temporal relation between the sets of points, then the points will align along a diagonal.  They use another signal processing method to find this line, and if it exists with some certainty, then they label the song a match.

20 comments:

  1. 1. Do they have a database of each and every song or they rely on multiple (external) databases?

    ReplyDelete
  2. Don't mind so much when they can't ID some of my relatively obscure stuff cos they haven't catalogued the CD, but several times they have returned the wrong track details, which is worrying.

    ReplyDelete
  3. Hi,
    Do they go through or listen to all the songs before inserting into database or They connect to another repository of Songs and Music.

    ReplyDelete
  4. Probably not constant time tho; the lookups are O(1 + n/k).

    ReplyDelete
  5. Can some rational man please explain how they update their database and how the hell have they built the database of nearly all the songs in the world!??

    ReplyDelete
  6. Any algoritham by which we can create hash table of our song?????
    plz help!!!!!!!!!!

    ReplyDelete
  7. Pretty! This was an incredibly wonderful article. Thanks for providing this info.
    Also visit my site : proform treadmill reviews

    ReplyDelete
  8. Nice blog here! Also your website loads up very fast!
    What web host are you using? Can I get your affiliate link to your host?
    I wish my web site loaded up as fast as yours lol
    Feel free to visit my web blog :: This Web site

    ReplyDelete
  9. Greetings! This is my first visit to your blog!
    We are a collection of volunteers and starting a new initiative
    in a community in the same niche. Your blog provided us
    useful information to work on. You have done a outstanding job!
    Feel free to visit my website :: Viagra

    ReplyDelete
  10. Тhese electгons are benefіcial for seνeral physiologicаl
    processes they have been reνealed to have a іmmedіate impression on the manufaсturing of hoгmones in thе
    human bodу this kind of аs serotοnіn.
    Οne of its moѕt effесtiνe νarіations is thе Bosch Pro Εlectrіcal
    power Ϲylindeг Vacuum Сleаner, ωhісh has ѕtandarԁ, high ѕtоp pieсes to sеriouѕly gіve уou a excellent clean.
    Anothеr bіg аdvantage for these decentгalised аrсhіtecture is that there will
    be nο one choκe lеvel in the nеtwork,
    аs a гesult guaranteeing thаt еνen in thе cіrcumstanсе of a disasteг,
    some elеment of the netwοrk will oftеn be
    up anԁ functiοning.

    Feel free to visit my ѕite - http://www.wikicookrecipe.com/
    my webpage - mouse click the following internet site

    ReplyDelete
  11. The Lіeutеnant swore than іn memorу of the miхеd drink, it would foгever be rеcognіsed іn the aгmy as a 'cock's tail'. Earning pizza is fantastic, not that troublesome, and a splendidly inventive way to impress friends and relatives with your cooking abilities - as well as make tasty meals that never crack the funds. Cloth markers have even been used and can be practical to contact-up the sections on the sneakers whereby the coloration did not take (the seams notably).

    My website - pizza stone recipes cook

    ReplyDelete
  12. Asking questions are truly pleasant thing if you
    are not understanding anything entirely, however this piece of writing presents fastidious understanding yet.


    My homepage :: small bathroom design

    ReplyDelete
  13. Reducе heat; simmer, unсovered, for 35 to 40minutes or tο desired consіstenсy,
    stirrіng occasionally. Αftеr you've got the crust rolled out, transfer it to your pan or pizza stone. Now you'rе геaԁy to bе crеativе wіth the outѕide of the
    cake.

    Herе is mу web site - pizza pan akron Market

    ReplyDelete
  14. It is the best time to make a few plans for the long run and it is time to be happy.
    I have read this submit and if I could I wish to recommend you few fascinating things or advice.
    Maybe you could write next articles regarding
    this article. I want to read more issues approximately it!

    Feel free to surf to my webpage ... Iqracollege.Com
    my page > http://www.giida.cnr.it/GIIDAwiki/PattyK3

    ReplyDelete
  15. Gently pеrmit go аnԁ then establіshed thе
    spiԁer upsidе down (wіth іts legs in the aiг) to dry.
    The publican рurchased Daіsy, his baгmаid, to
    ρгονide some cеlеbratoгy miхed drinks.
    "In modern society, I imply 50 % the time nearly all women and men have don't been alone, actually by yourself, with out some type of distraction.

    Also visit my page: old stone oven baking stone review

    ReplyDelete
  16. Dip remainіng 4 tortillаs іnto sauсe anԁ ρreρare about nеxt lаyer.

    - Centre tunnel chіmney duct to maximiѕe effectіve very hot airflοw.

    Theу havе liѵed lіves аkin to
    that of marvelous ѕаintѕ, sagеs аnd Rіshis who have been reсognised for their penance and austerіtiеs.


    Feel free to surf to my web site - farberware pizza stone set

    ReplyDelete
  17. Pretty! This has been an incredibly wonderful post.

    Thank you for supplying this info.

    Also visit my web page ... Kenny Nordlund

    ReplyDelete
  18. This is a really nice article, but dude, please install some spam plugin on your wordpress, most comments are spam.

    ReplyDelete
  19. What is the point of storing the intensity? They don't seem to use it in the algorithm.

    ReplyDelete
  20. What is the point of tracking the intensity? They don't seem to use it in their algorithm.

    ReplyDelete