Audio Search Engine
Last updated
Last updated
Algorithm:
Graph the frequency and amplitude of each point in time in the song.
Find the points having high amplitude and frequency variation in this graph.
Take all k consecutive points to get a set of chunks.
Take the combinatorial hash of each chunk.
Store these hashes in the DB.
Resources: - https://medium.com/@treycoopermusic/how-shazam-works-d97135fb4582 - http://coding-geek.com/how-shazam-works/ - https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
Storage Estimation: Total storage => Number of point_pairs * STORAGE_COST_PER_PAIR
STORAGE_COST_PER_PAIR = cost of storing a pair((f1, t1), (f2, t2))​ = (f1 : f2 : delta_time) = 3 integers = 3 * 4 bytes = 12 bytes.
Total storage => Number of point_pairs per song total number of songs 12Bytes Assume the total number of songs in the database to be 1 million.
Total storage => Number of point_pairs per song 1M 12Bytes
Total storage => NUMBER_OF_PAIRS_PER_CHUNK AVERAGE_NUMBER_OF_CHUNKS_PER_SONG 1M * 12Bytes
AVERAGE_NUMBER_OF_CHUNKS_PER_SONG length of chunk =avg length of song / divided by length of chunk
Assume average length to be around 3 minutes = 200, and each chunk to be around 10 seconds.
2. Why does this video only talk about the sound search algorithm? Where is an explanation of the architecture? This video is about analysing an algorithm in depth. The considerations of server load, fault tolerance, etc… are discussed in detail in the other videos.
3. Is this algorithm capable of handling millions of requests searching amongst millions of songs? The database has only insert and select operations. We do not require any transactional guarantees during querying. The requests will be processed by the mobile clients and sent as a set of interesting points to the server. This means there is little left to do on the server except matching a point map to the database. Even for millions of requests a day, this system should be scalable.
4. What if two chunks have very similar signatures? As long as the hash function is chosen appropriately, this is highly unlikely.
5. What if the song has no interesting points in it? Similarly, what happens if the clipping has no interesting points? The first scenario is highly unlikely, to the point of being assumed impossible. If there are no interesting points in the clip, it usually means one or more of the following: a) The clip is too short b) The audio is too faint c) The noise is too much.
6. Is it possible that two servers give different results for the same audio clip? If the process is deterministic, then no. If the algorithm changes, a new deployment will be needed. This might lead to multiple versions of code in the production environment. When this happens, the same song clip can give different song match results.
7. What happens if the API has to return the most likely matches? We can modify the response to return the songs having the most hash matches.