📖
System Design
  • Home
  • Learn System Design Daily
  • System Design Steps
  • List of design questions:
  • My System Design Interview Checklist
  • CheatSheet
  • Programming Language Jargons
  • Scaleable Design
  • Agile Scrum
  • Uber
  • Gmail System Design
  • Distributed Rate Limiting
  • Audio Search Engine
  • Code Style and Review
  • Calling App Design
  • Low Level Design: Payment Tracking App
  • Machine Coding :Cache
  • Interview Advice
  • URL Shortner
  • Unique ID in Distributed Systems
  • Load Balancing Algorithm
  • API Architecture
  • Desgin Practise for Rest API
  • Performance Practise for API
  • API Gateway
  • API Security Hacks
  • Distributed Design Patterns
  • Fault Tolerance in Distributed Systems
  • Microservice Communication Design Patterns
  • Zipping /Compression
  • Database
  • Mongo DB
  • SQL
  • PostgreSQL
  • Database Designs
    • Designing a location based database
  • Building Database
  • Design Patterns
    • Microservice Architecture 10 Design Patterns
    • Interaction Patterns
  • Locale.ai
  • Version Control
  • Caches / Caching
  • High Level Design
  • Low Level Design
  • Containers, Docker
  • Docker
  • Linux Directories
  • Design Pattern for Software Architect
  • S.O.L.I.D Principles
  • Monitoring and Telemetry for Production System
  • C4 model
  • LRU Cache
  • VSCode
  • Chatbot Architecture
  • Streaming API Repsonse
  • Latency in System Design
  • Cloud
    • Azure
    • AWS
  • Builds
    • Jenkins
Powered by GitBook
On this page

Was this helpful?

Audio Search Engine

PreviousDistributed Rate LimitingNextCode Style and Review

Last updated 4 years ago

Was this helpful?

Algorithm:

  1. Graph the frequency and amplitude of each point in time in the song.

  2. Find the points having high amplitude and frequency variation in this graph.

  3. Take all k consecutive points to get a set of chunks.

  4. Take the combinatorial hash of each chunk.

  5. Store these hashes in the DB.

Resources: - https://medium.com/@treycoopermusic/how-shazam-works-d97135fb4582 - http://coding-geek.com/how-shazam-works/ - https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

Storage Estimation: Total storage => Number of point_pairs * STORAGE_COST_PER_PAIR

STORAGE_COST_PER_PAIR = cost of storing a pair((f1, t1), (f2, t2))​ = (f1 : f2 : delta_time) = 3 integers = 3 * 4 bytes = 12 bytes.

Total storage => Number of point_pairs per song total number of songs 12Bytes Assume the total number of songs in the database to be 1 million.

Total storage => Number of point_pairs per song 1M 12Bytes

Total storage => NUMBER_OF_PAIRS_PER_CHUNK AVERAGE_NUMBER_OF_CHUNKS_PER_SONG 1M * 12Bytes

AVERAGE_NUMBER_OF_CHUNKS_PER_SONG length of chunk =avg length of song / divided by length of chunk

Assume average length to be around 3 minutes = 200, and each chunk to be around 10 seconds.

2. Why does this video only talk about the sound search algorithm? Where is an explanation of the architecture? This video is about analysing an algorithm in depth. The considerations of server load, fault tolerance, etc… are discussed in detail in the other videos.

3. Is this algorithm capable of handling millions of requests searching amongst millions of songs? The database has only insert and select operations. We do not require any transactional guarantees during querying. The requests will be processed by the mobile clients and sent as a set of interesting points to the server. This means there is little left to do on the server except matching a point map to the database. Even for millions of requests a day, this system should be scalable.

4. What if two chunks have very similar signatures? As long as the hash function is chosen appropriately, this is highly unlikely.

5. What if the song has no interesting points in it? Similarly, what happens if the clipping has no interesting points? The first scenario is highly unlikely, to the point of being assumed impossible. If there are no interesting points in the clip, it usually means one or more of the following: a) The clip is too short b) The audio is too faint c) The noise is too much.

6. Is it possible that two servers give different results for the same audio clip? If the process is deterministic, then no. If the algorithm changes, a new deployment will be needed. This might lead to multiple versions of code in the production environment. When this happens, the same song clip can give different song match results.

7. What happens if the API has to return the most likely matches? We can modify the response to return the songs having the most hash matches.