MapReduce: Simplified Data Processing on Large Clusters - Google, 2004

“MapReduce exploits a restricted programming model to parallelize the user program automatically and to provide transparent fault-tolerance”

Restricted programming model: a model whose designers omit features because they're "too complex" or because some programmers have used them to write bad code. In short, restricting the API to give greater guarantees.

  • Intermediate keys and values must be the same type as the output keys and values. This is not the case with the current Hadoop implementation.
  • The authors also mention that MapReduce could be applied to a shared memory system or NUMA multi-processor system
  • Notes on Execution
    • Intermediate values are written to local disk periodically, partitioned by key
      • It sends these locations to the master
    • Reducer learns about the locations from the master and reads the appropriate partitions via RPC directly from the Mapper
    • Once all of the data is received, the reducer sorts the data, possibly with an external sort
      • I wonder if some speculative partial reduction work could be done or some sort of heap style sort that could start immediately
      • However, it’s probably better use of the machine to have it running other work instead
    • Reducer output goes to GFS
      • Once the reducer finishes, it atomically renames the output file
  • Master does not currently checkpoint, the user must rerun the job
    • The authors say it would be easy
  • At the time, network bandwidth was scarce
    • They had the cluster scheduler assign map tasks on the same machines that held the data in GFS or very close to the same machine
    • Wonder how it change now with better network traffic from Google
  • ability to skip bad records
    • When an error occurs (i.e. segfault), they have a handler that sends a quick UDP packet to the master with the offset
    • If the master receives this message from two workers, when it reschedules the task, it will notify the worker to skip the record
    • https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/SkipBadRecords.html
  • Local execution: an alternative im- plementation of the MapReduce library that sequentially executes all of the work for a MapReduce operation on the local machine, where the user can debug with GDB
  • Counter values are piggybacked on the heartbeat response