An Overview of MapReduce and Its Impact on Distributed Data Processing
Not Available / Digital Item
An Overview of MapReduce and Its Impact on Distributed Data Processing
Organizations collect many types of data about the processes they support: marketing, operational, activity logging, etc. For example, “click stream†and log data provide a record of the end user’s activity during past visits to a web site. Shopping cart data provides information on what items a customer intends purchase, and checkout data records the items eventually purchased. Companies like Amazon.com and Netflix provide examples of how information is used by organizations to enhance the end users experience. In the Amazon.com retail web portal, a person can see product recommendations for products based on users with similar purchasing habits, items that were eventually purchased after a product is viewed, and additional typically items purchased with the product being viewed. In Netflix portal, movie recommendations are provided to users, based on sophisticated analytical models such as k-nearest-neighbor, attempting to find titles of interest based on other customers with similar interests. The ability to find customer insights in data provides an advantage to organizations. As a result, more organizations are capturing and storing large amounts of data. Organizations need an effective method to analyze this data, which in many cases may be distributed across multiple compute nodes.