Tuesday, March 15, 2016

Query di MongoDB : aggregation

Dikutip dari situsnya mongodb :
Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.
Singkatnya, proses agregasi dalam mongodb adalah memproses data (record) untuk dikumpulkan menjadi satu atau dipisah-pisah. 
Sebagai contoh terdapat data harga barang (nama, harga, kategori). Ingin mendapatkan harga rata-rata untuk tiap kategori barang, proses ini dinamakan agregasi.
Dalam tulisan kali ini saya akan langsung memberikan contoh kasus :
MongoDB menyinpan data dalam format JSON, dalam kasus kali ini data disimpan dengan format mempunyai format seperti dibawah ini. Data rating untuk masing-masing email (email == unique) dimana 1 email mempunyai child order, dan masing-masing order mempunyai child parameter dan nilai rating di setiap order.
/* 1 */
{
    "_id" : ObjectId("56d7c3d23004aa334327e109"),
    "_class" : "com.rating.SellerRate",
    "email" : "bibin@me.co",
    "orders" : [ 
        {
            "orderId" : "O1GT34QWKI",
            "rates" : [ 
                {
                    "param" : "QUALITY",
                    "rate" : 2.7999999523162842
                }, 
                {
                    "param" : "SERVICE",
                    "rate" : 1.0000000000000000
                }
            ]
        }, 
        {
            "orderId" : "OGHYJUKL",
            "rates" : [ 
                {
                    "param" : "QUALITY",
                    "rate" : 1.7000000476837158
                }, 
                {
                    "param" : "SERVICE",
                    "rate" : 3.2000000476837158
                }
            ]
        }
    ],
    "date" : "20160303",
    "createdBy" : "cs@nana.co",
    "modifiedBy" : "cs@nana.co",
    "creationTime" : NumberLong(1456980946865),
    "modificationTime" : NumberLong(1457326319159)
}

/* 2 */
{
    "_id" : ObjectId("56e6e6ecb5f1fbfb92892d20"),
     "_class" : "com.rating.SellerRate",
    "email" : "mimi@me.co",
    "orders" : [ 
        {
            "orderId" : "O1C45HWKI",
            "rates" : [ 
                {
                    "param" : "QUALITY",
                    "rate" : 2.7999999999999998
                }, 
                {
                    "param" : "SERVICE",
                    "rate" : 1.0000000000000000
                }
            ]
        }, 
        {
            "orderId" : "OF46JKJUKL",
            "rates" : [ 
                {
                    "param" : "QUALITY",
                    "rate" : 3.3999999999999999
                }, 
                {
                    "param" : "SERVICE",
                    "rate" : 4.2999999999999998
                }
            ]
        }
    ],
    "date" : "20160303",
    "createdBy" : "cs@nana.co",
    "modifiedBy" : "cs@nana.co",
    "creationTime" : NumberLong(1456980946865),
    "modificationTime" : NumberLong(1457326319159)
}
Expected Result :
[
   {
      "email":"bibin@me.co",
      "rates":[
         {
            "param":"SERVICE",
            "rate":2.100000023841858
         },
         {
            "param":"QUALITY",
            "rate":2.25
         }
      ]
   },
   {
      "email":"mimi@me.co",
      "rates":[
         {
            "param":"SERVICE",
            "rate":2.7
         },
         {
            "param":"QUALITY",
            "rate":1.5
         }
      ]
   }
]
Berikut, kurang lebih Query yang harus dijalankan agar mendapat hasil seperti yang di inginkan.
db.seller_rates.aggregate([
    {
        $unwind : "$orders"
    },
    {
        $unwind : "$orders.rates"
    },
    {
        $project : {
            email : "$email",
            param : "$orders.rates.param",
            rate : "$orders.rates.rate"
        }
    },
    {
        $group : {
            _id : {
                email : "$email", param : "$param"
            },
            rate : {
                $avg : "$rate"
            }
        }
    },
    {
        $project : {
            email : "$_id.email",
            param : "$_id.param",
            rate : "$rate"
        }
    },
    {
        $group : {
            _id : "$email",
            rates : { $push : "$$ROOT" }
        }
    }

])

$unwind

$unwind merupakan perintah untuk mengeluarkan child menjadi row. Dalam kasus ini 1 email mempunyai 2 orders, maka jika operasi { "$unwind" : "$orders"} gambaran singkatnya kurang lebih digambarkan dibawah ini, data awal :
{
   "email":"bibin@me.co",
   "orders":[
      {
         "orderId":"O1GT34QWKI",
         "rates":[
            {
               "param":"QUALITY",
               "rate":2.7999999523162842
            },
            {
               "param":"SERVICE",
               "rate":1.0000000000000000
            }
         ]
      },
      {
         "orderId":"OGHYJUKL",
         "rates":[
            {
               "param":"QUALITY",
               "rate":1.7000000476837158
            },
            {
               "param":"SERVICE",
               "rate":3.2000000476837158
            }
         ]
      }
   ]
}
Hasil $unwind :
{
   "result":[
      {
         "email":"bibin@me.co",
         "orders":{
            "orderId":"O1GTLAQWKI",
            "rates":[
               {
                  "param":"QUALITY",
                  "rate":2.7999999523162842
               },
               {
                  "param":"SERVICE",
                  "rate":1.0000000000000000
               }
            ]
         }
      },
      {
         "email":"bibin@me.co",
         "orders":{
            "orderId":"OGHYJUKL",
            "rates":[
               {
                  "param":"QUALITY",
                  "rate":1.7000000476837158
               },
               {
                  "param":"SERVICE",
                  "rate":3.2000000476837158
               }
            ]
         }
      }
   ]
}

$project

Operasi $project memungkinkan kita hanya memilih spesifik field yang akan ditampilkan. Contoh jika Query nya demikian (dengan menggunakan data existing).
db.seller_rates.aggregate([
    {
        $unwind : "$orders"
    },
    {
        $unwind : "$orders.rates"
    },
    {
        $project : {
            email : "$email",
            param : "$orders.rates.param",
            rate : "$orders.rates.rate"
        }
    }
])
Hasil $project :
/* 1 */
{
    "result" : [ 
        {
            "_id" : ObjectId("56d7c3d23004aa334327e109"),
            "email" : "bibin@me.co",
            "param" : "QUALITY",
            "rate" : 2.7999999523162842
        }, 
        {
            "_id" : ObjectId("56d7c3d23004aa334327e109"),
            "email" : "bibin@me.co",
            "param" : "SERVICE",
            "rate" : 1.0000000000000000
        }, 
        {
            "_id" : ObjectId("56d7c3d23004aa334327e109"),
            "email" : "bibin@me.co",
            "param" : "QUALITY",
            "rate" : 1.7000000476837158
        }, 
        {
            "_id" : ObjectId("56d7c3d23004aa334327e109"),
            "email" : "bibin@me.co",
            "param" : "SERVICE",
            "rate" : 3.2000000476837158
        }, 
        {
            "_id" : ObjectId("56e6e6ecb5f1fbfb92892d20"),
            "email" : "mimi@me.co",
            "param" : "QUALITY",
            "rate" : 2.7999999999999998
        }, 
        {
            "_id" : ObjectId("56e6e6ecb5f1fbfb92892d20"),
            "email" : "mimi@me.co",
            "param" : "SERVICE",
            "rate" : 1.0000000000000000
        }, 
        {
            "_id" : ObjectId("56e6e6ecb5f1fbfb92892d20"),
            "email" : "mimi@me.co",
            "param" : "QUALITY",
            "rate" : 3.3999999999999999
        }, 
        {
            "_id" : ObjectId("56e6e6ecb5f1fbfb92892d20"),
            "email" : "mimi@me.co",
            "param" : "SERVICE",
            "rate" : 4.2999999999999998
        }
    ],
    "ok" : 1.0000000000000000

}


$group

Jika dianalogikan dengan MySQL $group sama dengan operasi GROUP BY. Dalam kasus kita kali ini, bahwa ingin menghitung rata-rata rating berdasarkan email dan parameter. Berikut merupakan contoh query nya :
db.seller_rates.aggregate([
    {
        $unwind : "$orders"
    },
    {
        $unwind : "$orders.rates"
    },
    {
        $project : {
            email : "$email",
            param : "$orders.rates.param",
            rate : "$orders.rates.rate"
        }
    },
    {
        $group : {
            _id : {
                email : "$email", param : "$param"
            },
            rate : {
                $avg : "$rate"
            }
        }
    }

])
Hasil $group :
/* 1 */
{
    "result" : [ 
        {
            "_id" : {
                "email" : "mimi@me.co",
                "param" : "SERVICE"
            },
            "rate" : 2.6499999999999999
        }, 
        {
            "_id" : {
                "email" : "bibin@me.co",
                "param" : "SERVICE"
            },
            "rate" : 2.1000000238418579
        }, 
        {
            "_id" : {
                "email" : "mimi@me.co",
                "param" : "QUALITY"
            },
            "rate" : 3.0999999999999996
        }, 
        {
            "_id" : {
                "email" : "bibin@me.co",
                "param" : "QUALITY"
            },
            "rate" : 2.2500000000000000
        }
    ],
    "ok" : 1.0000000000000000
}
Pada hasil Query di atas, terlihat bahwa ada attribute “_id” selalu muncul, atibut ini yang digunakan sebagai key dalam operasi GROUP BY dan field “_id” merupakan mandatory ketika melakukan operasi GROUP BY. 

$push

$push merupakan operasi di dalam $group, satu level dengan $avg, $max, $sum, dll. Operasi $push akan mengembalikan array dari semua value dengan key yang sama.
Dari contoh query di bawah ini, dari hasil query sebelum nya, bahwa email dijadikan sebagai key, dan di push sebagai root dan attribut yang lain otomatis sebagai child (list of array).
db.seller_rates.aggregate([
    {
        $unwind : "$orders"
    },
    {
        $unwind : "$orders.rates"
    },
    {
        $project : {
            email : "$email",
            param : "$orders.rates.param",
            rate : "$orders.rates.rate"
        }
    },
    {
        $group : {
            _id : {
                email : "$email", param : "$param"
            },
            rate : {
                $avg : "$rate"
            }
        }
    },
    {
        $project : {
            email : "$_id.email",
            param : "$_id.param",
            rate : "$rate"
        }
    },
    {
        $group : {
            _id : "$email",
            rates : { $push : "$$ROOT" }
        }
    }
])
Hasil akhir :
/* 1 */
{
    "result" : [ 
        {
            "_id" : "bibin@me.co",
            "rates" : [ 
                {
                    "_id" : {
                        "email" : "bibin@me.co",
                        "param" : "SERVICE"
                    },
                    "rate" : 2.1000000238418579,
                    "email" : "bibin@me.co",
                    "param" : "SERVICE"
                }, 
                {
                    "_id" : {
                        "email" : "bibin@me.co",
                        "param" : "QUALITY"
                    },
                    "rate" : 2.2500000000000000,
                    "email" : "bibin@me.co",
                    "param" : "QUALITY"
                }
            ]
        }, 
        {
            "_id" : "mimi@me.co",
            "rates" : [ 
                {
                    "_id" : {
                        "email" : "mimi@me.co",
                        "param" : "SERVICE"
                    },
                    "rate" : 2.6499999999999999,
                    "email" : "mimi@me.co",
                    "param" : "SERVICE"
                }, 
                {
                    "_id" : {
                        "email" : "mimi@me.co",
                        "param" : "QUALITY"
                    },
                    "rate" : 3.0999999999999996,
                    "email" : "mimi@me.co",
                    "param" : "QUALITY"
                }
            ]
        }
    ],
    "ok" : 1.0000000000000000
}
selamat mencoba ^^v

Referensi :
[ 1 ] https://docs.mongodb.org/manual/aggregation/
[ 2 ] https://docs.mongodb.org/manual/reference/operator/aggregation/group/
[ 3 ] https://docs.mongodb.org/manual/reference/operator/aggregation/project/
[ 4 ] https://docs.mongodb.org/manual/reference/operator/aggregation/unwind/
[ 5 ] https://docs.mongodb.org/manual/reference/operator/aggregation/push/#grp._S_push

1 comment: