Question
Gender) in HDFS, write a MapReduce Given a 10TB table Student (name, year, program to compute the following query. gpa, SELECTyear, AVG (gpa) FROM WHERE GROUP BY year Student gender'Male' Ans...
Answers
Solution :
map (k, student) {
// here we can add the where clause condition
if (student.gender == 'male') {
// here we can get the data in map
stud1 = <student.year, student.gpa>
// here we can collect all the data
collect (student.year , stud1)
}
}
// Here each yr that reduce gets corresponds to a unique value in student.year field.
// Group by is implicitly done using the shuffling phase between map and reduce functions.reduce(yr, studentList<stud1>) {
avg := 0
// for loop iterate upto the length of studentList
for each student in studentList {
// sum the gpa of all student
avg += student.gpa
}
// calculate the gpa avg of all students
avg := avg / size(studentList)
// assign map to stud2 variable
stud2 = <student.year, avg>
// store the data
store (yr, stud2)
}
1 map (k, student) // here we can add the where clause condition if (student.gender'male') // here we can get the data in map stud1<student.year, student.gpa> // here we can collect all the data 4 collect (student.year , stud1) 10 / Here each yr that reduce gets corresponds to a unique value in student.year field. 11 // Group by is implicitly done using the shuffling phase between map and reduce functions. 12 13 reduce(yr, studentList<stud1>) 14 15 16 17 18 19 20 21 22II assign map to stud2 variable 23 24 25 avg := 0 /I for loop iterate upto the length of studentList for each student in studentList // sum the gpa of all student avg student.gpa /I calculate the gpa avg of all students avg avg / size (studentList) stud2 <student.year, avg //store the data store (yr, stud2,)