Jan 19, 2016

Apache Pig Exercises: 14. List Employees along with their experience & daily salary is more than $100


In this post the sample Apache Pig script will display the employees along with their experience and daily salary is more than $100.

The examples and exercise scripts are created using Apache Pig current version r0.14.0.

@ Test data structure:
Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more. 

@ Sample data:

Employees data table:


Department data table:


@ Apache Pig Script:

List all employee records in asc order of dept nbr. and desc of job title:

grunt> 
/* load data file in pig reference variable */
data = LOAD '/ALLApplnDvlpmt/All_DataSets/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* separated only the jobs column from source file dataand store in pig reference variable */
all_recs = foreach data generate empno,ename,job,mgr,hiredate, (int)GetYear(CurrentTime())-(int)SUBSTRING(hiredate,0,4) as expn ,sal,(int)(sal/30) as dailysal, comm,deptno;

recs_fltr = filter all_recs by (dailysal>100);

dump recs_fltr;

@Apache Pig Output on Grunt Shell: 




@ Store output in a file:
Please issue following Apache Pig command on grunt shell, to store the output in a file on the disk we use store command with appropriate folder location. 

grunt> store recs_fltr into '/pig_outout/final_output.txt'

You can review the grunt shell console if the command was successful or not. Also, if you decide to reuse the Pig Output file again and again, please make sure to delete the folder when the actual output file is stored to be able to reuse the same, else Apache Pig displays failed message on the grunt shell/console.

@ Review results from the stored file:
Please note, the above command creates a sub-folder named final_output.txt in the actual folder pig_output and the actual output is stored in a file called part-r-00000 and not in a file final_output.txt

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/
_________________
Thank you!

0 comments:

Post a Comment