Mar 14, 2016

Apache Pig Exercises: 24 List Odd or Even Employee records based on empno



In this post the sample Apache Pig script will List Odd or Even Employee records based on empno.

Using Apache Pig version r0.15.0.


@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data:

Employees data table:


@ Apache Pig Script:

a) List ODD Employee records based on empno:

grunt> 
data = LOAD 'Documents/tbl_Emp.txt' using PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* Handling all nulls in the commission column */
all_recs = FOREACH data GENERATE empno,ename,job,mgr,hiredate, sal,(comm is not null ? comm:0), deptno;
rnk_all_recs = FOREACH all_recs GENERATE (CASE empno % 2 WHEN 0 THEN 0 WHEN 1 THEN 1 END ) as odd_even,empno,ename,job,mgr,hiredate, sal, comm, deptno;
all_recs_fltr_odd = FILTER rnk_all_recs by (odd_even == 1);
DUMP all_recs_fltr_odd;

@Apache Pig Output on Grunt Shell:  

(1,7369,SMITH,CLERK,7902,1980-12-17,800.0,0.0,20)
(1,7499,ALLEN,SALESMAN,7698,1981-02-20,1600.0,300.0,30)
(1,7521,WARD,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(1,7839,KING,PRESIDENT,,1981-11-17,5000.0,0.0,10)

----------------------------------------------------------------------------------------------------------------------------------------------------------

bList EVEN Employee records based on empno:

grunt> 
data = LOAD 'Documents/tbl_Emp.txt' using PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* Handling all nulls in the commission column */
all_recs = FOREACH data GENERATE empno,ename,job,mgr,hiredate, sal,(comm is not null ? comm:0), deptno;
rnk_all_recs = FOREACH all_recs GENERATE (CASE empno % 2 WHEN 0 THEN 0 WHEN 1 THEN 1 END ) as odd_even,empno,ename,job,mgr,hiredate, sal, comm, deptno;
all_recs_fltr_even = FILTER rnk_all_recs by (odd_even == 0);
DUMP all_recs_fltr_even;

@Apache Pig Output on Grunt Shell: 

(0,7566,JONES,MANAGER,7839,1981-04-02,2975.0,0.0,20)
(0,7654,MARTIN,SALESMAN,7698,1981-09-28,1250.0,1400.0,30)
(0,7698,BLAKE,MANAGER,7839,1981-05-01,2850.0,0.0,30)
(0,7782,CLARK,MANAGER,7839,1981-06-09,2450.0,0.0,10)
(0,7788,SCOTT,ANALYST,7566,1982-12-09,3000.0,0.0,20)
(0,7844,TURNER,SALESMAN,7698,1981-09-08,1500.0,0.0,30)
(0,7876,ADAMS,CLERK,7788,1983-01-12,1100.0,0.0,20)
(0,7900,JAMES,CLERK,7698,1981-12-03,950.0,0.0,30)
(0,7902,FORD,ANALYST,7566,1981-12-03,3000.0,0.0,20)
(0,7934,MILLER,CLERK,7782,1982-01-23,1300.0,0.0,10)



@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/
_________________
Thank you!

0 comments:

Post a Comment