In this post the sample Apache Pig script will List Odd or Even Employee records based on empno.
Using Apache Pig version r0.15.0.
@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more.
@ Sample data:
Employees data table:
@ Apache Pig Script:
a) List ODD Employee records based on empno:
grunt>
data = LOAD 'Documents/tbl_Emp.txt' using PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* Handling all nulls in the commission column */
all_recs = FOREACH data GENERATE empno,ename,job,mgr,hiredate, sal,(comm is not null ? comm:0), deptno;
rnk_all_recs = FOREACH all_recs GENERATE (CASE empno % 2 WHEN 0 THEN 0 WHEN 1 THEN 1 END ) as odd_even,empno,ename,job,mgr,hiredate, sal, comm, deptno;
all_recs_fltr_odd = FILTER rnk_all_recs by (odd_even == 1);
DUMP all_recs_fltr_odd;
@Apache Pig Output on Grunt Shell:
(1,7369,SMITH,CLERK,7902,1980-12-17,800.0,0.0,20)
(1,7499,ALLEN,SALESMAN,7698,1981-02-20,1600.0,300.0,30)
(1,7521,WARD,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(1,7839,KING,PRESIDENT,,1981-11-17,5000.0,0.0,10)
----------------------------------------------------------------------------------------------------------------------------------------------------------
data = LOAD 'Documents/tbl_Emp.txt' using PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* Handling all nulls in the commission column */
all_recs = FOREACH data GENERATE empno,ename,job,mgr,hiredate, sal,(comm is not null ? comm:0), deptno;
rnk_all_recs = FOREACH all_recs GENERATE (CASE empno % 2 WHEN 0 THEN 0 WHEN 1 THEN 1 END ) as odd_even,empno,ename,job,mgr,hiredate, sal, comm, deptno;
all_recs_fltr_even = FILTER rnk_all_recs by (odd_even == 0);
DUMP all_recs_fltr_even;
@Apache Pig Output on Grunt Shell:
(0,7566,JONES,MANAGER,7839,1981-04-02,2975.0,0.0,20)
(0,7654,MARTIN,SALESMAN,7698,1981-09-28,1250.0,1400.0,30)
(0,7698,BLAKE,MANAGER,7839,1981-05-01,2850.0,0.0,30)
(0,7782,CLARK,MANAGER,7839,1981-06-09,2450.0,0.0,10)
(0,7788,SCOTT,ANALYST,7566,1982-12-09,3000.0,0.0,20)
(0,7844,TURNER,SALESMAN,7698,1981-09-08,1500.0,0.0,30)
(0,7876,ADAMS,CLERK,7788,1983-01-12,1100.0,0.0,20)
(0,7900,JAMES,CLERK,7698,1981-12-03,950.0,0.0,30)
(0,7902,FORD,ANALYST,7566,1981-12-03,3000.0,0.0,20)
(0,7934,MILLER,CLERK,7782,1982-01-23,1300.0,0.0,10)
@ Apache Pig Reference/s:
- https://pig.apache.org
- http://pig.apache.org/docs/r0.15.0/
Thank you!