Mar 14, 2016

Apache Pig Exercises: 24 List Odd or Even Employee records based on empno



In this post the sample Apache Pig script will List Odd or Even Employee records based on empno.

Using Apache Pig version r0.15.0.


@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data:

Employees data table:


@ Apache Pig Script:

a) List ODD Employee records based on empno:

grunt> 
data = LOAD 'Documents/tbl_Emp.txt' using PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* Handling all nulls in the commission column */
all_recs = FOREACH data GENERATE empno,ename,job,mgr,hiredate, sal,(comm is not null ? comm:0), deptno;
rnk_all_recs = FOREACH all_recs GENERATE (CASE empno % 2 WHEN 0 THEN 0 WHEN 1 THEN 1 END ) as odd_even,empno,ename,job,mgr,hiredate, sal, comm, deptno;
all_recs_fltr_odd = FILTER rnk_all_recs by (odd_even == 1);
DUMP all_recs_fltr_odd;

@Apache Pig Output on Grunt Shell:  

(1,7369,SMITH,CLERK,7902,1980-12-17,800.0,0.0,20)
(1,7499,ALLEN,SALESMAN,7698,1981-02-20,1600.0,300.0,30)
(1,7521,WARD,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(1,7839,KING,PRESIDENT,,1981-11-17,5000.0,0.0,10)

----------------------------------------------------------------------------------------------------------------------------------------------------------

bList EVEN Employee records based on empno:

grunt> 
data = LOAD 'Documents/tbl_Emp.txt' using PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* Handling all nulls in the commission column */
all_recs = FOREACH data GENERATE empno,ename,job,mgr,hiredate, sal,(comm is not null ? comm:0), deptno;
rnk_all_recs = FOREACH all_recs GENERATE (CASE empno % 2 WHEN 0 THEN 0 WHEN 1 THEN 1 END ) as odd_even,empno,ename,job,mgr,hiredate, sal, comm, deptno;
all_recs_fltr_even = FILTER rnk_all_recs by (odd_even == 0);
DUMP all_recs_fltr_even;

@Apache Pig Output on Grunt Shell: 

(0,7566,JONES,MANAGER,7839,1981-04-02,2975.0,0.0,20)
(0,7654,MARTIN,SALESMAN,7698,1981-09-28,1250.0,1400.0,30)
(0,7698,BLAKE,MANAGER,7839,1981-05-01,2850.0,0.0,30)
(0,7782,CLARK,MANAGER,7839,1981-06-09,2450.0,0.0,10)
(0,7788,SCOTT,ANALYST,7566,1982-12-09,3000.0,0.0,20)
(0,7844,TURNER,SALESMAN,7698,1981-09-08,1500.0,0.0,30)
(0,7876,ADAMS,CLERK,7788,1983-01-12,1100.0,0.0,20)
(0,7900,JAMES,CLERK,7698,1981-12-03,950.0,0.0,30)
(0,7902,FORD,ANALYST,7566,1981-12-03,3000.0,0.0,20)
(0,7934,MILLER,CLERK,7782,1982-01-23,1300.0,0.0,10)



@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/
_________________
Thank you!

Apache Pig Exercises: 23 How to handle Nulls in data (workaround)



In this post the sample Apache Pig script will handle nulls (in commission column) and display all Employee records with 0.0 for commission when it is null

Using Apache Pig version r0.15.0.


@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data:

Employees data table:


@ Apache Pig Script:

a) List 0.0 value for commission when it has nulls: *** Note in above Emp table, the commission column has all nulls when there is no value available ***

grunt> 
data = LOAD 'Documents/store_transactions.txt' using PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* Handling all nulls in the commission column */
all_recs = FOREACH data GENERATE empno,ename,job,mgr,hiredate, sal,(comm is not null ? comm:0 ), deptno;
dump all_recs;

@Apache Pig Output on Grunt Shell: The  output has managed nulls from commission column replaced with 0.0 value

(7369,SMITH,CLERK,7902,1980-12-17,800.0,0.0,20)
(7499,ALLEN,SALESMAN,7698,1981-02-20,1600.0,300.0,30)
(7521,WARD,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7566,JONES,MANAGER,7839,1981-04-02,2975.0,0.0,20)
(7654,MARTIN,SALESMAN,7698,1981-09-28,1250.0,1400.0,30)
(7698,BLAKE,MANAGER,7839,1981-05-01,2850.0,0.0,30)
(7782,CLARK,MANAGER,7839,1981-06-09,2450.0,0.0,10)
(7788,SCOTT,ANALYST,7566,1982-12-09,3000.0,0.0,20)
(7839,KING,PRESIDENT,,1981-11-17,5000.0,0.0,10)
(7844,TURNER,SALESMAN,7698,1981-09-08,1500.0,0.0,30)
(7876,ADAMS,CLERK,7788,1983-01-12,1100.0,0.0,20)
(7900,JAMES,CLERK,7698,1981-12-03,950.0,0.0,30)
(7902,FORD,ANALYST,7566,1981-12-03,3000.0,0.0,20)
(7934,MILLER,CLERK,7782,1982-01-23,1300.0,0.0,10)



@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/
_________________
Thank you!