Mar 14, 2016

Apache Pig Exercises: 23 How to handle Nulls in data (workaround)



In this post the sample Apache Pig script will handle nulls (in commission column) and display all Employee records with 0.0 for commission when it is null

Using Apache Pig version r0.15.0.


@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data:

Employees data table:


@ Apache Pig Script:

a) List 0.0 value for commission when it has nulls: *** Note in above Emp table, the commission column has all nulls when there is no value available ***

grunt> 
data = LOAD 'Documents/store_transactions.txt' using PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* Handling all nulls in the commission column */
all_recs = FOREACH data GENERATE empno,ename,job,mgr,hiredate, sal,(comm is not null ? comm:0 ), deptno;
dump all_recs;

@Apache Pig Output on Grunt Shell: The  output has managed nulls from commission column replaced with 0.0 value

(7369,SMITH,CLERK,7902,1980-12-17,800.0,0.0,20)
(7499,ALLEN,SALESMAN,7698,1981-02-20,1600.0,300.0,30)
(7521,WARD,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7566,JONES,MANAGER,7839,1981-04-02,2975.0,0.0,20)
(7654,MARTIN,SALESMAN,7698,1981-09-28,1250.0,1400.0,30)
(7698,BLAKE,MANAGER,7839,1981-05-01,2850.0,0.0,30)
(7782,CLARK,MANAGER,7839,1981-06-09,2450.0,0.0,10)
(7788,SCOTT,ANALYST,7566,1982-12-09,3000.0,0.0,20)
(7839,KING,PRESIDENT,,1981-11-17,5000.0,0.0,10)
(7844,TURNER,SALESMAN,7698,1981-09-08,1500.0,0.0,30)
(7876,ADAMS,CLERK,7788,1983-01-12,1100.0,0.0,20)
(7900,JAMES,CLERK,7698,1981-12-03,950.0,0.0,30)
(7902,FORD,ANALYST,7566,1981-12-03,3000.0,0.0,20)
(7934,MILLER,CLERK,7782,1982-01-23,1300.0,0.0,10)



@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/
_________________
Thank you!

0 comments:

Post a Comment