Jan 17, 2016

Apache Pig Exercises: 2. Only unique job titles from EMP data


In this post the sample Apache Pig script will display all unique Job positions from Employee file data set.

Using Apache Pig current version r0.15.0.

Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data: APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES


Employees data table:



@ Apache Pig Script:

2. Fetch all records having unique jobs from Employee dataset: 

grunt> 
data = LOAD '/Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_jobs = foreach data generate job;
/*get only unique data from extracted column*/
uniq_jobs = distinct all_jobs;
dump uniq_jobs;

@Apache Pig Output on Grunt Shell: 


(CLERK)
(ANALYST)
(MANAGER)
(SALESMAN)

(PRESIDENT)

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/
Thank you!

0 comments:

Post a Comment