Apr 4, 2016

Apache Pig Exercises: 28 List all employees who were joined in January

In this post the sample Apache Pig script will List employees who were joined in January

Using Apache Pig version r0.15.0.

@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 

@ Sample data:

Employees data table:

@ Apache Pig Script:

a) List employees who were joined in January:


data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by  (SIZE(ename) == 4) and (INDEXOF(ename,'R', 0) == 2);
rec_ordr = order rec_fltr by sal;
dump rec_fltr;

@Apache Pig Output on Grunt Shell:  




b)  List Employee names with 3rd character as ‘R’ and four characters length:

data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by (SIZE(ename) == 4) and (SUBSTRING(ename,2, 3) == 'R');
rec_ordr = order rec_fltr by sal;
dump rec_fltr;

@Apache Pig Output on Grunt Shell: 


@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/

1 comment:

  1. A = load '/user/hdfs/employee.txt' using PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int,hiredate:chararray,sal:float,comm:float,deptno:int);
    B = foreach A generate *, ToDate(hiredate,'yyyy-MM-dd') as date;
    C = filter B by GetMonth(date)==1;
    D = foreach C generate ..deptno;
    dump D;
