Apr 4, 2016

Apache Pig Exercises: 34. List all employees who joined in Before or After 1981



In this post the sample Apache Pig script will List employees who joined in Before or After 1981

Using Apache Pig version r0.15.0.


@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data:

Employees data table:


@ Apache Pig Script:

a) List employees who joined in Before or After 1981:

grunt> 
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);

all_recs = foreach data generate empno,ename,job,mgr,hiredate,(int)GetYear(ToDate(hiredate, 'yyyy-M-dd')) as yrs, sal, comm,deptno;
rec_fltr = filter all_recs by (yrs < 1981 OR yrs > 1981) ;
rec_ordr = order rec_fltr by yrs;
dump rec_ordr;


@Apache Pig Output on Grunt Shell:  

(7369,SMITH,CLERK,7902,1980-12-17,1980,800.0,,20)
(7934,MILLER,CLERK,7782,1982-01-23,1982,1300.0,,10)
(7788,SCOTT,ANALYST,7566,1982-12-09,1982,3000.0,,20)
(7876,ADAMS,CLERK,7788,1983-01-12,1983,1100.0,,20)

----------------------------------------------------------------------------------------------------------------------------------------------------------



@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/

Apache Pig Exercises: 33 List all employees except ‘President’ & ‘Manager’ in asc order of salaries



In this post the sample Apache Pig script will List all employees except ‘President’ & ‘Manager’ in asc order of salaries

Using Apache Pig version r0.15.0.


@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data:

Employees data table:


@ Apache Pig Script:

a) List all employees except ‘President’ & ‘Manager’ in asc order of salaries:

WIP

grunt> 
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);

all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by ( job != 'MANAGER' AND job != 'PRESIDENT') ;
rec_ordr = order rec_fltr by sal;
dump rec_ordr;


@Apache Pig Output on Grunt Shell:  

(7369,SMITH,5,CLERK,7902,1980-12-17,800.0,,20)
(7900,JAMES,5,CLERK,7698,1981-12-03,950.0,,30)
(7876,ADAMS,5,CLERK,7788,1983-01-12,1100.0,,20)
(7654,MARTIN,6,SALESMAN,7698,1981-09-28,1250.0,1400.0,30)
(7521,WARD,4,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7934,MILLER,6,CLERK,7782,1982-01-23,1300.0,,10)
(7844,TURNER,6,SALESMAN,7698,1981-09-08,1500.0,0.0,30)
(7499,ALLEN,5,SALESMAN,7698,1981-02-20,1600.0,300.0,30)
(7902,FORD,4,ANALYST,7566,1981-12-03,3000.0,,20)
(7788,SCOTT,5,ANALYST,7566,1982-12-09,3000.0,,20)

----------------------------------------------------------------------------------------------------------------------------------------------------------



@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/

Apache Pig Exercises: 32 List employees who does not belong to deptno 20



In this post the sample Apache Pig script will List employees who does not belong to deptno 20 

Using Apache Pig version r0.15.0.


@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data:

Employees data table:


@ Apache Pig Script:

a) List employees who does not belong to deptno 20:

WIP

grunt> 
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);

all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by ( deptno != 20) ;
rec_ordr = order rec_fltr by sal;
dump rec_ordr;


@Apache Pig Output on Grunt Shell:  

(7934,MILLER,CLERK,7782,1982-01-23,1300.0,,10)
(7839,KING,PRESIDENT,,1981-11-17,5000.0,,10)
(7782,CLARK,MANAGER,7839,1981-06-09,2450.0,,10)
(7900,JAMES,CLERK,7698,1981-12-03,950.0,,30)
(7844,TURNER,SALESMAN,7698,1981-09-08,1500.0,0.0,30)
(7698,BLAKE,MANAGER,7839,1981-05-01,2850.0,,30)
(7654,MARTIN,SALESMAN,7698,1981-09-28,1250.0,1400.0,30)
(7521,WARD,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7499,ALLEN,SALESMAN,7698,1981-02-20,1600.0,300.0,30)

----------------------------------------------------------------------------------------------------------------------------------------------------------



@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/

Apache Pig Exercises: 31 List employees whose names have a ‘LL’ characters together



In this post the sample Apache Pig script will List employees whose names have a ‘LL’ characters together

Using Apache Pig version r0.15.0.


@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data:

Employees data table:


@ Apache Pig Script:

a) List employees whose names have a ‘LL’ characters together:

WIP

grunt> 
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by  (SIZE(ename) == 4) and (INDEXOF(ename,'R', 0) == 2);
rec_ordr = order rec_fltr by sal;
dump rec_fltr;



@Apache Pig Output on Grunt Shell:  

(7521,WARD,2,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7902,FORD,2,ANALYST,7566,1981-12-03,3000.0,,20)

----------------------------------------------------------------------------------------------------------------------------------------------------------

OR

b)  List Employee names with 3rd character as ‘R’ and four characters length:

grunt> 
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by (SIZE(ename) == 4) and (SUBSTRING(ename,2, 3) == 'R');
rec_ordr = order rec_fltr by sal;
dump rec_fltr;

@Apache Pig Output on Grunt Shell: 


(7521,WARD,2,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7902,FORD,2,ANALYST,7566,1981-12-03,3000.0,,20)



@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/

Apache Pig Exercises: 30 List employees whose salary is 4 digit number ending with zero



In this post the sample Apache Pig script will List employees whose salary is 4 digit number ending with zero

Using Apache Pig version r0.15.0.


@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data:

Employees data table:


@ Apache Pig Script:

a) List employees who were joined in January:

WIP

grunt> 
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by  (SIZE(ename) == 4) and (INDEXOF(ename,'R', 0) == 2);
rec_ordr = order rec_fltr by sal;
dump rec_fltr;



@Apache Pig Output on Grunt Shell:  

(7521,WARD,2,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7902,FORD,2,ANALYST,7566,1981-12-03,3000.0,,20)

----------------------------------------------------------------------------------------------------------------------------------------------------------

OR

b)  List Employee names with 3rd character as ‘R’ and four characters length:

grunt> 
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by (SIZE(ename) == 4) and (SUBSTRING(ename,2, 3) == 'R');
rec_ordr = order rec_fltr by sal;
dump rec_fltr;

@Apache Pig Output on Grunt Shell: 


(7521,WARD,2,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7902,FORD,2,ANALYST,7566,1981-12-03,3000.0,,20)



@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/

Apache Pig Exercises: 29 List employees who joined in a month of which second character is ‘a’



In this post the sample Apache Pig script will List employees who joined in a month of which second character is ‘a’

Using Apache Pig version r0.15.0.


@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more. 


@ Sample data:

Employees data table:


@ Apache Pig Script:

a) List employees who were joined in January:

WIP

grunt> 
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by  (SIZE(ename) == 4) and (INDEXOF(ename,'R', 0) == 2);
rec_ordr = order rec_fltr by sal;
dump rec_fltr;



@Apache Pig Output on Grunt Shell:  

(7521,WARD,2,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7902,FORD,2,ANALYST,7566,1981-12-03,3000.0,,20)

----------------------------------------------------------------------------------------------------------------------------------------------------------

OR

b)  List Employee names with 3rd character as ‘R’ and four characters length:

grunt> 
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by (SIZE(ename) == 4) and (SUBSTRING(ename,2, 3) == 'R');
rec_ordr = order rec_fltr by sal;
dump rec_fltr;

@Apache Pig Output on Grunt Shell: 


(7521,WARD,2,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7902,FORD,2,ANALYST,7566,1981-12-03,3000.0,,20)



@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.15.0/