In this post the sample Apache Pig script will List Employee names with 3rd character as ‘R’ and four characters length with Apache Pig.
Using Apache Pig version r0.15.0.
@ Test data structure:
Please refer to APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES post for the file structures, visit the reference section shown at the bottom of the post for more.
@ Sample data:
Employees data table:
@ Apache Pig Script:
a) List Employee names with 3rd character as ‘R’ and four characters length:
grunt>
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by (SIZE(ename) == 4) and (INDEXOF(ename,'R', 0) == 2);
rec_ordr = order rec_fltr by sal;
dump rec_fltr;
@Apache Pig Output on Grunt Shell:
(7521,WARD,2,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7902,FORD,2,ANALYST,7566,1981-12-03,3000.0,,20)
----------------------------------------------------------------------------------------------------------------------------------------------------------
OR
data = LOAD 'Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by (SIZE(ename) == 4) and (SUBSTRING(ename,2, 3) == 'R');
rec_ordr = order rec_fltr by sal;
dump rec_fltr;
@Apache Pig Output on Grunt Shell:
(7521,WARD,2,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7902,FORD,2,ANALYST,7566,1981-12-03,3000.0,,20)
@ Apache Pig Reference/s:
- https://pig.apache.org
- http://pig.apache.org/docs/r0.15.0/
Thank you!
0 comments:
Post a Comment