In this post the sample Apache Pig script will display employees who are either ‘CLERK’ or ‘ANALYST’ in the desc order.
The examples and exercise scripts are created using Apache Pig current version r0.14.0.
@ Test data structure:
Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more.
@ Sample data:
Employees data table:
Department data table:
@ Apache Pig Script:
List all employee records in asc order of dept nbr. and desc of job title:
grunt>
/* load data file in pig reference variable */
data = LOAD '/ALLApplnDvlpmt/All_DataSets/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* separated only the jobs column from source file dataand store in pig reference variable */
all_recs = foreach data generate empno,ename,job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by (job=='CLERK') or (job=='ANALYST');
rec_ordr = order rec_fltr by empno desc;
dump rec_ordr;
@ Store output in a file:
Please issue following Apache Pig command on grunt shell, to store the output in a file on the disk we use store command with appropriate folder location.
grunt> store rec_ordr into '/pig_outout/final_output.txt'
You can review the grunt shell console if the command was successful or not. Also, if you decide to reuse the Pig Output file again and again, please make sure to delete the folder when the actual output file is stored to be able to reuse the same, else Apache Pig displays failed message on the grunt shell/console.
@ Review results from the stored file:
Please note, the above command creates a sub-folder named final_output.txt in the actual folder pig_output and the actual output is stored in a file called part-r-00000 and not in a file final_output.txt
@ Apache Pig Reference/s:
- https://pig.apache.org
- http://pig.apache.org/docs/r0.14.0/
Thank you!
0 comments:
Post a Comment