Jan 17, 2016

Apache Pig Exercises: 1. List All employee records


In this post the sample Apache Pig script will display all the records from Employee table/file data. The examples and exercise scripts are created using Apache Pig current version r0.14.0.

Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more. 

@ Sample data:

APACHE PIG ~ ALL SAMPLE TABLES and STRUCTURES

Employees data table:


Department data table:




@ Apache Pig Script:

1. Fetch all Employee records

data = LOAD '/Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/*display final output on console */
dump data;

@Apache Pig Output on Grunt Shell: 


(7369,SMITH,CLERK,7902,1980-12-17,800.0,,20)
(7499,ALLEN,SALESMAN,7698,1981-02-20,1600.0,300.0,30)
(7521,WARD,SALESMAN,7698,1981-02-22,1250.0,500.0,30)
(7566,JONES,MANAGER,7839,1981-04-02,2975.0,,20)
(7654,MARTIN,SALESMAN,7698,1981-09-28,1250.0,1400.0,30)
(7698,BLAKE,MANAGER,7839,1981-05-01,2850.0,,30)
(7782,CLARK,MANAGER,7839,1981-06-09,2450.0,,10)
(7788,SCOTT,ANALYST,7566,1982-12-09,3000.0,,20)
(7839,KING,PRESIDENT,,1981-11-17,5000.0,,10)
(7844,TURNER,SALESMAN,7698,1981-09-08,1500.0,0.0,30)
(7876,ADAMS,CLERK,7788,1983-01-12,1100.0,,20)
(7900,JAMES,CLERK,7698,1981-12-03,950.0,,30)
(7902,FORD,ANALYST,7566,1981-12-03,3000.0,,20)

(7934,MILLER,CLERK,7782,1982-01-23,1300.0,,10)

@ Store output in a file:
Please issue following Apache Pig command on grunt shell, to store the output in a file on the disk we use store command with appropriate folder location. 

grunt> store data into '/pig_outout/final_output.txt'

You can review the grunt shell console if the command was successful or not. Also, if you decide to reuse the Pig Output file again and again, please make sure to delete the folder when the actual output file is stored to be able to reuse the same, else Apache Pig displays failed message on the grunt shell/console.

@ Review results from the stored file:
Please note, the above command creates a sub-folder named final_output.txt in the actual folder pig_output and the actual output is stored in a file called part-r-00000 and not in a file final_output.txt

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/
Thank you!

2 comments:

  1. Hi-

    Please share the data set/sample data file for all exercises. All exercises are based on data set which is missing. Request you to please share the same.

    ReplyDelete
  2. Hi-

    Please share the data set/sample data file for all exercises. All exercises are based on data set which is missing. Request you to please share the same.

    ReplyDelete