Jan 26, 2016

Apache Pig Exercises: 21. List Employees those are having five characters in their names



In this post the sample Apache Pig script will List employees wthose are having five characters in their names.

The examples and exercise scripts are created using Apache Pig current version r0.14.0.

@ Test file data structure:

Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more.


@ Sample data:

Employees data table:


Department data table:



@ Apache Pig Script:

a) List employees wthose are having five characters in their names.

grunt>
data = LOAD '/Documents/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
all_recs = foreach data generate empno,ename, job,mgr,hiredate, sal, comm,deptno;
rec_fltr = filter all_recs by (SIZE(ename)==5) ;
rec_ordr = order rec_fltr by sal;
dump rec_ordr;

@Apache Pig Output on Grunt Shell:

(7369,SMITH,CLERK,7902,1980-12-17,800.0,,20)
(7900,JAMES,CLERK,7698,1981-12-03,950.0,,30)
(7876,ADAMS,CLERK,7788,1983-01-12,1100.0,,20)
(7499,ALLEN,SALESMAN,7698,1981-02-20,1600.0,300.0,30)
(7782,CLARK,MANAGER,7839,1981-06-09,2450.0,,10)
(7698,BLAKE,MANAGER,7839,1981-05-01,2850.0,,30)
(7566,JONES,MANAGER,7839,1981-04-02,2975.0,,20)
(7788,SCOTT,ANALYST,7566,1982-12-09,3000.0,,20)

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/

_________________
Thank you!

Jan 19, 2016

List of Apache Pig v0.15 Built-in functions

Apache Pig Exercises: 20. List Employees whose annul salary is ranging from 22000 and 45000


In this post the sample Apache Pig script will display employees whose annul salary ranging from 22000 and 45000.

The examples and exercise scripts are created using Apache Pig current version r0.14.0.

@ Test data structure:
Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more. 

@ Sample data:

Employees data table:


Department data table:


@ Apache Pig Script:

List all employee records in asc order of dept nbr. and desc of job title:

grunt> 
/* load data file in pig reference variable */
data = LOAD '/ALLApplnDvlpmt/All_DataSets/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* separated only the jobs column from source file dataand store in pig reference variable */
all_recs = foreach data generate empno,ename,job,mgr,hiredate, sal, (int)(sal*12) as annsal, comm,deptno;

rec_fltr = filter all_recs by (annsal>22000) and (annsal<45000) ;

rec_ordr = order rec_fltr by annsal;

dump rec_ordr;

@Apache Pig Output on Grunt Shell: 




@ Store output in a file:
Please issue following Apache Pig command on grunt shell, to store the output in a file on the disk we use store command with appropriate folder location. 

grunt> store rec_ordr into '/pig_outout/final_output.txt'

You can review the grunt shell console if the command was successful or not. Also, if you decide to reuse the Pig Output file again and again, please make sure to delete the folder when the actual output file is stored to be able to reuse the same, else Apache Pig displays failed message on the grunt shell/console.

@ Review results from the stored file:
Please note, the above command creates a sub-folder named final_output.txt in the actual folder pig_output and the actual output is stored in a file called part-r-00000 and not in a file final_output.txt

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/
_________________
Thank you!

Apache Pig Exercises: 19. List Employees who are joined in the month of Dec 1981


In this post the sample Apache Pig script will display employees  who are joined in the month of Dec 1981.

The examples and exercise scripts are created using Apache Pig current version r0.14.0.

@ Test data structure:
Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more. 

@ Sample data:

Employees data table:


Department data table:


@ Apache Pig Script:

List all employee records in asc order of dept nbr. and desc of job title:

grunt> 
/* load data file in pig reference variable */
data = LOAD '/ALLApplnDvlpmt/All_DataSets/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* separated only the jobs column from source file dataand store in pig reference variable */
all_recs = foreach data generate empno,ename,job,mgr,hiredate, SUBSTRING(hiredate,0,7) as hryear, sal,comm,deptno;

rec_fltr = filter all_recs by (hryear=='1981-12');

dump rec_fltr;


@Apache Pig Output on Grunt Shell: 



@ Store output in a file:
Please issue following Apache Pig command on grunt shell, to store the output in a file on the disk we use store command with appropriate folder location. 

grunt> store rec_fltr into '/pig_outout/final_output.txt'

You can review the grunt shell console if the command was successful or not. Also, if you decide to reuse the Pig Output file again and again, please make sure to delete the folder when the actual output file is stored to be able to reuse the same, else Apache Pig displays failed message on the grunt shell/console.

@ Review results from the stored file:
Please note, the above command creates a sub-folder named final_output.txt in the actual folder pig_output and the actual output is stored in a file called part-r-00000 and not in a file final_output.txt

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/
_________________
Thank you!

Apache Pig Exercises: 18. List Employees who are joined in the year 1981


In this post the sample Apache Pig script will display employees who are joined in the year 1981.

The examples and exercise scripts are created using Apache Pig current version r0.14.0.

@ Test data structure:
Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more. 

@ Sample data:

Employees data table:


Department data table:


@ Apache Pig Script:

List all employee records in asc order of dept nbr. and desc of job title:

grunt> 
/* load data file in pig reference variable */
data = LOAD '/ALLApplnDvlpmt/All_DataSets/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* separated only the jobs column from source file dataand store in pig reference variable */
all_recs = foreach data generate empno,ename,job,mgr,hiredate, SUBSTRING(hiredate,0,4) as hryear, sal,comm,deptno;

rec_fltr = filter all_recs by (hryear=='1981');

dump rec_fltr;


@Apache Pig Output on Grunt Shell: 




@ Store output in a file:
Please issue following Apache Pig command on grunt shell, to store the output in a file on the disk we use store command with appropriate folder location. 

grunt> store rec_fltr into '/pig_outout/final_output.txt'

You can review the grunt shell console if the command was successful or not. Also, if you decide to reuse the Pig Output file again and again, please make sure to delete the folder when the actual output file is stored to be able to reuse the same, else Apache Pig displays failed message on the grunt shell/console.

@ Review results from the stored file:
Please note, the above command creates a sub-folder named final_output.txt in the actual folder pig_output and the actual output is stored in a file called part-r-00000 and not in a file final_output.txt

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/
_________________
Thank you!

Apache Pig Exercises: 17. List Employees who are working for the deptno 10 or 20


In this post the sample Apache Pig script will display employees who are working for the deptno 10 or 20.

The examples and exercise scripts are created using Apache Pig current version r0.14.0.

@ Test data structure:
Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more. 

@ Sample data:

Employees data table:


Department data table:


@ Apache Pig Script:

List all employee records in asc order of dept nbr. and desc of job title:

grunt> 
/* load data file in pig reference variable */
data = LOAD '/ALLApplnDvlpmt/All_DataSets/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* separated only the jobs column from source file dataand store in pig reference variable */
all_recs = foreach data generate empno,ename,job,mgr,hiredate, sal,comm,deptno;

rec_fltr = filter all_recs by (deptno==10) or (deptno==20);

rec_ordr = order rec_fltr by deptno;

dump rec_ordr;


@Apache Pig Output on Grunt Shell: 




@ Store output in a file:
Please issue following Apache Pig command on grunt shell, to store the output in a file on the disk we use store command with appropriate folder location. 

grunt> store rec_ordr into '/pig_outout/final_output.txt'

You can review the grunt shell console if the command was successful or not. Also, if you decide to reuse the Pig Output file again and again, please make sure to delete the folder when the actual output file is stored to be able to reuse the same, else Apache Pig displays failed message on the grunt shell/console.

@ Review results from the stored file:
Please note, the above command creates a sub-folder named final_output.txt in the actual folder pig_output and the actual output is stored in a file called part-r-00000 and not in a file final_output.txt

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/
_________________
Thank you!

Apache Pig Exercises: 16. List Employees who joined on 1-May-81, 31-Dec-81, 17-Dec-81, 19-Jan-80 in asc order of seniority


In this post the sample Apache Pig script will display employees who joined on 1-May-81, 31-Dec-81, 17-Dec-81, 19-Jan-80 in asc order of seniority.

The examples and exercise scripts are created using Apache Pig current version r0.14.0.

@ Test data structure:
Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more. 

@ Sample data:

Employees data table:


Department data table:


@ Apache Pig Script:

List all employee records in asc order of dept nbr. and desc of job title:

grunt> 
/* load data file in pig reference variable */
data = LOAD '/ALLApplnDvlpmt/All_DataSets/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* separated only the jobs column from source file dataand store in pig reference variable */
all_recs = foreach data generate empno,ename,job,mgr,hiredate, (int)GetYear(CurrentTime())-(int)SUBSTRING(hiredate,0,4) as expn ,sal,(int)(sal/30) as dailysal, comm,deptno;

rec_fltr = filter all_recs by (hiredate=='1981-05-01') or (hiredate=='1981-12-31') or (hiredate=='1981-12-17') or (hiredate=='1980-01-19');

rec_ordr = order rec_fltr by expn desc;

dump rec_ordr;

@Apache Pig Output on Grunt Shell: 




@ Store output in a file:
Please issue following Apache Pig command on grunt shell, to store the output in a file on the disk we use store command with appropriate folder location. 

grunt> store rec_ordr into '/pig_outout/final_output.txt'

You can review the grunt shell console if the command was successful or not. Also, if you decide to reuse the Pig Output file again and again, please make sure to delete the folder when the actual output file is stored to be able to reuse the same, else Apache Pig displays failed message on the grunt shell/console.

@ Review results from the stored file:
Please note, the above command creates a sub-folder named final_output.txt in the actual folder pig_output and the actual output is stored in a file called part-r-00000 and not in a file final_output.txt

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/
_________________
Thank you!

Apache Pig Exercises: 15. List Employees who are either ‘CLERK’ or ‘ANALYST’ in descending order


In this post the sample Apache Pig script will display employees who are either ‘CLERK’ or ‘ANALYST’ in the desc order.

The examples and exercise scripts are created using Apache Pig current version r0.14.0.

@ Test data structure:
Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more. 

@ Sample data:

Employees data table:


Department data table:


@ Apache Pig Script:

List all employee records in asc order of dept nbr. and desc of job title:

grunt> 
/* load data file in pig reference variable */
data = LOAD '/ALLApplnDvlpmt/All_DataSets/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* separated only the jobs column from source file dataand store in pig reference variable */
all_recs = foreach data generate empno,ename,job,mgr,hiredate, sal, comm,deptno;

rec_fltr = filter all_recs by (job=='CLERK') or (job=='ANALYST');

rec_ordr = order rec_fltr by empno desc;

dump rec_ordr;

@Apache Pig Output on Grunt Shell: 

@ Store output in a file:
Please issue following Apache Pig command on grunt shell, to store the output in a file on the disk we use store command with appropriate folder location. 

grunt> store rec_ordr into '/pig_outout/final_output.txt'

You can review the grunt shell console if the command was successful or not. Also, if you decide to reuse the Pig Output file again and again, please make sure to delete the folder when the actual output file is stored to be able to reuse the same, else Apache Pig displays failed message on the grunt shell/console.

@ Review results from the stored file:
Please note, the above command creates a sub-folder named final_output.txt in the actual folder pig_output and the actual output is stored in a file called part-r-00000 and not in a file final_output.txt

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/
_________________
Thank you!

Apache Pig Exercises: 14. List Employees along with their experience & daily salary is more than $100


In this post the sample Apache Pig script will display the employees along with their experience and daily salary is more than $100.

The examples and exercise scripts are created using Apache Pig current version r0.14.0.

@ Test data structure:
Please refer to Apache Pig learning series intro... post for the file structures, visit the reference section shown at the bottom of the post for more. 

@ Sample data:

Employees data table:


Department data table:


@ Apache Pig Script:

List all employee records in asc order of dept nbr. and desc of job title:

grunt> 
/* load data file in pig reference variable */
data = LOAD '/ALLApplnDvlpmt/All_DataSets/tbl_EMP.txt' USING PigStorage(',') as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:float, comm:float, deptno:int);
/* separated only the jobs column from source file dataand store in pig reference variable */
all_recs = foreach data generate empno,ename,job,mgr,hiredate, (int)GetYear(CurrentTime())-(int)SUBSTRING(hiredate,0,4) as expn ,sal,(int)(sal/30) as dailysal, comm,deptno;

recs_fltr = filter all_recs by (dailysal>100);

dump recs_fltr;

@Apache Pig Output on Grunt Shell: 




@ Store output in a file:
Please issue following Apache Pig command on grunt shell, to store the output in a file on the disk we use store command with appropriate folder location. 

grunt> store recs_fltr into '/pig_outout/final_output.txt'

You can review the grunt shell console if the command was successful or not. Also, if you decide to reuse the Pig Output file again and again, please make sure to delete the folder when the actual output file is stored to be able to reuse the same, else Apache Pig displays failed message on the grunt shell/console.

@ Review results from the stored file:
Please note, the above command creates a sub-folder named final_output.txt in the actual folder pig_output and the actual output is stored in a file called part-r-00000 and not in a file final_output.txt

@ Apache Pig Reference/s:
  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/
_________________
Thank you!