What is a Dereference?
Many time it is necessary to reference a field in a tuple or a bag that are outside the current operator scope. Here is the complete pig script for your review to be able to discuss dereferencing:
aaa = group data by f1;
bbb = FOREACH aaa GENERATE group, data.f2, data.f3;
dump bbb;
The dereferencing can be done in the following manners.
a) Dereferencing fields created in tuple or bag:
Dereferencing fields this way can be observed with the Pig's FOREACH operator:
bbb = FOREACH aaa GENERATE group, data.f2, data.f3;
In the above line of the code if you have noticed, the fields f2 and f3 are not the part of the
relation aaa (pls. refer to complete pig example script shown above)
Thus, in order to reference them they have to be defined to qualified in a tuple or a bag.
The fields f2 and f3 are defined in the relation data, we can use them to create subsequent
relations.
b) Dereferencing fields by their positions:
bbb = FOREACH aaa GENERATE group, data.f2, data.f3;
In the above line of the code if you have noticed, the fields f2 and f3 are not the part of the
relation aaa (pls. refer to complete pig example script shown above)
Thus, in order to reference them they have to be defined to qualified in a tuple or a bag.
The fields f2 and f3 are defined in the relation data, we can use them to create subsequent
relations.
We can use same example to dereference the fields by their positions in the relation they were
created.This example dereferences the same fields as described in the top. Pls. refer to complete
pig example script shown above.
bbb = FOREACH aaa GENERATE group, data.$1, data.$2;
Thanks!
bbb = FOREACH aaa GENERATE group, data.$1, data.$2;
Thanks!
0 comments:
Post a Comment