教学文库网 - 权威文档分享云平台
您的当前位置:首页 > 精品文档 > 实用模板 >

华盛顿大学公开课 Introduction to Data Science 037_pig_evalua

来源:网络收集 时间:2025-05-01
导读: ExampleA= LOAD‘traffic.dat’ AS (ip, time, url); B= GROUP A BY ip; C= FOREACH B GENERATE group AS ip, COUNT(A); D= FILTER C BY ip IS‘192.168.0.1’ OR ip IS‘192.168.0.0’; STORE D INTO‘local_traffic.dat’; LOAD 5/18/2013 Howe, UW Adapte

ExampleA= LOAD‘traffic.dat’ AS (ip, time, url); B= GROUP A BY ip; C= FOREACH B GENERATE group AS ip, COUNT(A); D= FILTER C BY ip IS‘192.168.0.1’ OR ip IS‘192.168.0.0’; STORE D INTO‘local_traffic.dat’; LOAD

5/18/2013

Howe, UW Adapted from slides by Oliver Bill Kennedy, U of Buffalo

ExampleA= LOAD‘traffic.dat’ AS (ip, time, url); B= GROUP A BY ip; C= FOREACH B GENERATE group AS ip, COUNT(A); D= FILTER C BY ip IS‘192.168.0.1’; OR ip IS‘192.168.0.0’; STORE D INTO‘local_traffic.dat’; LOAD

GROUP

5/18/2013

Howe, UW Adapted from slides by Oliver Bill Kennedy, U of Buffalo

ExampleLOAD A= LOAD‘traffic.dat’ AS (ip, time, url); B= GROUP A BY ip; C= FOREACH B GENERATE group AS ip, COUNT(A); D= FILTER C BY ip IS‘192.168.0.1’; OR ip IS‘192.168.0.0’; STORE D INTO‘local_traffic.dat’; GROUP

FOREACH

5/18/2013

Howe, UW Adapted from slides by Oliver Bill Kennedy, U of Buffalo

ExampleLOAD A= LOAD‘traffic.dat’ AS (ip, time, url); B= GROUP A BY ip; C= FOREACH B GENERATE group AS ip, COUNT(A); D= FILTER C BY ip IS‘192.168.0.1’; OR ip IS‘192.168.0.0’; STORE D INTO‘local_traffic.dat’; GROUP

FOREACH FILTER

5/18/2013

Howe, UW Adapted from slides by Oliver Bill Kennedy, U of Buffalo

ExampleLOAD A= LOAD‘traffic.dat’ AS (ip, time, url); B= GROUP A BY ip; C= FOREACH B GENERATE group AS ip, COUNT(A); D= FILTER C BY ip IS‘192.168.0.1’; OR ip IS‘192.168.0.0’; STORE D INTO‘local_traffic.dat’; GROUP

FOREACH FILTER STORE

Algebraic Optimization!5/18/2013 Howe, UW Adapted from slides by Oliver Bill Kennedy, U of Buffalo 5

ExampleLOAD A= LOAD‘traffic.dat’ AS (ip, time, url); B= GROUP A BY ip; C= FOREACH B GENERATE group AS ip, COUNT(A); D= FILTER C BY ip IS‘192.168.0.1’; OR ip IS‘192.168.0.0’; STORE D INTO‘local_traffic.dat’; FILTER

GROUP FOREACH STORE

Lazy Evaluation: No work is done until STORE

5/18/2013

Howe, UW Adapted from slides by Oliver Bill Kennedy, U of Buffalo

ExampleLOAD Create a MR job for each COGROUP

FILTER Map Reduce

GROUP FOREACH STORE

5/18/2013

Howe, UW Adapted from slides by Oliver Bill Kennedy, U of Buffalo

ExampleLOAD 1) Create a MR job for each COGROUP

FILTER Map Reduce

GROUP FOREACH STORE

2) Add other commands where possible

Certain commands require their own MR job (e.g., ORDER)

5/18/2013

Howe, UW Adapted from slides by Oliver Bill Kennedy, U of Buffalo

Review NoSQL–“NoSchema”,“NoTransactions”,“NoLanguage”– A“reboot” of data systems focusing on just high-throughput reads and writes– But: A clear trend towards re-introducing schemas, languages, transactions at full scale Google’s Spanner system, for example

Pig– An RA-like language layer on Hadoop– But not a pure relational data model–“Schema-on-Read” rather than“Schema-on-write”

5/18/2013

Bill Howe, UW

…… 此处隐藏:689字,全部文档内容请下载后查看。喜欢就下载吧 ……
华盛顿大学公开课 Introduction to Data Science 037_pig_evalua.doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
本文链接:https://www.jiaowen.net/wendang/2325497.html(转载请注明文章来源)
Copyright © 2020-2025 教文网 版权所有
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ:78024566 邮箱:78024566@qq.com
苏ICP备19068818号-2
Top
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)