Hive：复杂查询不运行Mapreducejob而启用Fetchtask

原创

小哥 3年前 (2022-11-02) 阅读数 118 #大杂烩

1.背景：

如果在hive只查询表的一列，Hive默认情况下也会启用MapReduce Job来完成这项任务。我们都知道，使MapReduce Job它将消耗开销。对于这个问题，从Hive0.10.0版本启动，类似于简单的查询语句(无函数、排序、不需要聚合的查询语句)。SELECT from

LIMIT n语句，当开启Fetch Task功能，就执行一个简单的查询语句不会生成MapRreduce作业，而是直接使用Fetch Task，从hdfs文件系统中进行查询输出数据，从而提高效率。

二、配置Fetch Task的方法

1、在hive提示符

hive> set hive.fetch.task.conversion=more;

2、启动hive时，加入参数

bin/hive --hiveconf hive.fetch.task.conversion=more

3、修改 hive-site.xml文件，加入属性，保存退出。
上面的两种方法都可以开启了Fetch任务，但是都是临时起作用的；如果你想一直启用这个功能，可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置：

<property>

<name>hive.fetch.task.conversion</name>

<value>more</value>

<description>

Some select queries can be converted to single FETCH task

minimizing latency.Currently the query should be single

sourced not having any subquery and should not have

any aggregations or distincts (which incurrs RS),

lateral views and joins.

1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only

2. more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)

</description>

</property>

三、举例说明：

1、没有配置Fetch Task，默认启用MapReduce job完成这项任务。

hive> select id from t ;                 
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there is no reduce operator
Starting Job = job_1402248601715_0004, Tracking URL = http://cdh1:8088/proxy/application_1402248601715_0004/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1402248601715_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-06-09 11:12:54,817 Stage-1 map = 0%,  reduce = 0%
2014-06-09 11:13:15,790 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.96 sec
2014-06-09 11:13:16,982 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.96 sec
MapReduce Total cumulative CPU time: 2 seconds 960 msec
Ended Job = job_1402248601715_0004
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 2.96 sec   HDFS Read: 257 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 960 msec
OK
Time taken: 51.496 seconds

查看上面的运行日志，您可以看到查询已启动mapreduce任务，mapper数为1，没有reducer任务。

2、配置fetch task，用到 hive.fetch.task.conversion 参数：


  hive.fetch.task.conversion
  minimal
  
    Some select queries can be converted to single FETCH task 
    minimizing latency.Currently the query should be single 
    sourced not having any subquery and should not have
    any aggregations or distincts (which incurrs RS), 
    lateral views and joins.
    1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
    2. more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)

此参数的缺省值为minimal，也就是奔跑的意思select * ”并带有limit查询时，会对其进行转换。FetchTask；如果参数值为more，则select一些栏目limit条件，则还会将其转换为FetchTask任务。当然，这是有前提条件的：一个单一的数据源，即您从中输入源的表或分区；distinct；不能用于视图和join。

测试，首先设置其参数值more，再次运行：

hive> set hive.fetch.task.conversion=more;
hive> select id from t limit 1;           
OK
Time taken: 0.242 seconds
hive> select id from t ;                  
OK
Time taken: 0.496 seconds

t该表是一个没有数据的空表。

版权声明

所有资源都来源于爬虫采集,如有侵权请联系我们,我们将立即删除

上一篇：Python的听课笔记案例5--判别第几天3.0 下一篇：Python的听课笔记案例4--52周储蓄挑战5.0

解决SolidWorks2019许可证错误-85440

解决SolidWorks 2019许可证错误(-8, 544, 0) 简介本资源文件旨在帮助用户解决SolidWo...

原创 7个月前 (02-11) 25阅读 #大杂烩
解决sklearn.datasets.fetch_20newsgroups下载速度慢的问题

解决sklearn.datasets.fetch_20newsgroups下载速度慢的问题简介在使用Python...

原创 7个月前 (02-11) 25阅读 #大杂烩
解决sklearn.datasets.fetch_20newsgroups下载报错问题分享

解决sklearn.datasets.fetch_20newsgroups下载报错问题在使用Python的机器学习库...

原创 7个月前 (02-11) 25阅读 #大杂烩
解决Scratch3.0scratch-hex文件下载失败问题

解决Scratch 3.0 scratch-hex文件下载失败问题描述本资源文件提供了一个解决方案，用于解决在n...

原创 7个月前 (02-11) 25阅读 #大杂烩
解决RHEL7无法使用YUM源的问题

解决RHEL7无法使用YUM源的问题当您在红帽企业版Linux 7（RHEL7）上遇到无法使用YUM源的问题时，通常...

原创 7个月前 (02-11) 25阅读 #大杂烩
解决Qt应用中遇到的libpngwarningiCCPknownincorrectsRGBprofile问题

解决Qt应用中遇到的libpng warning: iCCP: known incorrect sRGB profile...

原创 7个月前 (02-11) 25阅读 #大杂烩
解决QT启动时找不到python36.dll问题

解决QT启动时“找不到python36.dll”问题介绍本仓库提供了一个资源文件，旨在解决在使用QT启动时遇到的...

原创 7个月前 (02-11) 25阅读 #大杂烩
解决Qt5在麒麟Linux下不能输入中文问题

解决Qt5在麒麟Linux下不能输入中文问题资源描述本资源文件旨在解决在麒麟Linux操作系统下，Qt5应用程序...

原创 7个月前 (02-11) 25阅读 #大杂烩
解决QT5.12.6使用32位MinGW编译器无法使用SSL协议问题

解决QT5.12.6使用32位MinGW编译器无法使用SSL协议问题简介在开发基于QT 5.12.6的应用程序时...

原创 7个月前 (02-11) 25阅读 #大杂烩
解决PyTorchCUDA编译问题指南

解决PyTorch CUDA编译问题指南资源文件介绍文件标题解决AssertionError: Torch...

原创 7个月前 (02-11) 25阅读 #大杂烩