在parquet文件上创建hive表

在parquet文件上创建hive表失败,ERROR: Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.IntWritable。

需要将parquet文件的schema打印出来,在创建hive表时的数据类型要与之一一对应。

val sqlc = SparkSession.builder().appName("test").getOrCreate()		
val S3Path = "s3://bucket_name/*/*/*/"
val parquetDF = sqlc.read.parquet(s3Path)
println(parquetDF.schema)

StructType(StructField(ColumnA,StringType,true), StructField(ColumnB,LongType,true), StructField(ColumnC,DateType,true), StructField(ColumnD,DoubleType,true))

CREATE EXTERNAL TABLE IF NOT EXISTS table_name(
  ColumnA string, 
  ColumnB bigint, 
  ColumnC date, 
  ColumnD double
)
partitioned by (ColumnA string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\t' 
STORED AS PARQUET
LOCATION
  's3://bucket_name/*/*/*/';
  
msck repair table table_name;
【版权说明:仅允许非商业转载且请注明出处:Mac私塾 网址:http://macsishu.com】

发表评论

邮箱地址不会被公开。

Captcha Code