在parquet文件上创建hive表失败,ERROR: Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.IntWritable。
需要将parquet文件的schema打印出来,在创建hive表时的数据类型要与之一一对应。
val sqlc = SparkSession.builder().appName("test").getOrCreate() val S3Path = "s3://bucket_name/*/*/*/" val parquetDF = sqlc.read.parquet(s3Path) println(parquetDF.schema)
StructType(StructField(ColumnA,StringType,true), StructField(ColumnB,LongType,true), StructField(ColumnC,DateType,true), StructField(ColumnD,DoubleType,true))
CREATE EXTERNAL TABLE IF NOT EXISTS table_name( ColumnA string, ColumnB bigint, ColumnC date, ColumnD double ) partitioned by (ColumnA string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS PARQUET LOCATION 's3://bucket_name/*/*/*/'; msck repair table table_name;