Remote Database not found while Connecting to remote Hive from Spark using JDBC in Python?

I am using pyspark script to read data from remote Hive through JDBC Driver. I have tried other method using enableHiveSupport, Hive-site.xml. but that technique is not possible for me due to some limitations(Access was blocked to launch yarn jobs from outside the cluster). Below is the only way I can connect to Hive.

from pyspark.sql import SparkSession

spark=SparkSession.builder \
        .appName("hive") \
        .config("spark.sql.hive.metastorePartitionPruning", "true") \
        .config("hadoop.security.authentication" , "kerberos") \
        .getOrCreate()

jdbcdf=spark.read.format("jdbc").option("url","urlname")\
        .option("driver","com.cloudera.hive.jdbc41.HS2Driver").option("user","username").option("dbtable","dbname.tablename").load()
spark.sql("show tables from dbname").show()

Giving me below error:

py4j.protocol.Py4JJavaError: An error occurred while calling o31.sql.
: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'vqaa' not found;

Could someone please help how I can access remote db/tables using this method? Thanks

1 answer

  • answered 2019-12-14 19:12 Girish501

    add .enableHiveSupport() to your sparksession in order to access hive catalog