Kerberos Integration with MapR for user authentication with the AD (Active Directory)

How to connect to hdfs using pyarrow in python

Hadoop cluster runs inconsistenly

Unable to cast field creation to bigint

Kerberos error while connection to cloudera impala environment throws port 22: connection timeout error

How to fix this fatal error while running spark jobs on HDIinsight cluster? Session 681 unexpectedly reached final status 'dead'. See logs:

As the file is having blank lines and header, Code is failing with Nullexecptionerror

Do avro and parquet formatted data have to be written within a hadoop infrastructure?

Cannot create directory on hadoop through hadoop web console

Hadoop MapReduce Java

Issue while passing a parameter to HQL

Kerberos java to impala keytab authentication with JAAS Configuration

In my hadoop project, I have set number of reduce task as 0 by "job.setNumReduceTasks(0)", there is still a reduce task in job tracker page

Writing Map-Reduce output with custom file name prefix to Amazon S3

How data read happens in HBase?

Ambari Agent Registration failed due to unsupported OS type

Why is my throughput and average io rate got slower when i add node to my Hadoop cluster?

How do you get the driver and executors to load and recognize the postgres driver in EMR with spark-submit?

Required executor memory is above the max threshold of this cluster

Unable to import data into Hive from SQL Server

pandas cumcount in pyspark

How to fix "Version information is not found in metastore" in Sqoop

How to split a dataframe based on column value with identifier in same order

Can a hadoop slave node be made hadoop master node without incurring data loss

Does Hive preserve file order when selecting data

Copying text file from download

authentication error when trying to access WebHDFS

Why map task writes its output to disk in MapReduce?

How to determine the number of requests/connections going to Hive Metastore Database from HMS?

Can not create a Path from a null string with copyFromLocal command

Spark Connect Hive to HDFS vs Spark connect HDFS directly and Hive on the top of it?

What is the advantage of using External tables in Hive?

How we can limit the usages of VCores during Spark-submit

how to encrypte ak and sk in core-site.xml when link to s3a using livy Rest API

How to check if HDFS directory is empty in Spark

I am trying to print just the size and basename

Hadoop services not starting, attempting to connect to

How to aggregate and show top n item with a mapreduce job

Why is it that SUM(a + b) != SUM(a) + SUM(b) in Hive?

What does "moveToLocal: Option '-moveToLocal' is not implemented yet." means?

How to access hdfs from a container on kubernetes

How to identify disk space consumed for a particular directory pattern using hdfs command without listing all files under that directory?

How to identify disk usage of a particular directory pattern using hdfs command without listing all files?

How to restart spark job when it fails with non-zero exit status

MapReduce with 2 values

Which dependency I should add to get txt file in s3 with scala-spark using intelliJ?

Use hyphen in impala database name

Is there away to share/access the hdfs among developers?

where can i find directory i have created using hadoop fs -mkdir in my ubuntu file system

Login to hadoop from java program

Regular expression - only include 0 if in 2nd position of x.x.x

Migrating existing metadata from metastore(derby) and data from Hive 1.2 to Hive 2.4.3

Hive remote postgres metastore

pyspark parquet read Error on reading parquet files stored in hdfs: Block Missing Exception

How to use filter conditions on SHOW PARTITIONS clause on hive?

Why is my Hadoop MapReduce doesn't run faster even when i add nodes on the cluster?

Is it possible to pass a parameter to an oozie workflow to control it?

is it safe to remove the /tmp/hive/hive folder?

How to check version of Spark and Hadoop in AWS glue?

python script to run 5 hadoop program using yarn command and if any service goes down then put system in safe mode

Unable to connect to s3 buckets from pyspark

Hadoop3: worker node error connecting to ResourceManager

How to read files from HDFS using Spark?

self-serve data capability stack?

Hadoop: Installation Problems and environment setup

I want to use data only for spark then which file format is best for hive?

Does standalone metastore 3.0 need Hadoop?

column deletion in HIVE without code change?

Calculating Rolling Weekly Spend in Hive using Window Functions

Hadoop datanode is down after power outage

Integration of spark and kafka, exception in Spark-submit a jar

load parquet file and keep same number hdfs partitions

Access HDFS or WebHDFS through Knox Using Java

What does 'pool_name' mean in CREATE TABLE-statement?

Why there is a reduce phase during I/O operations in Hadoop Mapreduce?

why hdfs dfs commands are stuck?

How to connect php and hadoop together and call data in hadoop

How to wrangle unstructured log data streamed from twitter through Flume?

could the security risk of several users logging with the same key in Kerberos managed?

spark Join optimization on huge dataframes

Hadoop : Yarn and local memory usage

Is there a way to provide multiple paths with working with MultipleInputs

Error while streaming data from Twitter using Apache FLume

Install Druid on AWS EMR

hive configuration hive.stats.fetch.partition.stats does not exists

ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoClassDefFoundError: scala/Function0$class

How to delete fields from a partitioned table in Hive stored as parquet?

how to fix "hadoop is not recognized as an internal or external command, operable program or batch file"

How to replace groupBy with more efficient method

FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

How to get text bytes used by a string in Hive?

Hadoop distcp error - java.lang.IllegalArgumentException: 'key@1' not found

Is there a way to view list of tables and columns in hue

Hive SQL Distinct Column Syntax Error when calling multiple columns

Spark, use local hard disk instead of hadoop

How to change hadoop temporary working directory /tmp to other folder

Add some lines at the top of hive table

JPS results and hdfs admin report is different

How to check cumulative size of an hdfs directory as part of oozie action?