Can't kill YARN apps using ResourceManager UI after HDP 3.1.0.0-78 upgrade

Get the two first files from HDFS

Unable to connect to hive using python using impyla/dbapi.py

Unable to read HBase table data in HBase standalone mode

Anyone know how to fix hadoop-functions.sh "syntax error near unexpected token `<'"?

How to create a log table in Hive to record job success/failure?

Convert zip file to gzip and write to hdfs

pentaho component Sqoop Import error Retrying connect to server: quickstart.cloudera/:8032. Already tried

Hadoop mapReduce wrong result

Unable to map the data properly from a CSV file to a Hive table on HDFS

zip function with 3 parameter

How to fill missing matrix values with Hadoop MapReduce

The ranger interface is configured with HDFS service, which does not take effect

load data from HDFS to Druid in real time

Is it possible to configure a gateway node from private network to public cluster?

sqoop free form query error when importing sql server --> hbase

Which one is faster in Hive? "in" or "or"?

Proper way of reading in files from a directory using Python 2.6 in bash shell

Convert sql schema to avro

special character "#" in column name in Hive select query

Apache Hive is very flakey on Ubuntu VM

Hadoop - Struggling with First Time Setup

Unable to import Tensorflow on Spark

MinMax algorithm implementation in map-reduce paradigm

how to find total number of bugs in any apache project

Flink Temp Jar Upload Directory Deleted

How to write TIMESTAMP logical type (INT96) to parquet, using ParquetWriter?

How to fetch next n rows in hive on hue cloudera

Create table in Hue after many with statements

PIG: Multiple records to be arranged in particular set of columns

Typo in word "hdfs" gives me: "java.io.IOException: No FileSystem for scheme: hdfs". Using FileSystem lib over hadoop 2.7.7

Cannot read (read_csv) from HDFS using Dask (FileNotFoundError: [Errno 2])

AWS EMR Spark usercache filecache errors

How to fix "Cannot use null as map key!" error in Spark.SQL with Python 3 using Group_Map

Hive - Rolling up the amount balance from leaf node to top parent

Exit status: -100. Diagnostics: Container released on a *lost* node

Unable to query/select data those inserted through Spark SQL

What's the benefit to compress ORC or parquet

Configuring Nutch to write to Apache Kudu

I have 3 slave nodes plus hadoop master but only 2 nodes appear

Hadoop Library is imported but cannot set the "get" method in FyleSystem

Hive External Table Schema Reconnection

Hadoop-3.1.2: Datanode and Nodemanager shuts down

How to automate multiple hive table creation using shell script

Apache Nutch 2.3.1, increase reducer memory

I want to skip/drop the first n rows of a text file with PySpark

I have done hive work through oozie but have no results

Not able to create tables in hbase

Hadoop Sqoop Export to MS-SQL database

How/Where can I write time series data? As Parquet format to Hadoop, or HBase, Cassandra?

Stop Word Elimination in Mapreduce Java

Having multiple reduce tasks assemble a single HDFS as output

What is this data analytic using spark?

What is the compatible datatype for bigint in Spark and how can we cast bigint into a spark compatible datatype?

Data type conversion issue

Measure Total Runtime of Hadoop Mapreduce Job

how to run a single query each day by scheduling jobs

Where can we see spark output console when we run in yarn cluster

How to migrate On Prem Hadoop to GCP

All the slaves in the Hadoop cluster should be of the same configuration

How do I make MapR-FS' disk balancer work?

How to use the dfs-datastores libraries by Nathan Marz in a lambda architecture

Hive: Find top 20 percent records

Impact of reducing HDFS replication factor to 2 (or just one) on HBase map/reduce performance

Installing Python modules on multiple servers (cluster)

YARN is allocating only 1 executors even though dynamic memory allocation is disabled

Hive - Flatten Hierarchy Table into Levels

Could not load the URI for stack HDP-2.1.GlusterFS from hortonworks.com

How do I run a JAR file on an EC2 instance?

HMaster process not running on hadoop multi-node cluster after HBase installation

Why Hive is so late to adapt compaction strategy?

Why do we use the Hive service principal when using beeline to connect to Hive on a Kerberos enabled EMR cluster?

Write data incrementally to a parquet file

I want to add an extra column in my existing hive table so that I can have a current time stamp for that day

submit local spark job to emr

Scala-script to remove all files in a Hadoop folder

Extracting schema from Union Avro

Lily Indexer stops all indexers after HBase restart

HBase shell slow put in a few rows table (standalone mode)

Presto "Failed to list directory" when connecting to hive

How to Identify total number of jobs required to execute hive query

reduce the execution time of large query

Gradle unable to load maven meta-data (hadoop-common, hadoop-core)

Soundex function returning different values in Spark SQL and Hive

why mapred java processes not exiting after successful task completion (hadoop)

Druid parquet poor ingestion performance

[Cloudbreak][EC2] unable to launch cluster on AWS when LDAP configuration is specified

Get rid of inner join,but without losing structure

How do I enable enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy?

clickstream analyis in spark

How to read table from Hbase using scala spark

Can we list out tables in hive pointing to a particular location in hdfs?

Mac compiled Hadoop source to support local libraries such as Snappy

how to pass hive query output in email body in oozie jobs hue

Multiple Hive Applciations for Hue

How to tune mapred.tasktracker.reduce.tasks.maximum

How do I get the actual data from Hadoop cluster (after map reducing) using the python API Pydoop?

Custom Dynamic Partitions in MapReduce

How do I fix "File could only be replicated to 0 nodes instead of minReplication (=1)."?

How to compile my java program (WordCount) for Hadoop