Click here to Skip to main content
15,891,905 members
Everything / Hadoop

Hadoop

hadoop

Great Reads

by Bert O Neill
Query Hadoop using Microsoft oriented technologies (C#, SSIS, SQL Server, Excel etc.)
by Fazlur Rahman
Step by step procedure to install Hadoop 2.7.3 version on Ubuntu 16.04 operating system
by Suffyan Asad
How to implement Joins in Hadoop Map-Reduce applications during Reduce and Map phases
by Vladimir Dorokhov
Design and development simple analytics system using Lambda Architecture principles and Microsoft Azure cloud

Latest Articles

by Fazlur Rahman
Step by step procedure to install Hadoop 2.7.3 version on Ubuntu 16.04 operating system
by YegorDovganich
Following 'Infrastructure as Code' rules we get a real project sample from the scratch which describes EMR cluster deploying and running Hive script there. It describes Analyze Big Data with Hadoop project from AWS 'Learn to Build' section.
by Mahsa Hassankashi
It is almost everything about big data.
by Michael_Churchman
Alibaba Cloud offers a range of Big Data solutions. This article outlines them and explains which types of Big Data services on the Alibaba Cloud align with various workloads.

All Articles

Sort by Updated

Hadoop 

11 Jul 2021 by Abhijit Dare
p="foo foo quux labs foo barquux".split() d={} s=[] count=1 for x in p: if x not in s: d.update({x:count}) s.append(x) else: d[x]+=1 print(d) What I have tried: Hello In this program, I intend to count the occurrence of each...
10 May 2015 by Afzaal Ahmad Zeeshan
Hello Rajasekhar, I would give you an overview of two paths that you are looking at. First one is the new one that you want to move yourself into. Second path is the one you are already on. So, coming to the first one. If you seriously want to switch your career field, from one to...
3 Oct 2015 by Afzaal Ahmad Zeeshan
Setting up JAVA_HOME variable is a first-step for any application or program that requires JDK to work with. There are many tutorials already provided, but I will try to provide the ones that suffice your needs and are standard based.Installing the JDK Software and Setting JAVA_HOME[^] (From...
20 Jul 2017 by Ailsa Harvey
Hi BuddyHow do we choose the right Hadoop distribution from the numerous options that would serve our purpose? Not all of the Hadoop distributions have the common components (but, they all consists of Hadoop’s core capabilities.ThanksWhat I have tried:I have tried to choose the...
3 Aug 2021 by Amel Hadfi
I've been able to use Sqoop & Flume import commands perfectly fine on Ubuntu terminal. But right now, I'm trying to do so on Jupyter notebook. 1) How can I import from MySQL to HDFS using Sqoop command on Jupyter notebook? 2) what is Flume...
26 Mar 2016 by Amit Kumar Tiwari
.NET to Hadoop connection using Keytab file
25 Jul 2018 by anjitaa
Loaded library lib-native-libhadoop.so.1.0.0 might have disabled stack guard. How to resolve it? What I have tried: I have tried loaded library lib-native-libhadoop I think 1.0.0 might have disabled stack guard
27 Jul 2018 by anjitaa
"The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting. There are 2 approaches to fulfill the same. 1. If reducers will get all the value for a particular key and buffer...
28 Jul 2018 by anjitaa
What do you understand by Word Count implementation via Hadoop framework? Explain in detail What I have tried: I am not able to implement the Word Count implementation via the Hadoop framework?
31 Jul 2018 by anjitaa
" No, it is not feasible given the distributed architecture of HDFS. If ‘n’ no of clients process read/write requests simultaneously, then it will increase overhead on Namenode.To avoid these bottlenecks, a distributed system of a computing architecture in master-slave fashion is proposed. "
16 Aug 2018 by anjitaa
"To enable the trash feature and to set the time delay for the trash removal in Hadoop, we have to edit the fs.trash.interval property in core-site.xml to the delay (and this has to be in minutes). Ex: if you want users to have 10 hours (600 minutes) to restore a deleted file, you should specify...
20 Aug 2018 by anjitaa
"To configure Hadoop to reuse JVM for mappers, we just need to add entry in the configuration file: $HADOOP_HOME/conf/mapred-site.xml mapred.job.reuse.jvm.num.tasks -1 We need to specify a number value how many times the JVM is to be reused...
22 Sep 2015 by anto_bernad
Its urgent guys............... i need to know how to configure eclipse for hadoop in linux .. can anyone suggest me a link to download a eclipse plugin
27 Jul 2018 by Bansal himani
How to sort intermediate output based on values in MapReduce ? What I have tried: How to sort intermediate output based on values in MapReduce?
28 Jul 2018 by Bansal himani
"Word Count Implementation will be as follows: For ex: Input File 1 contains data: “This is December Month.” Input File 2 contains data: “December is the last month of the year.” Step 1: Mapper will generate the following below output: Input File 1 output ...
16 Aug 2018 by Bansal himani
How can I enable Trash/Recycle Bin in Hadoop? What I have tried: I was not being able to enable Trash/Recycle Bin in Hadoop
28 May 2018 by Bata Omou
i try to execute and compile this code java mapreduce on my eclipse in local, but this probleme is showed up please help where is the issue? and this is the error showed up: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where...
28 May 2018 by Bata Omou
yeah thank you realy it was the probleme that i didn't made a outputPath but it showed me another error alawys about native librery haddoop and another one: 2018-05-28 16:27:24,687 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop...
12 Dec 2020 by BedantBiswal
Below is my query which takes around 5k mappers and 1k reducers and time taken is around 2.2 hours to finish. Any scope of optimization in here? What I have tried: SELECT sum(B.item_net_amount) net_amount, sum(B.item_gross_amount) gross_amount,...
29 Dec 2015 by Bert O Neill
Query Hadoop using Microsoft oriented technologies (C#, SSIS, SQL Server, Excel etc.)
6 Mar 2015 by BillWoodruff
http://azure.microsoft.com/en-...
4 Mar 2016 by Chendur Srinivasan
I'm self learning Hadoop and started of with installing Cloudera QuickStart on a VMware Workstation running CENT OS.I was under the impression that Quickstart VM has most the of configurations predefined. Do I need to set up any other configurations to set up data and name node? Reason being...
15 Jul 2021 by Dasisqo
format your code here Pythoniter - Pretty Python Online Formatter[python code formatter]
20 Jul 2017 by Eshika Roy
It’s all depends on your work and working environment There are 3 most usable distributions. Cloudera - you can choose when you need support from cloudera. They will charge for service-- partially open source Hortonworks - fully open source and user friendly (processing speed slow if you...
20 Jul 2017 by Eshika Roy
First you need to follow some steps for enable WASB on Hadoop • We need to create an account on windows azure. • Than take service • Than we need to implement Hadoop. Follow this to better understanding:...
21 Jan 2017 by Fazlur Rahman
What is Big Data and how Hadoop been introduced to overcome the problems associated with Big Data?
13 Feb 2017 by Fazlur Rahman
Step by step procedure to install NetBeans on Ubuntu 16.04 operating system with Hadoop 2.7.3 version. This may work for any other versions of Hadoop and Ubuntu.
22 May 2022 by Fazlur Rahman
Step by step procedure to install Hadoop 2.7.3 version on Ubuntu 16.04 operating system
17 Apr 2015 by Flowra white
I want to open hadoop source code as a project in Eclipse for the purpose of developing and studying.
5 May 2016 by George Jonsson
Here you can find information about different distributions: Welcome to Apache™ Hadoop®![^]Here you have a discussion forum for Hadoop: Discuss Hadoop[^]I guess your specific choice depends on your requirements.
12 Apr 2017 by Intel
BigDL is a distributed deep learning library for Apache Spark. With BigDL, users can write their deep learning applications as standard Spark programs, which can run directly on top of existing Spark or Hadoop clusters.
5 May 2019 by Jackie Lloyd
Could somebody please help me with this query :). We use Impala to query data, with Sentry to restrict access to data at column level. We use Spark to write code to query data stored in files. My understanding is that Sentry roles cannot control access at column level when used with Spark....
29 May 2017 by Jayaprakash Manchi
For example, Let me explain it in detail. https://i.stack.imgur.com/DIlIT.png Like this data will be there in excel sheet as shown above with n number of rows typically huge data. Now we need to filter the column status with output as in different excel sheets or in same workbook as given...
28 May 2018 by Jochen Arndt
Quote: the line error 63 is about the output format: FileOutputFormat.setOutputPath(conf, new Path(args[1])); and the error message is java.lang.ArrayIndexOutOfBoundsException So there is no second command line argument present when executing the application. You have to execute the...
23 Sep 2015 by Justin Zh.
Hi, all!Here is some information:Windows 10 with VMware 12Ubuntu 14.04.3 LTS with VMware tools.JDK1.8.0_60HADOOP-2.7.1It works perfectly when I try to process the job on HDFS of the Pseudo-Distributed Hadoop (without Yarn, and the job is done in several seconds). Once I have set...
31 Dec 2014 by kadriu
If you have JDK 8.x installed, uninstall and install JDK 7.x. This worked for me.
25 Jul 2018 by kasliwal aayush
"This error could be due to wrong JDK package. Hadoop runs on 64 bit ..so try to uninstall 32bit JDK and install 64 bit JDK8 Please add following variables to .bashrc environment file, export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_ROOT/lib/native export...
5 May 2019 by Kornfeld Eliyahu Peter
Impala and Spark are two separate SQL engines for use with Hadoop... One can not use features from the other!!! So, no if you use Impala there is no Spark, if you use Spark there is no Impala...
16 Jun 2018 by LearningSpark
Hi All, I am New to Big Data World.need urs help to make it real.here is myquestion I am Reading data from txt file(1,2,3,4,4,4,4) var file=sc.textFile("file:///home/cloudera/MyData/Lab1/numbers.txt") var number=file.flatMap(line=>line.split(",")) var...
15 Sep 2017 by Leviya bl
hi here is the step by step process [^]WASB is automatically enabled in HDInsight clusters. But you can also mount a blob storage account manually to a [^]Hadoop Administration instance that lives anywhere as long as it has Internet access to the blob storage. Here are the steps: I assume...
12 Sep 2017 by Mahsa Hassankashi
This article is the most complete essay about big data from scratch to practical.
3 Apr 2019 by Mahsa Hassankashi
27 Dec 2015 by Mallanagouda Patil
This article helps to setup debug environment for hadoop framework on Linux Ubuntu using IntelliJ IDEA
6 May 2016 by Mankuji87
Hi Ailsa i refer some helpful link.I hope it will help youSpoilt for Choice – How to choose the right Big Data / Hadoop Platform?[^]How to Choose a Hadoop Distribution - For Dummies[^]How to Choose the Right Hadoop Distribution?[^]Top 3 Hadoop distributions, which is right for...
30 Dec 2014 by Mansoor Alikhan K
Microsoft HDInsight Emulator for Windows Azure installation via WPI 5.0 returns installation not successfully: fatal errorError Logs are here=== Verbose logging started: 06/Dec/14 19:16:34 Build type: SHIP UNICODE 5.00.7601.00 Calling process: C:\Program Files\Microsoft\Web Platform...
3 Oct 2015 by Mehdi Gholam
Start here : http://harishshan.blogspot.co.uk/2014/10/install-hadoop-251-on-windows-7-64bit.html[^]
6 Mar 2015 by Mehdi_S
Hi,I have been trying to install HDInsight on a windows platform but without success. I'm wondering if there is a clear procedure to install it, which version of windows it is compatible with and if there is a direct link to download it (without using the web platform installer.Thank you...
21 Sep 2014 by Member 11097824
While installing hadoop in windows 8.1 pro and was ready to run mapreduce I got this error message.Unable to make directory and further more errors are mentioned below.-mkdir: java.net.URISyntaxException: Illegal character in hostname at index...
3 Jul 2015 by Member 11402033
I am trying to manage database of the android app. will it be good to use hadoop with mysql database for the android app
16 Feb 2015 by Member 11456117
We are getting some warnings in our mapreduce job while reading and writing data from datanode, it is not aborting the job though. This error comes up at several places in the job. Looks like an issue with timeout variables in hdfs-site.xml and hbase-site.xml files.What timeout values should...
24 Jan 2023 by Member 11622664
hi i am unable to set the java home for hadoop during the installation of hadoop
15 May 2015 by Member 11694565
I am trying to set hadoop in single-cluster node. And I need to create tables in hive and hbase inorder to handle the tables using c#.I have cygwin,hadoop-1.2.1 and hive-1.1.0 on windows 7 32bit.Running hadoop, it gives "Warning: $HADOOP_HOME is deprecated." still it works!!But when...
3 Feb 2016 by Member 11726267
HI I am trying to install hadoop on windowsI am looking for the correct path for downloading the google-gson-2.2.4-release.zip file.I downloaded the file from couple of sites but not able to see the jar's files in the zip folder. I have only html,java,class files when extracted the...
25 Apr 2016 by Member 11842305
I have Installed Windows SDK on windows 10 from herehttps://developer.microsoft.com/en-us/windows/downloads/windows-10-sdkBut I am unable to open Windows SDK command prompt to run my maven commands to install hadoop. I have searched online but didn't find anything useful. Please...
3 Oct 2015 by Member 12029885
i can't run hadoop exe file it error comes java_home is incorrectly set
14 Oct 2015 by Member 12059854
public class MaxMinReducer extends Reducer {int max_sum=0; int mean=0;int count=0;Text max_occured_key=new Text();Text mean_key=new Text("Mean : ");Text count_key=new Text("Count : ");int min_sum=Integer.MAX_VALUE; Text min_occured_key=new Text(); public void reduce(Text key,...
15 Sep 2017 by Member 13258163
I have setup a single node hadoop cluster (2.71.1) on windows 7 and now trying to establish it's connection with Azure storage (wasb) with no success. I am getting the error: No FileSystem for scheme: wasb I have been following several blogs but was focused on : articles/hadoopAndWasb.md at...
5 Jan 2018 by Member 13609332
here is the solution for the above problem select d.department, case when (d.maxJan>=d.maxFeb) and (d.maxJan>=d.maxMarch) then 'Jan' when (d.maxFeb>=d.maxJan) and (d.maxFeb>=d.maxMarch) then 'Feb' when (d.maxMarch>=d.maxJan) ...
26 Sep 2014 by Member 8899038
While running Recipe.java, getting error that Mapper, Job package is not there.
4 Apr 2018 by Member Hemal
You have to give all of your source files to javac Example: javac -classpath /usr/local/hadoop/hadoop-core-1.0.4.jar -sourcepath src/ -d build/ MyMain.java
25 Jul 2021 by mgjsa
Hi, I have written a hive query language as below. It is giving me error as written in title. the query is : select clnt_nbr, case when clnt_nbr in (select clnt_NBR from crd_master where crd_typ = '198 or crd_typ = '199' ) then 1 else 0 end) as...
7 Mar 2015 by mibetty
Im trying to install hadoop single node.When I do start-all.sh name node and job tracker dont start.Do you see in my files what can be be wrong so Im having this result?Result of hadoop jps command:14878 Jps14823 TaskTracker14605 SecondaryNameNode14456...
13 Dec 2017 by Michael_Churchman
Alibaba Cloud offers a range of Big Data solutions. This article outlines them and explains which types of Big Data services on the Alibaba Cloud align with various workloads.
21 Nov 2014 by midhun3600
Hi,I am new to hadoop. I have managed to install and use hadoop HDFS,Hive. I am able to fetch data and insert data into hive using talend.My problem is when ever we create a table from talend (distribution: apache) it is creating in hive but i am unable to see the same in hive...
9 Jan 2015 by midhun3600
Hi,I am very new to Hadoop and some how we managed to install it with apache distribution and Derby database.My requirement is i need multi users to access hive at a time. But right now we are only able to work single user at a time.I searched some of the blogs but haven't found the...
14 Jan 2015 by midhun3600
Hi,I am trying to create a hadoop table and load data into using talend.I have successfully created table but was unable to load data to it.while i execute talend job i am getting following error.========================================================FAILED: Error in semantic...
5 Dec 2015 by mohitjain012
I was learning hadoop and I come to a doubt :Every slave node consists of a data node and task tracker, every data node consists of data blocks. Suppose we have a data node which has which has 10 data blocks of each size 64 MB.How the data of a data node is processed inside a slave node?...
1 Aug 2018 by patelsandeep
During execution of MapReduce jobs how to overwrite an existing output file/dir ? What I have tried: I am working on a MapReduce project and need to overwrite an existing output. I'm unaware of the procedure?
20 Aug 2018 by patelsandeep
How we can configure Hadoop to reuse JVM for mappers? What I have tried: I am not able to configure Hadoop to reuse JVM for mappers
27 Feb 2016 by Patrice T
Problem can be anywhere. You have to define/search where is the bottle neck.Your network can be in downgraded mode because of bad wiring or bad switch.Computers can be slowed down because of lack of memory.Your programs can be artificially complicated or not optimized.It can be...
11 Jul 2021 by Patrice T
Quote: I dont understand why. Please explain Simple, you have mixed spaces and tab differently from previous line. p="foo foo quux labs foo barquux".split() d={} s=[] count=1 for x in p: if x not in s: d.update({x:count}) # this line...
20 Mar 2015 by Rabbits Foot
I am struggling for getting my HBase shell running. It throws me the above exception in subject line. I have checked that hbase-site.xml matches perfectly with hadoop one.Please help. I am struggling for 2 days and have a project due. I am attaching the two xml files of hadoop and...
25 Sep 2015 by ravi30713
Problem replicating config (bundle) to search peer 'myserver.com:8089',Reading reply to upload: rv=-2, Receive from=https://myserver.com:8089 timed out; exceeded 60sec, as per=distsearch.conf/[replicationSettings]/sendRcvTimeout
27 Feb 2016 by rehabrish
I have a topology running with parallelism as (1,8,1)(spout,logic bolt, write bolt) with number of ackers set as 12( 12 are available slots in my cluster). The max spout pending is 200 and timeout.secs is 200. I have to process 14 lac inputs.My cluster consist of 1 nimbus & 3 supervisors (...
25 Apr 2016 by Richard MacCutchan
See Visual Studio and Windows SDK Command Prompts[^].
16 Jun 2018 by Richard MacCutchan
The data contains an item that is not a number, so you need to strip that out of your list before trying to convert.
10 Aug 2018 by Richard MacCutchan
enable Trash/Recycle Bin in Hadoop - Google Search[^]
12 Mar 2015 by RkRkRkRkk
(CAQuietExec: WINPKG: Unzip of C:\HadoopInstallFiles\HadoopPackages\hdp-2.1.3.0-winpkg.zip to C:\HadoopInstallFiles\HadoopPackages succeededCAQuietExec: WINPKG: UnzipRoot: C:\HadoopInstallFiles\HadoopPackages\hdp-2.1.3.0-winpkgCAQuietExec: WINPKG:...
18 Oct 2015 by Saman With You
Hello,We are going to start a research about data mining in our company. We've chosen Cassandra as our data store. I've heard that R tool is used for data mining too. But I don't know how I can relate these to each other? Would Cassandra be enough to do data mining or we have to use R or any...
10 May 2015 by Sergey Alexandrovich Kryukov
There is no definition of "good technology". Only you can decide what's good for you.If you only want to choose something, no matter what, just to be on top of things, I'm afraid you are at wrong forum. This is the forum primarily oriented to professionals (even though some are students at...
9 Apr 2015 by shivendrapandey
actually I want to write a code that uses hash-table for storing the data just before we process, I have Mapper output.but before we process this data I want to store it in hash-table(in Reducer) ..but I am not able to write the,
18 Feb 2016 by Simon Elliston Ball
How to use NiFi to write to HDFS on the Hortonworks Sandbox
20 Aug 2015 by Sofia Panagiotidi
I have a cluster made by two slaves and one master and set up and I submit a jar (scala) to the spark master (192.168.1.64):spark-submit --master spark://spark-master:7077 --class tests.elements target/scala-2.10/zzz-project_2.10-1.0.jarAfter quite sometime running just fine it stops...
7 Jan 2017 by SrikantSahu
This tip gives basic commands to import table from Mysql to Hadoop File system and Import the files from HDFS back to Mysql.
29 Jan 2015 by Suffyan Asad
How to implement Joins in Hadoop Map-Reduce applications during Reduce and Map phases
16 Mar 2015 by Suffyan Asad
Implementing joins in Hadoop Map-Reduce applications during Map-phase using MapFiles
12 Apr 2017 by Sukanya Karri
0 down vote favorite my input Department Jan_sal Feb_sal Mar_sal civil 1 5 5 mech 2 7 2 civil 3 8 9 mech 6 4 4 mech 5...
3 Nov 2015 by sunny_sharma123
Hello, I am trying to setup a multi node cluster of hadoop using two systems. Whenever I tried to format the hdfs there will be NullPointerException occurs. I am not happy to see this code again and again. If any one have solution of this then please reply...
3 Dec 2014 by Syncfusion
With the Syncfusion Big Data Platform, you have complete access to the Hadoop environment. By adopting our platform, you are using an industry-tested solution currently employed by companies such as Microsoft, Facebook, Amazon, Adobe, Hulu, LinkedIn, and Yahoo.
24 Jan 2023 by User 2753469
It's 2023 now and if you have linux and a program called 'alternatives' you can use the cmd $> alternatives --config java to find path to java versions on your machine and this program lets you choose which version you want to use if you...
25 Mar 2022 by Viswanath Sitaraman
I'm trying to convert a piece of SQL code to HiveQL, and it's not working as expected. Please find below the code snippet in SQL that I'm attempting to convert: SQL Code:UPDATE C SET C.prod_l = P.prod_l, C.numprod = P.numprod, C.prod_cng...
23 Mar 2017 by Vladimir Dorokhov
Design and development simple analytics system using Lambda Architecture principles and Microsoft Azure cloud
5 Jun 2019 by YegorDovganich
Following 'Infrastructure as Code' rules we get a real project sample from the scratch which describes EMR cluster deploying and running Hive script there. It describes Analyze Big Data with Hadoop project from AWS 'Learn to Build' section.
12 Dec 2014 by ZurdoDev
The way this site works is we volunteer our time to help people that have gotten stuck on a specific code issue.In this case, you seem to be asking for someone to do everything for you and we don't do that.