This is the 20th day of my participation in the August More Text Challenge

The first job operation

Hello everyone, I am [Bean dried peanut], this time I brought big data HDFS, HBase programming introduction ~ can be said to be very specific, including specific operations, code instructions, each step screenshots.

I. Homework content

Docker is selected here to configure the environment

The intermediate treatment of group 0 is used here

Two. Preparation

Docker has configured the environment:

1. Enter the container

Docker run -itd dingms/ ucas-bdmS-HW-u64-2019 :16.04 /bin/bash docker ps docker exec it <CONTAINER ID> /bin/bashCopy the code

2. Enable SSH (HDFS and hbase seem to be connected through SSH)

Service SSH start (SSH seems to be restarted every time the container is restarted)Copy the code

3. You can read the description file (HDFS and hbase operations).

--------------------------------------------------------------------------------
  2                 README
  3 --------------------------------------------------------------------------------
  4 
  5 PLEASE save your code and data to your drive!
  6 WARNING: this VM will be cleaned without notice after you log out.
  7          Your code and data on the VM will be lost!!!
  8 
  9 ## Directory Layout
 10 
 11    * example:  example codes for HDFS and HBase
 12    * input:    input test data for homework 1
 13 
 14 
 15 
 16 Please enter example, in order to follow the guide.
 17 
 18    $ cd example
 19 
 20 
 21 ## HDFS Usage: 
 22 
 23 ### Start and Stop
 24 
 25    $ start-dfs.sh
 26 
 27    then, run 'jps' to check whether following processes have been started:
 28 
 29     * NameNode
 30     * DataNode
 31     * SecondaryNameNode
 32 
 33 
 34    To stop HDFS, run
 35 
 36    $ start-dfs.sh
 37 
 38 
 39 ### HDFS Command List
 40 
 41    $ hadoop fs
 42 
 43    hdfs directory layout:
 44 
 45    $ hadoop fs -ls /
 46 
47 
 48 ###. Run Example
 49 Description:
 50   put a file into HDFS by HDFS commands, and then write a Java program to
 51 read the file from HDFS
 52 
 53 1. put file to HDFS
 54 
 55    $ hadoop fs -mkdir /hw1-input
 56    $ hadoop fs -put README.md /hw1-input
 57    $ hadoop fs -ls -R /hw1-input
 58 
 59 2. write a Java program  @see ./HDFSTest.java
 60 
 61 3. compile and run Java program
 62 
 63    $ javac HDFSTest.java
 64 
 65    $ java HDFSTest hdfs://localhost:9000/hw1-input/README.md
 66 
 67 
 68 
 69 ## HBase Usage: 
 70 
 71 ### Start and Stop
 72 
 73 Start HDFS at first, then HBase.
 74    $ start-dfs.sh
 75    $ start-hbase.sh
 76 
 77    then, run 'jps' to check whether following processes have been started:
 78 
 79    * NameNode
 80    * DataNode
 81    * SecondaryNameNode
 82    * HMaster
 83    * HRegionServer
 84    * HQuorumPeer
 85 
 86    To stop HDFS, run
 87 
 88    $ stop-hbase.sh
 89    $ start-dfs.sh
 90 
 91 
 92 ###. Run Example
 93 Description:
 94    put records into HBase
 95 
 96 1. write a Java program  @see ./HBaseTest.java
 97 
 98 2. compile and run Java program
 99 
100    $ javac HBaseTest.java
101 
102    $ java HBaseTest
103 
104 3. check
105 
106     $ hbase shell
107 
108     hbase(main):001:0> scan 'mytable'
109     ROW                                                  COLUMN+CELL                                                                                                                                        
110      abc                                                 column=mycf:a, timestamp=1428459927307, value=789                                                                                                  
111     1 row(s) in 1.8950 seconds
112 
113     hbase(main):002:0> disable 'mytable'
114     0 row(s) in 1.9050 seconds
115 
116     hbase(main):003:0> drop 'mytable'
117     0 row(s) in 1.2320 seconds
118 
119     hbase(main):004:0> exit
120 
121 --------------------------------------------------------------------------------
122 version: 2019-spring

Copy the code

4. Enable HDFS and hbase

5. For details about how to Write hbase and HDFS, see how to Write hbase and HDFS?

HBaseTest.java

/* 2 * Make sure that the classpath contains all the hbase libraries 3 * 4 * Compile: 5 * javac HBaseTest.java 6 * 7 * Run: 8 * java HBaseTest 9 */
 10 
 11 import java.io.IOException;
 12 
 13 import org.apache.hadoop.conf.Configuration;
 14 import org.apache.hadoop.hbase.HBaseConfiguration;
 15 import org.apache.hadoop.hbase.HColumnDescriptor;
 16 import org.apache.hadoop.hbase.HTableDescriptor;
 17 import org.apache.hadoop.hbase.MasterNotRunningException;
 18 import org.apache.hadoop.hbase.TableName;
 19 import org.apache.hadoop.hbase.ZooKeeperConnectionException;
 20 import org.apache.hadoop.hbase.client.HBaseAdmin;
 21 import org.apache.hadoop.hbase.client.HTable;
 22 import org.apache.hadoop.hbase.client.Put;
 23 
 24 import org.apache.log4j.*;
 25 
 26 public class HBaseTest {
 27 
 28   public static void main(String[] args) throws MasterNotRunningException, ZooKeeperConnectionException, IOException {
 29 
 30     Logger.getRootLogger().setLevel(Level.WARN);
 31 
 32     // create table descriptor
 33     String tableName= "mytable";
 34     HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(tableName));
 35 
 36     // create column descriptor
 37     HColumnDescriptor cf = new HColumnDescriptor("mycf");
 38     htd.addFamily(cf);
 39 
 40     // configure HBase
 41     Configuration configuration = HBaseConfiguration.create();
 42     HBaseAdmin hAdmin = new HBaseAdmin(configuration);
 43 
 44     if (hAdmin.tableExists(tableName)) {
 45         System.out.println("Table already exists");
 46     }
 47     else {
48         hAdmin.createTable(htd);
 49         System.out.println("table "+tableName+ " created successfully");
 50     }
 51     hAdmin.close();
 52 
 53     // put "mytable","abc","mycf:a","789"
 54 
 55     HTable table = new HTable(configuration,tableName);
 56     Put put = new Put("abc".getBytes());
 57     put.add("mycf".getBytes(),"a".getBytes(),"789".getBytes());
 58     table.put(put);
 59     table.close();
 60     System.out.println("put successfully");
 61   }
 62 }
HBaseTest.java                                 
Copy the code

HDFSTest.java

1 import java.io.*; 2 import java.net.URI; 3 import java.net.URISyntaxException; 4 5 import org.apache.hadoop.conf.Configuration; 6 import org.apache.hadoop.fs.FSDataInputStream; 7 import org.apache.hadoop.fs.FSDataOutputStream; 8 import org.apache.hadoop.fs.FileSystem; 9 import org.apache.hadoop.fs.Path; 10 import org.apache.hadoop.io.IOUtils; 11 12 /** 13 *complie HDFSTest.java 14 * 15 * javac HDFSTest.java 16 * 17 *execute HDFSTest.java 18 * 19 * java HDFSTest  20 * 21 */ 22 23 public class HDFSTest { 24 25 public static void main(String[] args) throws IOException, URISyntaxException{ 26 if (args.length <= 0) { 27 System.out.println("Usage: HDFSTest <hdfs-file-path>"); 28 System.exit(1); 29 } 30 31 String file = args[0]; 32 33 Configuration conf = new Configuration(); 34 FileSystem fs = FileSystem.get(URI.create(file), conf); 35 Path path = new Path(file); 36 FSDataInputStream in_stream = fs.open(path); 37 38 BufferedReader in = new BufferedReader(new InputStreamReader(in_stream)); 39 String s; 40 while ((s=in.readLine())! =null) { 41 System.out.println(s); 42 } 43 44 in.close(); 45 46 fs.close(); 47 } 48 } HDFSTest.javaCopy the code

Three. Write the program

The requirements are as follows:

[Img-fipDBFWE-1628670451218] (C: Users\YUANMU\AppData\ Typora\ Typora-user-images) [img-fipDBFWE-1628670451218] [Img-fipDBFWE-1628670451218] [IMg-fipDBFWE-1628670451218] [IMg-fipDBFWE-1628670451218 \image-20210420195312210.png)]

Three parts are required: Reading files from the HDFS, processing files based on your own requirements, and writing processed files to hbase

According to the previous two example files and courseware procedures, to write the program:

// Import the corresponding package
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import java.io.*;
import java.net.URI;
import java.net.URISyntaxException;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map.Entry;
import java.util.AbstractMap.SimpleEntry;

import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.log4j.*;



// Create public class
public class Hw1Grp0 {

    private static String fileR;    //File R name
    private static String fileS;    
    private static int joinR;       // R join key
    private static int joinS;       

    private static ArrayList<Integer> resRs = new ArrayList<> (); //R result column number array
    private static ArrayList<Integer> resSs = new ArrayList<> (); 
    private static ArrayList<String> resRStrs = new ArrayList<> ();//R column family string array
    private static ArrayList<String> resSStrs = new ArrayList<> ();

    private static boolean isREmpty = true; //Does res contain R's column 
    private static boolean isSEmpty = true;

    /** * hashmap used for join * the String type key is the join key * the Entry's key is the LinkedList of R's columns that in the res * the Entry's value is the LinkedList of S's columns that in the res */
    // Use Java's built-in hashMap to generate hash tables
    private static HashMap<String, Entry<LinkedList<LinkedList<String>>,LinkedList<LinkedList<String>>>> joinMap = new HashMap<> ();


    private static HTable table; 
    /**
     * process the arguments
     * retracts the file name, join key, and res column from the args into the member variable.
     * @param input-arguments
     * @return void
     */
    
    // Join
    private static void processArgs(String[] args)
    {
        int index = 0;
        String resStr;
        
        if(args.length! =4)
        {
        	// Join
            System.out.println("Usage: Hw1Grp0 R=<file-1> S=<file-2> join:R*=S* res=R*,S*");
            System.exit(1);
        }
        // Open the file
        fileR = args[0].substring(2);
        fileS = args[1].substring(2);
        index = args[2].indexOf('=');
        joinR = Integer.valueOf(args[2].substring(6,index));
        joinS = Integer.valueOf(args[2].substring(index+2));
        index = args[3].indexOf(', ');
        resStr = args[3].substring(4);
        String[] resStrs = resStr.split(",");
        for(String s : resStrs)
        {
            System.out.println(s);
            if(s.startsWith("R"))
            {
                resRs.add(Integer.valueOf(s.substring(1)));
                resRStrs.add(s);
                isREmpty = false;   
            }
            else
            {
                resSs.add(Integer.valueOf(s.substring(1)));
                resSStrs.add(s);
                isSEmpty = false; }}}/**
     * Read File R and File S from HDFS line by line and Map them into the joinMap.
     * @param void
     * @return void
     */
    private static void readFileFromHDFS(a) throws IOException, URISyntaxException
    {
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path pathR = new Path(fileR);
        Path pathS = new Path(fileS);
        FSDataInputStream in_streamR = fs.open(pathR);
        FSDataInputStream in_streamS = fs.open(pathS);
        BufferedReader inR = new BufferedReader(new InputStreamReader(in_streamR));
        BufferedReader inS = new BufferedReader(new InputStreamReader(in_streamS));
        String r, s;

        while((r=inR.readLine())! =null)
        {

            String[] tmp = r.split("\ \ |");
            String joinKey = tmp[joinR];
            LinkedList<String> joinValues = new LinkedList<String> ();
            if(joinMap.containsKey(joinKey))
            {
                for(int i : resRs)
                {
                    joinValues.add(tmp[i]);
                }
                if(isREmpty)
                {
                    joinValues.add("");
                }
                joinMap.get(joinKey).getKey().add(joinValues);
            }
            else
            {
                for(int i : resRs)
                {
                    joinValues.add(tmp[i]);
                }
                if(isREmpty)
                {
                    joinValues.add("");
                }
                LinkedList<LinkedList<String>> rValues = new LinkedList<> ();
                LinkedList<LinkedList<String>> sValues = new LinkedList<> ();
                rValues.add(joinValues);
                Entry<LinkedList<LinkedList<String>>, LinkedList<LinkedList<String>>> pair = newSimpleEntry<>(rValues, sValues); joinMap.put(joinKey, pair); }}while((s=inS.readLine())! =null)
        {
            String[] tmp = s.split("\ \ |");
            String joinKey = tmp[joinS];
            LinkedList<String> joinValues = new LinkedList<String> ();

            if(joinMap.containsKey(joinKey))
            {
                for(int i : resSs)
                {
                    joinValues.add(tmp[i]);
                }
                if(isSEmpty)
                    joinValues.add("");
                joinMap.get(joinKey).getValue().add(joinValues);
            }
        }
        inR.close();
        inS.close();
        fs.close();

    }
    
    
    /**
     * create Result table in HBase
     * @param: void
     * @return: void
     */
    private static void createHBaseTable(a) throws IOException, URISyntaxException
    {
        Logger.getRootLogger().setLevel(Level.WARN);
        // create table descriptor
        String tableName= "Result";
        HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(tableName));
        // create column descriptor
        HColumnDescriptor cf = new HColumnDescriptor("res");
        htd.addFamily(cf);
        // configure HBase
        Configuration configuration = HBaseConfiguration.create();
        HBaseAdmin hAdmin = new HBaseAdmin(configuration);
        if (hAdmin.tableExists(tableName)) {
            System.out.println("Table already exists");
            hAdmin.disableTable(tableName);
            hAdmin.deleteTable(tableName);
            System.out.println("Table has been deleted");
        }
        hAdmin.createTable(htd);
        System.out.println("table "+tableName+ " created successfully");
        
        hAdmin.close();
        table = new HTable(configuration,tableName);
        table.setAutoFlush(false);
        table.setWriteBufferSize(64*1024*1024);
    }
    
    
    /**
     *  Use the joinMap to decide which record to put into the Result table.
     * @param: void
     * @return: void
     */
    private static void hashJoin(a) throws IOException, URISyntaxException
    {
        for(String joinKey : joinMap.keySet())
        {
            int count = 0;
            Entry<LinkedList<LinkedList<String>>, LinkedList<LinkedList<String>>> entry = joinMap.get(joinKey);
            LinkedList<LinkedList<String>> rValues = entry.getKey();
            LinkedList<LinkedList<String>> sValues = entry.getValue();
            if(sValues.size()==0)
                continue;
            for(LinkedList<String> rValue : rValues)
            {
                
                for(LinkedList<String> sValue : sValues)
                {
                    String countStr = "";
                    if(count ! =0)
                    {
                        countStr = "." + Integer.toString(count);
                    }
                    if(! isREmpty) {for(int i = 0; i < rValue.size(); i++ )
                        {
                            Put put = new Put(joinKey.getBytes());
                        
                            put.add("res".getBytes(), (resRStrs.get(i) + countStr).getBytes(), rValue.get(i).getBytes()); table.put(put); }}if(! isSEmpty) {for(int i = 0; i < sValue.size(); i++ )
                        {
                            Put put = new Put(joinKey.getBytes());
                            
                            put.add("res".getBytes(), (resSStrs.get(i) + countStr).getBytes(), sValue.get(i).getBytes());
                            table.put(put); 
                        }
                    } 
                    count++ ;
                }
            }
        }
        table.flushCommits();
        table.close();
    }
    public static void main (String[] args) throws IOException, URISyntaxException
    {
    	// Join method
        processArgs(args);
        // Method of reading data from hbase
        readFileFromHDFS();
        // Create a hash table
        createHBaseTable();
        // Make a connection
        hashJoin(); 
        return; }}Copy the code

Four. Run the program

Here we use the file uploaded by the teacher:

Start by passing the check folder into Ubuntu:

The corresponding Java program is also dragged in:

Pass the file into the docker using the cp directive:

sudo docker cp /home/abc/bigdata/hw1-check/0_202028018629028_hw1.java 9d44ee5dfe3b:/home/bdms sudo docker cp / home/ABC/bigdata/hw1 - check - v1.1. Tar. Gz d44: / home/BDMS/homework/hw1 note: the container name here just write topCopy the code

In the container. You can find the corresponding Java files and check folders:

Follow the readme.md instructions in the check file:

//readme.md 0. set language to POSIX $ export LC_ALL="POSIX" 1. make sure ssh is running $ service ssh status if not, then run sshd (note that this is necessary in a docker container) $ service ssh start 2. make sure HDFS and HBase are successfully started $ start-dfs.sh $ start-hbase.sh check if hadoop and hbase are running correctly $ jps 5824 Jps 5029  HMaster 5190 HRegionServer 4950 HQuorumPeer 4507 SecondaryNameNode 4173 NameNode 4317 DataNode 3. put input files into HDFS $ ./myprepare 4. check file name format $ ./check-group.pl <your-java-file> 5. check if the file can be compiled $ ./check-compile.pl <your-java-file> 6. run test $ ./run-test.pl ./score <your-java-file> Your score will be in ./score. The run-test.pl tests 3 input cases, you will get one score for each case. So the output full score is 3. To run the test again, you need to first remove ./score $ rm ./score $ ./run-test.pl ./score <your-java-file>Copy the code