First, make sure you have the driver you need to actually connect to the database. In my case it was SQL Server from Microsoft. Hortonworks provides some documentation to use on how to set this up (http://hortonworks.com/hadoop-tutorial/import-microsoft-sql-server-hortonworks-sandbox-using-sqoop/). Since that is for the sandbox, I will write from the side of a production cluster with multiple nodes.
Confirm in Ambari which nodes have the sqoop client installed. For each server with a client you will need to copy the jdbc jar to a location on the server so it can be used. Using curl, download the jdbc from Microsoft:
curl -L 'http://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/sqljdbc_4.0.2206.100_enu.tar.gz' | tar xz
That will download the file and then unzip it for you.
Now with 2.3 the location where you need to put this jar is different, so from the Ambari host (since you should have passwordless ssh setup) perform the following for each node with a sqoop client:
scp sqljdbc4.jar root@<hostname>:/usr/hdp/184.108.40.206-2557/sqoop/lib
That will place it in the needed folder and allow you to utilize the driver. Now, Hortonworks stated you should restart, I did not and I had no issues.
With the driver in place you can now attempt to connect to the database server and see what databases are available:
sqoop list-databases --connect jdbc:sqlserver://<server address>:<port> --username <username of database user> -P
Your user only needs to have read-only access in order to pull any data. -P option will prompt you for the password, but you can also use the --password option if you would like to just pass the password in. You can also pass a file with the password and I'd go with that once your verify that everything works.
If everything is correct you should get a list of all the databases on the machine.
For this next part I am going to cover importing an entire database (all the tables) into Hive. In my case there will be a lot of tables and doing them one by one is not an option. The sticking point is you need to be using Sqoop 1.4.4 or above (HDP 2.3 has 1.4.6).
The first thing you will want to do is create the database in Hive so you have somewhere to import it. So do the following:
Type hive on one of your nodes (confirm you have a client on the node)
hive>create database <name>;
You should see default and the newly created database.
Now you will run the following command to sqoop the entire database into your newly created Hive database:
sqoop import-all-tables --connect "jdbc:sqlserver://<address>:<port>;database=<name>;" --username <username> -P --hive-import --hive-database <database name>
You'll enter the password and the job will begin to run. It will let you know of any issues it runs into and you can troubleshoot from there.
Once the job has completed, go back into Hive and do the following:
hive>show databases; <this will show the available databases>
hive>use <database name>; <this will be the new one you created>
hive>show tables; <this should show you the tables available to you>
hive>describe <table name>; <this will show you the schema that was imported with the tables>
hive>select <column name(s) you got from the describe command> from <table name>; <this will output to the screen any data within the table>
And that is a quick run through of sqooping data into Hive and then performing a simple query of said data.