Hadoop Time

It’s finally time for me to look into Hadoop a little more closely.

I cloned the Hadoop 2.7.3 repo and built it without too much difficulty.  I even got the daemon running locally in single node mode.  After running through a few of the examples, etc., I wondered what Amazon Web Services had to offer as far as running a Hadoop cluster goes.  That’s when I discovered Amazon EMR.  After looking into the documentation – which is as excellent as I expected from the Amazon folks – I decided I would run through their example and incur the $1.05 or so it would cost to bring up a cluster and play with it.  Hopefully another post will be forthcoming on the experience…

Building S2N

s2n is the new TLS/SSL implementation by Amazon.

I retrieved the git repository and tried to build it, only to have it crash while running a unit test.  I put some debugging statements in the test but they never showed up.  I was getting an “error 139” error and initial searches indicated it might have been a gcc problem.

Turns out, my stack was blowing up during initialization of an array and I needed to increase my stack size using “ulimit -s 16284”.  An interesting little problem for me since I haven’t worked in the C world for quite some time.

Playing with Some Java Concurrency Classes

So I used Spring Boot to bang out a REST service which I call a “delay service” which simply sleeps for the amount of time (in seconds) given as a path parameter like this:  http://<host>/delay/{timeInSeconds}.  I created this to actually see my web services client call the service many times in parallel to see an ExecutorService work as advertised.  The ExecutorService implementation I chose was a simple fixed-size thread pool.

As expected, when I used the client to call the delay service several times in parallel with varying delay times, the entire time spent was roughly the same as the longest delay time.

Kinda neat.  Later I ran into the RxJava project which seems to allow you to do the same thing in a “reactive” manner (and much, much more) and I’ll be looking at that soon.

Spring Boot and Docker

I went through this Spring Framework page describing running a Spring Boot application in a Docker container.  Pretty cool.  I will investigate Spring Boot some more and also the Dockerfile used to create the image being used.  There are a couple of new items I’ve never seen, including “ENTRYPOINT” and “VOLUME”.

What I’d like to do is add a bit of database access to this skeletal project.  Maybe utilize the MySQL Docker image I downloaded a few days ago with this new Spring Boot image to make a CRUD-type application with the two images.

I got a Spring Data JPA project going and working against my Docker MySQL image.  One difficulty I ran into was the port on which MySQL is running.  I’m already running MySQL on this machine, so 3306 is in use.  And when I fire up the image, it’s just grabbing a random port which makes it difficult to configure the Spring app’s datasource.  Not sure how much I’ll work on that for now…

Continuing with Docker

Today I’m continuing to look at Docker.  I am going to look into creating my own images and playing with some existing ones.  I’ve pulled the MySQL image and I’m going to poke around on it.  I might also look at securing a container or the daemon with some keys.

Hold on!  It seems there’s something I don’t quite get about images and containers.  I tried to remove one of the images I downloaded during the tutorial, but I was told it was in use by a container.  I ran “docker ps” and didn’t see any running.  Why are these containers hanging around?  I’ve gotta investigate that.

I found a couple relevant pages.  Here and here.