Monday, March 24, 2014

REST service in 5 minutes (using Java)

I remember these days when building something similar in Java required much more body movements, and maybe this was the reason to why some start-ups have chosen other weak typing languages with all their fancy Web frameworks for rapid bootstrapping. This isn't the case anymore, see how easy is creating a REST service that supports all CRUD operations in Java:

1. Define your task model:

/**
 * Task model
 */
@Entity
public class Task {
  @Id
  @GeneratedValue(strategy = GenerationType.AUTO)
  private long id;
  private String text;
  private Date created = new Date();
  private Date completed;

  public String getText() {
    return text;
  }

  public void setText(String text) {
    this.text = text;
  }

  public Date getCreated() {
    return created;
  }

  public void setCreated(Date created) {
    this.created = created;
  }

  public Date getCompleted() {
    return completed;
  }

  public void setCompleted(Date completed) {
    this.completed = completed;
  }
}

2. Tell what operations on tasks you're going to support:

/**
 * This class defines DB operations on Task entity
 */
public interface TaskRepository extends PagingAndSortingRepository<Task> {
  // Magic method name automatically generates needed query
  public List<Task> findByCompletedIsNull();
}

3. Configure your application:

/**
 * This class is responsible for:
 *  - Setting up DB connection and ORM
 *  - Initializing REST service for all found entities
 *  - Starting Spring application (main entry point)
 */
@ComponentScan
@Configuration
@EnableAutoConfiguration
@EnableJpaRepositories
@EnableTransactionManagement
public class Application extends RepositoryRestMvcConfiguration {

  @Bean
  public DataSource dataSource() throws PropertyVetoException {
    MySQLDataSource dataSource = new MySQLDataSource();
    dataSource.setDatabaseName("taskdb");
    dataSource.setUserName("user");
    dataSource.setPassword("pass");
    return dataSource;
  }

  @Bean
  public LocalContainerEntityManagerFactoryBean entityManagerFactory(DataSource dataSource) {
    HibernateJpaVendorAdapter jpaVendorAdapter = new HibernateJpaVendorAdapter();
    // Database tables will be created/updated automatically due to this:
    jpaVendorAdapter.setGenerateDdl(true);
    jpaVendorAdapter.setDatabase(Database.MYSQL);

    LocalContainerEntityManagerFactoryBean entityManagerFactoryBean = new LocalContainerEntityManagerFactoryBean();
    entityManagerFactoryBean.setDataSource(dataSource);
    entityManagerFactoryBean.setJpaVendorAdapter(jpaVendorAdapter);
    entityManagerFactoryBean.setPackagesToScan(getClass().getPackage().getName());
    return entityManagerFactoryBean;
  }

  @Bean
  public PlatformTransactionManager transactionManager() {
    return new JpaTransactionManager();
  }

  public static void main(String[] args) {
    SpringApplication.run(Application.class, args);
  }
}

That's all! After invoking this application, you'll get a task complete REST service for free. Let's test it:

Create a new task:

~$ curl -X POST -H "Content-Type: application/json" -d '{"text":"Implement simplest REST Java application"}' http://localhost:8080/tasks

See the task contents:

~$ curl  http://localhost:8080/tasks/1
{
  "text" : "Implement simplest REST Java application",
  "created" : 1395665199000,
  "completed" : null,
  "_links" : {
    "self" : {
      "href" : "http://localhost:8080/tasks/1"
    }
  }
}

Create another task:

~$ curl -X POST -H "Content-Type: application/json" -d '{"text":"Go home"}' http://localhost:8080/tasks

Find all tasks:

~$ curl  http://localhost:8080/tasks
{
  "_links" : {
    "self" : {
      "href" : "http://localhost:8080/tasks{?page,size,sort}",
      "templated" : true
    },
    "search" : {
      "href" : "http://localhost:8080/tasks/search"
    }
  },
  "_embedded" : {
    "tasks" : [ {
      "text" : "Implement simplest REST Java application",
      "created" : 1395665199000,
      "completed" : null,
      "_links" : {
        "self" : {
          "href" : "http://localhost:8080/tasks/1"
        }
      }
    }, {
      "text" : "Go home",
      "created" : 1395665359000,
      "completed" : null,
      "_links" : {
        "self" : {
          "href" : "http://localhost:8080/tasks/2"
        }
      }
    } ]
  },
  "page" : {
    "size" : 20,
    "totalElements" : 2,
    "totalPages" : 1,
    "number" : 0
  }
}
(pay an attention to how easy is it implementing pagination using this REST service!)

Mark the first task as complete:

~$ curl -X PATCH -H "Content-Type: application/json" -d "{\"completed\":$(($(date +%s)*1000))}" http://localhost:8080/tasks/1

Find incomplete tasks:

~$ curl  http://localhost:8080/tasks/search/findByCompletedIsNull
{
  "_embedded" : {
    "tasks" : [ {
      "text" : "Go home",
      "created" : 1395665359000,
      "completed" : null,
      "_links" : {
        "self" : {
          "href" : "http://localhost:8080/tasks/2"
        }
      }
    } ]
  }
}

Pretty easy and yet powerful, huh?

For more information and instructions for how to compile and run this application, see source code on GitHub.

Tuesday, February 04, 2014

Parallelization monster framework for Pentaho Kettle

We always end up with ROFL in our team, when trying to find a name for strange looking ETL processes diagrams. This monster has no name yet:


This is a parallelization framework for Pentaho Kettle 4.x. As you probably know in the upcoming version of Kettle (5.0) there's native ability to launch job entries in parallel, but we haven't got there yet.

In order to run a job in parallel, you have to call this abstract monster job, and provide it with 3 parameters:

  • Path to your job (which is supposed to run in parallel).
  • Number of threads (concurrency level).
  • Optional flag that says whether to wait for completion of all jobs or not.
Regarding the number of threads, as you can see the framework supports up to 8 threads, but it can be easily extended.

How this stuff works. "Thread #N" transformations are executed in parallel on all rows copies. Rows are split then, and filtered in these transformations by the given number of threads, so only a relevant portion of rows is passed to the needed job (Job - Thread #N). For example, if the original row set was:

           ["Apple", "Banana", "Orange", "Lemon", "Cucumber"]

and the concurrency level was 2, then the first job (Job - Thread #1) will get the ["Apple", "Banana", "Orange"] and the second job will get the rest: ["Lemon", "Cucumber"]. All the other jobs will get an empty row set.

Finally, there's a flag which tells whether we should wait until all jobs are completed.

I hope one will find attached transformations useful. And if not, at least help me find a name for the ETL diagram. Fish, maybe? :)

Sunday, January 26, 2014

"Be careful when using or accessing WiFi connection"

Last week my Skype account was hacked during my weekend holidays in Budapest. I don't know how this has happened - I only know that I was logged into Skype from iPhone, and I used a lot of free public WiFi, which are abundant in Budapest. The last day of my journey I tried to call out from Skype, and the call was finished too quickly, which should not have happened, since I remembered there was a ~30 bucks deposit on my account. I checked my account, and I've found a lot of calls to Belarus, which I didn't make of course:


There were more (tens) of entries like this.

The next thing I did was logging out from Skype iPhone app, and changing my password. Then I contacted Skype support, and I've got a Web chat with support engineer. I must say, their support reacted immediately to my request, which looked really professional from their side. I chatted about half an hour from my mobile phone's browser, but finally I've got a refund for all the calls I never did.

The incident is over now (actually, it was over the hour after I realized that my account was hijacked), but it raises the question: "How is that possible that my account was hacked? Is there some insecure part in Skype connection from the iPhone app, like sending credentials over non encrypted channel?". Unfortunately, I've got no answer from the support engineer, except for some funny comments/advises (Postfactum, I've read Skype security evaluation, but I haven't find anything that explains this incident either). Below are selected parts from the chat transcript:

Donald M: Michael, we understand that you would like to have your Skype account secured while using the application. 
Donald M: We’d be more than happy to assist you and provide you the best practice to keep you secured.
Donald M: To help you stay secure, we would like to share with you some useful tips and information about online security:
You: Ok, what can I do to keep the account secure?
Donald M: Please visit this link: http://www.skype.com/en/security/

I've read everything on that page, but I didn't find anything useful except for choosing strong enough password (which was strong enough).

Donald M: We strongly advise that every customer installs sufficient security software, such as an antivirus and a firewall on all their devices that use Skype and to keep them enabled and up to date.
You: Antivirus on iPhone?
Donald M: Skype does its best to keep your communication and personal information secure. 
Donald M: Yes!
Donald M: However, please be aware that Skype users should also take precautions against security threats by not sharing their private data and should install adequate security software on all their devices that use Skype.
You: There's no antivirus software on for iPhone mobile phone

Previously, I explained to the support engineer that I use Skype solely from my mobile phone.

Donald M: Yes and be careful when using or accessing Wifi connection. 

This last sentence simply killed me. What can I do when using public WiFi? Maybe wrap my iPhone into a condom?


Thursday, January 16, 2014

Python code indentation nightmare

After numerous hesitations, overcoming my intuitive distaste of Python as programming language, I finally decided to learn it. This is not just for getting familiar with the core coolness of Python and getting myself into Python mainstream, but more for not finding myself becoming a Java-mastodon (a-la COBOL-mastodon or FORTRAN-mastodon) in the next 10 years. So, I created a little project, and started to push lines of code into it.

My first experience with Python has not changed anything about my bad attitude towards Python syntax:
  • Code structures simply don't look fine without opening and closing curly braces, everything looks like a big unstructured mess.
  • Doc-strings written as a comment in the body(!), where I can write any crap I want simply don't feel like an API documentation to a class/method/function.
  • """, which can be used either as a multi-line comment separator or ... as a multi-line string separator when assigned to a variable. Isn't it weird? One of my university professors used to say: "If you wan't to confuse a man, either call two different things with the same name, or call two equal things with different names", and I totally agree with him.
  • Two forms: "import thing" and "from thing import another_thing". Why do I need both of them? Why can't I just use: "import thing.another_thing" or "import thing [another_thing]" like in Perl?
  • Indentation...

Indentation worth another post. I've spent about an hour trying to understand why Unit test, which I added to a forked project's test suite doesn't run:

class TestSchemes(TestCase):
    ...
    def test_find(self):
        work = Scheme.find('wlan0', 'work')
        assert work.options['wpa-ssid'] == 'workwifi'

 # Added my test:
 def test_mytest(self):
  work = Scheme.find('wlan0', 'work')
  work.delete()
  work = Scheme.find('wlan0', 'work')
  assert work == None

Trying to run - my test doesn't run, and there's no single error. I thought, may be test methods are registered somewhere (unlikely, but who knows..), but this wasn't the case. Maybe test_mytest is some registered name in Python? Tried another method names, but with no luck. Finally, I tried one more thing: copied and pasted one of the existing method's declarations using Vim's 'yy' and 'p' shortcuts, then renamed the pasted method name, and voila! That worked! Hm... What's the difference between:

    def test_mytest(self):
        work = Scheme.find('wlan0', 'work')
        work.delete()
        work = Scheme.find('wlan0', 'work')
        assert work == None

and:

 def test_mytest(self):
  work = Scheme.find('wlan0', 'work')
  work.delete()
  work = Scheme.find('wlan0', 'work')
  assert work == None

Right.. this is white-space against tab. I always indent my code using tabs, so the method I added was also indented using tabs. As it turns out, the original code was indented using spaces, so the new method simply wasn't recognized until I replaced all tabs with spaces. This is really really weird situation, when indentation characters have an effect on execution flow. Don't you agree?

To complete, I'd like to write a couple of warm words about Python. There are plenty of frameworks and libraries written in Python. GitHub is teeming with lots and lots of interesting and fun projects, from which you can learn. If you want to build a fast prototype, it's awesome. Not as awesome as Java + Maven, but still :-) I'm mastodon...

May the source be with you.