Sunday, July 19, 2009

New DLTK indexing is promising

The last couple of weeks I've been working on improving DLTK indexer, which was derived from JDT as is. The original bug report sounds like: "Indexing must be adapted for dynamic languages". I have to explain this point a little bit. In Java, every element reference is strongly bound to the original declaration. This is why one can calculate this binding during source code parsing and hold it in a memory (probably update it when referenced/referencing elements are changed). This is not the case for dynamic languages, consider this example (PHP):

<?php

function __autoload($class_name) {

    require_once $class_name . '.php';

}

$obj  = new MyClass();

?>

In this example PHP file is included before the class is loaded, and there's no way for IDE to determine which one. In order to have all JDT-like features in DLTK-based IDE resolution of elements binding is done each time from scratch. This ends up with a lot of queries and updates to index file, which are very I/O intensive operations.

We've tried to implement indexing using H2 database, and the results are really amazing! Here's a screen-cast showing how fast building of full hierarchy for 'Exception' class is using H2 database based index. Comparing to an older implementation I must admit that it's 10 times faster. Due to the fast access to the model most of other features will have great performance as well: Code Assist, Source Navigation, Source Editing, Mark Occurrences, etc...

I hope this will be included into DLTK 2.0.0 and PDT 2.2.0.

11 comments:

Anonymous said...

Is there anyway I can test these improvements?

Seva (Wsevolod) Lapsha said...

As I already mentioned, you would probably like to try the Berkeley DB. I assume you do not use non-trivial table joints and filtering, in the queries, so seemless Map interface provided by Berkeley would be easier to use.
Also the statement parsing, compilation and optimization is told to be a certain overhead.
Anyway, it would be interesting to perform a clean performance comparison between H2 and B., since I did not found an existing one over the Internets.

Chris Aniszczyk (zx) said...

Cool stuff!

Is this going to be something generic in DLTK that applies to everything built on top of DLTK?

Zviki Cohen said...

Nice work. Indeed, H2 is a very powerful database.

@Seva: before even testing BDB, there's a bigger issue and that's the license. The BDB license is not as permissive as EPL. Especially, it is less commercial-friendly since you must open source any derived work. There is a commercial license option, but that is not free.

Michael said...

Anonymous: there's no way to test it right now, I'm working on some CVS branch that will be merged into HEAD later. The work is still in progress, what needs to be done is adapting all DLTK features that use JDT-like indexing to the new one. But, if you're still interested, look for the CVS branch called 'INDEX2' in DLTK and PDT. Affected plug-ins are:

DLTK:
------
org.h2.core
org.eclipse.dltk.core
org.eclipse.dltk.core.h2

PDT:
-----
org.eclipse.php.core
org.eclipse.php.core.h2
org.eclipse.php.ui

zx: It's going to be generic in DLTK, but it won't apply automatically on all extenders. One will have to "enable" new indexing.

Seva: Actually, H2 can be replaced by another database quite easily due to DAO pattern, but I don't see any reason for that right now :)

Greg Martyn said...

I'd like to test this out, but I'm having trouble satisfying the DLTK H2-based Indexer dependency.

Has the code been merged into HEAD yet?

FWIW, I have H2 installed in the Data Management > Connectivity > Driver Definitions page

I'm trying with:
http://download.eclipse.org/technology/dltk/updates-dev/2.0/

and

http://download.eclipse.org/tools/pdt/updates/2.0/interim/


Thanks.

Michael said...

Greg> Please find PDT 2.2 integration/nigtly builds at this page: http://www.eclipse.org/pdt/downloads/

Also, look for the needed dependencies on the same download page.

Thanks!

alex said...

DLTK indexing is something fucking great

alex@linux-sus3:/opt/Zend/ZendStudio-7.0.0> ./ZendStudio
Exception in thread "DLTK indexing" java.lang.OutOfMemoryError: Java heap space
at org.eclipse.dltk.compiler.util.HashtableOfObject.< init >(HashtableOfObject.java:37)
at org.eclipse.dltk.compiler.util.HashtableOfObject.rehash(HashtableOfObject.java:138)
at org.eclipse.dltk.compiler.util.HashtableOfObject.put(HashtableOfObject.java:111)
at org.eclipse.dltk.core.search.index.DiskIndex.mergeCategory(DiskIndex.java:601)
at org.eclipse.dltk.core.search.index.DiskIndex.mergeCategories(DiskIndex.java:561)
at org.eclipse.dltk.core.search.index.DiskIndex.mergeWith(DiskIndex.java:682)
at org.eclipse.dltk.core.search.index.Index.save(Index.java:225)
at org.eclipse.dltk.core.search.indexing.IndexManager.saveIndex(IndexManager.java:768)
at org.eclipse.dltk.core.search.indexing.IndexManager.saveIndexes(IndexManager.java:830)
at org.eclipse.dltk.core.search.indexing.IndexManager.notifyIdle(IndexManager.java:512)
at org.eclipse.dltk.internal.core.search.processing.JobManager.run(JobManager.java:433)
at java.lang.Thread.run(Thread.java:619)
Error while logging event loop exception:
java.lang.OutOfMemoryError: PermGen space

Michael said...

@alex, new DLTK indexer is not a part of Zend Studio 7.0.0.

Derek said...

Since it's i/o intensive, I'd love to see you benchmark it with a 7200 rpm HDD versus a high-performance SSD, such as this one: http://eshop.macsales.com/item/Other%20World%20Computing/SSDMXE200/

Barry Steele said...

I have been using ZDE (Zend studio) since before it was an eclipse plugin and I can say that DLTK is the sigle biggest pain in the neck we have EVER seen.

It makes the IDE almost unusable, consumes a ridiculous amoujnt of resources and adds no value.

I have been trying to work out how to turn the stupid thing off.