Tuesday, February 12, 2013

Basho Riak Enterprise Pricing

Basho has put their pricing model for Riak Enterprise online. Riak Enterprise is Basho's commercial product based on the free, open-source Riak Database. Riak Enterprise's main selling points is multi-data-center support and product-support.

With five nodes, Riak Enterprise costs USD 30,000 per year. (Why five nodes is your minimum.)

Tuesday, September 25, 2012

Riak Predictable Latency

Antonio Ye to lists.basho.com:
Is there any way to tune Riak to produce predictable latency? I have
been experimenting with a three node cluster and tuning the
frag_merge_trigger and frag_threshold bitcask parameters but no matter
what I set them to, I get very inconsistente latency numbers. Latency
seems to increase quite significant as soon as a merge starts.
It will be interesting to see the answers to this post. Can it be archived? Then again, you have DynamoDB which promises predictable latency, but note that this is done using SSDs:
Amazon DynamoDB offers low, predictable latencies at any scale. Customers can typically achieve average service-side in the single-digit milliseconds. Amazon DynamoDB stores data on Solid State Drives (SSDs) (...)
An important point is that DynamoDB databases can be deployed to only one region, so clients far away might get predictable, but high latency. A Riak database can run in multiple regions and therefore clients in different regions should enjoy the same latency. It remains to be seen how predictable that "same latency" can be...

Thursday, March 8, 2012

VisualWorks DTangler Interface - Next Tasks

Here is the plan for changes in the VisualWorks Smalltalk DTangler interface:
  • The current version (1.15) is only tested on Windows. Would be great if someone tested other operating systems.
  • Ability to "follow dependencies": Currently, only the selected packages are included in the analysis. Add an option that also includes the prerequisites of these initial selected packages. This could go one level deep or until no more prerequisites are found.
  • Consider how bundles marked as "individual functional" should be handled. Currently, this setting is ignored.
  • Look at whether the number of references (between packages) should be counted. This will perform slower, and must be an option.
  • Stings are for humans, not computers: Some of the internal methods depend on packages represented by their name (String). Look at whether this can be changed to use PundleModel.
  • Look at how bundles should be treated. The current version can analyse bundles, but it only fetches the packages within the bundle and uses this as an input. I must add that I think it is an mistake to have prerequisites on bundles. Bundles should only line up packages.
  • Look at performance. Currently, DTangler uses over 2 minutes to open on my large example having 300 packages. I will discuss with the authors of DTangler whether they could make improvements. My test data contains a lot of circularity, and this could be the reason for the weak performance.
  • Provide visual feedback that DTangler is starting. Currently, you hit the "Analyze" button and DTangler starts in the background without any feedback.
  • Handle errors when starting DTangler, and report back to user.

Wednesday, March 7, 2012

Analyzing Dependencies in Seaside

The image below shows a dependency analysis using DTangler on Seaside for VisualWorks Smalltalk 7.8.1:

Seaside dependencies (click to enlarge)
From the DSM it looks like Seaside is layered properly. My initial reaction is that this looks well engineered.

There is one circular dependency ("Seaside-Core" and "Seaside-VisualWorks-Core"), but this is probably introduced in the porting to VisualWorks. From what I can see, removing the dependency should be trivial.

Using DSM to Understand Code

Patrick Smacchia has a good write-up on using Dependency Structure Matrix (DSM):  Identify Code Structure Patterns at a Glance.

"Layer", "Cycle", "High Cohesion / Low-Coupling", "Hungry caller" and "Popular Callee" are some of the uses he explains.

Why not try DTangler with VisualWorks to use these techniques?

DTangler's DSM Algorithm

You should note that DTangler, unlike many other Dependency Structure Matrix (DSM) tools, can have entries above the diagonal, even for non-circular graphs:
(...) a DSM may contain entries above the diagonal even when there are no cycles between the analysed items. With dtangler, this is rather an indication that the items are not properly layered.

Non-circular DSM
According to the DTangler authors, the DSM arranging principle was a conscious decision: You can be cycle-free and still have non-optimal layering of modules. If you have items in the upper-right triangle it is not necessarily a problem, but it sort of gives you a nagging feeling: "Why do we have stable components depending on less stable components?"

The philosophy originates from Robert C. Martins' (Uncle Bob) articles on object-oriented software. In particular, the article about "Stability" is relevant.

To fully understand how the DTangler's DSM is created, please read the full thesis. You should concentrate on section 5.5.4 “Generating DSMs”.

Why DTangler?

Why did I choose to use DTangler to visualize VisualWorks package dependencies? There are two main reasons:

DTangler uses a Dependency Structure Matrix (DSM) -- not a graph -- to visualize dependencies. DSMs are initially harder to understand, but are more efficient for large graphs:
(...) there is a trade-off:
  • Graph is more intuitive but can be totally not understandable when the numbers of nodes and edges grow (a few dozens boxes can be enough to produce a graph too complex)
  • DSM is less intuitive but can be very efficient to represent large and complex graph. We say that DSM scales compare to graph.

DTangler is open source and available for the platforms VisualWorks runs on. Read the license information here.

Analyze Dependencies in VisualWorks using DTangler

I have published package “Epigent DTangler Dependency Analysis” (version 1.14) to the Cincom Store Public Repository. This package allows using DTangler in VisualWorks Smalltalk to analyze dependencies on packages and bundles. I want to thank the authors of DTangler for assisting me with this work.

DTangler is an open source tool for analyzing component dependencies. It provides a Dependency Structure Matrix (DSM) UI to visualize large, complex systems:

DTangler on bundle Tools-IDE”
The tool adds menu item “DTangler Analyze Dependencies” to the package and bundle popup menu in the System Browser. The dialog below is opened when selecting the menu:

DTangler Launch Screen
Follow these steps to test the tool:
  1. Install DTangler from this page.
  2. Load package “Epigent DTangler Dependency Analysis”.
  3. Set the installed location of DTangler using Epigent.DTanglerDependencyAnalysis.ExecutableInterface class dTanglerFolderName: Alternatively, set the location from the “DTangler Launch Screen”.
  4. Select a bundle or a set of packages in the System Browser. Right click and select  “DTangler Analyze Dependencies”.

To load “Epigent DTangler Dependency Analysis” from a Cincom base image, evalutate the script below:

Compiler evaluate: 'Parcel loadParcelByName: ''StoreForPostgreSQL''.
Store.StoreDevelopmentSystem reconnectAction: #reconnect.
Store.RepositoryManager addRepository: (Store.ConnectionProfile new
        name: ''Public Store Repository'';
        driverClassName: #PostgreSQLEXDIConnection;
        environment: ''store.cincomsmalltalk.com:5432_store_public'';
        userName: ''guest'';
        password: ''guest'';
        tableOwner: ''BERN'';
Store.DbRegistry connectTo: (Store.RepositoryManager repositories
detect: [:each | each name = ''Public Store Repository'']).'.
Compiler evaluate: '(Store.Package mostRecentVersionOfPundleWithName: ''Epigent DTangler Dependency Analysis'') loadSrc.'.

Monday, February 27, 2012

Pharo Smalltalk Nautilus Browser

Pharo Smalltalk Nautilus Browser. Finally something that possibly can match the VisualWorks browser.

Build Time and Interaction

This article by Greg Young has some explanations on why many .NET projects suffers under long build-times:
The rational is using refactoring tools like reshaper and coderush. They want refactoring support across the whole codebase.
Another reason (...) is for debugging support (...)
The same writer reports results from a poll of different projects' build-time:
Over 50% of people had builds ranging over 1 minute! This is a real pain point for development teams (...)
OK, maybe build time can be reduced to a few seconds, but fulfilling Bret Victor's vision in "Inventing on Principle" will not be possible using tools like this.

Wednesday, January 25, 2012

Basho Riak vs Amazon DynamoDB

Up to this point Amazon had two non-relational data-store offerings, but none really matched the needs when storing large amount of structured data:

Amazon S3 - Has unlimited storage of key-value data, but no indexing and no conflict resolution mechanism.
SimpleDB - Good indexing and query capabilities, but has limits on the amount of data that can be stored.

Now Amazon has announced DynamoDB: A scalable, structured database, fully served from the cloud.

With DynamoDB, Basho gets competition from the source that inspired them to make Riak. Still, it does not look like Basho is too concerned. I agree with Basho: DynamoDB might turn out good for Basho, since DynamoDB does not directly copy the model of Riak.

DynamoDB has some advantages over Riak, like its "scale-by-pay": With DynamoDB, you just make an API call to scale up your database. No new hardware needs to be purchased and deployed. Still, I think Riak has a lot of features that usually will make it a better choice than DynamoDB:
  • Riak allows defining multiple indices per object. DynamoDB has a more limited index model. 
  • Riak allows storing objects larger than DynamoDB's 64 KB limitThis is particular important when you want to serialize a business object graph and store it as a BLOB in a single object/record in the database. With DynamoDB this approach won't (usually) work; 64 KB is too little to fit a serialized graph. You will be forced give up serializing, break up your entire graph and map it to multiple records. This means manual mapping between objects/database. If the domain is complex, I fear we enter another Vietnam of Software Development.
  • DynamoDB only replicates between "availability zones", not "regions"Riak Enterprise allows for replicating between data centers in different regions, e.g. Europe and the US. With Riak you can have your data in Europe and the US to reduce latency for both regions.
  • You can run Riak at your own hardware having full control of the data. DynamoDB is only run at Amazon's data centers.

Friday, October 21, 2011

Source Code Search for VisualWorks Smalltalk

I have published a tool for searching through the image for a specific string: Epigent Source Code Search.

Once loaded, you get option "Source Code Search" in the VisualWorks tools menu. You can configure the tool to (optionally) limit the search to packages/bundles in your own project.

The tool has been tested with VisualWorks 7.8. It is released under the MIT License.

Tuesday, October 11, 2011

Google's Dart Language and Smalltalk

The Google Dart language failed to deliver what the Smalltalk community hoped for. The community wanted a Smalltalk variant with only optional typing added, but that was too optimistic.

Gilad Bracha, one of the creators of Dart, writes this about the Smalltalk community and its hopes for Dart:

As I watched the pre-launch speculation about Dart in the Smalltalk community, I knew that disappointment would follow. That's inevitable given the amount of wishful thinking involved. And the wishful thinking is natural too, but it is very much divorced from what we can do in reality.
Read the full post here.

The designers of Dart try too hard to please people using C-style languages. Dart has a C-like syntax, it does not use keyword messages, and it misses class extensions. They also include the Switch statement; not a good sign when you design a new language.
But Dart has many elements borrowed from Smalltalk. Therefore I expect it will be easier to learn Dart than JavaScript for the average Smalltalk programmer.

Right now my main interest is whether the Dart virtual machine (VM) can host other languages better than JavaScript does. Could this new VM make it easier to run Amber Smalltalk in the browser?

Monday, September 26, 2011

RAM over LAN Faster Than Disk

Interesting fact: Fetching objects over (fast) LAN from memory in another computer is faster than fetching the same object from SSD disk. Read more...

Tuesday, September 6, 2011

"Server-Centric" and "Client-Centric" Web Frameworks

“Server-Centric” Web Frameworks
These frameworks generate the user interface on the server, and use third-party JavaScript libraries to provide a rich user interface experience at the client. They use a combination of HTML, CSS and AJAX to create the user interface. Code you write runs on the server, and you can assume low-latency access to your database and domain objects.

Aida/WebIliad and Seaside are the main Smalltalk server-centric web frameworks. Aida/Web and Seaside are used in production by several projects, while Iliad is new and experimental

“Client-Centric” Web Frameworks
With these frameworks you have your code executed in the browser on top of JavaScript. You manipulate the web UI directly in Smalltalk. 

Quicksilver and Jtalk are well-known Smalltalk client-centric web frameworks. These frameworks are experimental, and not ready for production yet.

It will be interesting to see how client-centric web frameworks evolve. Remember, having Smalltalk running in the browser does not solve all problems: If the user interface acts on domain objects stored on the server, how do you go about transferring the data? Will your application transfer objects to the client, or will you have to choose a more light-weight approach? Maybe the ability to run Smalltalk on the client will only be used to decorate the user interface? Who knows?

Integration between server-centric and client-centric frameworks will be important in the coming years. Therefore it is great to see that Aida/Web recently announced plans for integrating Jtalk.

Tuesday, August 23, 2011

Scrollbars - A New Take

When your finger moves the mouse's scroll-wheel down, the document moves up. This is the opposite of touch interfaces' notion of direct manipulation. Then, why does the scroll wheel work like that?

The answer is of course that the scroll wheel manipulates the scroll-bar's "tumb", not the document.

But what happens when the scroll-bar becomes less visible, or even invisible? Read what Apple did to scroll-bars in OS X Lion. Ubuntu has also hidden scroll-bars, but I think Appels' design is a lot cleaner: Apple simply copied their design from iOS.

Thursday, July 14, 2011

Fuel Hook Methods

Mariano Martinez Peck explains that Fuel will get «hook» methods like StOMP has:
StOMP's hook methods are awesome and we want to have the same in Fuel. In fact, check the issue tracker and you will see several open issues regarding this :)

I think StOMP’s API for hook methods looks good, but the ability to (optionally) pass in an argument for the hook methods would be great. The argument passed would be (optionally) specified when invoking serialize / deserialize operations.

Why would you want to pass down an argument to be used in the hook methods? Well, not all serialize operations are simply about copying an object; sometimes you can use a serializer to copy only relevant parts of your model. Which parts you want to copy can vary with the context of the copy operation.

As an example, we have a large, complex, tree-like model containing hundreds of arrays with floats, where each position in the arrays represents the result of a Monte Carlo simulation. Some functions in the system require that we extract the results at a certain position from the arrays, or maybe a set of positions. Currently we implement the copy as variants of #copy* methods. This is hard to maintain, and error prone. If we could serialize and deserialize to create the customized copy, a lot of code could be removed.

Let’s say StOMP’s method Object stompWriteValue was called from a new method:

Object stompWriteValue: argument 
^self stompWriteValue

By default clients do not specify an argument. Object stompWriteValue: is called with nil as argument, and it calls #stompWriteValue. Now, if I wanted to pass an argument, I would override Object stompWriteValue: and use the argument to determine which write value to answer.

What you basically do is to pass down information about the context of the serialize operation. This context could be something simple as a symbol, or a more complex objects that you dispatch to.

I have not decided whether I really like this idea. :-) It might even be a bad idea: It complicates the API a bit, and complicates even more the internal implementation. Also, I like the idea to separate functions: A serializer should simply create a (binary) representation of an object. If you want to modify the copy, you should serialize the original, deserialize to a copy, and then modify the deserialized copy.

In one way you could argue that I want to tap into the serializer’s ability to traverse the object graph, visiting each object exactly one time, accepting circular structures. If I had access to these functions, having the argument in the hook methods might be less relevant. Such a function can be nice to have for a lot of operations.

But I am unsure how traversing should be implemented. How would a traverse function for example work when hitting #stompWriteValue (or its equivalent Fuel method)? It would not traverse the actual object, but risk getting a constructed object from that method.  Should the answer from this method be traversed? Or should a TraversingConstructedWriteValue exception be raised?

Goals of Fuel

Mariano Martinez Peck wrote a comment on my posts about Fuel and StOMP, explaining the goals of Fuel.

Tuesday, July 12, 2011

Fuel and StOMP - Best of Both

Web sites with need for low latency and dominated by read requests, would benefit from Fuel's deserialization  performance. On the other hand, serialization can also be heavily used in operations like saving data to database, storing data to allow undo/rollback, copy of data, etc. So a framework that performs like StOMP for serialization would be good to have.

After seeing how StOMP is faster for serialization and Fuel is excels in deserialization, it is tempting to hope for the two projects to learn from each other. Could we get StOMP's fast serialization and Fuel's fast deserialization in a single serializer?

Without knowing the internals of the frameworks, my guess is that the format of the serialized data makes it hard to combine the need for speed in both serialization and deserialization. So the initial answer is "no, sorry!"

But... A partial solution would be to use an intermediate format for serialization. This format would support fast serialization, and therefore be similar to StOMP's format. The intermediate format would for example be used when a user needs to quickly commit changes to a database. Later, the intermediate format could be transformed to a format that is better optimized for deserialization. This transformation would typically be done as a background task. The deserializer would be able to (transparently) understand both the intermediate format and the optimized format.

Even if you do not get the best combined performance of StOMP and Fuel, systems with the need for low latency for both writes and reads, and a period of time between these operations, could benefit.

Comparing Speed of StOMP and Fuel

Based on the data StOMP provides, I have published a spreadsheet and graphs comparing the speed of Fuel and StOMP.

Monday, July 11, 2011

New Smalltalk Object Serializers: Fuel and StOMP

There are two new Smalltalk serialization frameworks that get attention these days:
  • Fuel
    "Fuel is an open-source general purpose framework to serialize and deserialize objects based on the pickling algorithm of Parcels, a popular tool for loading packages in VisualWorks. Fuel is implemented in Pharo Smalltalk Environment and we demonstrate that we can build a really fast serializer without specific VM support, with a clean object-oriented design and providing most possible required features for a serializer"
  • StOMP
    "[StOMP is a] multi-dialect object serializer built on MessagePack for Smalltalk. The aim is to provide portable, fast, compact serializer for major Smalltalk dialects. StOMP is optimized for small/medium sized data. It is especially suitable for KVS or RPC."
One could wish that forces were joined to create a single project, but right now it seems like the projects learn from each other: Just read what the StOMP team has to say about Fuel:
Fuel's materialization(deserialization) speed is superb. Fuel uses an optimized format for speeding up deserialization. On the other hand, StOMP materialization speed is high because it uses simple one-pass recursion.
Mentioning the other major "competitor" like this, is really nice of the team!

The conclusion is that Fuel is faster for deserialization, but StOMP is faster for serialization. I will post more about this later.

To me it appears like StOMP has better support for serialization/deserialization "hook" methods, class renaming and shape changing. Hopefully, Fuel can learn from StOMP in this field.

Thanks to both teams for their effort!

Wednesday, July 6, 2011

Comparing Smalltalk Web Frameworks

Author of Aida/Web, Janko Mivšek, created a comparison between three Smalltalk web frameworks:


Janko's post started a heated discussion: Read many of the mails here.

As I understand Janko, Seaside was created in an age where pure HTML ruled the web. These days JavaScript is driving most web pages, so support for JavaScript is the most important aspect of a web framework. Having said that, not everyone seems to agree that Seaside is behind. And I think it is fair to mention that Seaside has strong support for JavaScript frameworks.

I have only tried Seaside, choosing that framework since it is the “default” one to use for Smalltalk web development. At least the comparison made me aware of Iliad, and had me rediscover Aida/web. I will try Aida/Web again.

Everyone agrees on the importance of JavaScript. It seems like Aida/Web and Seaside takes different routes to support JavaScript, and I really want to understand these differences.

Saturday, June 4, 2011

Windows 8, Silverlight and HTML5

Earlier, Scott Barnes reported that Microsoft debates Silverlight's future:
Right now there's a faction war inside Microsoft over HTML5 vs Silverlight. oh and WPF is dead.. i mean..it kind of was..but now.. funeral.
An interesting fact is that Jensen Harris is now the director of "Windows User Experience". He came from the Office team and headed the transformation to the Office 2007 UI. The Microsoft Office Team has a long tradition of ignoring the Microsoft "native" UI, and inventing their own technology.

So with the announcement of Windows 8 using HTML5, did the HTML5 camp win? At least Microsoft's Silverlight forum has a few worried developers.

Thursday, May 26, 2011

I am on Twitter too...

You can follow me on Twitter at @runarj

Friday, May 6, 2011

Why Riak?

I got my first understanding of NoSQL-databases reading the excellent Dynamo-paper. This is a well-written paper which explains a lot of the reasons for using key-value databases, and their design. It also contains technical information that is good to know for users of Riak, since Riak builds on the ideas found in Dynamo.

Looking into NoSQL databases I started with CouchDB which looks neat. Still, it seems to not match everything I want, especially in terms of scalability and fault tolerance. Like Riak, CouchDB is built using Erlang, and its API is RESTful.

I briefly looked at HBase, but found the technology too complicated. I have limited knowledge about Linux, and want to be able to setup the datastore without too much hassle.

I moved over to testing Cassandra, which, like HBase, seem to be able to scale well. Cassandra is built in Java and uses Thrift as its interface. I managed to install Cassandra and compile the Thrift classes in Smalltalk, but lost interest when I discovered Riak.

Riak is similar to Cassandra but everything seems a bit simpler; the installation, the data model and the API:
  • The installation is dead simple. Installing Riak took me only a few minutes.
  • The data model is easy to understand. You could say that Cassandra has a richer model, but my overall impression is that Riak is able to support most use-cases. 
  • Basho's choice of using a RESTful API for Riak makes sense to me. I could relatively easy create a Smalltalk interface to do some basic operations.

Soon in Riak: Secondary Indices

Basho is presenting Secondary Indices for Riak later this summer on OSCON. This discussion thread contains more details, and outlines the main functionality:
You tell the system how to index an object by "tagging" it with field/value pairs. The tags are passed to Riak via object metadata, currently sent via HTTP headers. (...) We're initially targeting a SQL-ish type language for querying, with support for exact match and range queries.
This looks promising!

Friday, March 18, 2011

Pharo Smalltalk Riak Interface - Why Pharo Smalltalk?

I have been asked why my Smalltalk Riak Interface was developed for Pharo Smalltalk, and not another Smalltalk dialect. I guess the answer is that I wanted to get into Pharo and test its tools. If I am not  mistaken, the client will run without modification on Squeak too.

To port the client to other Smalltalk dialects, only one class currently needs to be changed. This class does the interfacing to the Pharo HTTP library.

Zinc HTTP Components and a JSON implementation for Squeak/Pharo are the only external dependencies for the client. These components also needs to be replaced if the client should be ported.

Having said that, it is best to first enhance the client with needed features, then look at porting it.

When moving forward with the client, I will keep it easy to port the client to other dialects.

Wednesday, March 16, 2011

Use of VisualWorks’ Polycephaly at GeoKnowledge

At GeoKnowledge we use Cincom VisualWorks Smalltalk to develop GeoX, a decision support solution for play, prospect and field assessment in the upstream petroleum industry.

We have around 150 unit tests that check the static quality of our code. Examples of tests include text spelling, verification of correctly defined class hierarchies, correct use of pragmas, etc. We also run a subset of the “Code Critics” rules included in VisualWorks. If there is a problem in code, we will see if it can be statically checked and then write a code quality test.

We have made our own tool to execute the tests, write the result to a window, load new code and re-execute:

GeoX “Test Runner”

The process of running the 150 tests takes about an hour in a single image. We wanted to run the tests faster to detect errors quicker. To do this we used Polycephaly.

We made a small extension to Polycephaly to let it process a set of operations (a job) using a pool of virtual machines. (The two methods we added are found below.) The extension allows starting a hard-coded number of virtual machines, and let those execute a set of tasks. When a virtual machine finishes a task, it is given the next task in the job. All machines are kept busy during the execution of the job.

By using a fixed-size pool of Polycephaly worker images, we limit the number of virtual machines started. This is important; if we started 150 images to run our unit tests in parallel, we would use a lot of resources (memory in particular) without being able to execute the job faster. This approach is similar to how Erlang does its thread handling.

Using our extensions to Polycephaly, the tool does the following:
  1. Load newest code.
  2. Save the image.
  3. Set up a pool, of for example 4, Polycephaly virtual machines.
  4. Using the pool, send tasks to the virtual machines. Each task is the instruction to perform a single test. Answer is a Boolean indicating failure/success.
  5. When job is finished report result to user interface.
  6. Wait for new code published, restart process.
Below are the results of running the 150 tests, using 4 Polycephaly virtual machines. “Original time” refers to executing the job using a single image:

Intel Quad CPU Q8200 (4 cores, 4 threads)
Total time went down to 28% of original time.
This is near linear scaling.

Running virtualized on an Intel Core i5 750 Microsoft Hyper-V (4 cores, 4 threads)
Total time went down to 32% of original time.
We do not understand why the tests do not scale as good using this setup, but we suspect virtualization hurts performance.

Intel Core i5 661 (2 cores, 4 threads)
Total time went down to 43% of original time.
This shows how raw execution of Smalltalk code benefits more from using “true” cores, than Intel threads.

Instance-side code extensions to Polycephaly.VirtualMachines
doActions: actions
"Do all actions using the receiver’s machine pool.
To reduce the total execution time, smaller tasks should be at the end of the action collection.
Answer an array with the result of each action."

^self doActionsAndArguments: (actions collect: [:each | each -> Array new])

doActionsAndArguments: actionsAndArguments
"Do all actions using the receiver’s machine pool. Argument actionsAndArguments is a dictionary where each association holds the action block and its argument collection.
To reduce the total execution time, smaller tasks should be at the end of the action collection.
Answer an array with the result of each action."

| machinesReady answerSemaphore dronesSemaphore answer |

machinesReady := Array new: self machines size withAll: true.
answer := Array new: actionsAndArguments size.
dronesSemaphore := Semaphore new.
answerSemaphore := Semaphore new.
dronesSemaphore initSignalsTo: self machines size.
actionsAndArguments doWithIndex: [:each :index | | indexOfReadyMachine |
dronesSemaphore wait.
indexOfReadyMachine := machinesReady indexOf: true.
machinesReady at: indexOfReadyMachine put: false.
at: index
put: ((self machines at: indexOfReadyMachine) do: each key withArguments: each value).
machinesReady at: indexOfReadyMachine put: true.
dronesSemaphore signal.
answerSemaphore signal] fork].
actionsAndArguments size timesRepeat: [answerSemaphore wait].

Monday, March 14, 2011

Riak Interface for Pharo Smalltalk

I have published the first beta version of a Pharo Smalltalk interface to Basho’s key-value database “Riak”. The interface uses the REST API of Riak.

The current version (0.2) supports:
  • Storing an object (JSON / text / blobs) at a key (PUT operation)
  • Getting object at a key (GET operation)
  • Deleting a key
  • List all buckets
  • List all keys in a bucket

Here are the planned features that will be added:
  • Avoid the need to specify data type on put operation
  • Error handling
  • Class that represents a key
  • Handling of “sibling” objects
  • Get meta-information about database
  • Map-Reduce support
  • Streaming support
In a Pharo 1.1.1 image, use the following script to load the Riak interface:

  location: 'http://www.squeaksource.com/EpigentRiakInterface'
  user: ''
  password: ''.

Gofer new
    squeaksource: 'EpigentRiakInterface';
    package: 'ConfigurationOfEpigentRiakInterface';

((Smalltalk at: #ConfigurationOfEpigentRiakInterface) project version: '0.2') load.

To run tests, set up a Riak database at a host named riaktest and execute tests in class EpigentRiakRestConnectionTest.

Tuesday, February 22, 2011

Storing Part of Riak Object Value in Memory

One lesson I learned from SQL is that using (primary) surrogate keys has a lot of advantages. I know that not all SQL wisdom holds true for Riak, but also with Riak I prefer not using natural keys

For “Key Filters” Basho list the following natural keys as an example:


Here, the keys contain two pieces of domain data to enable Key Filters: Company name and date. Key Filters’ main advantage is that they work on the keys only, which – at least for Bitcask – are stored in memory. So querying can be done faster than if the values were to be loaded from disk.

The disadvantage of Riak’s Key Filter approach is that you end up with highly domain-specific keys, which can be hard to reference, especially if you need to update keys to allow querying new aspects of the data: If you need to change your existing keys, references to these keys needs to be updated too. This is hard to do atomically when you have a key-value store like Riak. Even worse, if data changes you need to update the key, and – again – the pointers to the key, if you have any.

Riak’s natural keys also demand the use of transform functions, which gets more complex as the amount of data stored in the key increases. In the example of filtering data for “3rd of June” (look at bottom of this page), the predicate function “ends_with” is used. If the key is extended with more data, that query will fail.

Using natural keys like Riak currently does, is a cumbersome way to store part of the object’s value in memory, forced into a single string.

You could of course ease the situation by only using Key Filter-friendly natural keys for objects that act as indexes. But wouldn't it be good to have the advantage of Key Filters, while at the same time have the ability to have surrogate keys?

What if… not only the key, but also part of the object’s value could be stored in memory? Then you could write queries that used the object’s memory only and get good performance. For the REST API, maybe an X-Riak-Memory header could be supported. Its content could be JSON, and the Key Filter could work on this memory data.  Enabling such functionality would let the application developer tune memory/disk storage and keep keys stable as the application evolves.

I fully understand that such a change will be complex. Riak use multiple backends, and maybe this idea does not fit those. Still, I think having part of the object in memory has advantages that cannot be ignored: Key filters could be replaced by simply using the memory part of the value. And maybe the need for secondary indices would be less important? Using memory could potentially enable Riak to scan data on range too.

Saturday, January 29, 2011

Pharo Updates User Interface

Pharo gets an updated user interface in version 1.2. I like the removal of elements which only looked like poor copies OS X controls. Gone are the round buttons and the "traffic light" windows button. Command buttons have no longer rounded edges:

Pharo 1.2 RC
I view Pharo as primarily a web development environment, so the look and feel is not that important. But a face-lift is welcome nonetheless.

Wednesday, January 26, 2011

Drop Your Mouse Buttons!

Personally I think the right mouse button only confuses "normal" users. Same goes for the concept of double-clicking.

Monday, January 10, 2011

Smalltalk API for Riak

Göran Krampe is working on an API from Squeak/Pharo for Riak. I am looking forward to the result!

Monday, December 20, 2010

VisualWorks 7.7.1 Memory Policy and Shrinking Memory Usage

VisualWorks 7.7.1 has LargeGrainMemoryPolicy as its default memory policy. If you allocate a large amount of memory, let’s say 300 MB, and then drop the referenced objects, the memory allocated from the operating system does not drop.

Even when the image is left unused for a long time, the allocation stays. Running additional allocations, which consumes less memory than the first allocation, does not free any memory allocated from the operating system. However, evaluating garbage collect frees up the used memory. Earlier versions of VisualWorks did not have this behavior; they would free memory when additional allocations were made.

I used the following script on Windows 7 64 bit to test this behavior:

| arrays |
arrays := List new.
1 to: 300 do: [:each |
       arrays addLast: (ByteArray new: 1024 * 1024 withAll: 1)].
arrays := List new.
1 to: 100 do: [:each |
       arrays addLast: (ByteArray new: 1024 * 1024 withAll: 1)].

After the script finished, all initial 300 MB of data are still allocated from the OS, when using VisualWorks 7.7.1. This might not represent a problem; the memory is not “lost”. It is still there, available for new allocations made by the image. But if you have multiple images on one server (Citrix hosting a desktop application, or a server application with multiple images), this memory usage can cause problems.

To fix this problem I make my own memory policy, add instance variable #lastGarbageCollectTimestamp, add getter/setter for the variable and override #idleLoopAction to to perform a garbage collection every 60 second:

                super idleLoopAction.
                (self lastGarbageCollectTimestamp isNil or: [
                               (self lastGarbageCollectTimestamp differenceFromTimestamp: Timestamp now)
                                               > (Duration fromSeconds: 60)]) ifTrue: [
                               ObjectMemory globalCompactingGC.
                               self lastGarbageCollectTimestamp: Timestamp now]

Note that a global, compacting garbage collect can be CPU intensive. However, we prefer this over using too much memory.

Friday, December 10, 2010

Riak Considers Secondary Indices

Just like Cassandra, Riak is considering secondary indices. Read the comments at the bottom of this post.

Tuesday, December 7, 2010

VisualWorks 7.7.1 Store Using Too Much Memory and Too Many Open Cursors During Load (and Fix)

We have experienced two types of load problems when loading bundles with many changes in Store using VisualWorks 7.7.1:
  • 'ORA-01000: maximum open cursors exceeded'
  • Unhandled exception: a primitive has failed
    The primitive failure is caused by too little memory available. Here is the stack trace:
optimized [] in [] in Glorp.DatabaseAccessor>>executeCommand:returnCursor:
Error class(GenericException class)>>raiseErrorString:
optimized [] in OracleBuffer>>mallocForRowBuffer
optimized [] in OracleSession>>acquireBuffers

If you get any of these problems, follow the advice Cincom’s Alan Knight gave to the VMNC mailing list:
In StoreLoginFactory>>currentStoreSession, modify it to send "reusePreparedStatements: false". This will remove one of the caching optimizations it uses for these resources. It may make things a bit slower. It also may not fix the problem, I haven't tried it, but I suspect that's the critical resource.
I tested the suggest fix now, and it works. I would recommend that Cincom include the modification in the next version of VisualWorks.

Monday, December 6, 2010

VisualWorks Memory Policy for 3 GB Memory Usage

As I wrote earlier, VisualWorks 7.7.1 can use close to 3 GB of memory when running on Windows 7 64 bit. My initial attempt to override #defaultMemoryUpperBound is however broken. It turns out other parameters are based on this number, and these are modified when this parameter is changed. Some of these modifications give problems allocating memory. It also cause the VisualWorks included memory policy unit tests to fail.

A better solution is to continue subclassing LargeGrainMemoryPolicy, but add #initialize to do the following:

                super initialize.
                self memoryUpperBound: 1024 * 1024 * (1024 * 3 - 128)

This should work OK. It will use the default values of LargeGrainMemoryPolicy, and allow for growth beyond 512 MB. I have not used this new memory policy long enough to actually confirm that it does not give any problems. Also, it might need modifications to deal with what happens when too much memory is consumed.

You do not need to subclass LargeGrainMemoryPolicy. You could also simply modify memoryUpperBound through the setter on LargeGrainMemoryPolicy. I choose subclassing because I think it makes it clear that a system is actually creating its own policy. It also makes it easier to add other modifications. More about this later...


Welcome to my new blog! This blog will focus on the programming language Smalltalk, and surrounding technologies. Whatever that means…

I will stop posting on my old blog, which covered the same topics as this one.