Last week I got an assignment to make a small Java library for some often used portfolio calculations. A portfolio is a collection of holdings, each holding has a weight, has multiple associated attributes. All data type is Double. The calculation involved are just sum and averaging, which is so easy in Java 8 Stream. So I said there is nothing to re-use about.. plus a very generic routine is hard to understand and use. As an example I hand crafted a "classify" method that accepts a Function<H,C> to find its classification, and also uses a Map<C,Double> as its internal state, a method to update this state, a method to export a result. And that method internally calls another GenericAccumulator that simply iterates input. Long/Short handling was added as part of the iteration loop, and sum/averaging on the other hand was part of the "classify" method body. This successfully confused everyone including myself.
Next step is to manually create some sample data for a sample portfolio, write calculations in the "old" way of using Java 8 streams. Since nobody knows what function this library is supposed to include, people just say "developers are not re-using common functions", and "the calculations are same for all data points". I have to try all combinations. I got some cases that run average, some calculate sum within classifications, and some are sum under a threshold or a filter. With the experience obtained from these manually created cases, I was able to replace the "classify" method with 3 pairs of methods, and avoided the GenericAccumulator. The methods are 1. sum or weight of sum with classification; 2. sum or weight of sum with classification, each holding can be either Long or Short position; 3. sum or weight of sum with classification, but simply only takes care of sum of weight.
And it turns out Java's Collectors.* can replace every line of code I wrote. To implement my "classify" method family, I at first created two versions of "classify" that one handles simple case (without Long/Short) and another handles Long/Short positions then calls the first one to further accumulate the result. Since they are both implemented with a internal state object, accumulate and export that object, it is so easy to convert to a Collector. Then I did not know "groupingBy" can have a downstream Collector. That's the right one to implement the "classify" method family! Now all hand-crafted Collector become one liner calls to "groupingBy" or "cleectingAndThen" or "mapping" -- or everything composed.
It is very good practice working on project like this. Collectors.* forces one to think in terms of where we are in the .collect process. The details of function composing is hidden, so one only need to focus on data type conversions. I thought functional programming in Java would mean using java.util.function.* as method arguments, making a method so specialized and hard to use, hard to explain, hard to refactor (type erasing makes it worse). But it seems a walled garden can grow better flowers.
It is easy said than done.. Since I relied on type inference, reciprocal method signature or similarities and IDE syntax checks to write everything, I often cannot explain why that code has to be that way. Most times I did redundant works -- every time if I write a new method without chaining a Collector, it is wrong. Now I look back, I still doubt if some code is redundant, or some helper methods is best left for library user, or if I should add an overload for a downstream Collector. One thing is certain: if a method is going to be shared between functional and non-functional code, use the simplest argument list (no functions). That way it is easiest to compose a new function out of it without too many headache. The down side is that all arguments are evaluated before it is necessary. A related issue is that, with Collector interface, the down side is that NULL value is not well handled -- since NULL is not evaluated until it has to, a simple "groupingBy" that does not accept NULL key will force the stream to filter before collect.
Later I have created another method family that does averaging. This family has 2 extra variations, one is averaging of contributions with classification, another is to further rescale the result. Also a third method family that does sum but attribute values to Long/Short positions. The extra variations are sum or weight of sum of the contributions, also a simplified version that does not involve classifications (but still does all the rest). Even when I know Collectors.* can help me implement them all, I don't know what the variations are until I manually created some sample test cases. And after they are committed and merged and renamed multiple times, I still find it difficult to explain which family of methods is suitable for which situation. Each method can do sum of contributions, or one can call that a weighted average, isn't it?