Fluent interface in Delphi

What are the pros and cons in using fluent interfaces in Delphi?
Fluent interfaces are supposed to increase the readability, but I'm a bit skeptical to have one long LOC that contains a lot of chained methods.
Are there any compiler issues?
Are there any debugging issues?
Are there any runtime/error handling issues?
Fluent interfaces are used in e.g. TStringBuilder, THTMLWriter and TGpFluentXMLBuilder.
David Heffernan asked which issues I was concerned about. I've been given this some thought, and the overall issue is the difference between "explicitly specifying how it's done" versus "letting the compiler decide how it's done".
AFAICS, there is no documentation on how chained methods actually is handled by the compiler, nor any specification on how the compiler should treat chained methods.
In this article we can read about how the compiler adds two additional var-parameters to methods declared as functions, and that the standard calling convention puts three params in the register and the next ones on the stack. A "fluent function method" with 2 params will therefor use the stack, whereas an "ordinary procedure method" with 2 params only uses the register.
We also know that the compiler does some magic to optimize the binary (e.g. string as function result, evaluation order, ref to local proc), but sometimes with surprising side effects for the programmer.
So the fact that the memory/stack/register-management is more complex and the fact that compiler could do some magic with unintentional side effects, is pretty smelly to me. Hence the question.
After I've read the answers (very good ones), my concern is strongly reduced but my preference is still the same :)

Everybody is just writing about negative issues so let's stress some positive issues. Errr, the only positive issue - less (in some cases much less) typing.
I wrote GpFluentXMLBuilder just because I hate typing tons of code while creating XML documents. Nothing more and nothing less.
The good point with fluent interfaces is that you don't have to use them in the fluent manner if you hate the idiom. They are completely usable in a traditional way.
EDIT: A point for the "shortness and readability" view.
I was debugging some old code and stumbled across this:
.AddChild('Time', Now)
.AddSibling('Severity', msg.MsgID)
.AddSibling('Message', msg.MsgData.AsString)
I knew immediately what the code does. If, however, the code would look like this (and I'm not claiming that this even compiles, I just threw it together for demo):
xmlData: IXMLNode;
xmlDoc : IXMLDocument;
xmlKey : IXMLNode;
xmlRoot: IXMLNode;
xmlDoc := CreateXMLDoc;
'version="1.0" encoding="UTF-8"'));
xmlRoot := xmlDoc.CreateElement('LogEntry');
xmlKey := xmlDoc.CreateElement('Time');
xmlData := xmlDoc.CreateTextNode(FormatDateTime(
'yyyy-mm-dd"T"hh":"mm":"ss.zzz', Now));
xmlKey := xmlDoc.CreateElement('Severity');
xmlData := xmlDoc.CreateTextNode(IntToStr(msg.MsgID));
xmlKey := xmlDoc.CreateElement('Message');
xmlData := xmlDoc.CreateTextNode(msg.MsgData.AsString);
I would need quite some time (and a cup of coffee) to understand what it does.
Eric Grange made a perfectly valid point in comments. In reality, one would use some XML wrapper and not DOM directly. For example, using OmniXMLUtils from the OmniXML package, the code would look like that:
xmlDoc: IXMLDocument;
xmlLog: IXMLNode;
xmlDoc := CreateXMLDoc;
'xml', 'version="1.0" encoding="UTF-8"'));
xmlLog := EnsureNode(xmlDoc, 'LogEntry');
SetNodeTextDateTime(xmlLog, 'Time', Now);
SetNodeTextInt(xmlLog, 'Severity', msg.MsgID);
SetNodeText(xmlLog, 'Message', msg.MsgData.AsString);
Still, I prefer the fluent version. [And I never ever use code formatters.]

Compiler issues:
If you're using interfaces (rather than objects), each call in the chain will result in a reference count overhead, even if the exact same interface is returned all the time, the compiler has no way of knowing it. You'll thus generate a larger code, with a more complex stack.
Debugging issues:
The call chain being seen as a single instruction, you can't step or breakpoint on the intermediate steps. You also can't evaluate state at intermediate steps.
The only way to debug intermediate steps is to trace in the asm view.
The call stack in the debugger will also not be clear, if the same methods happens multiple times in the fluent chain.
Runtime issues:
When using interfaces for the chain (rather than objects), you have to pay for the reference counting overhead, as well as a more complex exception frame.
You can't have try..finally constructs in a chain, so no guarantee of closing what was opened in a fluent chain f.i.
Debug/Error logging issues:
Exceptions and their stack trace will see the chain as a single instruction, so if you crashed in .DoSomething, and the call chain has several .DoSomething calls, you won't know which caused the issue.
Code Formatting issues:
AFAICT none of the existing code formatters will properly layout a fluent call chain, so it's only manual formatting that can keep a fluent call chain readable. If an automated formatter is run, it'll typically turn a chain into a readability mess.

Are there any compiler issues?
Are there any debugging issues?
Yes. Since all the chained method calls are seen as one expression, even if you write them on multiple lines as in the Wikipedia example you linked, it's a problem when debugging because you can't single-step through them.
Are there any runtime/error handling issues?
Edited: Here's a test console application I wrote to test the actual Runtime overhead of using Fluent Interfaces. I assigned 6 properties for each iteration (actually the same 2 values 3 times each). The conclusions are:
With interfaces: 70% increase in runtime, depends on the number of properties set. Setting only two properties the overhead was smaller.
With objects: Using fluent interfaces was faster
Didn't test records. It can't work well with records!
I personally don't mind those "fluent interfaces". Never heard of the name before, but I've been using them, especially in code that populates a list from code. (Somewhat like the XML example you posted). I don't think they're difficult to read, especially if one's familiar with this kind of coding AND the method names are meaningful. As for the one long line of code, look at the Wikipedia example: You don't need to put it all on one line of code.
I clearly remember using them with Turbo Pascal to initialize Screens, which is probably why I don't mind them and also use them at times. Unfortunately Google fails me, I can't find any code sample for the old TP code.

I would question the benefit of using "fluent interfaces".
From what I see, the point is to allow you to avoid having to declare a variable. So the same benefit the dreaded With statement brings, with a different set of problems (see other answers)
Honestly I never understood the motivation to use the With statement, and I don't understand the motivation to use fluent interfaces either. I mean is it that hard to define a variable ?
All this mumbo jumbo just to allow laziness.
I would argue that rather than increase readability it, which at first glance it seems to do by having to type/read less, it actually obfuscates it.
So again, I ask why would you want to use fluent interfaces in the first place ?
It was coined by Martin Fowler so it must be cool ? Nah I ain't buying it.

This is a kind of write-once-read-never notation that is not easy to understand without going through documentation for all involved methods. Also such notation is not compatible with Delphi and C# properties - if you need to set properties, you need to rollback to using
common notations as you can't chain property assignments.


Why is the destructor in Delphi named?

Destructors in Delphi are usually named "Destroy", however as far as i understand you can also
name destructors differently
have multiple destructors
Is there any reason why this was implemented this way? What are the possible use cases for differently named / multiple destructors?
In theory you can manually call different destructors to free different external resources, like breaking ref-counting loops, deleting or just closing file, etc.
Also, since the Object Pascal language does not have those magical new/delete operations, there just should be some identifier to call for disposing of the object.
I'd prefer to look at that in retrospect.
"Turbo Pascal with Objects" style objects have both - you call a "magical" Dispose procedure but explicitly specify a destructor to call, since language itself did not knew what to choose. Similarly "magic" procedure New had to be supplied with a manually selected constructor.
This however violates DRY principle: compiler knows that we are calling d-tor or c-tor, but yet we have to additionally call those "New" and "Dispose" functions. In theory that probably provided to decouple memory allocation and information feeding and combine them anyway we'd like. But i don't think this feature was actually used anything wide.
Interesting that the same design is used in Apple Objective C. You 1st allocate memory for the object and after that you call a constructor for that new instance: http://en.wikipedia.org/wiki/Objective-C#Instantiation
When that model was streamlined for Delphi few decisions was made to make things more simplified (and unified). Memory [de]allocation strategy was shifted to the class level, rather than call-site. That made the redundancy of both calling "New" and named constructor very contrast. One had to be dropped.
C++/C#/Java chosen to retain a special language-level keywords for it, using overloaded functions to provide different c-tors. Perhaps that corresponds to USA style of computer languages.
However Pascal at its core has two ideas: verbosity and small vocabulary. Arguably they can be tracked in other European-school languages like Scala. If possible, the keywords should be removed from language itself and moved to external modules - libraries that you can add or remove from project. And overloaded functions were introduced much later to the language and early preference was to surely have two differently named (self-documenting) function names.
This both ideas probably caused Delphi to remove "magic" procedures and to deduce object creation/destruction at the call-site just by used function names. If you call MyVar.Destroy then compiler looks at the declaration of .Destroy and knows we are deleting the object. Similarly it knows TMyType.CreateXXX(YYY,ZZZ) is an object instanbtiation due to the way CreateXXX was declared.
To make c-tor and d-tor no-named like in C++, Delphi would have to introduce two more keywords to the language level, like those C++ new and delete. And there seems to be no clear advantage in that. At least personally i better like Delphi way.
PS. I had to add there one assumption: we are talking about real C++ and Delphi languages as they were around 1995. They only featured manual memory control for heap-allocated objects, no garbage collection and no automatic ref-counting. You could not trigger object destruction by assigning variable with nil/NULL pointer.

How to effectively use interfaces for memory management in Delphi

I'm fairly new to Delphi and have been doing all my memory management manually, but have heard references to Delphi being able to use interfaces to do reference counting and providing some memory management that way. I want to get started with that, but have a few questions.
Just generally, how do I use it. Create the interface and the class implementing it. Then anytime I need that object, have the variable actually be of the Interface type, but instantiate the object and presto? No nee to think about freeing it? No more try-finallys?
It seems very cumbersome to create a bunch of interfaces for classes that really don't need them. Any tips on auto generating those? How do I best organize that? Interface and class in the same file?
What are common pitfalls that might cause me grief? Ex: Does casting the interfaced object to the an object of its class break my reference counting? Or are there any non-obvious ways Delphi would create reference loops? (meaning besides A uses B uses C uses A)
If there are tutorials that cover any of this, that would be great, but I didn't come up with anything in my searches. Thanks.
I am currently working with a very large project that takes advantage of the "side affect" of interface reference counting for the purpose of memory management.
My own personal conclusion is that you end up with a lot of code that is overly complex for no better reason than, "I don't have to worry about calling free"
I would strongly advise against this course of action for some very basic reasons:
1) You are using a side affect that exists for the purpose of COM compatibility.
2) You are making your object footprint and efficiency heavier. Interfaces are pointers to lists of pointers.. or something along those lines.
3) Like you stated... you now have to make piles of interfaces for the sole purpose of avoiding freeing memory yourself... this causes more trouble than it's worth in my opinion.
4) Most common bug that will be a HUGE pain to debug will become when an object gets freed, before it's reference. We have special code in our own reference counting to try and test for this problem before software goes out the door.
Now to answer your questions.
1) Given TFoo and interface IFoo you can have a method like the following
function GetFoo: IFoo;
Result := (TFoo.Create as IFoo);
...and presto, you don't need the finally to free it.
2) Yes like I said, you think it's a great idea, but it turns into a huge pain in the bupkis
3) 2 problems.
A) you have Object1.Interface2 and Object2.Interface1... these objects will never be freed due to the circular reference
B) Freeing the object before all the references are released, I cannot stress how dificult these bugs are to track down...
The most common complaint leading to the desire for "automatic garbage collection" in Delphi is the way that even short-lived temporary objects have to be disposed of manually and that you have to write a fair amount of "boiler-plate" code to ensure that this takes place when exceptions occur.
For example, creating a TStringList for some temporary sorting or other algorithmic purpose within a procedure:
procedure SomeStringsOperation(const aStrings: TStrings);
list: TStringList;
list := TStringList.Create;
// do some work with "list"
As you mentioned, objects that implement the COM protocol of reference counted lifetime management avoid this by cleaning themselves up when all references to them have been released.
But since TStringList isn't a COM object, you cannot enjoy the convenience this offers.
Fortunately there is a way to use COM reference counting to take care of these things without have to create all new, COM versions of the classes you wish to use. You don't even need to switch to an entirely COM based model.
I created a very simple utility class to allow me to "wrap" ANY object inside a lightweight COM container specifically for the purpose of getting this automatic cleanup behaiour. Using this technique you can replace the above example with:
procedure SomeStringsOperation(const aStrings: TStrings);
list: TStringList;
list := TStringList.Create;
// do some work with "list"
The AutoFree() function call creates an "anonymous" interfaced object that is Release()'d in the exit code generated by the compiler for the procedure. This autofree object is passed a pointer to the variable that references the object you wish to be free'd. Among other things this allows us to use the AutoFree() function as a pseudo-"declaration", placing any and ALL AutoFree() calls at the top of the method, as close as possible to the variable declarations that they reference, before we have even created any objects.
Full details of the implementation, including source code and further examples, are on my blog in this post.
The memory management of interfaces is done through implementation of _AddRef and _Release which are implemented by TInterfacedObject.
In general using interfaces to make memory management less cumbersome can be a nice idea, but you need to take care of these things:
Make sure the classes that implement interfaces are derived from TInterfacedObject or roll your own ancestor class that provides good implementations for _AddRef and _Release
Use either/or: so either user interfaces references, or use object instance references, don't mix them. That can be problematic when implementing interfaces in components (as those derive from TComponent, not TInterfacedObject)
Don't go the TInterfacedComponent way as that mixes Owner based memory management and _AddRef/_Release based memory management
Watch circular interface references (you can go around implementing "weak interface references" mentioned here and implemented here)
You need to maintain extra code as you need to define interfaces for the parts your classes that you want to expose, and keep those two in sync (you could Model Maker Code Explorer for this; it allows you to extract interfaces and in general boost your development because it manages the interface/implementation parts of code in single-actions)
You need some extra plumbing to create instances of the underlying classes. You can use the factory pattern for that.
That is not always effectively, but does answer a few of your underlying questions.
Shortest possible answer: The default delphi memory model is that owners free the objects they own. All other references are weak references and must let go before the owner does. "Sharing" an object that has a lifetime shorter than the entire lifetime of the app is rarely done. Reference counting is rarely done, and when it is done, it is only done by experts, or else it adds more bugs and crashes than it solves.
Learn idiomatic delphi style and try to imitate it, don't fight the grain. Sadly, people think that "program against interfaces, not implementations" means "Use IUnknown everywhere". That's not true. I recommend you don't use COM IUnknown interfaces, and use abstract base classes instead. The only thing you can't do is implement two abstract base classes in a single class, and the need for that is rare.
Update: I've recently found it helpful to use COM Interfaces (IUnknown based) to help me separate out my model and controller implementations from my UI classes. So I do find using IUnknown based interfaces useful. But there is not a lot of documentation and prior art out there to base your efforts on. I'd like to see a "cookbook" style recipe that lays all this out for people, so they can work without the usual problem of combining interface and non-interface based lifetime management, and all the trouble that comes while you get used to that extra complexity.
Switching to interfaces only for avoiding manual Free's is senseless. Little economy in Free/try-finally lines will hardly compensate the necessity of declaring both g/setters and properties in the interface not mentioning the necessity of keeping the intf/class declarations in sync. Interfaces also bring performance loss due to implicit finalize code and reference counting. If performance is not the main point and all you want to achieve is autofreeing, I'd recommend using some universal interface wrappers like the one Deltics suggested.

What are the pros and cons of using interfaces in Delphi?

I have used Delphi classes for a while now but never really got into using interfaces. I already have read a bit about them but want to learn more.
I would like to hear which pros and cons you have encountered when using interfaces in Delphi regarding coding, performance, maintainability, code clearness, layer separation and generally speaking any regard you can think of.
All I can think of for now:
Clear separation between interface and implementation
Reduced unit dependencies
Multiple inheritance
Reference counting (if desired, can be disabled)
Class and interface references cannot be mixed (at least with reference counting)
Getter and setter functions required for all properties
Reference counting does not work with circular references
Debugging difficulties (thanks to gabr and Warren for pointing that out)
Adding to the answers few more advantages:
Use interfaces to represent the behavior and each implementation of a behavior will implement the interface.
API Publishing: Interfaces are great to use when publishing APIs. You can publishing an interface without giving out the actual implementation. So you are free to make internal structural changes without causing any problems to the clients.
All I say is that interfaces WITHOUT reference counting are VERY HIGH on my wishlist for delphi!!!
--> The real use of interfaces is the declaration of an interface. Not the ability for reference counting!
There are some SUBTLE downsides to interfaces that I don't know if people consider when using them:
Debugging becomes more difficult. I have seen a lot of strange difficulties stepping into interfaced method calls, in the debugger.
Interfaces in Delphi come with IUnknown semantics, if you like it or not, you'r stuck with reference counting being a supported interface. And, thus, with any interfaces created in Delphi's world, you have to be sure you handle reference counting correctly, and if you don't, you'll end up with leaks. When you want to avoid reference counting, your only choice is to override addref/decref and don't actually free anything, but this is not without its own problems. I find that the more heavily interface-laden codebases have some of the hardest-to-find access violations, and memory leaks, and this is, I think because it is very difficult to combine the refcount semantics, and the default delphi semantics (owner frees objects, and nobody else does, and most objects live for the entire life of their parents.).
Badly-done implementations using Interfaces can contribute some nasty code-smells. For example, Interfaces defined in the same unit that defines the initial concrete implementation of a class, add all the weight of interfaces, without really providing proper separation between the users of the interfaces and the implementors. I know this isn't a problem with interfaces themselves, but more of a quibble with those who write interface-based code. Please put your interface declarations in units that only have those interface declarations in them, and avoid unit-to-unit dependency hell caused by glomming your interface declarations into the same units as your implementor classes.
I mostly use interfaces when I want objects with different ancestry to offer a common service. The best example I can think of from my own experience is an interface called IClipboard:
IClipboard = interface
function CopyAvailable: Boolean;
function PasteAvailable(const Value: string): Boolean;
function CutAvailable: Boolean;
function SelectAllAvailable: Boolean;
procedure Copy;
procedure Paste(const Value: string);
procedure Cut;
procedure SelectAll;
I have a bunch of custom controls derived from standard VCL controls. They each implement this interface. When a clipboard operation reaches one of my forms it looks to see if the active control supports this interface and, if so, dispatches the appropriate method.
For a very simple interface you can do this with an of object event handler, but once it gets sufficiently complex an interface works well. In fact I think that is a very good analogue. Use an interface where you a single of object event won't fit the functionality.
Interfaces solves a certain kind of issues. The primary function is to... well, ...define interfaces. To distinguish between definition and implementation.
When you want to specify or check if a class supports a set of methods - use interfaces.
You cannot do that in any other way.
(If all classes inherits from the same base class, then an abstract class will define the interface. But when you are dealing with different class hierarchies, you need interfaces to define the methods thy have in common...)
Extra note on
Cons: Performance
I think many people are too blithely dismissing the performance penalty of interfaces. (Not that I don't like and use interfaces but you should be aware of what you are getting into). Interfaces can be expensive not just for the _AddRef / _Release hit (even if you are just returning -1) but also that properties are REQUIRED to have a Get method. In my experience, most properties in a class have direct access for the read accessor (e.g., propery Prop1: Integer read FProp1 write SetProp1). Changing that direct, no penalty access to a function call can be significant hit on your speed (especially when you start adding 10s of property calls inside a loop.
For example, a simple loop using a class
for i := 0 to 99 do
j := (MyClass.Prop1 + MyClass.Prop2 + MyClass.Prop3) / MyClass.Prop4;
// do something with j
goes from 0 function calls to 400 function calls when the class becomes an interface. Add more properties in that loop and it quickly gets worse.
The _AddRef / _Release penalty you can ameliorate with some tips (I am sure there are other tips. This is off the top of my head):
Use WITH or assign to a temp variable to only incur the penalty of one _AddRef / _Release per code block
Always pass interfaces using const keyword into a function (otherwise, you get an extra _AddRef / _Release occurs every time that function is called.
The only case when we had to use interfaces (besides COM/ActiveX stuff) was when we needed multiple inheritance and interfaces were the only way to get it. In several other cases when we attempted to use interfaces, we had various kinds of problems, mainly with reference counting (when the object was accessed both as a class instance and via interface).
So my advice would be to use them only when you know that you need them, not when you think that it can make your life easier in some aspect.
Update: As David reminded, with interfaces you get multiple inheritance of interfaces only, not of implementation. But that was fine for our needs.
Beyond what others already listed, a big pro of interfaces is the ability of aggregating them.
I wrote a blog post on that topic a while ago which can be found here: http://www.nexusdb.com/support/index.php?q=intf-aggregation (tl;dr: you can have multiple objects each implementing an interface and then assemble them into an aggregate which to the outside world looks like a single object implementing all these interfaces)
You might also want to have a look at the "Interface Fundamentals" and "Advanced Interface Usage and Patterns" posts linked there.

Delphi : Handling the fact that Strings are not Objects

I am trying to write a function that takes any TList and returns a String representation of all the elements of the TList.
I tried a function like so
function ListToString(list:TList<TObject>):String;
This works fine, except you can't pass a TList<String> to it.
E2010 Incompatible types: 'TList<System.TObject>' and 'TList<System.string>'
In Delphi, a String is not an Object. To solve this, I've written a second function:
function StringListToString(list:TList<string>):String;
Is this the only solution? Are there other ways to treat a String as more 'object-like'?
In a similar vein, I also wanted to write an 'equals' function to compare two TLists. Again I run into the same problem
function AreListsEqual(list1:TList<TObject>; list2:TList<TObject>):boolean;
Is there any way to write this function (perhaps using generics?) so it can also handle a TList<String>? Are there any other tricks or 'best practises' I should know about when trying to create code that handles both Strings and Objects? Or do I just create two versions of every function? Can generics help?
I am from a Java background but now work in Delphi. It seems they are lately adding a lot of things to Delphi from the Java world (or perhaps the C# world, which copied them from Java). Like adding equals() and hashcode() to TObject, and creating a generic Collections framework etc. I'm wondering if these additions are very practical if you can't use Strings with them.
[edit: Someone mentioned TStringList. I've used that up till now, but I'm asking about TList. I'm trying to work out if using TList for everything (including Strings) is a cleaner way to go.]
Your problem isn't that string and TObject are incompatible types, (though they are,) it's that TList<x> and TList<y> are incompatible types, even if x and y themselves are not. The reasons why are complicated, but basically, it goes like this.
Imagine your function accepted a TList<TObject>, and you passed in a TList<TMyObject> and it worked. But then in your function you added a TIncompatibleObject to the list. Since the function signature only knows it's working with a list of TObjects, then that works, and suddenly you've violated an invariant, and when you try to enumerate over that list and use the TMyObject instances inside, something's likely to explode.
If the Delphi team added support for covariance and contravariance on generic types then you'd be able to do something like this safely, but unfortunately they haven't gotten around to it yet. Hopefully we'll see it soon.
But to get back to your original question, if you want to compare a list of strings, there's no need to use generics; Delphi has a specific list-of-strings class called TStringList, found in the Classes unit, which you can use. It has a lot of built-in functionality for string handling, including three ways to concatenate all the strings into a single string: the Text, CommaText and DelimitedText properties.
Although it is far from optimal, you can create string wrapper class, possibly containing some additional useful functions which operate on strings. Here is example class, which should be possibly enhanced to make the memory management easier, for example by using these methods.
I am only suggesting a solution to your problem, I don't agree that consistency for the sake of consistency will make the code better. If you need it, Delphi object pascal might not be the language of choice.
It's not cleaner. It's worse. It's a fundamentally BAD idea to use a TList<String> instead of TStringList.
It's not cleaner to say "I use generics everywhere". In fact, if you want to be consistent, use them Nowhere. Avoid them, like most delphi developers avoid them, like the plague.
All "lists" of strings in the VCL are of type TStringList. Most collections of objects in most delphi apps use TObjectList, instead of templated types.
It is not cleaner and more consistent to be LESS consistent with the entire Delphi platform, and to pick on some odd thing, and standardize on it, when it will be you, and you alone, doing this oddball thing.
In fact, I'm still not sure that generics are safe to use heavily.
If you start using TList you won't be able to copy it cleanly to your Memo.Lines which is a TStringList, and you will have created a type incompatibility, for nothing, plus you will have lost the extra functionality in TStringList. And instead of using TStringList.Text you have to invent that for yourself. You also lose LoadFromFile and SaveToFile, and lots more. Arrays of strings are an ubiquitous thing in Delphi, and they are almost always a TStringList. TList<String> is lame.

How do programmers practice code reuse

I've been a bad programmer because I am doing a copy and paste. An example is that everytime i connect to a database and retrieve a recordset, I will copy the previous code and edit, copy the code that sets the datagridview and edit. I am aware of the phrase code reuse, but I have not actually used it. How can i utilize code reuse so that I don't have to copy and paste the database code and the datagridview code.,
The essence of code reuse is to take a common operation and parameterize it so it can accept a variety of inputs.
Take humble printf, for example. Imagine if you did not have printf, and only had write, or something similar:
//convert theInt to a string and write it out.
char c[24];
itoa(theInt, c, 10);
Now this sucks to have to write every time, and is actually kind of buggy. So some smart programmer decided he was tired of this and wrote a better function, that in one fell swoop print stuff to stdout.
printf("%d", theInt);
You don't need to get as fancy as printf with it's variadic arguments and format string. Even just a simple routine such as:
void print_int(int theInt)
char c[24];
itoa(theInt, c, 10);
would do the trick nickely. This way, if you want to change print_int to always print to stderr you could update it to be:
void print_int(int theInt)
fprintf(stderr, "%d", theInt);
and all your integers would now magically be printed to standard error.
You could even then bundle that function and others you write up into a library, which is just a collection of code you can load in to your program.
Following the practice of code reuse is why you even have a database to connect to: someone created some code to store records on disk, reworked it until it was usable by others, and decided to call it a database.
Libraries do not magically appear. They are created by programmers to make their lives easier and to allow them to work faster.
Put the code into a routine and call the routine whenever you want that code to be executed.
Check out Martin Fowler's book on refactoring, or some of the numerous refactoring related internet resources (also on stackoverflow), to find out how you could improve code that has smells of duplication.
At first, create a library with reusable functions. They can be linked with different applications. It saves a lot of time and encourages reuse.
Also be sure the library is unit tested and documented. So it is very easy to find the right class/function/variable/constant.
Good rule of thumb is if you use same piece three times, and it's obviously possible to generalize it, than make it a procedure/function/library.
However, as I am getting older, and also more experienced as a professional developer, I am more inclined to see code reuse as not always the best idea, for two reasons:
It's difficult to anticipate future needs, so it's very hard to define APIs so you would really use them next time. It can cost you twice as much time - once you make it more general just so that second time you are going to rewrite it anyway. It seems to me that especially Java projects of late are prone to this, they seem to be always rewritten in the framework du jour, just to be more "easier to integrate" or whatever in the future.
In a larger organization (I am a member of one), if you have to rely on some external team (either in-house or 3rd party), you can have a problem. Your future then depends on their funding and their resources. So it can be a big burden to use foreign code or library. In a similar fashion, if you share a piece of code to some other team, they can then expect that you will maintain it.
Note however, these are more like business reasons, so in open source, it's almost invariably a good thing to be reusable.
to get code reuse you need to become a master of...
Giving things names that capture their essence. This is really really important
Making sure that it only does one thing. This is really comes back to the first point, if you can't name it by its essence, then often its doing too much.
Locating the thing somewhere logical. Again this comes back to being able to name things well and capturing its essence...
Grouping it with things that build on a central concept. Same as above, but said differntly :-)
The first thing to note is that by using copy-and-paste, you are reusing code - albeit not in the most efficient way.
You have recognised a situation where you have come up with a solution previously.
There are two main scopes that you need to be aware of when thinking about code reuse. Firstly, code reuse within a project and, secondly, code reuse between projects.
The fact that you have a piece of code that you can copy and paste within a project should be a cue that the piece of code that you're looking at is useful elsewhere. That is the time to make it into a function, and make it available within the project.
Ideally you should replace all occurrances of that code with your new function, so that it (a) reduces redundant code and (b) ensures that any bugs in that chunk of code only need to be fixed in one function instead of many.
The second scope, code reuse across projects, requires some more organisation to get the maximum benefit. This issue has been addressed in a couple of other SO questions eg. here and here.
A good start is to organise code that is likely to be reused across projects into source files that are as self-contained as possible. Minimise the amount of supporting, project specific, code that is required as this will make it easier to reuse entire files in a new project. This means minimising the use of project specific data-types, minimising the use project specific global variables, etc.
This may mean creating utility files that contain functions that you know are going to be useful in your environment. eg. Common database functions if you often develop projects that depend on databases.
I think the best way to answer your problem is that create a separate assembly for your important functions.. in this way you can create extension methods or modify the helper assemble itself.. think of this function..
ExportToExcel(List date, string filename)
this method can be use for your future excel export functions so why don't store it in your own helper assembly.. i this way you just add reference to these assemblies.
Depending on the size of the project can change the answer.
For a smaller project I would recommend setting up a DatabaseHelper class that does all your DB access. It would just be a wrapper around opening/closing connections and execution of the DB code. Then at a higher level you can just write the DBCommands that will be executed.
A similar technique could be used for a larger project, but would need some additional work, interfaces need to be added, DI, as well as abstracting out what you need to know about the database.
You might also try looking into ORM, DAAB, or over to the Patterns and Practices Group
As far as how to prevent the ole C&P? - Well as you write your code, you need to periodically review it, if you have similar blocks of code, that only vary by a parameter or two, that is always a good candidate for refactoring into its own method.
Now for my pseudo code example:
Function GetCustomer(ID) as Customer
Dim CMD as New DBCmd("SQL or Stored Proc")
CMD.Paramaters.Add("CustID",DBType,Length).Value = ID
Dim DHelper as New DatabaseHelper
DR = DHelper.GetReader(CMD)
Dim RtnCust as New Customer(Dx)
Return RtnCust
End Function
Class DataHelper
Public Function GetDataTable(cmd) as DataTable
Write the DB access code stuff here.
Do DB Operation
Close Connection
End Function
Public Function GetDataReader(cmd) as DataReader
Public Function GetDataSet(cmd) as DataSet
... And So on ...
End Class
For the example you give, the appropriate solution is to write a function that takes as parameters whatever it is that you edit whenever you paste the block, then call that function with the appropriate data as parameters.
Try and get into the habit of using other people's functions and libraries.
You'll usually find that your particular problem has a well-tested, elegant solution.
Even if the solutions you find aren't a perfect fit, you'll probably gain a lot of insight into the problem by seeing how other people have tackled it.
I'll do this at two levels. First within a class or namespace, put that code piece that is reused in that scope in a separate method and make sure it is being called.
Second is something similar to the case that you are describing. That is a good candidate to be put in a library or a helper/utility class that can be reused more broadly.
It is important to evaluate everything that you are doing with an perspective whether it can be made available to others for reuse. This should be a fundamental approach to programming that most of us dont realize.
Note that anything that is to be reused needs to be documented in more detail. Its naming convention be distinct, all the parameters, return results and any constraints/limitations/pre-requisites that are needed should be clearly documented (in code or help files).
It depends somewhat on what programming language you're using. In most languages you can
Write a function, parameterize it to allow variations
Write a function object, with members to hold the varying data
Develop a hierarchy of (function object?) classes that implement even more complicated variations
In C++ you could also develop templates to generate the various functions or classes at compile time
Easy: whenever you catch yourself copy-pasting code, take it out immediately (i.e., don't do it after you've already CP'd code several times) into a new function.
