Things you look for when trying to understand new software - analysis

I wonder what sort of things you look for when you start working on an existing, but new to you, system? Let's say that the system is quite big (whatever it means to you).
Some of the things that were identified are:
Where is a particular subroutine or procedure invoked?
What are the arguments, results and predicates of a particular function?
How does the flow of control reach a particular location?
Where is a particular variable set, used or queried?
Where is a particular variable declared?
Where is a particular data object accessed, i.e. created, read, updated or deleted?
What are the inputs and outputs of a particular module?
But if you look for something more specific or any of the above questions is particularly important to you please share it with us :)
I'm particularly interested in something that could be extracted in dynamic analysis/execution.

I like to use a "use case" approach:
First, I ask myself "what's this software's purpose?": I try to identify how users are going to interact with the application;
Once I have some "use case", I try to understand what are the objects that are more involved and how they interact with other objects.
Once I did this, I draw a UML-type diagram that describe what I've just learned for further reference. What happens after depends on the task I've been assigned, i.e. modify the code, document the code etc.

There is the question of what motivation do I have for learning the new system:
Bug fix/minor enhancement - In this case, I may focus solely on that portion of the system that performs a specific function that needs to be altered. This is a way to break down a huge system but also is a way to identify if the issue is something I can fix or if it is something that I have to hand to the off-the-shelf company whose software we are using,e.g. a CRM, CMS, or ERP system can be a customized off-the-shelf system so there are many pieces to it.
Project work - This would be the other case and is where I'd probably try to build myself a view from 30,000 feet or so to know what are the high-level components and which areas of the system does the project impact. An example of this is where I'd join a company and work off of an existing code base but I don't have the luxury of having the small focus like in the previous case. Part of that view is to look for any patterns in the code in terms of naming conventions, project structure, etc. as this may be useful once I start changing some code in the system. I'd probably do some tracing through the system and try to see where are the uglier parts of the code. By uglier I mean those parts that are kludge-like and may have some spaghetti code as this was rushed when first written and is now being reworked heavily.
To my mind another way to view this is the question of whether I'm going to be spending days or weeks wrapping my head around a system like in the second case or should this be a case where it hopefully takes only a few hours, optimistically that is, to get my footing to make the necessary changes.

Related

File Path Name or URL analysis

I am looking for information on tools, methods, techniques for analysis of file path names. I am not talking file size, read/write times, or file types, but analysis of the path or URL it self.
I am only aware of basic word frequency text tools or methods, but I am wondering if there is something more advanced that people use/apply to this to try and mine extra information out of them.
Thanks!
UPDATE:
Here is the most narrow example of what I would want. OK, so I have some full path names as strings like this:
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File1.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File2.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File3.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File4.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File5.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File1.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File2.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File3.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File4.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File5.doc
What I want to know is that the folder MapShedMaps appears "uniquely" 2 times. If I do frequency on the strings I would get 10 appearances. The issues is that I don’t know what level in the directory this is important, so I would like a unique count at each level of the directory based on what I am describing.
This is an extremely broad question so it is difficult for me to give you a per say "Answer" but I will give you my first thoughts on this.
First,
the Regular expression class of .NET is extremely useful for parsing large amounts of information. It is so powerful that it will easily confuse the impatient, however once mastered it can be used across text editors, .NET and pretty much any other respectable language I believe. This would allow you to search strings and separate it into directories. This could be overkill depending on how you use it, but its a thought. Here is a favorite link of mine to try out some regular expressions.
Second,
You will need a database, I prefer to use SQL. Look into how to connect to databases and create databases. With this database you can store all the fields abstracted from your original path entered. Such as a parent directory, child directory, common file types accessed. Just have a field for each one of these and through queries you can form a hypothesis as to redundancy.
Third,
I don't know if its easily accessible but you might look into whether windows stores accessed file history. It seems to have some inkling as to which files have been opened in the past. So there may be a resource in windows which already stores much of the information you would be storing in your database. If you could find a way to access this information. Parse it with regular expressions and resubmit it to the database of your application. You could control the WORLD! j/k... You could get a pretty good prediction as to user access patterns though.
Fourth,
I always try to stick with what I have available. If .NET is sitting in front of you, hammer away at what your trying to do. If you reach a wall. At least your making forward progress. In today's motion towards object orientated programming, you can usually change data collected by one program into an acceptable format for another. You just gotta dig a little.
Oh and btw, Coursera.com is actually doing a free class on machine learning and algorithms. You might want to check it out or reference it for prediction formulas.
Good Luck.
I wanted to post this as a comment but SO kept editing the double \ to \ and it is important there are two because \ is a key character, without another \ to escape it, regex will interpret it as a command.
Hey I just wanted to let you know I've been playing with some regex... I know a pretty easy way to code this up in VB.net and I'll post that as my second answer but I wanted you to check out back-references. If the part between parenthesis matches it captures that text and moves on to the second query for instance....
F:\\(directory1)?(directory2)?(directory3)?
You could use these matches to find out how many directories each parent directory has under it. Are you following me? Here is a reference.

Programming for and by yourself

If you're writing something by yourself, whether to practice, solve a personal problem, or just for entertainment, is it ok, once in a while, to have a public field? Maybe?
Let me give you an analogy.
I come from a part of the world where English is not the primary language. But it’s necessary for all things in life.
During one of those usual days during my pre-teen years I said something very funny in English. Then my Dad said, “Son, think in English. Then you’ll get fluent”
I think it applies perfectly to this situation.
Think,try and question best practices in your playground. You will soon realize what’s best for what.Why are properties needed in the first place. Why should this be public? Why should I not call a virtual member from the constructor? Let me try using "new" modifier for a method call. What happens when I write 10 nested levels of if-then-else and try reading it again after 10 days. Why the heck should I use a factory pattern for a simple project. Et cetera.
And then you’ll realize without shooting at your foot, why design patterns are patterns...
I think it's reasonable if you're consciously throwing the code away afterwards. In particular, if you're experimenting with something completely different, taking shortcuts makes sense. Just don't let it lead to habits which cross over into "real" code.
Violating general principles is always "ok"! It is not an error to violate a principle but it is a trade off. The cost of not writing clean code will be higher the longer your software will survive. My take on this is: If in doubt make it clean!
Of course it's OK. It's your code, you can do whatever you want with it. Personally, I try to stick to good practice also in my private code, just to make it a natural habit so I don't have to think about it.
The short answer is yes, if you believe that you're gaining a lot by making things public instead of private with accessors you are welcome to do so. Consistency, I think, is the biggest thing to keep in mind. For instance, don't make some variables straight public, and some not. Do the same across the board if you break with best practices. It comes back to a trade-off. Almost no-one follows many of the IEEE specs for how Software Engineering should be executed and documented because the overhead is far too great, and it can get unmanageable. The same is true for personal, light-weight programming. It's okay to do something quick and dirty, just do not get used to it.
Public members are acceptable in the Data Transfer Object design patter:
Typically, the members in the Transfer Object are defined as public, thus eliminating the need for get and set methods.
One of the key advantages of OOP is for scaling and maintainability. By encapsulating code, one can hide the implementation. This means other programmers don't have to know the implementation, and can't change your object's internal state. If you language doesn't support properties, you end up with a lot of code which obfuscates and bloats your project. If the code doesn't need to be worked on by multiple programmers, you aren't producing a reusable component, and YOU are the maintenance programmer, then code in whatever manner allows you to get things done.
Does a maid need to make his/her own bed in the morning in order to practice properly making a bed?
Side note: it also depends on the language:
In Scala, according to the Uniform Access Principle, clients read and write field values as if they are publicly accessible, even though in some case they are actually calling methods. The maintainer of the class has the freedom to change the implementation without forcing users to make code changes.
Scala keeps field and method names in the same namespace.
Many languages, like Java, keep field and method names in separate namespaces.
However, these languages can’t support the uniform access principle as a result, unless they build in ad hoc support in their grammars or compilers.
So the real question is:
What service are you exposing (here by having a public field)?.
If the service (get/set a given type value) makes sense for your API, then the "shortcut" is legitimate.
As long as you encapsulate that field eventually, is it ok because you made the shortcut for the "right" reason (API and service exposure), versus the "wrong" reason (quick ad-hoc access).
A good unit test (thinking like the user of your API) can help you check if that field should be accessed directly or if it is only useful for internal development of other classes within your program.
Here's my take on it:
I'd advise avoiding public fields. They have a nasty habit of biting you later on because you can't control them. (The word you're looking for here is volatility.) Further, if you decide to change their internal implementation, you have to touch a lot more code.
Then again, that's what refactoring tools are for. If you have a decent refactoring tool, that's not nearly so difficult.
There is no silver bullet. I can't repeat this enough. If you have work to do, and you need to get it done in a hurry, writing one line of code instead of eight (as is the case in Visual Basic) is certainly faster.
Rules were meant to be broken. If a rule doesn't necessarily apply in your case, don't use it. Design patterns, coding guidelines, laws and best practices should not be treated as a straightjacket that requires you to needlessly complicate your code to the point where it is enormously complex and difficult to understand and maintain. Don't let someone force you into a practice just because it's popular or "standard" when it doesn't fit your requirements.
Again, this is a subjective opinion, and your mileage may vary.

Difference between BPM and App. workflow?

I know there is a lot of talk about BPM these days and I am conscious that some may see it to be a craze rather than a fundamentally important piece of software.
As someone from what most would call 'The Business', I have been doing my best to learn about BPM to ensure we continue to make decisions that not only make sense to the business, but IT as well.
I have noticed while reading that mention is made to application workflow when sometimes discussing BPM. I hadn't given this much thought until recently.
Therefore, what is the difference? When would you use one and not the other?
BPM is about the process and improving it, which takes into account users and potentially more than one application,e.g. an ERP system may have more than one application to it, though there may be other uses of the term. Note that the process could be viewed without what applications or technologies are used.
Application workflow is how an application is used to go from a to b. Here it is a specific set of code that is used and what happens over the course of an application getting from a to b. In this case, the application is front and center rather than the process.
Does that provide an answer? Another way to think of it is that multiple application workflows can make up a system which is used in a process that can have BPM applied to it.
Late to the game, but workflow is to database as BPMS is to DBMS. (Convenient how the letters line up, huh?)
IOW, BPM(S) is traditionally meant to refer to a particular framework/application that allows you to manage business processes: defining them, storing them, versioning them, measuring them, etc. This is similar to how a DBMS manages databases.
Now, a workflow is a definition, much like a database is a definition. In the former case, it is a definition of operations/work (Fufill Order), steps thereof (Send Invoice) and rules/constraints on the work (If no stock, send notice). In the latter, similar case, it is a definition of data structure (CREATE TABLE) and constraints (InvoiceTotal must be > $0.00).
I think this is a potentially confusing subject, particular as some development environments use a type of process flow model to generate user facing applications (I'm thinking about Outsystems here, for example).
But, for me, the distinction is crystal clear. Application workflow, as people talk about it, refers to a user's path through an application, i.e. the pages they complete/visit, the data they enter, etc. on their way to completing a transaction of some sort. Application orkflow is a poor term for this though, I think application flow would be more meaningful.
BPM on other hand, is about modelling and executing a workflow process. By workflow, in this context, I mean a series of discrete steps (or tasks) that have to be completed (either programmatically or via human interaction) in a certain order to complete a process. These tasks can be implemented as individual application modules (each with their own "application workflow", see above). The job of the workflow engine is to make sure that these separate steps are assigned to the right people (of groups of people) in the right sequence, and that overall the process completes in an orderly way.
I don't think there's a clear answer to this at all. These are words, as opposed to theoretical concepts. If you add the word "checklist" into the mix - that just turns out to be a linear version of a process (but you can have conditionals in checklists - making them a workflow).
I am not sure how to help in reframing this question, but it's almost as if no answer can ever be possible. My own thoughts are at https://tallyfy.com/improving-efficiency-workflow-vs-business-process-management/

Does anyone have a good analogy for dependency injection? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have read a lot of articles on Dependency Injection as well as watched a lot of videos, but I still can't get my head around it. Does anyone have a good analogy to explain it?
I watched the first part of the Autumn of Agile screencast and still was a little confused.
Analogy? I'll give it a whack... Your CD Player stereo is useless without a CD with music on it... (It's dependent on the CD). If they built CD Players with the CD already in it, it would get boring very quickly...
So they build them so you can "inject" the CD, (on which it is dependent) into the player. That way you can inject a different one each time, and get "different" behavior (music) dependent on which one you inject.
The only requirement is that the CD must be compatible with the interface defined by the player. (You can't play a blue-ray disk in a 1992 CD player.)
The best analogy I can think of is that of hiring a mechanic.
Without dependency injection, you hire a mechanic and the mechanic brings his own tools. He may have lousy tools, he may have great tools, he may be using a pipe wrench when he should be using a socket. You don't know, and may not care, so long as he gets the work done.
With dependency injection, you hire a mechanic and you provide him with the tools that you want him to do his work with. You get to choose what you consider to be the best or most appropriate tools for the work you are hiring him to do.
Think of it as a realisation of the "Inversion of Control" pattern. I guess, your problem is, you are so used to it, you don't realize it's that simple.
Let's start at the beginning.
In the early days programs followed a given path through the code. The order of the called functions was given by the programmer.
In interactive programs, e.g. mostly ANY program, you can not say, which function is called at what time. Just look at a GUI or website. You can not say, at what time what button or link is clicked. So the "control" of what's happening is no longer at the program, it's at an outer source. The "control" has been inverted. The function is no longer "acting" it is instead "listening". Think of the hollywood principle: "Don't call us, we call you". A listener is a good example for a realisation of this pattern.
IoC is realized by functions or "methods" in the "object oriented world" of today.
"Dependency Injection" now means the same, but not for "methods", which do something, but for "objects", which hold data.
The data is no longer part of the object holding it. It is "injected" into the object at runtime. To stay in hollywood, think of a film star, playing golf to talk about the business, but to keep in shape, she hungers herself down, minimizing her muscle weight and therefore she is only able to carry one club at a time.
So, on the golf course her game would heavily depend on the one club, she is carrying.
Lucky for her, there are caddies, carrying a whole lot of clubs at one time, and also having the knowledge what club to use at what time. Now she is independent of her limited possibility to carry golf clubs. "Don't think about a concrete club to wear, we know them all and give you the right one at the right time".
The film star is the object and the golf clubs are the members of the object. That's dependency injection.
Maybe focus on the "injection" part? When I see that term, I think of syringes. The process of pushing the dependencies of a component to the component can be thought of as injecting into the component.
Just like with the body, when there is something that it needs in the way of medicine (a component that it needs) you can inject it into the body.
In their 2003 JavaPolis presentation (slides), Jon Tirsén & Aslak Hellesøy had an amusing analogy with a Girl object that needs a Boy to kiss. I seem to remember that the BoyFactory is sometimes known as a 'nightclub', but that's not in the slides.
Another analogy: let us say you are a developer and whenever you like you order computer science books from the market directly - you know the sellers and their prices. In fact your company might have a preferred seller and you contact them directly. All this works fine but may be a new seller is now offering better prices and your company wants to change the 'preferred' seller.
At this point you have to make the following changes - update the contact details (and other stuff) so as to use the new seller. You still place the order directly.
Now consider we introduce a new step in between, there is a 'library' officer in the company and you have to go through him to get the books. While there is a new dependency, you are now immune to any changes to the seller: either the seller changes mode of payment or the seller himself is changed, you now simply put an order to the librarian and he gets the books for you.
From Head First Design Patterns:
Remember, code should be closed (to change) like the lotus flower in the evening, yet open (to extension) like the lotus flower in the morning
A DI-enabled object can be configured by injecting behaviors defined in other classes. The original object structure doesn't have change in order to create many variations. The injection can be made explicit by having a class request other worker-classes in its constructor, or it can be less obvious when using monkeypatching in dynamic languages like Python.
Using an analogy of a Person class, you can take a basic human framework, pass it a set of organs, and watch it evolve. The Person doesn't directly know how the organs work, but their behaviors confirm to an expected interface and influence the owner's physical and mental manifestation.
A magician's sleight of hand! What you may think you see may be secretly manipulated or replaced.
Life is full of dependency injection analogies:
printer - cartridge
digital device - battery
letter - stamp
musician - instrument
bus - driver
sickness - pill
The essence of Inversion of Control (of which Dependency Injection is an implementation) is the separation of the use of an object from the management thereof.
The analogy/example I use is an engine. An engine requires fuel to run, i.e. it is depdendent on fuel. However, the engine cannot be responsible for the fuel it needs. It just 'asks' for fuel, and it is provided (typically by a fuel pump in a car).
The analogy starts breaking down when you look too deep, in that an engine doesn't ask for fuel, it is given it by some kind of management element, like an ECU. One might be able to compare the ECU to a container but I'm not certain how valid this is.
Your project manager asks you to write an app.
You could just write some code based on your career experience so far, but it's unlikely to be what your PM wants.
Better would be if your PM dependency injected you with say a spec for the app. Now your code is going to be related to the spec he gives you.
Better if you were told where the source repository was.
Better if you were told what the tech platform was.
Better if you were told when this needed to be done by.
Etc..
I think a great analogy is a six-year-old with a lego set.
You want your objects to be like the lego bricks. Each one is independent of all the others, and yet offers a clear interface for connecting them to the others. When connecting them together, it doesn't really matter exactly which two bricks you hook together so long as they have a matching interface.
Your dependency injection framework is like the six-year-old. He follows the instructions (i.e., your config file, annotations, etc.) to connect specific bricks together in certain ways to make a particular model.
Of course, since the bricks' interfaces are pretty generalized, they can go together in lots of different ways, so it's easy to come up with new sets of instructions which the six-year-old can use to make a completely different model out of the same bricks.

How do you make code reusable? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 2 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Any code can be reused in a way or an other, at least if you modify the code. Random code is not very reusable as such. When I read some books, they usually say that you should explicitly make the code reusable by taking into account other situations of code usage too. But certain code should not be an omnipotent all doing class either.
I would like to have reusable code that I don't have to change later. How do you make code reusable? What are the requirements for code being reusable? What are the things that reusable code should definitely have and what things are optional?
See 10 tips on writing reusable code for some help.
Keep the code DRY. Dry means "Don't Repeat Yourself".
Make a class/method do just one thing.
Write unit tests for your classes AND make it easy to test classes.
Remove the business logic or main code away from any framework code
Try to think more abstractly and use Interfaces and Abstract classes.
Code for extension. Write code that can easily be extended in the future.
Don't write code that isn't needed.
Try to reduce coupling.
Be more Modular
Write code like your code is an External API
If you take the Test-Driven Development approach, then your code only becomes re-usable as your refactor based on forthcoming scenarios.
Personally I find constantly refactoring produces cleaner code than trying to second-guess what scenarios I need to code a particular class for.
More than anything else, maintainability makes code reusable.
Reusability is rarely a worthwhile goal in itself. Rather, it is a by-product of writing code that is well structured, easily maintainable and useful.
If you set out to make reusable code, you often find yourself trying to take into account requirements for behaviour that might be required in future projects. No matter how good you become at this, you'll find that you get these future-proofing requirements wrong.
On the other hand, if you start with the bare requirements of the current project, you will find that your code can be clean and tight and elegant. When you're working on another project that needs similar functionality, you will naturally adapt your original code.
I suggest looking at the best-practices for your chosen programming language / paradigm (eg. Patterns and SOLID for Java / C# types), the Lean / Agile programming literature, and (of course) the book "Code Complete". Understanding the advantages and disadvantages of these approaches will improve your coding practice no end. All your code will then become reausable - but 'by accident', rather than by design.
Also, see here: Writing Maintainable Code
You'll write various modules (parts) when writing a relatively big project. Reusable code in practice means you'll have create libraries that other projects needing that same functionality can use.
So, you have to identify modules that can be reused, for that
Identify the core competence of each module. For instance, if your project has to compress files, you'll have a module that will handle file compression. Do NOT make it do more than ONE THING. One thing only.
Write a library (or class) that will handle file compression, without needing anything more than the file to be compressed, the output and the compression format. This will decouple the module from the rest of the project, enabling it to be (re)used in a different setting.
You don't have to get it perfect the first time, when you actually reuse the library you will probably find out flaws in the design (for instance, you didn't make it modular enough to be able to add new compression formats easily) and you can fix them the second time around and improve the reusability of your module. The more you reuse it (and fix the flaws), the easier it'll become to reuse.
The most important thing to consider is decoupling, if you write tightly coupled code reusability is the first casualty.
Leave all the needed state or context outside the library. Add methods to specify the state to the library.
For most definitions of "reuse", reuse of code is a myth, at least in my experience. Can you tell I have some scars from this? :-)
By reuse, I don't mean taking existing source files and beating them into submission until a new component or service falls out. I mean taking a specific component or service and reusing it without alteration.
I think the first step is to get yourself into a mindset that it's going to take at least 3 iterations to create a reusable component. Why 3? Because the first time you try to reuse a component, you always discover something that it can't handle. So then you have to change it. This happens a couple of times, until finally you have a component that at least appears to be reusable.
The other approach is to do an expensive forward-looking design. But then the cost is all up-front, and the benefits (may) appear some time down the road. If your boss insists that the current project schedule always dominates, then this approach won't work.
Object-orientation allows you to refactor code into superclasses. This is perhaps the easiest, cheapest and most effective kind of reuse. Ordinary class inheritance doesn't require a lot of thinking about "other situations"; you don't have to build "omnipotent" code.
Beyond simple inheritance, reuse is something you find more than you invent. You find reuse situations when you want to reuse one of your own packages to solve a slightly different problem. When you want to reuse a package that doesn't precisely fit the new situation, you have two choices.
Copy it and fix it. You now have to nearly similar packages -- a costly mistake.
Make the original package reusable in two situations.
Just do that for reuse. Nothing more. Too much thinking about "potential" reuse and undefined "other situations" can become a waste of time.
Others have mentioned these tactics, but here they are formally. These three will get you very far:
Adhere to the Single Responsibility
Principle - it ensures your class only "does one thing", which means it's more likely it will be reusable for another application which includes that same thing.
Adhere to the Liskov
Substitution Principle - it ensures your code "does what it's supposed without surprises", which means it's more likely it will be reusable for another application that needs the same thing done.
Adhere to the Open/Closed Principle - it ensures your code can be made to behave differently without modifying its source, which means it's more likely to be reusable without direct modification.
To add to the above mentioned items, I'd say:
Make those functions generic which you need to reuse
Use configuration files and make the code use the properties defined in files/db
Clearly factor your code into such functions/classes that those provide independent functionality and can be used in different scenarios and define those scenarios using the config files
I would add the concept of "Class composition over class inheritance" (which is derived from other answers here).
That way the "composed" object doesn't care about the internal structure of the object it depends on - only its behavior, which leads to better encapsulation and easier maintainability (testing, less details to care about).
In languages such as C# and Java it is often crucial since there is no multiple inheritance so it helps avoiding inheritance graph hell u might have.
As mentioned, modular code is more reusable than non-modular code.
One way to help towards modular code is to use encapsulation, see encapsulation theory here:
http://www.edmundkirwan.com/
Ed.
Avoid reinventing the wheel. That's it. And that by itself has many benefits mentioned above. If you do need to change something, then you just create another piece of code, another class, another constant, library, etc... it helps you and the rest of the developers working in the same application.
Comment, in detail, everything that seems like it might be confusing when you come back to the code next time. Excessively verbose comments can be slightly annoying, but they're far better than sparse comments, and can save hours of trying to figure out WTF you were doing last time.

Resources