I am currently a single BI developer over a corporate datawarehouse and cube. I use SQL Server 2008, SSAS, and SSIS as my basic toolkit. I use Visual Studio +BIDS and TFS for my IDE and source control. I am about to take on multiple projects with an offshore vendor and I am worried about managing change. My major concern is manging merges and changes between me and the offshore team. Merging and managing changes to SQL & XML for just one person is bad enough but with multiple developers it seems like a nightmare. Any thoughts on how best to structure development knowing that sometimes there is no way to avoid multiple individuals making changes to the same file?
SSIS, SSAS and SSRS files are not merge-friendly. They are stored in an xml file that is changed drastically - even with minor changes (such as changing a property) - so it becomes really impossible to merge.
So stop thinking about parallel development on one file. You need to think how you can achieve that people are not need to do parallel development on one file. So start with disabling the multiple checkout of a file. You might even want to consider to enable the option to get the latest version on a checkout.
Then start thinking how you can achieve that people can work independent. This is more in the way you structure the work and files:
Give people their own area they can work on. One SSIS package is only developed by person X at any given moment in time.
Make smaller files so the change that two people need to work in the same file is small.
I have given feedback to the product team of the imcompatability of BIDS to merge. It is a known issue, but will be hard to tackle. They don't know when it will be possible to really do parallel development on these files. Until then keep away from parallel development.
As Ewald Hofman mentioned, SSAS and SSIS is not merge-friendly.
In one environment I worked solved the problem as follows:
do only use SSIS when you have to (fuzz algorithm or something similar). Replace SSIS packages as often as you can with SQL code (see Linked Server for datasync. and MEARGE Command for dimension/fact-table-creating for instance).
build your data warehouse structure as follows:
build 2 databases, one for the "raw source data" from the source systems and one (the "stage" database) for the dimension and fact views and tables
use procedures that can deploy the whole "stage" database
put the structure for the "stage" database into your Repository
build a C# application that build your dimensions and cubes via the AMO API (I know, that's a tough job at the beginning but it is it worth - think on what you gain - Look at the Pros below )
add the stage database and the C# application to your Repository (TFS/Git etc.)
Pros of that structure:
you have a merge-able structure you can put in your Repository
you are using the AMO API witch has
you can automate the generation of new partitions
you can use procedures to automate and clone measure groups to different cubes (what I think is sometimes a big benefit!)
you could outsource your translation and import it easily (the cube designer is probably not the best translator)
Cons:
the vendor would probably not adapt that structure
you have to pay more (because of either higher skill requirements or for teaching him your individual structure)
you probably need knowledge over a new language C# - if you don't already have
Conclusion:
there are possibilities to get a merge-friendly environment
you will get lost of nice click-and-run tools f.e. BIDS - but will get into process of high automation functionality
outsourcing will be maybe unprofitable because of high individualization
http://code.google.com/p/support/wiki/DVCSAnalysis
maybe a better tag is DVCS?
https://stackoverflow.com/questions/tagged/dvcs
As long as both teams are using bids and TFS this should not be a problem.
assuming that your tsql code is checked in to source control in a single file per object, merging TSQL code is straight forward since it is text based. I have found the VSTS Database projects help with this.
Merging the XML based source files of SSIS and the MSAS can be cumbersome as you indicate below. to alleviate some of the pain, I find that keeping each package limited to a single dataflow or logical unit of work helps reduce developer contention on packages. I then call these packages from one or more master packages. I also try to externalize all of my tsql source queries using sprocs, view or udfs so that the need to edit the package is further reduced. using configuration files and variables also helps to a smaller extent.
MSSAS cubes are a little bit tougher. My best suggestion is to look into a 3rd party xml differencing tool. I have been able to successfully merge small changes use the standard text based tools but it can be a daunting task.
Related
My questions are listed at the end of this post but reading the background info first will help understand the context of the questions.
Background
I've had some limited experience working with TFS (on-premise) as a development team member in the past, and I'm generally familiar with the basic concepts. But now I'm also tasked with administering a VSTS instance (cloud). Most of my experience using TFS was centered around the project management features (backlogs, sprints, work items, etc.).
I'm rolling out VSTS in 2 phases. The first phase focuses on the non-technical project management aspects of the configuration. The second phase will focus on technical aspects such as managing source code and other development artifacts. I have some experience working with the source control aspects using TFVC repository, but no experience with Git repository.
I completed the preliminary design of the basic configuration (Teams, Area Path, Iteration Path, etc.) for VSTS. I suspect that my Area Path will need adjustment at some point in the near future. From what I've read, VSTS does a decent job of automatically updating Work Items (e.g. Features, Product Backlog Items, Tasks, etc.) when changes to the Area Path are made.
Questions
What I'd like to know is how other technical areas of the product (e.g. code repositories, build definitions, etc.) are affected when changes are made to Area Path. For example, if I inserted a completely new node somewhere within the Area Path, how badly will that "break" existing VSTS functionality (and what type of functionality will be broken and require manual repair)?
This is an easy one:
Area paths only affect work item tracking. Period.
We have a project with multiple customers. Suppose application name is Ali-Systems. Foo company will have Ali-Systems Foo, Bar company will have Ali-Systems Bar and so on. All versions have almost same structure. But their logic are different. For example there's a BillIssue method for all versions, but its logic may be different. Either how to calculate or even the difference between their DB models(we are using ASP.Net MVC Entity Framework)
As I understand there is some branching strategies categorized by Microsoft like Development Isolation(which enables
concurrent development of the next release, experimentation, or bug fixes), Release Isolation(which enables concurrent
release management), etc.
Can I take advantages of Branching for this purpose and how? Or I should create separate repository for each customer's project?
NOTE : In fact I want to clone the base of a project to some derived projects which will never merge again together. But we must able to add features to all derived projects(probably with help of another branch)
Microsoft has Visual Studio Team Foundation Server Branching and Merging Guide , that can be used to choose the branching structure. Including Main Only,Development isolation,Feature isolation,Release isolation,Servicing and Release isolation,Servicing, Hotfix, Release isolation...
In your case, the core of the project, is similar , but for each company has its own personalization of the product(logic / DB models ), and maybe will evolve independently in the future.
Using some branching structure, the main limitation it becomes a maintenance nightmare for solving bugs and global changes.
For example, if you want to merge changes from one company to another was very complex and it required retesting each version separately . Moreover, for bug tracking, one team will solve the bug in their version and another team will have to merge that change from the version they don't fully understand.
To support such development scenario suggest to share extensible core
and data model which will be configurable from upper customizable
application layers. This core should be used as the base for each
customer/company specific customization and maintained by separate
team(for each company). It will include some management complication
because multiple project managers may need resources from the same
team but it is a good way to make architecture consistent, control the
whole process and keep versions in sync.
Besides ,if a person or team need to work across multiple companies in the local machine, one way to keep the local environment clean is Using multiple workspaces with Visual Studio
Update
On a completely unrelated search, I have found this: Lightweight SQL database which doesn't require installation which lists a few possible options with about the same goals.
Original post
We have a desktop .Net/WPF app, with large (for a desktop app) amounts of data stored: it's has layouts, templates, products list and technical specs, and many more stuff.
Today it's stored in an Access DB, but we have hit the Access limitations pretty hard: it's not very fast, the DB weights 44Mb (which results in a large setup package), and more importantly, it's a pain to use with version control, because we can't merge the data from a branch to another. For instance, we might create a branch to add a few products, but then we have to add them manually in the trunk when we merge. We could use SQL scripts, but writing advanced SQL scripts for Access is a pain.
Basically, I want to replace the MS Access DB with another storage format, because Access is not well adapted.
I had tought of using JSON files that would be unzipped during or after install, but I'm a bit afraid of performance problems.
I'm also thinking of splitting the data into multiple files with multiple formats, depending on its usage, but using different formats might get complicated or annoying to develop.
Performance
Some parts of the DB are accessed pretty often and should be performance-optimized, whereas others are accessed maybe 1 or 2 times per work session, and using a poor-performance but high-compression format could be OK.
Size
We want the installer to be the smallest possible, so the library should be small, and the format should use small files. Using a library that adds 5 Mb to the installer file is out of the question.
Compatibility
The software must be able to run on .Net 4 (not 4.5), and it would be great if it ran on Windows XP (even though we're thinking more and more of just abandoning it going forward, it's still more than 7% of our market share).
Moreover, it should not need to install a server (like MS Access or SQLite) because it will be installed on end-user's computers, and we don't want to bloat them.
Versionning
It should be easy to version the data and the DB structure. The file should either be a text file (like JSON), or scripts should be easy to run in the continuous integration platform (like SQL server).
So, which technology would you use that answers all these contraints ?
Thanks !
As for your version control pains, it sounds from your description that if I were you, I'd keep the raw data in text files that are version-controlled, and only have the build process produce the database from them. This way, you should be able to use SQLite.
I would go for SQLite in your case, since the files are self-contained and easy to locate (hence easy to save on a version control system), installer is small, and performance is good.
http://www.sqlite.org/
I have a problem, I have several versions of the same application but the process of duplicating and managing several duplicate applications is becoming very complex, each copy gets unique features by client demand.
What methods are used to simplify this process?
Do I need to have detailed documentation about every App?
I'm trying to separate the code by modules and had them according to the clients demand, am I on the correct path?
Sorry for the bad English, any question just ask, I'm always online.
This can be managed in your code revision system. Git and Mercurial allow you to manage code as "change sets". You could have a branch for each client, and have a main branch (trunk) where you add features for everybody. In the client branches, you add feature sets for individual clients. If you want to merge them back to the trunk, you can. You can also merge from the trunk to branches.
Of course, it's important to develop in a modular way in order to facilitate this approach. Also, unit tests speed things along when you have to merge.
I am pretty new to TFS and source control. I unable to understand the advantage of branching. Since i can do the same stuff by creating 2 folder main and development, when I am done with development.I can merge the code using any diff tool with the main branch.
Then whats the point of having branches ? I know there must a huge advantage but i am unable to understand.
(UPDATE: TFS now supports git for version control so the rest of this answer no longer applies)
I would google branch-per-feature.
The main advantage of branching is that you can work on a feature and not be interrupted by anyone else's work. When you are ready, you can merge and see if many features work well together or not. This is usually done as the feature is developed but for small features can be done once the feature is complete.
The advantage is that you have a clear history of what you did to implement something. Without branches, you would have a whole lot of commits mixed together with other features' commits. If QA does not pass a certain feature, you have your work cut out for you to put together another build using just the commits for the other features. The other alternative is to try and fix your feature so that QA passes. This may not be doable on a Friday afternoon.
Feature toggles are another way to omit work but this increases the complexity of code and the toggles may themselves have bugs in them. This is something to be very weary of and see how this became an "acceptable" work-around.
Branches are also used to track changes to multiple versions of releases. Products that are consumed by multiple customers may be in a situation that one set of customers is using 1.0 of the product while others are already on 2.0. If you support both, you should track changes to each by branches that are designated to them. The previous points still apply to developing for these branches.
Having said that, TFS is not ideal at branch-per-feature for a number of reasons. The biggest is that it does not support 3-way merges - it only has what is called a baseless merge. The way history is tracked, TFS cannot show you a common ancestor between the feature branch and where you are trying to merge it to. This leaves you potentially solving a lot of conflicts. In general, a lot of people that use TFS shy away from branching for this reason.
3-way merges are great because they will show you what the common ancestor is, what your changes are and what the changes in the other branch are. This will allow you to make a very educated decision on how to resolve a conflict.
If you have to use TFS, I would suggest using git-tfs to be able to take advantage of 3-way merges and many other features. Some of them include: rerere, rebasing, disconnected model, local history, bisect, and many many more.
Rebase is very useful as it allows you to alter a feature to be based off of another starting point, omit commits, squash commits together, split commits, etc. Once ready you can them merge into an integration or release branch, depending on the workflow you decide upon.
Mercurial is also another one that may be easier to use, but will not be as powerful in the long run.
If you have the opportunity, I would highly recommend moving away from TFS for source control due to a lot of limitations when compared to modern day DVCS.
Here is a nice set of guidelines to follow if you want to effectively manage branching/merging:
http://dymitruk.com/blog/2012/02/05/branch-per-feature/
Hope this helps.
There is a lot of information to read through, but there is TFS Branching Guidance located here if it helps at all - http://tfsbranchingguideiii.codeplex.com/