What is the difference between GetItems and GetExtendedItems in TFS SDK - tfs

I'm at my first tries with the TFS SDK (Microsoft.TeamFoundation.VersionControl.Client) and when came time to retrieve objects, I got confused on why and when I should use VersionControlServer.GetItems vs VersionControlServer.GetExtendedItems. What are the differences? Performance? Features?
Thank you! :)

Yes, you have a tradeoff between performance and features. You can imagine that GetItems is a simple query, whereas GetExtendedItems is a join on another table (or tables), and less efficient.
An Item, for example, contains information about an item at a particular version. An ExtendedItem adds in information about your version of that file as it exists in the workspace that you've specified in the query. If you have done a Get on that file then fields will be populated with the version that exists on your local disk and any pending changes that you've made on it.
ExtendedItems largely exist for the Source Control Explorer view; it can display information about both the items on the server and their status in your local repository in a single query. This reduces the number of round-trips that view makes, but the ExtendedItems query is more expensive than a query for simple Items.
If GetItems will give you the data that you need, you should prefer that. If not, use GetExtendedItems.

Related

Managing Schema/Data In Static/Fixed-Content Dimensions with Lakehouse

In the absence of DML (not leveraging Delta Lake as of yet), I'm looking for ways to manage Static/Fixed-Content Dimensions in a Data Lakehouse (i.e. Gender, OrderType, Country).
Ideally the schema and data within these dimensions would be managed by non-technical staff, but at this point, I'm just looking for development patterns to support the concept technically without being able DML. Preferably with history on source (who added 10 rows to this dimension?)
Thank you in advance for any assistance.
The Lake/Lakehouse/Warehouse should not be the system-of-record for any of your data. So you have another system where that data is "mastered" and then you copy it. Just like anything else.
The "system-of-record" can be a SharePoint List, an Excel workbook on OneDrive, a Dataverse table, or whatever you find convenient and is accessible to the people managing the data.

Load Entire Neo4j Database into Linkurious's SigmaJS

How can I load an entire Neo4j database into Linkurious's SigmaJS Graph API? On that page, I don't see any methods that describe how to import a database in its entirety -- only how to build out a graph manually by adding nodes and edges. I suspect that the read() function almost does what I want (reading in an object), but it is unclear in what format I must supply this object in.
It would be great to be able to simply pass in the graph.db folder within my Neo4j folder.
I think you've got the idea of the library correct. It's a general purpose library to display graph visualizations and not specific to any graph database. I also suspect that it's not going to effectively hold your entire database (it depends on the size). The idea of it is to load in the required subset of the data and make it easy to display and work with that data.
The linksurious team could correct me if I'm wrong here, though ;)

Handling Huge data from TAdoQuery in delphi

Programming Language: Delphi 6
SQL Server in The Back end.
Problem:
Application used to hit DB each time we needed something , Ended up hitting it for more than 2000 times for getting certain things, Caused problems with application being slow. This hitting DB happened for a lot of tables each having different structure and different number of columns. So I’m trying to reduce the number of calls.
We can have about 4000 records at a time from each table.
Proposed Solution:
Let’s get all the data from DB at once and use it when we need it so we don’t have to keep hitting the DB.
How Solution is turning out so far:
This version of Delphi doesn’t have a dictionary. So we already have an implementation of a dictionary from String List (Let us assume that implementation is good).
Solution 1:
Store this in a dictionary that we created with:
A unique field as a key.
And Add rest of the data as strings in String List separated like this:
FiledName1:FileValue,FieldName2:FieldValue2,…..
Might Have to create about 2000 String List to map data to key.
I took a look at the following link:
How Should I Implement a Huge but Simple Indexed StringList in Delphi?
Looks like they could move to a different DB not possible with me.
Is this a sane solution?
Solution 2:
Store this in a dictionary with List.
That list will have Delphi Records.
Records can’t directly be added to so I took a look at this Link:
Delphi TList of records
Solution 3:
Or Given that I’m using TAdoQuery should I use Seek or locate to find my records.
Please advice on the best way to do this?
Requirements:
Need Random Access of this data.
Insertion of data will happened only once when we get all the data , per table as we require them.
Need to only read the data , don’t have to modify it.
Constantly need to search in terms of primary key.
In addition to changing the application we have already done good indexing on DB to take care of things from DB side. This is more to make things run well from the application.
This sounds like a perfect use case for TClientDataSet. It's an in-memory dataset that be indexed, filtered, and searched easily, hold any information you can retrieve from the database using a SQL statement, and it has pretty good performance over a few thousand reasonable-sized rows of data. (The link above is to the current documentation, as I don't have one available for the Delphi 6 docs. They should be very similar, although I don't recall which specific version added the ability to directly include MidasLib in your uses clause to eliminate distributing Midas.dll with your app.)
Carey Jensen wrote a series of articles about it a few years back that you might find useful. The first one can be found in A ClientDataset in Every Database Application - the others in the series are linked from it.

Being able to save (multiple) changes without affecting database - structural issue

I'm developing a business web application which will maintain a database with several tables. One of the requirements are that one should be able to "save your work" without affecting the database and then later pick it up and continue working. Multiple "savings" must be supported.
This administration tool will be developed in ASP.NET MVC4 or Microsoft's LightSwitch, I haven't decided yet.
The problem I have is that I don't know how to solve this structurally, are there any known techniques to this problem? I need help by someone to point me in the right direction, I'm stuck here..
EDIT: I'll try to explain further with a scenario
I make a change to one row and save, but the change should only be visible to me (not affect the main database).
I realize the change in 1. is bad and choose to start over with changing the data in the same row, I also make a change to another row. I save these changes (but only for me)
Now I have two savings (from step 1 and 2), I change my mind and the changes made in 1. is correct and I open that "savefile" and commit the changes to main databse. I'll then delete the "savefile" from step 2.
Hope that makes the situation more clear.
Thanks
The easiest way that I can think of is to let the database do the work for you.
You could:
Just add some type of a "status" column ("committed", "uncommitted" etc) to the table, & filter out any "uncommitted" records in any grid that displays "real" data. You can then also filter
a different way in your editing grid, that only shows you
"uncommitted" records, or if you could save an ID instead of a status, if you
only want to see your own records.
Add another table to hold the uncommitted records, rather than
"pollute" the actual table with extra columns.
Does that make sense?
if you are really going to build a transactional type version control system, then you have a huge job ahead.
look at some popular tools like SVN and see the level of complication and features they support.
Rolling back a partial transaction inside a database - especially one with constraints and triggers will be very difficult. almost everything would run into an issue somewhere.
you may also consider storing uncommitted transactions outside the database - like in some local XML structure - then committing the components only once when desired.
either way, the interface for finding all the records and determining which ones to do what with will be a challenge - nevermind the original application.

Entity, dealing with large number of records (> 35 mlns)

We have a rather large set of related tables with over 35 million related records each. I need to create a couple of WCF methods that would query the database with some parameters (data ranges, type codes, etc.) and return related results sets (from 10 to 10,000 records).
The company is standardized on EF 4.0 but is open to 4.X. I might be able to make argument to migrate to 5.0 but it's less likely.
What’s the best approach to deal with such a large number of records using Entity? Should I create a set of stored procs and call them from Entity or there is something I can do within Entity?
I do not have any control over the databases so I cannot split the tables or create some materialized views or partitioned tables.
Any input/idea/suggestion is greatly appreciated.
At my work I faced a similar situation. We had a database with many tables and most of them contained around 7- 10 million records each. We used Entity framework to display the data but the page seemed to display very slow (like 90 to 100 seconds). Even the sorting on the grid took time. I was given the task to see if it could be optimized or not. and well after profiling it (ANTS profiler) I was able to optimize it (under 7 secs).
so the answer is Yes, Entity framework can handle loads of records (in millions) but some care must be taken
Understand that call to database made only when the actual records are required. all the operations are just used to make the query (SQL) so try to fetch only a piece of data rather then requesting a large number of records. Trim the fetch size as much as possible
Yes, not you should, you must use stored procedures and import them into your model and have function imports for them. You can also call them directly ExecuteStoreCommand(), ExecuteStoreQuery<>(). Sames goes for functions and views but EF has a really odd way of calling functions "SELECT dbo.blah(#id)".
EF performs slower when it has to populate an Entity with deep hierarchy. be extremely careful with entities with deep hierarchy .
Sometimes when you are requesting records and you are not required to modify them you should tell EF not to watch the property changes (AutoDetectChanges). that way record retrieval is much faster
Indexing of database is good but in case of EF it becomes very important. The columns you use for retrieval and sorting should be properly indexed.
When you model is large, VS2010/VS2012 Model designer gets real crazy. so break your model into medium sized models. There is a limitation that the Entities from different models cannot be shared even though they may be pointing to the same table in the database.
When you have to make changes in the same entity at different places, try to use the same entity by passing it and send the changes only once rather than each one fetching a fresh piece, makes changes and stores it (Real performance gain tip).
When you need the info in only one or two columns try not to fetch the full entity. you can either execute your sql directly or have a mini entity something. You may need to cache some frequently used data in your application also.
Transactions are slow. be careful with them.
if you keep these things in mind EF should give almost similar performance as plain ADO.NET if not the same.
My experience with EF4.1, code first: if you only need to read the records (i.e. you won't write them back) you will gain a performance boost by turning of change tracking for your context:
yourDbContext.Configuration.AutoDetectChangesEnabled = false;
Do this before loading any entities. If you need to update the loaded records you can allways call
yourDbContext.ChangeTracker.DetectChanges();
before calling SaveChanges().
The moment I hear statements like: "The company is standardized on EF4 or EF5, or whatever" This sends cold shivers down my spine.
It is the equivalent of a car rental saying "We have standardized on a single car model for our entire fleet".
Or a carpenter saying "I have standardized on chisels as my entire toolkit. I will not have saws, drills etc..."
There is something called the right tool for the right job
This statement only highlights that the person in charge of making key software architecture decisions has no clue about software architecture.
If you are dealing with over 100K records and the datamodels are complex (i.e. non trivial), Maybe EF6 is not the best option.
EF6 is based on the concepts of dynamic reflection and has similar design patterns to Castle Project Active Record
Do you need to load all of the 100K records into memory and perform operations on these ? If yes ask yourself do you really need to do that and why wouldn't executing a stored procedure across the 100K records achieve the same thing. Do some analysis and see what is the actual data usage pattern. Maybe the user performs a search which returns 100K records but they only navigate through the first 200. Example google search, Hardly anyone goes past page 3 of the millions of search results.
If the answer is still yes you need to load all of the 100K records into memory and perform operations. Then maybe you need to consider something else like a custom built write through cache with light weight objects. Maybe lazy load dynamic object pointers for nested objects. etc... One instance where I use something like this is large product catalogs for eCommerce sites where very large numbers of searches get executed against the catalog. Why is in order to provide custom behavior such as early exit search, and regex wildcard search using pre-compiled regex, or custom Hashtable indexes into the product catalog.
There is no one size fits all answer to this question. It all depends the data usage scenarios and how the application works with the data. Consider Gorilla Vs Shark who would win? It all depends on the environment and the context.
Maybe EF6 is perfect for one piece that would benefit from dynamic reflection, While NetTiers is better for another that needs static reflection and an extensible ORM. While low level ADO is perhaps best for extreme high performance pieces.

Resources