I have a rather huge (30 mln rows, up to 5–100Kb each) Table on Azure.
Each RowKey is a Guid and PartitionKey is a first Guid part, for example:
PartitionKey = "1bbe3d4b"
RowKey = "1bbe3d4b-2230-4b4f-8f5f-fe5fe1d4d006"
Table has 600 reads and 600 writes (updates) per second with an average latency of 60ms. All queries use both PartitionKey and RowKey.
BUT, some reads take up to 3000ms (!). In average, >1% of all reads take more than 500ms and there's no correlation with entity size (100Kb row may be returned in 25ms and 10Kb one – in 1500ms).
My application is an ASP.Net MVC 4 web-site running on 4-5 Large instances.
I have read all MSDN articles regarding Azure Table Storage performance goals and already did the following:
UseNagle is turned Off
Expect100Continue is also disabled
MaxConnections for table client is set to 250 (setting 1000–5000 doesn't make any sense)
Also I checked that:
Storage account monitoring counters have no throttling errors
There are some kind of "waves" in performance, though they does not depend on load
What could be the reason of such performance issues and how to improve it?
I use the MergeOption.NoTracking setting on the DataServiceContext.MergeOption property for extra performance if I have no intention of updating the entity anytime soon. Here is an example:
var account = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("DataConnectionString"));
var tableStorageServiceContext = new AzureTableStorageServiceContext(account.TableEndpoint.ToString(), account.Credentials);
tableStorageServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
tableStorageServiceContext.MergeOption = MergeOption.NoTracking;
tableStorageServiceContext.AddObject(AzureTableStorageServiceContext.CloudLogEntityName, newItem);
tableStorageServiceContext.SaveChangesWithRetries();
Another problem might be that you are retrieving the entire enity with all its properties even though you intend only use one or two properties - this is of course wasteful but can't be easily avoided. However, If you use Slazure then you can use query projections to only retrieve the entity properties that you are interested in from the table storage and nothing more, which would give you better query performance. Here is an example:
using SysSurge.Slazure;
using SysSurge.Slazure.Linq;
using SysSurge.Slazure.Linq.QueryParser;
namespace TableOperations
{
public class MemberInfo
{
public string GetRichMembers()
{
// Get a reference to the table storage
dynamic storage = new QueryableStorage<DynEntity>("UseDevelopmentStorage=true");
// Build table query and make sure it only return members that earn more than $60k/yr
// by using a "Where" query filter, and make sure that only the "Name" and
// "Salary" entity properties are retrieved from the table storage to make the
// query quicker.
QueryableTable<DynEntity> membersTable = storage.WebsiteMembers;
var memberQuery = membersTable.Where("Salary > 60000").Select("new(Name, Salary)");
var result = "";
// Cast the query result to a dynamic so that we can get access its dynamic properties
foreach (dynamic member in memberQuery)
{
// Show some information about the member
result += "LINQ query result: Name=" + member.Name + ", Salary=" + member.Salary + "<br>";
}
return result;
}
}
}
Full disclosure: I coded Slazure.
You could also consider pagination if you are retrieving large data sets, example:
// Retrieve 50 members but also skip the first 50 members
var memberQuery = membersTable.Where("Salary > 60000").Take(50).Skip(50);
Typically, if a specific query requires scanning a large number of rows, that will take longer time. Is the behavior you are seeing specific a query / data? Or, are you seeing the performance varies for the same data and query?
Related
I am working on application built on ASP.NET MVC 3.0 and displaying the data in MVC WebGrid.
I am using LINQ to get the records from Entities to EntityViewModel. In doing this I have to convert the records from entity to EntityViewModel.
I have 30K records to be displayed in the grid, for each and every record there are 3 flags where It has to go 3 other tables and compare the existence of the record and paint with true or false and display the same in grid.
I am displaying 10 records at a time, but it is bit very slow as I am getting all the records and storing in my application.
The Paging is in place (I mean to say -only 10 records are being displayed in web grid) but all the records are getting loaded into the application which is taking 15-20 seconds. I have checked the place where this time is being spent by the processor. It's happening in the painting place(where every record is being compared with 3 other tables).
I have converted LINQ query to SQL and I can see my SQL query is getting executed under 2 seconds. By this , I can strongly say that, I do not want to spend time on SQL indexing as the speed of SQL query is good enough.
I have two options to implement
1) Caching for MVC
2) Paging(where I should get only first ten records).
I want to go with the paging technique for performance improvement .
Now my question is how do I pass the number 10(no of records to service method) so that It brings up only ten records. And also how do I get the next 10 records when clicking on the next page.
I would post the code, but I cannot do it as it has some sensitive data.
Any example how to tackle this situation, many thanks.
If you're using SQL 2005 + you could use ROW_NUMBER() in your stored procedure:
http://msdn.microsoft.com/en-us/library/ms186734(v=SQL.90).aspx
or else if you just want to do it in LINQ try the Skip() and Take() methods.
As simple as:
int page = 2;
int pageSize = 10;
var pagedStuff = query.Skip((page - 1) * pageSize).Take(pageSize);
You should always, always, always be limiting the amount of rows you get from the database. Unbounded reads kill applications. 30k turns into 300k and then you are just destroying your sql server.
Jfar is on the right track with .Skip and .Take. The Linq2Sql engine (and most entity frameworks) will convert this to SQL that will return a limited result set. However, this doesn't preclude caching the results as well. I recommend doing that as well. That fastest trip to SQL Server is the one you don't have to take. :) I do something like this where my controller method handles paged or un-paged results and caches whatever comes back from SQL:
[AcceptVerbs("GET")]
[OutputCache(Duration = 360, VaryByParam = "*")]
public ActionResult GetRecords(int? page, int? items)
{
int limit = items ?? defaultItemsPerPage;
int pageNum = page ?? 0;
if (pageNum <= 0) { pageNum = 1; }
ViewBag.Paged = (page != null);
var records = null;
if (page != null)
{
records = myEntities.Skip((pageNum - 1) * limit).Take(limit).ToList();
}
else
{
records = myEntities.ToList();
}
return View("GetRecords", records);
}
If you call it with no params, you get the entire results set (/GetRecords). Calling it will params will get you the restricted set (/GetRecords?page=3&items=25).
You could extend this method further by adding .Contains and .StartsWith functionality.
If you do decide to go the custom stored procedure route, I'd recommend using "TOP" and "ROW_NUMBER" to restrict results rather than a temp table.
Personally I would create a custom stored procedure to do this and then call it through Linq to SQL. e.g.
CREATE PROCEDURE [dbo].[SearchData]
(
#SearchStr NVARCHAR(50),
#Page int = 1,
#RecsPerPage int = 50,
#rc int OUTPUT
)
AS
SET NOCOUNT ON
SET FMTONLY OFF
DECLARE #TempFound TABLE
(
UID int IDENTITY NOT NULL,
PersonId UNIQUEIDENTIFIER
)
INSERT INTO #TempFound
(
PersonId
)
SELECT PersonId FROM People WHERE Surname Like '%' + SearchStr + '%'
SET #rc = ##ROWCOUNT
-- Calculate the final offset for paging --
DECLARE #FirstRec int, #LastRec int
SELECT #FirstRec = (#Page - 1) * #RecsPerPage
SELECT #LastRec = (#Page * #RecsPerPage + 1)
-- Final select --
SELECT p.* FROM People p INNER JOIN #TempFound tf
ON p.PersonId = tf.PersonId
WHERE (tf.UID > #FirstRec) AND (tf.UID < #LastRec)
The #rc parameter is the total number of records found.
You obviously have to model it to your own table, but it should run extremely fast..
To bind it to an object in Linq to SQL, you just have to make sure that the final selects fields match the fields of the object it is to be bound to.
Hi is it possible using Entity Framework and/or linq to select a certain number of rows? For example i want to select rows 0 - 500000 and assign these records to the List VariableAList object, then select rows 500001 - 1000000 and assign this to the List VariableBList object, etc. etc.
Where the Numbers object is like ID,Number,DateCreated, DateAssigned, etc.
Sounds like you're looking for the .Take(int) and .Skip(int) methods
using (YourEntities db = new YourEntities())
{
var VariableAList = db.Numbers
.Take(500000);
var VariableBList = db.Numbers
.Skip(500000)
.Take(500000);
}
You may want to be wary of the size of these lists in memory.
Note: You also may need an .OrderBy clause prior to using .Skip or .Take--I vaguely remember running into this problem in the past.
I'm trying to figure out the best approach to display combined tables based on matching logic and input search criteria.
Here is the situation:
We have a table of customers stored locally. The fields of interest are ssn, first name, last name and date of birth.
We also have a web service which provides the same information. Some of the customers from the web service are the same as the local file, some different.
SSN is not required in either.
I need to combine this data to be viewed on a Grails display.
The criteria for combination are 1) match on SSN. 2) For any remaining records, exact match on first name, last name and date of birth.
There's no need at this point for soundex or approximate logic.
It looks like what I should do is extract all the records from both inputs into a single collection, somehow making it a set on SSN. Then remove the blank ssn.
This will handle the SSN matching (once I figure out how to make that a set).
Then, I need to go back to the original two input sources (cached in a collection to prevent a re-read) and remove any records that exist in the SSN set derived previously.
Then, create another set based on first name, last name and date of birth - again if I can figure out how to make a set.
Then combine the two derived collections into a single collection. The collection should be sorted for display purposes.
Does this make sense? I think the search criteria will limit the number of record pulled in so I can do this in memory.
Essentially, I'm looking for some ideas on how the Grails code would look for achieving the above logic (assuming this is a good approach). The local customer table is a domain object, while what I'm getting from the WS is an array list of objects.
Also, I'm not entirely clear on how the maxresults, firstResult, and order used for the display would be affected. I think I need to read in all the records which match the search criteria first, do the combining, and display from the derived collection.
The traditional Java way of doing this would be to copy both the local and remote objects into TreeSet containers with a custom comparator, first for SSN, second for name/birthdate.
This might look something like:
def localCustomers = Customer.list()
def remoteCustomers = RemoteService.get()
TreeSet ssnFilter = new TreeSet(new ClosureComparator({c1, c2 -> c1.ssn <=> c2.ssn}))
ssnFilter.addAll(localCustomers)
ssnFilter.addAll(remoteCustomers)
TreeSet nameDobFilter = new TreeSet(new ClosureComparator({c1, c2 -> c1.firstName + c1.lastName + c1.dob <=> c2.firstName + c2.lastName + c2.dob}))
nameDobFilter.addAll(ssnFilter)
def filteredCustomers = nameDobFilter as List
At this point, filteredCustomers has all the records, except those that are duplicates by your two criteria.
Another approach is to filter the lists by sorting and doing a foldr operation, combining adjacent elements if they match. This way, you have an opportunity to combine the data from both sources.
For example:
def combineByNameAndDob(customers) {
customers.sort() {
c1, c2 -> (c1.firstName + c1.lastName + c1.dob) <=>
(c2.firstName + c2.lastName + c2.dob)
}.inject([]) { cs, c ->
if (cs && c.equalsByNameAndDob(cs[-1])) {
cs[-1].combine(c) //combine the attributes of both records
cs
} else {
cs << c
}
}
}
I'm attempting to implement complete search functionality in my ASP.NET MVC (C#, Linq-to-Sql) website.
The site consists of about 3-4 tables that have about 1-2 columns that I want to search.
This is what I have so far:
public List<SearchResult> Search(string Keywords)
{
string[] split = Keywords.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
List<SearchResult> ret = new List<SearchResult>();
foreach (string s in split)
{
IEnumerable<BlogPost> results = db.BlogPosts.Where(x => x.Text.Contains(s) || x.Title.Contains(s));
foreach (BlogPost p in results)
{
if (ret.Exists(x => x.PostID == p.PostID))
continue;
ret.Add(new SearchResult
{
PostTitle= p.Title,
BlogPostID = p.BlogPostID,
Text=p.Text
});
}
}
return ret;
}
As you can see, I have a foreach for the keywords and an inner foreach that runs over a table (I would repeat it for each table).
This seems inefficent and I wanted to know if theres a better way to create a search method for a database.
Also, what can I do to the columns in the database so that they can be searched faster? I read something about indexing them, is that just the "Full-text indexing" True/False field I see in SQL Management Studio?
Also, what can I do to the columns in
the database so that they can be
searched faster? I read something
about indexing them, is that just the
"Full-text indexing" True/False field
I see in SQL Management Studio?
Yes, enabling full-text indexing will normally go a long way towards improving performance for this scenario. But unfortunately it doesn't work automatically with the LIKE operator (and that's what your LINQ query is generating). So you'll have to use one of the built-in full-text searching functions like FREETEXT, FREETEXTTABLE, CONTAINS, or CONTAINSTABLE.
Just to explain, your original code will be substantially slower than full-text searching as it will typically result in a table scan. For example, if you're searching a varchar field named title with LIKE '%ABC%' then there's no choice but for SQL to scan every single record to see if it contains those characters.
However, the built-in full-text searching will actually index the text of every column you specify to include in the full-text index. And it's that index that drastically speeds up your queries.
Not only that, but full-text searching provides some cool features that the LIKE operator can't give you. It's not as sophisticated as Google, but it has the ability to search for alternate versions of a root word. But one of my favorite features is the ranking functionality where it can return an extra value to indicate relevance which you can then use to sort your results. To use that look into the FREETEXTTABLE or CONTAINSTABLE functions.
Some more resources:
Full-Text Search (SQL Server)
Pro Full-Text Search in SQL Server 2008
The following should do the trick. I can't say off the top of my head whether the let kwa = ... part will actually work or not, but something similar will be required to make the array of keywords available within the context of SQL Server. I haven't used LINQ to SQL for a while (I've been using LINQ to Entities 4.0 and nHibernate for some time now, which have a different set of capabilities). You might need to tweak that part to get it working, but the general principal is sound:
public List<SearchResult> Search(string keywords)
{
var searcResults = from bp in db.BlogPosts
let kwa = keywords.Split(new char[]{' '}, StringSplitOptions.RemoveEmptyEntries);
where kwa.Any(kw => bp.Text.Contains(kw) || bp.Title.Contains(kw))
select new SearchResult
{
PostTitle = bp.Title,
BlogPostID = bp.BlogPostID,
Test = bp.Text
};
return searchResults.ToList();
}
I'm a little stumped on this one. Anyone have any ideas? I'll try to lay out the example as brief as possible.
Creating Silverlight 3.0 application against SQL 2005 database. Using RIA Services and Entity Framework for data access.
I need to be able to populate a grid against a table. However, my grid UI and my table structure is different. Basically my grid needs to turn rows into columns (like a PIVOT table). Here are my challenges / assumptions
I have no idea until runtime which columns I will have on the grid.
Silverlight 3 only supports binding to properties
Silverlight 3 does not allow you to add a row to the grid and manually populate data.
As we all know, Silverlight does not have the System.Data (mainly DataTable) namespace
So, how do I create an object w/ dynamic properties so that I can bind to the grid. Every idea I've had (multi-dimensional arrays, hash tables, etc.) fall apart b/c SL needs a property to bind to, I can't manually add/fill a data row, and I can't figure out a way to add dynamic properties. I've seen an article on a solution involving a linked list but I'm looking for a better alternative. It may come down to making a special "Cody Grid" which will be a bunch of text boxes/labels. Doable for sure but I'll lose some grid functionality that users expect
The ONLY solution I have been able to come up is to create a PIVOT table query in SQL 2005 and use an entity based on that query/view. SQL 2008 would help me with that. I would prefer to do it in Silverlight but if that is the last resort, so be it. If I go the PIVOT route, how do I implement a changing data structure in Entity Framework?
Data Sample.
Table
Name Date Value
Cody 1/1/09 15
Cody 1/2/09 18
Mike 1/1/09 20
Mike 1/8/09 77
Grid UI should look like
Name 1/1/09 1/2/09 1/3/09 .... 1/8/09
Cody 15 18 NULL NULL
Mike 20 NULL NULL 77
Cody
My team came up with a good solution. I'm not sure who deserves the credit but it's somewhere in google land. So far it works pretty good.
Essentially the solution comes down to using reflection to build a dynamic object based on this dynamic data. The function takes in a 2-dimensional array and turns it into a List
object with properties that can be bound. We put this process in a WCF Service and it seems to do exactly what we need so far.
Here is some of the code that builds the object using Reflection
AppDomain myDomain = AppDomain.CurrentDomain;
AssemblyName myAsmName = new AssemblyName("MyAssembly");
AssemblyBuilder myAssembly = myDomain.DefineDynamicAssembly(myAsmName, AssemblyBuilderAccess.Run);
ModuleBuilder myModule = myAssembly.DefineDynamicModule(myAsmName.Name);
TypeBuilder myType = myModule.DefineType("DataSource", TypeAttributes.Public);
string columnName = "whatever";
for (int j = 0; j <= array.GetUpperBound(1); j++)
{
Type properyType = typeof(T);
FieldBuilder exField = myType.DefineField("_" + "columnName" + counter, properyType, FieldAttributes.Private);
//The following line is where I’m passing columnName + counter and getting errors with some strings but not others.
PropertyBuilder exProperty = myType.DefineProperty(columnName + counter.ToString(), PropertyAttributes.None, properyType, Type.EmptyTypes);
//Get
MethodBuilder exGetMethod = myType.DefineMethod("get_" + "columnName" + counter, MethodAttributes.Public, properyType, Type.EmptyTypes); ILGenerator getIlgen = exGetMethod.GetILGenerator();
//IL for a simple getter:
//ldarg.0
//ldfld int32 SilverlightClassLibrary1.Class1::_Age
//ret
getIlgen.Emit(OpCodes.Ldarg_0);
getIlgen.Emit(OpCodes.Ldfld, exField);
getIlgen.Emit(OpCodes.Ret);
exProperty.SetGetMethod(exGetMethod);
//Set
MethodBuilder exSetMethod = myType.DefineMethod("set_" + "columnName" + counter, MethodAttributes.Public, null, new Type[] { properyType }); ILGenerator setIlgen = exSetMethod.GetILGenerator();
//IL for a simple setter:
//ldarg.0
//ldarg.1
//stfld int32 SilverlightClassLibrary1.Class1::_Age
//ret
setIlgen.Emit(OpCodes.Ldarg_0);
setIlgen.Emit(OpCodes.Ldarg_1);
setIlgen.Emit(OpCodes.Stfld, exField); setIlgen.Emit(OpCodes.Ret);
exProperty.SetSetMethod(exSetMethod);
counter++;
}
finished = myType.CreateType();
You can dynamically set columns with their associated bindings (ensuring that AutoGenerateColumns is off):
For instance, the name column:
DataGridTextColumn txtColumn = new DataGridTextColumn();
textColumn.Header = "Name";
textColumn.Binding = new Binding("FirstName");
myDataGrid.Columns.Add(txttColumn);
The ObservableCollection you use to store the data that is queried could possibly be overriden to support pivoting, making sure to change the binding of the DataGrid columns, as shown above.
Note: This is a fair amount of hand waving i'm sure (haven't touched silverlight for over a year); but I hope it's enough to formulate another strategy.
if you are working with two dimensional array then adding columns dynamically as shown above will not work.
The problem is with silverlight it cannot understand the binding of columns to a list.
So we have to create list of rows with row convertor that will represent our two dimensional arrays.
this one worked for me
http://www.scottlogic.co.uk/blog/colin/2010/03/binding-a-silverlight-3-datagrid-to-dynamic-data-via-idictionary-updated/