I have an Activity function that reads child elements of a parent in an organization as follows:
[FunctionName("ChildReaderFunction")]
public async Task<List<User>> GetChildren([ActivityTrigger] User parent)
{
var children = await GetChildrenAsync(parent);
return children;
}
public async Task<List<User>> GetChildrenAsync(User parent)
{
var allUsers = new List<User> { parent };
List<User> children = null;
children = await ExecuteQueryAsync("tableName", $"Parent eq '{parent.Id}'");
var taskIndex = 0;
var readTasks = new Task<List<User>>[children.Count(x => x.Childcount > 0)];
foreach (var child in children)
{
if (child.Childcount > 0)
{
readTasks[taskIndex++] = GetChildrenAsync(child);
}
else
{
allUsers.Add(child);
}
}
var validTasks = readTasks.Where(task => task != null).ToList();
if (validTasks.Count > 0)
{
foreach (var result in await Task.WhenAll(validTasks))
{
allUsers.AddRange(result);
}
}
Console.WriteLine($"Got {allUsers.Count} children for {parent.Id}");
return allUsers;
}
This works perfectly when I use premium plan with a timeout of 2 hours. I'm trying to convert this to a consumption plan with a timeout of 10 min. On testing out, I see timeout exception. Is there a way to breakdown this durable function and complete execution in 10 min?
I tried to update this logic by using a queue as follows:
[FunctionName("ChildReaderFunction")]
public async Task<List<User>> GetChildren([ActivityTrigger] User parent)
{
var allUsers = new List<User>();
var directReportEntities = new List<User>();
Queue<User> myQueue = new Queue<Person>();
myQueue.Enqueue(request.Parent);
while (myQueue.Any())
{
var current = myQueue.Dequeue();
if (current.Childcount > 0)
{
var children = await GetChildrenAsync(parent);
foreach (var child in children)
{
myQueue.Enqueue(child);
}
}
allUsers.Add(current);
}
Console.WriteLine($"Got {allUsers.Count} children for {parent.Id}");
return allUsers;
}
public async Task<List<User>> GetChildrenAsync(User parent)
{
return await ExecuteQueryAsync("tableName", $"Parent eq '{parent.Id}'");
}
This also gives a timeout exception. Any suggestions on what other approach I could try?
You might think about trying to figure out which parts of this method are slow. Perhaps it isn't the method itself that is slow but the query to the database. How many rows are you trying to download?
Also, you have a recursive call in your method. That may lead to many queries being executed. Can you think of a different way to grab the data all at once instead of a little bit at a time?
Related
I have an Azure Function with Premium Plan where users from multiple AzureAD groups are read and put to a queue. Currently, I'm looking into converting this function to Durable Function and use consumption plan. Here is my code:
Orchestrator:
public async Task RunOrchestrator([OrchestrationTrigger] IDurableOrchestrationContext context)
{
var groups = await context.CallActivityAsync<List<AzureADGroup>>("GroupsReaderFunction"), null);
if (groups != null && groups.Count > 0)
{
var processingTasks = new List<Task>();
foreach (var group in groups)
{
var processTask = context.CallSubOrchestratorAsync("SubOrchestratorFunction", group);
processingTasks.Add(processTask);
}
await Task.WhenAll(processingTasks);
}
}
SubOrchestrator:
public async Task RunSubOrchestrator([OrchestrationTrigger] IDurableOrchestrationContext context)
{
var request = context.GetInput<Group>();
var users = await context.CallActivityAsync<List<AzureADUser>>("UsersReaderFunction", request.objectId);
return users;
}
Here is the function that gets users from AzureAD group:
public async Task<List<AzureADUser>> GetUsersInGroup(Guid objectId)
{
IGroupTransitiveMembersCollectionWithReferencesPage members;
members = await GraphServiceClient.Groups[objectId.ToString()].TransitiveMembers.Request().Select("id").GetAsync();
var toReturn = new List<AzureADUser>(ToUsers(members.CurrentPage));
while (members.NextPageRequest != null)
{
members = await members.NextPageRequest.GetAsync();
toReturn.AddRange(ToUsers(members.CurrentPage));
}
return toReturn;
}
private IEnumerable<AzureADUser> ToUsers(IEnumerable<DirectoryObject> fromGraph)
{
foreach (var users in fromGraph)
{
return new AzureADUser { ObjectId = Guid.Parse(user.Id) };
}
}
Number of users in groups vary - one group contains 10 users and another group contains ~500k users. Timeout occurs when reading users from larger groups (> 10 minutes). Is there a faster way to get users from AzureAD group (for example, get users in batches) so that I should be able to use Consumption Plan? Or is there a different way to use Durable Functions (fan in - fan out pattern or some other patterns) for a faster performance?
UPDATE:
var users = new List<AzureADUser>();
public async Task RunSubOrchestrator([OrchestrationTrigger] IDurableOrchestrationContext context)
{
var request = context.GetInput<Group>();
var response = await context.CallActivityAsync<(List<AzureADUser> users, nextPageLink link)>("UsersReaderFunction", request.objectId);
users.AddRange(response.users);
return users;
}
Here response contains 2 values - users from current page and link to next page. I need to keep calling "UsersReaderFunction" activity function until link to next page is null.
var users = new List<AzureADUser>();
public async Task RunSubOrchestrator([OrchestrationTrigger] IDurableOrchestrationContext context)
{
var request = context.GetInput<Group>();
var response = await context.CallActivityAsync<(List<AzureADUser> users, nextPageLink link)>("UsersReaderFunction", request.objectId);
users.AddRange(response.users);
while (response.link != null) {
var response = await context.CallActivityAsync<(List<AzureADUser> users, nextPageLink link)>("UsersReaderFunction", request.objectId);
users.AddRange(response.users);
}
return users;
}
But this is not working. What am I missing?
I created a function to return the outcomes using Microsoft Graph Education API. I want to print feedback and points but I can’t, they don’t exist. Here is the code:
public static async Task<IEnumerable<EducationOutcome>> GetOutcomesForSubmission()
{
var outcomes = await graphClient
.Education
.Classes["8557483b-a233-4710-82de-e1bdb03bb9a9"]
.Assignments["1b09cd43-cf87-4cef-a043-ae3d6160c200"]
.Submissions["d4486e20-1b47-4b5b-720c-0fe0038d4882"]
.Outcomes
.Request()
.GetAsync();
return outcomes;
}
public static void ListOutcomes()
{
var outcomes = GetOutcomesForSubmission().Result;
Console.WriteLine("Outcomes:\n");
foreach (var v in outcomes)
{
Console.WriteLine($"User id: {v.LastModifiedBy.User.Id}, Submission id: {v.Id}");
}
Console.WriteLine("\n");
}
Your issue is that GetAsync() isn't returning a collection of EducationOutcome objects. It is returning an EducationSubmissionOutcomesCollectionPage instead. To get the actual results, you need to return the CurrentPage property.
public static async Task<IEnumerable<EducationOutcome>> GetOutcomesForSubmission()
{
var response = await graphClient
.Education
.Classes["8557483b-a233-4710-82de-e1bdb03bb9a9"]
.Assignments["1b09cd43-cf87-4cef-a043-ae3d6160c200"]
.Submissions["d4486e20-1b47-4b5b-720c-0fe0038d4882"]
.Outcomes
.Request()
.GetAsync();
return response.CurrentPage.ToList();
}
Note that this will only return the first page of data. If you want to grab all of the data, you'll need to page through using the response.NextPageRequest property:
var outcomes = response.CurrentPage.ToList();
while (response.NextPageRequest != null)
{
response = await response.NextPageRequest.GetAsync();
outcomes.AddRange(response.CurrentPage);
}
return outcomes;
Keep in mind that EducationOutcome is a base class, so it will only contain properties common across all "Outcome" types (which in this case is pretty little). If you want a specific type of Outcome, you'll need to cast it to the specific type first:
foreach (var v in outcomes)
{
if (v is EducationRubricOutcome)
{
var outcome = v as EducationRubricOutcome;
Console.WriteLine($"User id: {outcome.LastModifiedBy.User.Id}, Submission id: {outcome.Id}, Feedback Count: {outcome.RubricQualityFeedback.Count}");
}
else if (v is EducationPointsOutcome)
{
var outcome = v as EducationPointsOutcome;
Console.WriteLine($"User id: {outcome.LastModifiedBy.User.Id}, Submission id: {outcome.Id}, Points: {outcome.Points.Points}");
}
else if (v is EducationFeedbackOutcome)
{
var outcome = v as EducationFeedbackOutcome;
Console.WriteLine($"User id: {outcome.LastModifiedBy.User.Id}, Submission id: {outcome.Id}, Feedback: {outcome.Feedback.Text}");
}
}
I've made a crawler on a website it has to read a website and fetch some values from it website.I've made use quartz.net and Asp.net MVC. but what is my problem? in fact,My problem is that for example,he/she the first time start for scraping a "Stackoverflow.com" about 5 hours and then he/she is decided stop "stackoverflow.com" and start a scrap new website.So,How can i do it?
[HttpPost]
public ActionResult Index(string keyword, string url)
{
IScheduler scheduler = StdSchedulerFactory.GetDefaultScheduler();
scheduler.Start();
IJobDetail job = JobBuilder.Create<ScrapJob>()
.WithIdentity("MyScrapJob")
.UsingJobData("url", url)
.UsingJobData("keyword", keyword)
.Build();
ITrigger trigger = TriggerBuilder.Create().WithDailyTimeIntervalSchedule(
s => s.WithIntervalInSeconds(20).OnEveryDay().StartingDailyAt(TimeOfDay.HourAndMinuteOfDay(0, 0))
).Build();
scheduler.ScheduleJob(job, trigger);
return View(db.Scraps.ToList());
}
public List<ScrapJob> Scraping(string url, string keyword)
{
int count = 0;
List<ScrapJob> scraps = new List<ScrapJob>();
ScrapJob scrap = null;
HtmlDocument doc = new HtmlDocument();
try
{
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var stream = response.GetResponseStream())
{
doc.Load(stream, Encoding.GetEncoding("UTF-8"));
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()"))
{
if (node.InnerText.ToString().Contains(keyword))
{
count++;
scrap = new ScrapJob { Keyword = keyword, DateTime = System.DateTime.Now.ToString(), Count = count, Url = url };
}
}
}
}
}
catch (WebException ex)
{
Console.WriteLine(ex.Message);
}
// scraps.Add(scrap);
var isExist = db.Scraps.Where(s => s.Keyword == keyword && s.Count == scrap.Count).Max(s => (int?)s.Id) ?? 0;
if (isExist == 0)
{
db.Scraps.Add(scrap);
db.SaveChanges();
}
return scraps;
}
public void Execute(IJobExecutionContext context)
{
//ScrapJob scraps = null;
using (var scrap = new ScrapJob())
{
JobKey key = context.JobDetail.Key;
JobDataMap dataMap = context.JobDetail.JobDataMap;
string url = dataMap.GetString("url");
string keyword = dataMap.GetString("keyword");
scrap.Scraping(url, keyword);
}
}
I'm not sure why you picked QUARTZ, but here is something that I think will help you.
This is a code sample that interrupt and delete job by unique identifier
public void DeleteJob(JobKey jobKey)
{
var scheduler = StdSchedulerFactory.GetDefaultScheduler();
var executingJobs = scheduler.GetCurrentlyExecutingJobs();
if (executingJobs.Any(x => x.JobDetail.Key.Equals(jobKey)))
{
scheduler.Interrupt(jobKey);
}
scheduler.DeleteJob(jobKey);
}
But I believe you need to define what behavior you expect, because it can be a bit more complex for example:
If you like to just pause the job and resume it after finish with the other website /persist some state and progress/ or just log the progress
If you want them to run in parallel and process multiple sites simultaneously. (You just need to give different names instead of the hardcoded .WithIdentity("MyScrapJob") )
Also with scheduler.GetCurrentlyExecutingJobs() you can get the currently executing jobs, show them to the user and let him decide what to do.
Also looking at your action method I'm not sure whether this is the behavior you expect of that trigger. Also what bothers me is db.Scraps.ToList() you will materialize the whole table you can consider adding pagination as well in your case is not necessary because you will only show count but its mandatory if you have a lot of records in the grid.
About the scraping method
Instead of
var isExist = db.Scraps.Where(s => s.Keyword == keyword && s.Count == scrap.Count).Max(s => (int?)s.Id) ?? 0;
you can use .Any
var exists = db.Scraps.Any(s => s.Keyword == keyword && s.Count == scrap.Count);
this will return boolean and you can check if(!exists)
You can check https://github.com/AngleSharp/AngleSharp it's high performance web parsing library. Super easy to use as well.
I see possibility of duplicated records by keyword if you check them by keyword and count - not sure whether you want this or just want to update the existing record with it's counter
Good luck! I hope this answer helps you :)
I Create A News Site With MVC5 But I Have Problem .
in Model i Create A Repository Folder And in this i Create Rep_Setting for
Connect to Tbl_Setting in DataBase .
public class Rep_Setting
{
DataBase db = new DataBase();
public Tbl_Setting Tools()
{
try
{
var qGetSetting = (from a in db.Tbl_Setting
select a).FirstOrDefault();
return qGetSetting;
}
catch (Exception)
{
return null;
}
}
}
And i Create a Rep_News for Main Page .
DataBase db = new DataBase();
Rep_Setting RSetting = new Rep_Setting();
public List<Tbl_News> GetNews()
{
try
{
List<Tbl_News> qGetNews = (from a in db.Tbl_News
where a.Type.Equals("News")
select a).OrderByDescending(s => s.ID).Skip(0).Take(RSetting.Tools().CountNewsInPage).ToList();
return qGetNews;
}
catch (Exception ex)
{
return null;
}
}
But This Code Have Error to Me
OrderByDescending(s=>s.ID).Skip(0).Take(RSetting.Tools().CountNewsInPage).ToList();
Error :
Error 18 'System.Linq.IQueryable<NewsSite.Models.Domain.Tbl_News>' does
not contain a definition for 'Take' and the best extension method overload
'System.Linq.Queryable.Take<TSource>(System.Linq.IQueryable<TSource>, int)' has
some invalid arguments
E:\MyProject\NewsSite\NewsSite\Models\Repository\Rep_News.cs 50 52 NewsSite
How i Resolve it ?
Try it this way. The plan of debugging is to split your execution, this also makes for a more reusable method in many cases. And a good idea is to avoid using null and nullables if you can, if you use them "on purpose" the you must have a plan for them.
DataBase db = new DataBase();
Rep_Setting RSetting = new Rep_Setting();
public List<Tbl_News> GetNews()
{
int skip = 0;
Tbl_Setting tools = RSetting.Tools();
if(tools == null){ throw new Exception("Found no rows in the database table Tbl_Setting"); }
int? take = tools.CountNewsInPage;//Nullable
if(!take.HasValue)
{
// Do you want to do something if its null maybe set it to 0 and not null
take = 0;
}
string typeStr = "News";
List<Tbl_News> qGetNews = (from a in db.Tbl_News
where a.Type.Equals(typeStr)
select a).OrderByDescending(s => s.ID).Skip(skip).Take(take.Value);
return qGetNews.ToList();
}
if qGetNews is a empty list you now don't break everything after trying to iterate on it, like your return null would. instead if returning null for a lit return a new List<>() instead, gives you a more resilient result.
So I said reusable method, its more like a single action. So you work it around to this. Now you have something really reusable.
public List<Tbl_News> GetNews(string typeStr, int take, int skip = 0)
{
List<Tbl_News> qGetNews = (from a in db.Tbl_News
where a.Type.Equals(typeStr)
select a).OrderByDescending(s => s.ID).Skip(skip).Take(take);
return qGetNews.ToList();
}
Infact you shjould always try to avoid returning null if you can.
public class Rep_Setting
{
DataBase db = new DataBase();
public Tbl_Setting Tools()
{
var qGetSetting = (from a in db.Tbl_Setting
select a).FirstOrDefault();
if(qGetSetting == null){ throw new Exception("Found no rows in the database table Tbl_Setting"); }
return qGetSetting;
}
}
I have next controller
public async Task<ActionResult> ImageAsync(int id)
{
var img = await _repository.GetImageAsync(id);
if (img != null)
{
return File(img, "image/jpg"); //View(img);
}
byte[] res = new byte[0];
return File(res, "image/jpg");
}
and method in repository
public async Task<byte[]> GetImage(int imageId)
{
try
{
var dbCtx = new smartbags_storeEntities();
var res = await dbCtx.GoodImages.SingleAsync(d => d.ImageId == imageId);
return res != null ? res.ImageData : null;
}
catch (Exception ex)
{
throw ex;
}
}
public async Task<byte[]> GetImageAsync(int imageId)
{
byte[] img = await Task.Run(() =>
{
var res = GetImage(imageId).Result;
if (res != null)
{
var wi = new System.Web.Helpers.WebImage(res);
wi.AddTextWatermark("info");
return wi.GetBytes();
}
return null;
});
return img;
}
but execution of image reading is freezing on line
var res = await dbCtx.GoodImages.SingleAsync(d => d.ImageId == imageId);
What I am doing in wrong way when try to read data from data base in async style ?
The call to the property Result of a Task is a blocking call and the continuation of the await won't be able to be posted to run.
Once you already have a Task returning method, why didn't you just use await?
public async Task<byte[]> GetImageAsync(int imageId)
{
var res = await GetImage(imageId);
if (res != null)
{
var wi = new System.Web.Helpers.WebImage(res);
wi.AddTextWatermark("info");
return wi.GetBytes();
}
return null;
}
The funny thing about that line is that it calls SingleAsync, which is a TAP extension method for observables.
I have never used a data repository that exposed its collections as observables, though I suppose it is possible. My first guess is that [the task returned by] SingleAsync isn't completing because the GoodImages observable isn't completing. Note that SingleAsync must continue scanning after it sees a match to ensure that it is the only match; FirstAsync is more forgiving and will complete as soon as it sees the first match.
On a side note, I do recommend using await instead of Result and not using Task.Run on the server. So Paulo's answer is good in that regard, though in this case Result is not causing a deadlock.