I have an Azure Function with Premium Plan where users from multiple AzureAD groups are read and put to a queue. Currently, I'm looking into converting this function to Durable Function and use consumption plan. Here is my code:
Orchestrator:
public async Task RunOrchestrator([OrchestrationTrigger] IDurableOrchestrationContext context)
{
var groups = await context.CallActivityAsync<List<AzureADGroup>>("GroupsReaderFunction"), null);
if (groups != null && groups.Count > 0)
{
var processingTasks = new List<Task>();
foreach (var group in groups)
{
var processTask = context.CallSubOrchestratorAsync("SubOrchestratorFunction", group);
processingTasks.Add(processTask);
}
await Task.WhenAll(processingTasks);
}
}
SubOrchestrator:
public async Task RunSubOrchestrator([OrchestrationTrigger] IDurableOrchestrationContext context)
{
var request = context.GetInput<Group>();
var users = await context.CallActivityAsync<List<AzureADUser>>("UsersReaderFunction", request.objectId);
return users;
}
Here is the function that gets users from AzureAD group:
public async Task<List<AzureADUser>> GetUsersInGroup(Guid objectId)
{
IGroupTransitiveMembersCollectionWithReferencesPage members;
members = await GraphServiceClient.Groups[objectId.ToString()].TransitiveMembers.Request().Select("id").GetAsync();
var toReturn = new List<AzureADUser>(ToUsers(members.CurrentPage));
while (members.NextPageRequest != null)
{
members = await members.NextPageRequest.GetAsync();
toReturn.AddRange(ToUsers(members.CurrentPage));
}
return toReturn;
}
private IEnumerable<AzureADUser> ToUsers(IEnumerable<DirectoryObject> fromGraph)
{
foreach (var users in fromGraph)
{
return new AzureADUser { ObjectId = Guid.Parse(user.Id) };
}
}
Number of users in groups vary - one group contains 10 users and another group contains ~500k users. Timeout occurs when reading users from larger groups (> 10 minutes). Is there a faster way to get users from AzureAD group (for example, get users in batches) so that I should be able to use Consumption Plan? Or is there a different way to use Durable Functions (fan in - fan out pattern or some other patterns) for a faster performance?
UPDATE:
var users = new List<AzureADUser>();
public async Task RunSubOrchestrator([OrchestrationTrigger] IDurableOrchestrationContext context)
{
var request = context.GetInput<Group>();
var response = await context.CallActivityAsync<(List<AzureADUser> users, nextPageLink link)>("UsersReaderFunction", request.objectId);
users.AddRange(response.users);
return users;
}
Here response contains 2 values - users from current page and link to next page. I need to keep calling "UsersReaderFunction" activity function until link to next page is null.
var users = new List<AzureADUser>();
public async Task RunSubOrchestrator([OrchestrationTrigger] IDurableOrchestrationContext context)
{
var request = context.GetInput<Group>();
var response = await context.CallActivityAsync<(List<AzureADUser> users, nextPageLink link)>("UsersReaderFunction", request.objectId);
users.AddRange(response.users);
while (response.link != null) {
var response = await context.CallActivityAsync<(List<AzureADUser> users, nextPageLink link)>("UsersReaderFunction", request.objectId);
users.AddRange(response.users);
}
return users;
}
But this is not working. What am I missing?
Related
I have an Activity function that reads child elements of a parent in an organization as follows:
[FunctionName("ChildReaderFunction")]
public async Task<List<User>> GetChildren([ActivityTrigger] User parent)
{
var children = await GetChildrenAsync(parent);
return children;
}
public async Task<List<User>> GetChildrenAsync(User parent)
{
var allUsers = new List<User> { parent };
List<User> children = null;
children = await ExecuteQueryAsync("tableName", $"Parent eq '{parent.Id}'");
var taskIndex = 0;
var readTasks = new Task<List<User>>[children.Count(x => x.Childcount > 0)];
foreach (var child in children)
{
if (child.Childcount > 0)
{
readTasks[taskIndex++] = GetChildrenAsync(child);
}
else
{
allUsers.Add(child);
}
}
var validTasks = readTasks.Where(task => task != null).ToList();
if (validTasks.Count > 0)
{
foreach (var result in await Task.WhenAll(validTasks))
{
allUsers.AddRange(result);
}
}
Console.WriteLine($"Got {allUsers.Count} children for {parent.Id}");
return allUsers;
}
This works perfectly when I use premium plan with a timeout of 2 hours. I'm trying to convert this to a consumption plan with a timeout of 10 min. On testing out, I see timeout exception. Is there a way to breakdown this durable function and complete execution in 10 min?
I tried to update this logic by using a queue as follows:
[FunctionName("ChildReaderFunction")]
public async Task<List<User>> GetChildren([ActivityTrigger] User parent)
{
var allUsers = new List<User>();
var directReportEntities = new List<User>();
Queue<User> myQueue = new Queue<Person>();
myQueue.Enqueue(request.Parent);
while (myQueue.Any())
{
var current = myQueue.Dequeue();
if (current.Childcount > 0)
{
var children = await GetChildrenAsync(parent);
foreach (var child in children)
{
myQueue.Enqueue(child);
}
}
allUsers.Add(current);
}
Console.WriteLine($"Got {allUsers.Count} children for {parent.Id}");
return allUsers;
}
public async Task<List<User>> GetChildrenAsync(User parent)
{
return await ExecuteQueryAsync("tableName", $"Parent eq '{parent.Id}'");
}
This also gives a timeout exception. Any suggestions on what other approach I could try?
You might think about trying to figure out which parts of this method are slow. Perhaps it isn't the method itself that is slow but the query to the database. How many rows are you trying to download?
Also, you have a recursive call in your method. That may lead to many queries being executed. Can you think of a different way to grab the data all at once instead of a little bit at a time?
I created a function to return the outcomes using Microsoft Graph Education API. I want to print feedback and points but I can’t, they don’t exist. Here is the code:
public static async Task<IEnumerable<EducationOutcome>> GetOutcomesForSubmission()
{
var outcomes = await graphClient
.Education
.Classes["8557483b-a233-4710-82de-e1bdb03bb9a9"]
.Assignments["1b09cd43-cf87-4cef-a043-ae3d6160c200"]
.Submissions["d4486e20-1b47-4b5b-720c-0fe0038d4882"]
.Outcomes
.Request()
.GetAsync();
return outcomes;
}
public static void ListOutcomes()
{
var outcomes = GetOutcomesForSubmission().Result;
Console.WriteLine("Outcomes:\n");
foreach (var v in outcomes)
{
Console.WriteLine($"User id: {v.LastModifiedBy.User.Id}, Submission id: {v.Id}");
}
Console.WriteLine("\n");
}
Your issue is that GetAsync() isn't returning a collection of EducationOutcome objects. It is returning an EducationSubmissionOutcomesCollectionPage instead. To get the actual results, you need to return the CurrentPage property.
public static async Task<IEnumerable<EducationOutcome>> GetOutcomesForSubmission()
{
var response = await graphClient
.Education
.Classes["8557483b-a233-4710-82de-e1bdb03bb9a9"]
.Assignments["1b09cd43-cf87-4cef-a043-ae3d6160c200"]
.Submissions["d4486e20-1b47-4b5b-720c-0fe0038d4882"]
.Outcomes
.Request()
.GetAsync();
return response.CurrentPage.ToList();
}
Note that this will only return the first page of data. If you want to grab all of the data, you'll need to page through using the response.NextPageRequest property:
var outcomes = response.CurrentPage.ToList();
while (response.NextPageRequest != null)
{
response = await response.NextPageRequest.GetAsync();
outcomes.AddRange(response.CurrentPage);
}
return outcomes;
Keep in mind that EducationOutcome is a base class, so it will only contain properties common across all "Outcome" types (which in this case is pretty little). If you want a specific type of Outcome, you'll need to cast it to the specific type first:
foreach (var v in outcomes)
{
if (v is EducationRubricOutcome)
{
var outcome = v as EducationRubricOutcome;
Console.WriteLine($"User id: {outcome.LastModifiedBy.User.Id}, Submission id: {outcome.Id}, Feedback Count: {outcome.RubricQualityFeedback.Count}");
}
else if (v is EducationPointsOutcome)
{
var outcome = v as EducationPointsOutcome;
Console.WriteLine($"User id: {outcome.LastModifiedBy.User.Id}, Submission id: {outcome.Id}, Points: {outcome.Points.Points}");
}
else if (v is EducationFeedbackOutcome)
{
var outcome = v as EducationFeedbackOutcome;
Console.WriteLine($"User id: {outcome.LastModifiedBy.User.Id}, Submission id: {outcome.Id}, Feedback: {outcome.Feedback.Text}");
}
}
I have a background service which reads & sends from a mailbox. It is created in a web ui, but after the schedule is created and mailbox set, it should run automatically, without further user prompt.
I have used the various combinations of the MSAL and both public and confidential clients (either would be acceptable as the server can maintain the client secret.
I have used the EWS client and got that working, but there is a note that the client_credentials flow won't work for IMAP/POP/SMTP.
I have a small console app working, but each time it runs, it needs to login interactively, and so long as I don't restart the application, it will keep authenticating, and I can call the AquireTokenSilently.
The Question
How can I make the MSAL save the tokens/data such that when it next runs, I can authenticate without user interaction again? I can store whatever is needed to make this work when the user authenticates, but I don't know what that should be nor how to reinstate it to make a new request, if the console app is restarted.
The Code
internal async Task<string> Test()
{
PublicClientApplication =
PublicClientApplicationBuilder.Create( "5896de31-e251-460c-9dc2-xxxxxxxxxxxx" )
.WithRedirectUri( "https://login.microsoftonline.com/common/oauth2/nativeclient" )
.WithAuthority( AzureCloudInstance.AzurePublic, ConfigurationManager.AppSettings["tenantId"] )
.Build();
//var scopes = new string[] { "email", "offline_access", "profile", "User.Read", "Mail.Read" };
var scopes = new string[] { "https://outlook.office.com/IMAP.AccessAsUser.All" };
var accounts = await PublicClientApplication.GetAccountsAsync();
var firstAccount = accounts.FirstOrDefault();
AuthenticationResult authResult;
if (firstAccount == null )
{
authResult = await PublicClientApplication.AcquireTokenInteractive( scopes ).ExecuteAsync();
}
else
{
//The firstAccount is null when the console app is run again
authResult = await PublicClientApplication.AcquireTokenSilent( scopes, firstAccount ).ExecuteAsync();
}
if(authResult == null)
{
authResult = await PublicClientApplication.AcquireTokenInteractive( scopes ).ExecuteAsync();
}
MailBee.Global.LicenseKey = "MN120-569E9E8D9E5B9E8D9EC8C4BC83D3-D428"; // (demo licence only)
MailBee.ImapMail.Imap imap = new MailBee.ImapMail.Imap();
var xOAuthkey = MailBee.OAuth2.GetXOAuthKeyStatic( authResult.Account.Username, authResult.AccessToken );
imap.Connect( "imap.outlook.com", 993 );
imap.Login( null, xOAuthkey, AuthenticationMethods.SaslOAuth2, AuthenticationOptions.None, null );
imap.SelectFolder( "INBOX" );
var count = imap.MessageCount.ToString();
return authResult.AccessToken;
}
It feels very much like a step missed, which can store the information to make subsequent requests and I would love a pointer in the right direction please.
When you create your PublicClientApplication, it provides you with the UserTokenCache.
UserTokenCache implements interface ITokenCache, which defines events to subscribe to token cache serialization requests as well as methods to serialize or de-serialize the cache at various formats.
You should create your own TokenCacheBuilder, which can store the tokens in file/memory/database etc.. and then use the events to subscribe to to token cache request.
An example of a FileTokenCacheProvider:
public abstract class MsalTokenCacheProviderBase
{
private Microsoft.Identity.Client.ITokenCache cache;
private bool initialized = false;
public MsalTokenCacheProviderBase()
{
}
public void InitializeCache(Microsoft.Identity.Client.ITokenCache tokenCache)
{
if (initialized)
return;
cache = tokenCache;
cache.SetBeforeAccessAsync(OnBeforeAccessAsync);
cache.SetAfterAccessAsync(OnAfterAccessAsync);
initialized = true;
}
private async Task OnAfterAccessAsync(TokenCacheNotificationArgs args)
{
if (args.HasStateChanged)
{
if (args.HasTokens)
{
await StoreAsync(args.Account.HomeAccountId.Identifier,
args.TokenCache.SerializeMsalV3()).ConfigureAwait(false);
}
else
{
// No token in the cache. we can remove the cache entry
await DeleteAsync<bool>(args.SuggestedCacheKey).ConfigureAwait(false);
}
}
}
private async Task OnBeforeAccessAsync(TokenCacheNotificationArgs args)
{
if (!string.IsNullOrEmpty(args.SuggestedCacheKey))
{
byte[] tokenCacheBytes = await GetAsync<byte[]>(args.SuggestedCacheKey).ConfigureAwait(false);
args.TokenCache.DeserializeMsalV3(tokenCacheBytes, shouldClearExistingCache: true);
}
}
protected virtual Task OnBeforeWriteAsync(TokenCacheNotificationArgs args)
{
return Task.CompletedTask;
}
public abstract Task StoreAsync<T>(string key, T value);
public abstract Task DeleteAsync<T>(string key);
public abstract Task<T> GetAsync<T>(string key);
public abstract Task ClearAsync();
}
And the MsalFileTokenCacheProvider:
public sealed class MsalFileTokenCacheProvider : MsalTokenCacheProviderBase
{
private string basePath;
public MsalFileTokenCacheProvider(string basePath)
{
this.basePath = basePath;
}
public override Task ClearAsync()
{
throw new NotImplementedException();
}
public override Task DeleteAsync<T>(string key)
{
if (string.IsNullOrEmpty(key))
{
throw new ArgumentException("Key MUST have a value");
}
string path = Path.Combine(basePath, key + ".json");
if (File.Exists(path))
File.Delete(path);
return Task.FromResult(true);
}
public override Task<T> GetAsync<T>(string key)
{
if (string.IsNullOrEmpty(key))
{
throw new ArgumentException("Key MUST have a value");
}
string path = Path.Combine(basePath, key + ".json");
if (File.Exists(path))
{
T value = JsonConvert.DeserializeObject<T>(File.ReadAllText(path));
return Task.FromResult(value);
}
else
return Task.FromResult(default(T));
}
public override Task StoreAsync<T>(string key, T value)
{
string contents = JsonConvert.SerializeObject(value);
string path = Path.Combine(basePath, key + ".json");
File.WriteAllText(path, contents);
return Task.FromResult(value);
}
}
So based on your code you will have:
PublicClientApplication =
PublicClientApplicationBuilder.Create( "5896de31-e251-460c-9dc2-xxxxxxxxxxxx" )
.WithRedirectUri( "https://login.microsoftonline.com/common/oauth2/nativeclient" )
.WithAuthority( AzureCloudInstance.AzurePublic, ConfigurationManager.AppSettings["tenantId"] )
.Build();
MsalFileTokenCacheProvider cacheProvider = new MsalFileTokenCacheProvider("TokensFolder");
cacheProvider.InitializeCache(PublicClientApplication.UserTokenCache);
//var scopes = new string[] { "email", "offline_access", "profile", "User.Read", "Mail.Read" };
var scopes = new string[] { "https://outlook.office.com/IMAP.AccessAsUser.All" };
// when you call the below code, the PublicClientApplication will use your token cache
//provider in order to get the required Account. You should also use the
//PublicClientApplication.GetAccountAsync(key) which will use the token cache provider for
//the specific account that you want to get the token. If there is an account you could
//just call the AcquireTokenSilent method. The acquireTokenSilent method will take care of the token expiration and will refresh if needed.
//Please bare in mind that in some circumstances the AcquireTokenSilent method will fail and you will have to use the AcquireTokenInteractive method again. //Example of this would be when the user changes password, or has removed the access to your Application via their Account.
var accounts = await PublicClientApplication.GetAccountsAsync();
var firstAccount = accounts.FirstOrDefault();
Please refer to the following documentation from Microsoft.
https://learn.microsoft.com/en-us/azure/active-directory/develop/msal-net-token-cache-serialization
I have Generated Microsoft Graph app in ASP.NET MVC platform, that I have downloaded from Microsoft Graph site. I need to access the shared mail folder not sure exactly how can I get that?? In the following code I can access my mailFolder but not shared mailfolder!
public static async Task<IEnumerable<MailFolder>> GetMailFolderAsync()
{
var graphClient = GetAuthenticatedClient();
var mailFolder = await graphClient.Me.MailFolders.Request().GetAsync();
var sharedMailFolder = await graphClient.Users.Request().GetAsync();
return mailFolder;
}
Also, I want to know in above code where I can pass the parameter to access next page or all pages??
private static GraphServiceClient GetAuthenticatedClient()
{
return new GraphServiceClient(
new DelegateAuthenticationProvider(
async (requestMessage) =>
{
string signedInUserId = ClaimsPrincipal.Current.FindFirst(ClaimTypes.NameIdentifier).Value;
SessionTokenStore tokenStore = new SessionTokenStore(signedInUserId,
new HttpContextWrapper(HttpContext.Current));
var idClient = new ConfidentialClientApplication(
appId, redirectUri, new ClientCredential(appSecret),
tokenStore.GetMsalCacheInstance(), null);
var accounts = await idClient.GetAccountsAsync();
var result = await idClient.AcquireTokenSilentAsync(
graphScopes.Split(' '), accounts.FirstOrDefault());
requestMessage.Headers.Authorization =
new AuthenticationHeaderValue("Bearer", result.AccessToken);
}));
I think it is not possible to access shared folders I am investigating as well. In regards to the question of getting pages, as soon as you get the first request
public static async Task<IEnumerable<MailFolder>> GetMailFolderAsync()
{
var graphClient = GetAuthenticatedClient();
var mailFolder = await graphClient.Me.MailFolders.Request().GetAsync();
var sharedMailFolder = await graphClient.Users.Request().GetAsync();
return mailFolder;
}
then you can review for example, mailFolder.NextPageRequest, if it is not null then you can request it by doing mailFolder.NextPageRequest.GetAsync() and you can use it as a loop conditional
while(mailfoldersCollection != null) {
// Do your stuff with items within for(var folder in mailfoldersCollection) {}
// when read all items in CurrentPage then
if (mailFolder.NextPageRequest != null) {
mailfoldersCollection = await mailFolder.NextPageRequest.GetAsync();
}
hope it works for you!
I've made a crawler on a website it has to read a website and fetch some values from it website.I've made use quartz.net and Asp.net MVC. but what is my problem? in fact,My problem is that for example,he/she the first time start for scraping a "Stackoverflow.com" about 5 hours and then he/she is decided stop "stackoverflow.com" and start a scrap new website.So,How can i do it?
[HttpPost]
public ActionResult Index(string keyword, string url)
{
IScheduler scheduler = StdSchedulerFactory.GetDefaultScheduler();
scheduler.Start();
IJobDetail job = JobBuilder.Create<ScrapJob>()
.WithIdentity("MyScrapJob")
.UsingJobData("url", url)
.UsingJobData("keyword", keyword)
.Build();
ITrigger trigger = TriggerBuilder.Create().WithDailyTimeIntervalSchedule(
s => s.WithIntervalInSeconds(20).OnEveryDay().StartingDailyAt(TimeOfDay.HourAndMinuteOfDay(0, 0))
).Build();
scheduler.ScheduleJob(job, trigger);
return View(db.Scraps.ToList());
}
public List<ScrapJob> Scraping(string url, string keyword)
{
int count = 0;
List<ScrapJob> scraps = new List<ScrapJob>();
ScrapJob scrap = null;
HtmlDocument doc = new HtmlDocument();
try
{
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var stream = response.GetResponseStream())
{
doc.Load(stream, Encoding.GetEncoding("UTF-8"));
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()"))
{
if (node.InnerText.ToString().Contains(keyword))
{
count++;
scrap = new ScrapJob { Keyword = keyword, DateTime = System.DateTime.Now.ToString(), Count = count, Url = url };
}
}
}
}
}
catch (WebException ex)
{
Console.WriteLine(ex.Message);
}
// scraps.Add(scrap);
var isExist = db.Scraps.Where(s => s.Keyword == keyword && s.Count == scrap.Count).Max(s => (int?)s.Id) ?? 0;
if (isExist == 0)
{
db.Scraps.Add(scrap);
db.SaveChanges();
}
return scraps;
}
public void Execute(IJobExecutionContext context)
{
//ScrapJob scraps = null;
using (var scrap = new ScrapJob())
{
JobKey key = context.JobDetail.Key;
JobDataMap dataMap = context.JobDetail.JobDataMap;
string url = dataMap.GetString("url");
string keyword = dataMap.GetString("keyword");
scrap.Scraping(url, keyword);
}
}
I'm not sure why you picked QUARTZ, but here is something that I think will help you.
This is a code sample that interrupt and delete job by unique identifier
public void DeleteJob(JobKey jobKey)
{
var scheduler = StdSchedulerFactory.GetDefaultScheduler();
var executingJobs = scheduler.GetCurrentlyExecutingJobs();
if (executingJobs.Any(x => x.JobDetail.Key.Equals(jobKey)))
{
scheduler.Interrupt(jobKey);
}
scheduler.DeleteJob(jobKey);
}
But I believe you need to define what behavior you expect, because it can be a bit more complex for example:
If you like to just pause the job and resume it after finish with the other website /persist some state and progress/ or just log the progress
If you want them to run in parallel and process multiple sites simultaneously. (You just need to give different names instead of the hardcoded .WithIdentity("MyScrapJob") )
Also with scheduler.GetCurrentlyExecutingJobs() you can get the currently executing jobs, show them to the user and let him decide what to do.
Also looking at your action method I'm not sure whether this is the behavior you expect of that trigger. Also what bothers me is db.Scraps.ToList() you will materialize the whole table you can consider adding pagination as well in your case is not necessary because you will only show count but its mandatory if you have a lot of records in the grid.
About the scraping method
Instead of
var isExist = db.Scraps.Where(s => s.Keyword == keyword && s.Count == scrap.Count).Max(s => (int?)s.Id) ?? 0;
you can use .Any
var exists = db.Scraps.Any(s => s.Keyword == keyword && s.Count == scrap.Count);
this will return boolean and you can check if(!exists)
You can check https://github.com/AngleSharp/AngleSharp it's high performance web parsing library. Super easy to use as well.
I see possibility of duplicated records by keyword if you check them by keyword and count - not sure whether you want this or just want to update the existing record with it's counter
Good luck! I hope this answer helps you :)