How to process an excel file more efficiently? - asp.net-mvc

I have an excel file that enters through my MVC web app that I have to process and do things with. So I receive my file on the controller
public class StripExcelDocument
{
public DataSet Convert(HttpPostedFileBase file)
{
return GetDataFromExcel(file.InputStream);
}
private DataSet GetDataFromExcel(Stream target)
{
var excelReader = ExcelReaderFactory.CreateOpenXmlReader(target);
excelReader.IsFirstRowAsColumnNames = true;
return excelReader.AsDataSet();
}
}
and I send it through a processor I have created that is just a large conditional statement and then based on the outcome it gets sent to a specific table in a database.
public class Processor{
public Result Process
{
if (FirstCondition(string foo, int bar)){
SetResult(foo, bar);
}
if (SecondCondition(string foo, int bar)){
SetResult(foo, bar);
}
if (ThirdCondition(string foo, int bar)){
SetResult(foo, bar);
}
//etc...
Obviously this works great when the user wants to enter a single record but when processing large excel files it either:
A: Times out on the server.
B: Leaves the user staring at a screen for a while.
What is a more effective way to deal with bulk processing large amounts of data from an excel file, where the records will need to be their own entity in the database?

Try to keep it as last option. Because SqlBulkCopy belongs to some older versions of .net and may be there are some better things available now.
Do the Bulk Import for all the records of excel sheet in some table. So you can use SqlBulkCopy.
Create a Stored proc and based upon the conditions, use the Insert/Update in one shot.
The above approach in Stored proc will be faster as comparing to Linq operations in code behind.
A: Times out on the server. B: Leaves the user staring at a screen for
a while.
Do it asynchronously.
Example Code
class ThreadTest
{
public ActionResult Main()
{
Thread t = new Thread (WriteY);
t.Start();
return View();
}
void WriteY()
{
}
}
For TimeOut
sqlcommand.CommandTimeout = 0 will set it to infinite

Related

Run a long running job using the fire and forget strategy with Thymeleaf in Reactor and r2dbc

I am trying to achieve a fire and forget type of effect with webflux, thymeleaf and r2dbc. I have two endpoints, one to add an employee and another to list all employees. I want to simulate a slow database access so I have a thread sleep of several seconds before I call the DB.
Now, the effect I expect to see when I call /add is that my controller returns immediately and the page add is rendered at once. However, I'm not sure how to achieve this. With the current code nap() happens before I can return a Mono. In other words, I'm trying to run a long running job in the background without blocking the controller.
I have the following model:
#Data
public class Employee {
#Id
private Long id;
private String name;
}
The annotated controller has following methods:
#GetMapping(value = "/")
public String home(Model model) {
model.addAttribute("employees", repo.findAll());
return "home";
}
#GetMapping(value = "/add")
public Mono<String> add() {
return Mono
.defer(this::getEmployee)
.doOnNext(e -> repo.save(e).subscribe())
.thenReturn("add");
}
private Mono<Employee> getEmployee() {
final var e = new Employee();
e.setName("John");
nap(); // calls thread sleep for a few sec
return Mono.just(e);
}
My question is how can I wrap the long running job but at the same time preserve a Controller based notation (instead of functional) and also render the add page immediately? I am aware of some similar questions like this and this, but I don't seem to be able to achieve the behaviour I need.
Edit:
lkatiforis' suggestion and this SO question were a push in the right direction. I had to adjust their example a bit because the employee didn't persist. The change is in add():
public String add() {
Mono.just(employee)
.delayElement(Duration.ofSeconds(5))
.doOnNext(e -> repo.save(e).subscribe())
.subscribe();
return "add";
}
employee is just an instance of Employee with a populated name. The delayElement operator pauses for 5 seconds without blocking. Finally, I had to call subscribe() on repo.save() and at the end in order for it to work. I assume that if subscribe() is only called on doOnNext() then the main chain that starts with Mono.just() is never executed.
I guess nap() method executes Thread.sleep or something similar, right? Thread.sleep is blocking the main thread making the application unresponsive. You can use delayElements operator to simulate a long-running operation:
private Mono<Employee> getEmployee() {
final var e = new Employee();
e.setName("John");
return Mono.just(e).delayElement(Duration.ofSeconds(5));
}

How to pass data down the reactive chain

Whenever I need to pass data down the reactive chain I end up doing something like this:
public Mono<String> doFooAndPassDtoAsMono(Dto dto) {
return Mono.just(dto)
.flatMap(dtoMono -> {
Mono<String> result = // remote call returning a Mono
return Mono.zip(Mono.just(dtoMono), result);
})
.flatMap(tup2 -> {
return doSomething(tup2.getT1().getFoo(), tup2.getT2()); // do something that requires foo and result and returns a Mono
});
}
Given the below sample Dto class:
class Dto {
private String foo;
public String getFoo() {
return this.foo;
}
}
Because it often gets tedious to zip the data all the time to pass it down the chain (especially a few levels down) I was wondering if it's ok to simply reference the dto directly like so:
public Mono<String> doFooAndReferenceParam(Dto dto) {
Mono<String> result = // remote call returning a Mono
return result.flatMap(result -> {
return doSomething(dto.getFoo(), result); // do something that requires foo and result and returns a Mono
});
}
My concern about the second approach is that assuming a subscriber subscribes to this Mono on a thread pool would I need to guarantee that Dto is thread safe (the above example is simple because it just carries a String but what if it's not)?
Also, which one is considered "best practice"?
Based on what you have shared, you can simply do following:
public Mono<String> doFooAndPassDtoAsMono(Dto dto) {
return Mono.just(dto.getFoo());
}
The way you are using zip in the first option doesn't solve any purpose. Similarly, the 2nd option will not work either as once the mono is empty then the next flat map will not be triggered.
The case is simple if
The reference data is available from the beginning (i.e. before the creation of the chain), and
The chain is created for processing at most one event (i.e. starts with a Mono), and
The reference data is immutable.
Then you can simple refer to the reference data in a parameter or local variable – just like in your second solution. This is completely okay, and there are no concurrency issues.
Using mutable data in reactive flows is strongly discouraged. If you had a mutable Dto class, you might still be able to use it (assuming proper synchronization) – but this will be very surprising to readers of your code.

Amazon SWF queries

Over the last couple of years, I have done a fair amount of work on Amazon SWF, but the following points are still unclear to me and I am not able to find any straight forward answers on any forums yet.
These are pretty basic requirements I suppose, sure others might have come across too. Would be great if someone can clarify these.
Is there a simple way to return a workflow execution result (maybe just something as simple as boolean) back to workflow starter?
Is there a way to catch Activity timeout exception, so that we can do run customised actions in such scenarios?
Why doesn't WorkflowExecutionHistory contains Activities, why just Events?
Why there is no simple way of restarting a workflow from the point it failed?
I am considering to use SWF for more business processes at my workplace, but these limitations/doubts are holding me back!
FINAL WORKING SOLUTION
public class ReturnResultActivityImpl implements ReturnResultActivity {
SettableFuture future;
public ReturnResultActivityImpl() {
}
public ReturnResultActivityImpl(SettableFuture future) {
this.future = future;
}
public void returnResult(WorkflowResult workflowResult) {
System.out.print("Marking future as Completed");
future.set(workflowResult);
}
}
public class WorkflowResult {
public WorkflowResult(boolean s, String n) {
this.success = s;
this.note = n;
}
private boolean success;
private String note;
}
public class WorkflowStarter {
#Autowired
ReturnResultActivityClient returnResultActivityClient;
#Autowired
DummyWorkflowClientExternalFactory dummyWorkflowClientExternalFactory;
#Autowired
AmazonSimpleWorkflowClient swfClient;
String domain = "test-domain;
boolean isRegister = true;
int days = 7;
int terminationTimeoutSeconds = 5000;
int threadPollCount = 2;
int taskExecutorThreadCount = 4;
public String testWorkflow() throws Exception {
SettableFuture<WorkflowResult> workflowResultFuture = SettableFuture.create();
String taskListName = "testTaskList-" + RandomStringUtils.randomAlphabetic(8);
ReturnResultActivity activity = new ReturnResultActivityImpl(workflowResultFuture);
SpringActivityWorker activityWorker = buildReturnResultActivityWorker(taskListName, Arrays.asList(activity));
DummyWorkflowClientExternalFactory factory = new DummyWorkflowClientExternalFactoryImpl(swfClient, domain);
factory.getClient().doSomething(taskListName)
WorkflowResult result = workflowResultSettableFuture.get(20, TimeUnit.SECONDS);
return "Call result note - " + result.getNote();
}
public SpringActivityWorker buildReturnResultActivityWorker(String taskListName, List activityImplementations)
throws Exception {
return setupActivityWorker(swfClient, domain, taskListName, isRegister, days, activityImplementations,
terminationTimeoutSeconds, threadPollCount, taskExecutorThreadCount);
}
}
public class Workflow {
#Autowired
private DummyActivityClient dummyActivityClient;
#Autowired
private ReturnResultActivityClient returnResultActivityClient;
#Override
public void doSomething(final String resultActivityTaskListName) {
Promise<Void> activityPromise = dummyActivityClient.dummyActivity();
returnResult(resultActivityTaskListName, activityPromise);
}
#Asynchronous
private void returnResult(final String taskListname, Promise waitFor) {
ActivitySchedulingOptions schedulingOptions = new ActivitySchedulingOptions();
schedulingOptions.setTaskList(taskListname);
WorkflowResult result = new WorkflowResult(true,"All successful");
returnResultActivityClient.returnResult(result, schedulingOptions);
}
}
The standard pattern is to host a special activity in the workflow starter process that is used to deliver the result. Use a process specific task list to make sure that it is routed to a correct instance of the starter. Here are the steps to implement it:
Define an activity to receive the result. For example "returnResultActivity". Make this activity implementation to complete the Future passed to its constructor upon execution.
When the workflow is started it receives "resultActivityTaskList" as an input argument. At the end the workflow calls this activity with a workflow result. The activity is scheduled on the passed task list.
The workflow starter creates an ActivityWorker and an instance of a Future. Then it creates an instance of "returnResultActivity" with that future as a constructor parameter.
Then it registers the activity instance with the activity worker and configures it to poll on a randomly generated task list name. Then it calls "start workflow execution" passing the generated task list name as an input argument.
Then it wait on the Future to complete. The future.get() is going to return the workflow result.
Yes, if you are using the AWS Flow Framework a timeout exception is thrown when activity is timed out. If you are not using the Flow framework than you are making your life 100 times harder. BTW the workflow timeout is thrown into a parent workflow as a timeout exception as well. It is not possible to catch a workflow timeout exception from within the timing out instance itself. In this case it is recommended to not rely on workflow timeout, but just create a timer that would fire and notify workflow logic that some business event has timed out.
Because a single activity execution has multiple events associated to it. It should be pretty easy to write code that converts history to whatever representation of activities you like. Such code would just match the events that relate to each activities. Each event always has a reference to the related events, so it is easy to roll them up into higher level representation.
Unfortunately there is no easy answer to this one. Ideally SWF would support restarting workflow by copying its history up to the failure point. But it is not supported. I personally believe that workflow should be written in a way that it never fails but always deals with failures without failing. Obviously it doesn't work in case of failures due to unexpected conditions. In this case writing workflow in a way that it can be restarted from the beginning is the simplest approach.

Grails - Command object, service method

I'm not a programming savvy person, so please bear with me.
I've read blog entries and docs about command object. I've never used it and was wondering if I should. (I probably should...)
My project requires parsing, sorting, calculating, and saving results into database when users upload files.
So according to one of the blog entries I read and its corresponding github code,
1) SERVICE should receive file uploads, parse uploaded files (mainly docs and pdfs), sort parsed data using RegEx, and calculate data,
2) COMMAND OBJECT should call SERVICE, collect results and send results back to controller, and save results into the database,
3) CONTROLLER should receive request from VIEW, get results from COMMAND OBJECT, and send results back to VIEW.
Did I understand correctly?
Thanks.
I found this to be the best setup. Here is an example that I use on production:
Command Object (to carry data and ensure their validity):
#grails.validation.Validateable
class SearchCommand implements Serializable {
// search query
String s
// page
Integer page
static constraints = {
s nullable: true
page nullable: true
}
}
Controller (directs a request to a Service and then gets a response back from the Service and directs this response to a view):
class SomeController {
//inject service
def someService
def search(SearchCommand cmd) {
def result = someService.search(cmd)
// can access result in .gsp as ${result} or in other forms
render(view: "someView", model: [result: result])
}
}
Service (handles business logic and grabs data from Domain(s)):
class SomeService {
def search(SearchCommand cmd) {
if(cmd.hasErrors()) {
// errors found in cmd.errors
return
}
// do some logic for example calc offset from cmd.page
def result = Stuff.searchAll(cmd.s, offset, max)
return result
}
}
Domain (all database queries are handled here):
class Stuff {
String name
static constraints = {
name nullable: false, blank: false, size: 1..30
}
static searchAll(String searchQuery, int offset, int max) {
return Stuff.executeQuery("select s.name from Stuff s where s.name = :searchQuery ", [searchQuery: searchQuery, offset: offset, max:max])
}
}
Yes, you understood it correctly except the one thing: command object shouldn't save the data to DB - let service to do that. The other advantage of command object is data binding and validation of data from the client. Read more about command objects here grails command object docs
You can also find helpful information regarding your question in this article
grails best practices
I guess not. Its not really related to whether the save is done in a service it should always attempt to carry out complex stuff and specifically db stuff in a service. so that is regardless. I tend to not use command object but have got hooked on helper classes aka beans that sit in src/main/groovy and do all of the validation and formatting. I just did a form and in it has feedback and reason.
Initially I thought I would get away with
def someAction(String feedback, String reason) {
someService.doSomething(feedback,reason)
}
But then I looked closed and my form was firstly a textarea then the selection objects were bytes so above was not working and to simply fix it without adding the complexity to my controller/service I did this:
packe some.package
import grails.validation.Validateable
class SomeBean implements Validateable {
User user
byte reason
String feedback
static constraints = {
user(nullable: true)
reason(nullable:true, inList:UsersRemoved.REASONS)
feedback(nullable:true)
}
void setReason(String t) {
reason=t as byte
}
void setFeedback(String t) {
feedback=t?.trim()
}
}
Now my controller
class SomeController {
def userService
def someService
def doSomething(SomeBean bean){
bean.user = userService.currentUser
if (!bean.validate()) {
flash.message=bean.errors.allErrors.collect{g.message([error : it])}
render view: '/someTemplate', model: [instance: bean,template:'/some/template']
return
}
someService.doSomeThing(bean)
}
}
Now my service
Class SomeService {
def doSomeThing(SomeBean bean) {
if (bean.user=='A') {
.....
}
}
All of that validation would have still had to have been done somewhere, you say no validation but in a good model you should do validation and set things to be stored in proper structures to reduce overloading your db over time. difficult to explain but in short i am talking about your domain class objects and ensuring you are not setting up String something string somethingelse and then not even defining their lenghts etc. be strict and validate
if you have a text area this will be stored in the back end - so you will need to trim it like above - you will need to ensure the input does not exceed the max character of the actual db structure which if not defined will probably be 255
and by doing
static constraints = {
user(nullable: true)
reason(min:1, max:255, nullable:true, inList:UsersRemoved.REASONS)
Has already invalidated it through the bean.validate() in the controller if the user exceeded somehow my front end checks and put in more than 255.
This stuff takes time be patient
Edited to finally add in that example byte - is one to be careful of -
When adding any String or what ever I have started to define the specific like this and in the case of byte if it is a boolean true false - fine if not then define it as a tinyint
static mapping = {
//since there is more than 1 type in this case
reason(sqlType:'tinyint(1)')
feedback(sqlType:'varchar(1000)')
// name(sqlType:'varchar(70)')
}
If you then look at your tables created in the db you should find they have been created as per definition rather than standard 255 varchar which I think is the default for a declared String.

breeze: creating inheritance in client-side model

I'm having a weird issue with the configureMetadataStore.
My model:
class SourceMaterial {
List<Job> Jobs {get; set;}
}
class Job {
public SourceMaterial SourceMaterial {get; set;}
}
class JobEditing : Job {}
class JobTranslation: Job {}
Module for configuring Job entities:
angular.module('cdt.request.model').factory('jobModel', ['breeze', 'dataService', 'entityService', modelFunc]);
function modelFunc(breeze, dataService, entityService) {
function Ctor() {
}
Ctor.extend = function (modelCtor) {
modelCtor.prototype = new Ctor();
modelCtor.prototype.constructor = modelCtor;
};
Ctor.prototype._configureMetadataStore = _configureMetadataStore;
return Ctor;
// constructor
function jobCtor() {
this.isScreenDeleted = null;
}
function _configureMetadataStore(entityName, metadataStore) {
metadataStore.registerEntityTypeCtor(entityName, jobCtor, jobInitializer);
}
function jobInitializer(job) { /* do stuff here */ }
}
Module for configuring JobEditing entities:
angular.module('cdt.request.model').factory(jobEditingModel, ['jobModel', modelFunc]);
function modelFunc(jobModel) {
function Ctor() {
this.configureMetadataStore = configureMetadataStore;
}
jobModel.extend(Ctor);
return Ctor;
function configureMetadataStore(metadataStore) {
return this._configureMetadataStore('JobEditing', metadataStore)
}
}
Module for configuring JobTranslation entities:
angular.module('cdt.request.model').factory(jobTranslationModel, ['jobModel', modelFunc]);
function modelFunc(jobModel) {
function Ctor() {
this.configureMetadataStore = configureMetadataStore;
}
jobModel.extend(Ctor);
return Ctor;
function configureMetadataStore(metadataStore) {
return this._configureMetadataStore('JobTranslation', metadataStore)
}
}
Then Models are configured like this :
JobEditingModel.configureMetadataStore(dataService.manager.metadataStore);
JobTranslationModel.configureMetadataStore(dataService.manager.metadataStore);
Now when I call createEntity for a JobEditing, the instance is created and at some point, breeze calls setNpValue and adds the newly created Job to the np SourceMaterial.
That's all fine, except that it is added twice !
It happens when rawAccessorFn(newValue); is called. In fact it is called twice.
And if I add a new type of job (hence I register a new type with the metadataStore), then the new Job is added three times to the np.
I can't see what I'm doing wrong. Can anyone help ?
EDIT
I've noticed that if I change:
metadataStore.registerEntityTypeCtor(entityName, jobCtor, jobInitializer);
to
metadataStore.registerEntityTypeCtor(entityName, null, jobInitializer);
Then everything works fine again ! So the problem is registering the same jobCtor function. Should that not be possible ?
Our Bad
Let's start with a Breeze bug, recently discovered, in the Breeze "backingStore" model library adapter.
There's a part of that adapter which is responsible for rewriting data properties of the entity constructor so that they become observable and self-validating and it kicks in when register a type with registerEntityTypeCtor.
It tries to keep track of which properties it has rewritten. The bug is that it records the fact of rewrite on the EntityType rather than on the constructor function. Consequently, every time you registered a new type, it failed to realize that it had already rewritten the properties of the base Job type and re-wrapped the property.
This was happening to you. Every derived type that you registered re-wrapped/re-wrote the properties of the base type (and of its base type, etc).
In your example, a base class Job property would be re-written 3 times and its inner logic executed 3 times if you registered three of its sub-types. And the problem disappeared when you stopped registering constructors of sub-types.
We're working on a revised Breeze "backingStore" model library adapter that won't have this problem and, coincidentally, will behave better in test scenarios (that's how we found the bug in the first place).
Your Bad?
Wow that's some hairy code you've got there. Why so complicated? In particular, why are you adding a one-time MetadataStore configuration to the prototypes of entity constructor functions?
I must be missing something. The code to register types is usually much smaller and simpler. I get that you want to put each type in its own file and have it self-register. The cost of that (as you've written it) is enormous bulk and complexity. Please reconsider your approach. Take a look at other Breeze samples, Zza-Node-Mongo for example.
Thanks for reporting the issue. Hang in there with us. A fix should be arriving soon ... I hope in the next release.

Resources