Definition of integration point in Abaqus - abaqus

I need to know the definition of "integration points" in abaqus subroutines.
I'm new to abaqus software and I'm waiting for your help

It is now 2.5 years after the OP asked this question, so my answer is probably more for anyone who has followed a link here, hoping for some insight. On the grounds that FEM programming is special,0 I will try to answer this question rather than flag it as off-topic. Anyway, some of my answer is applicable to FEM in general, some is specific to Abaqus.
Quick check:
If you're only asking for the specific numerical value to use for the (usual or standard) location of integration points, then the answer is that it depends. Luckily, standard values are widely available for a variety of elements (see resources below).
However, I assume you're asking about writing a User-Element (UEL) subroutine but are not yet familiar with how elements are formulated, or what an integration point is.
The answer: In the standard displacement-based FEM the constitutive response of an individual finite element is usually obtained by numerical integration (aka quadrature) at one or more points on or within the element. How many and where these points are located depends on the element type, certain performance tradeoffs, etc, and the particular integration technique being used. Integration techniques that I have seen used for continuum (solid) finite elements include:
More Common: Gauss integration -- the number & position of sampling points are determined by the Gauss quadrature rule used; nodes are not included in the sampling domain of [-1,1].
Less Common: Newton-Cotes integration -- evenly spaced sampling points; includes the nodes in the sampling domain of (-1,1).
In my experience, the standard practice by far is to use Gauss quadrature or reduced integration methods (which are often variations of Gauss quadrature). In Gauss quadrature, the location of the integration points are taken at special ("optimal") points within the element known as Gauss points which have been shown to provide a high level of reliably accurate solutions for a given level of computational expense - at least for the typical polynomial functions used for many isoparametric finite elements. Other integration techniques have been found to be competitive in some cases1 but Gauss quadrature is certainly the gold standard. There are other techniques that I'm not familiar with.
Practical advice: Assuming an isoparametric formulation, in the UEL you use "element shape functions" and the primary field variables defined by the nodal degrees of freedom (with a solid mechanics focus, these are typically the displacements) to calculate the element strains, stresses, etc. at each integration point. If this doesn't make sense to you, see resources below.
Note that if you need the stresses at the nodes (or at any other point) you must extrapolate them from the integration points, again using the shape functions, or calculate/integrate directly at the nodes.
Suggested resources:
Please: If you're writing a user subroutine you should already know what an integration point is. I'm sorry, but that's just how it is. You have to know at least the basics before you attempt to write a UEL.
That said, I think it's great that you're interested in programming for FEA/FEM. If you're motivated but not at university where you can enroll in an FEM course or two, then there are a number of resources available, from Massive Open Online Courses (MOOCs), to a plethora of textbooks - I generally recommend anything written by Zienkiewicz. For a readable yet "solid" introduction with an emphasis on solid mechanics, I like Concepts and Applications of Finite Element Analysis, 4th Edition, by Cook et al (aka the "Cook Book"). Good luck!
0 You typically need a lot of background before you even ask the right questions.
1 Trefethen, 2008, "Is Gauss Quadrature Better than Clenshaw-Curtis?", DOI 10.1137/060659831

Your question is not really clear.
Do you mean in the python environment? You have section points for shell elements which are trough thickness you set these through your shell section. The amount of integration points depend on your element type.
You can find a lot of info in the Abaqus scripting manual. For example

An integration point in FEM where the primary variables are solved. Just keep that in mind. In user subroutines in Abaqus, the calculation takes place at each integration point. Remember that and go forward. If you are unsatisfied, take a look at any FEM book for the definition/explanation of the integration point. It is not dependent on subroutines.

An integration point is one of the nodal values within an element. For example an eight node C3D8R continuum brick element has eight integration points, one at each corner of the brick.
Also within a subroutine other variables such as state variables, SVARS, or stored at the integration points so if your element has say 4 SVARS you need to keep track of then there will 8 * 4 = 32 SVARS in the entire 8 node element.
I hope this answers your question.


Optimize deep Q network with long episode

I am working on a problem for which we aim to solve with deep Q learning. However, the problem is that training just takes too long for each episode, roughly 83 hours. We are envisioning to solve the problem within, say, 100 episode.
So we are gradually learning a matrix (100 * 10), and within each episode, we need to perform 100*10 iterations of certain operations. Basically we select a candidate from a pool of 1000 candidates, put this candidate in the matrix, and compute a reward function by feeding the whole matrix as the input:
The central hurdle is that the reward function computation at each step is costly, roughly 2 minutes, and each time we update one entry in the matrix.
All the elements in the matrix depend on each other in the long term, so the whole procedure seems not suitable for some "distributed" system, if I understood correctly.
Could anyone shed some lights on how we look at the potential optimization opportunities here? Like some extra engineering efforts or so? Any suggestion and comments would be appreciated very much. Thanks.
======================= update of some definitions =================
0. initial stage:
a 100 * 10 matrix, with every element as empty
1. action space:
each step I will select one element from a candidate pool of 1000 elements. Then insert the element into the matrix one by one.
2. environment:
each step I will have an updated matrix to learn.
An oracle function F returns a quantitative value range from 5000 ~ 30000, the higher the better (roughly one computation of F takes 120 seconds).
This function F takes the matrix as the input and perform a very costly computation, and it returns a quantitative value to indicate the quality of the synthesized matrix so far.
This function is essentially used to measure some performance of system, so it do takes a while to compute a reward value at each step.
3. episode:
By saying "we are envisioning to solve it within 100 episodes", that's just an empirical estimation. But it shouldn't be less than 100 episode, at least.
4. constraints
Ideally, like I mentioned, "All the elements in the matrix depend on each other in the long term", and that's why the reward function F computes the reward by taking the whole matrix as the input rather than the latest selected element.
Indeed by appending more and more elements in the matrix, the reward could increase, or it could decrease as well.
5. goal
The synthesized matrix should let the oracle function F returns a value greater than 25000. Whenever it reaches this goal, I will terminate the learning step.
Honestly, there is no effective way to know how to optimize this system without knowing specifics such as which computations are in the reward function or which programming design decisions you have made that we can help with.
You are probably right that the episodes are not suitable for distributed calculation, meaning we cannot parallelize this, as they depend on previous search steps. However, it might be possible to throw more computing power at the reward function evaluation, reducing the total time required to run.
I would encourage you to share more details on the problem, for example by profiling the code to see which component takes up most time, by sharing a code excerpt or, as the standard for doing science gets higher, sharing a reproduceable code base.
Not a solution to your question, just some general thoughts that maybe are relevant:
One of the biggest obstacles to apply Reinforcement Learning in "real world" problems is the astoundingly large amount of data/experience required to achieve acceptable results. For example, OpenAI in Dota 2 game colletected the experience equivalent to 900 years per day. In the original Deep Q-network paper, in order to achieve a performance close to a typicial human, it was required hundres of millions of game frames, depending on the specific game. In other benchmarks where the input are not raw pixels, such as MuJoCo, the situation isn't a lot better. So, if you don't have a simulator that can generate samples (state, action, next state, reward) cheaply, maybe RL is not a good choice. On the other hand, if you have a ground-truth model, maybe other approaches can easily outperform RL, such as Monte Carlo Tree Search (e.g., Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning or Simple random search provides a competitive approach to reinforcement learning). All these ideas a much more are discussed in this great blog post.
The previous point is specially true for deep RL. The fact of approximatting value functions or policies using a deep neural network with millions of parameters usually implies that you'll need a huge quantity of data, or experience.
And regarding to your specific question:
In the comments, I've asked a few questions about the specific features of your problem. I was trying to figure out if you really need RL to solve the problem, since it's not the easiest technique to apply. On the other hand, if you really need RL, it's not clear if you should use a deep neural network as approximator or you can use a shallow model (e.g., random trees). However, these questions an other potential optimizations require more domain knowledge. Here, it seems you are not able to share the domain of the problem, which could be due a numerous reasons and I perfectly understand.
You have estimated the number of required episodes to solve the problem based on some empirical studies using a smaller version of size 20*10 matrix. Just a caution note: due to the curse of the dimensionality, the complexity of the problem (or the experience needed) could grow exponentially when the state space dimensionalty grows, although maybe it is not your case.
That said, I'm looking forward to see an answer that really helps you to solve your problem.

Cutting down on Stanford parser's time-to-parse by pruning the sentence

We are already aware that the parsing time of Stanford Parser increases as the length of a sentence increases. I am interested in finding creative ways in which we prune the sentence such that the parsing time decreases without compromising on accuracy. For e.g. we can replace known noun phrases with one word nouns. Similarly can there be some other smart ways of guessing a subtree before hand, let's say, using the POS Tag information? We have a huge corpus of unstructured text at our disposal. So we wish to learn some common patterns that can ultimately reduce the parsing time. Also some references to publicly available literature in this regards will also be highly appreciated.
P.S. We already are aware of how to multi-thread using Stanford Parser, so we are not looking for answers from that point of view.
You asked for 'creative' approaches - the Cell Closure pruning method might be worth a look. See the series of publications by Brian Roark, Kristy Hollingshead, and Nathan Bodenstab. Papers: 1 2 3. The basic intuition is:
Each cell in the CYK parse chart 'covers' a certain span (e.g. the first 4 words of the sentence, or words 13-18, etc.)
Some words - particularly in certain contexts - are very unlikely to begin a multi-word syntactic constituent; others are similarly unlikely to end a constituent. For example, the word 'the' almost always precedes a noun phrase, and it's almost inconceivable that it would end a constituent.
If we can train a machine-learned classifier to identify such words with very high precision, we can thereby identify cells which would only participate in parses placing said words in highly improbable syntactic positions. (Note that this classifier might make use of a linear-time POS tagger, or other high-speed preprocessing steps.)
By 'closing' these cells, we can reduce both the the asymptotic and average-case complexities considerably - in theory, from cubic complexity all the way to linear; practically, we can achieve approximately n^1.5 without loss of accuracy.
In many cases, this pruning actually increases accuracy slightly vs. an exhaustive search, because the classifier can incorporate information that isn't available to the PCFG. Note that this is a simple, but very effective form of coarse-to-fine pruning, with a single coarse stage (as compared to the 7-stage CTF approach in the Berkeley Parser).
To my knowledge, the Stanford Parser doesn't currently implement this pruning technique; I suspect you'd find it quite effective.
Shameless plug
The BUBS Parser implements this approach, as well as a few other optimizations, and thus achieves throughput of around 2500-5000 words per second, usually with accuracy at least equal to that I've measured with the Stanford Parser. Obviously, if you're using the rest of the Stanford pipeline, the built-in parser is already well integrated and convenient. But if you need improved speed, BUBS might be worth a look, and it does include some example code to aid in embedding the engine in a larger system.
Memoizing Common Substrings
Regarding your thoughts on pre-analyzing known noun phrases or other frequently-observed sequences with consistent structure: I did some evaluation of a similar idea a few years ago (in the context of sharing common substructures across a large corpus, when parsing on a massively parallel architecture). The preliminary results weren't encouraging.In the corpora we looked at, there just weren't enough repeated substrings of substantial length to make it worthwhile. And the aforementioned cell closure methods usually make those substrings really cheap to parse anyway.
However, if your target domains involved a lot of repetition, you might come to a different conclusion (maybe it would be effective on legal documents with lots of copy-and-paste boilerplate? Or news stories that are repeated from various sources or re-published with edits?)

The options for the first step of document clustering

I checked several document clustering algorithms, such as LSA, pLSA, LDA, etc. It seems they all require to represent the documents to be clustered as a document-word matrix, where the rows stand for document and the columns stand for words appearing in the document. And the matrix is often very sparse.
I am wondering, is there any other options to represent documents besides using the document-word matrix? Because I believe the way we express a problem has a significant influence on how well we can solve it.
As #ffriend pointed out, you cannot really avoid using the term-document-matrix (TDM) paradigm. Clustering methods operates on points in a vector space, and this is exactly what the TDM encodes. However, within that conceptual framework there are many things you can do to improve the quality of the TDM:
feature selection and re-weighting attempt to remove or weight down features (words) that do not contribute useful information (in the sense that your chosen algorithm does just as well or better without these features, or if their counts are decremented). You might want to read more about Mutual Information (and its many variants) and TF-IDF.
dimensionality reduction is about encoding the information as accurately as possible in the TDM using less columns. Singular Value Decomposition (the basis of LSA) and Non-Negative Tensor Factorisation are popular in the NLP community. A desirable side effect is that the TDM becomes considerably less sparse.
feature engineering attempts to build a TDM where the choice of columns is motivated by linguistic knowledge. For instance, you may want to use bigrams instead of words, or only use nouns (requires a part-of-speech tagger), or only use nouns with their associated adjectival modifier (e.g. big cat, requires a dependency parser). This is a very empirical line of work and involves a lot of experimentation, but often yield improved results.
the distributional hypothesis makes if possible to get a vector representing the meaning of each word in a document. There has been work on trying to build up a representation of an entire document from the representations of the words it contains (composition). Here is a shameless link to my own post describing the idea.
There is a massive body of work on formal and logical semantics that I am not intimately familiar with. A document can be encoded as a set of predicates instead of a set of words, i.e. the columns of the TDM can be predicates. In that framework you can do inference and composition, but lexical semantics (the meaning if individual words) is hard to deal with.
For a really detailed overview, I recommend Turney and Pantel's "From Frequency to Meaning : Vector Space Models of Semantics".
You question says you want document clustering, not term clustering or dimensionality reduction. Therefore I'd suggest you steer clear of the LSA family of methods, since they're a preprocessing step.
Define a feature-based representation of your documents (which can be, or include, term counts but needn't be), and then apply a standard clustering method. I'd suggest starting with k-means as it's extremely easy and there are many, many implementations of it.
OK, this is quite a very general question, and many answers are possible, none is definitive
because it's an ongoing research area. So far, the answers I have read mainly concern so-called "Vector-Space models", and your question is termed in a way that suggests such "statistical" approaches. Yet, if you want to avoid manipulating explicit term-document matrices, you might want to have a closer look at the Bayesian paradigm, which relies on
the same distributional hypothesis, but exploits a different theoretical framework: you don't manipulate any more raw distances, but rather probability distributions and, which is the most important, you can do inference based on them.
You mentioned LDA, I guess you mean Latent Dirichlet Allocation, which is the most well-known such Bayesian model to do document clustering. It is an alternative paradigm to vector space models, and a winning one: it has been proven to give very good results, which justifies its current success. Of course, one can argue that you still use kinds of term-document matrices through the multinomial parameters, but it's clearly not the most important aspect, and Bayesian researchers do rarely (if ever) use this term.
Because of its success, there are many software that implements LDA on the net. Here is one, but there are many others:

Explicitly Mapping of observations from a general set S into an inner product space

I am learning "Kenel Tricks" for SVM. When I was searching I had to read the passage from Wiki as follows:
"For machine learning algorithms, the kernel trick is a way of mapping observations
from a general set S into an inner product space V (equipped with its natural norm),
without ever having to compute the mapping explicitly, in the hope that the
observations will gain meaningful linear structure in V"
My Question from above passage is:
What is meant by "compute the mapping explicitly"?
Can any one please define it with some real time example or give me some referential web sites. So it will help in understanding kernels.
The answer is right there in the same article:
The trick to avoid the explicit mapping is to use learning algorithms
that only require dot products between the vectors in V, and choose
the mapping such that these high-dimensional dot products can be
computed within the original space, by means of a kernel function.
That means that one can avoid computing the images of the data points in the [multidimensional] kernel space and instead only calculate the pairwise dot product of these images, which often turns out to be cheaper. There's an example here, as well as in pretty nearly every book on SVM's.

Where can I get a Delphi/Pascal implementation of Excel-style polynomial regression curve fitting?

I have a set of X-Y values (i.e. a scatter plot) and I want a Pascal routine to generate the coefficients of a Nth order polynomial that fits those points, in the same way that Excel does.
I used David J Taylor's Polyfit example (, which implements a least squares curve fitting algorithm (also known as linear regression) David's site is here, but keep reading, because my version is better. (See below).
The origin of the algorithms David is using is a book on scientific math for Pascal programmers, Allen Miller's Curve Fitting routine from the book "Pascal Programs For Scientists And Engineers", typed and submitted to MTPUG in Oct. 1982 by Juergen Loewner,
and corrected and adaptated for Turbo Pascal by Jeff Weiss.
You can grab directly from bitbucket here. (You can clone the sourcecode with Mercurial/TortoiseHG, or download a ZIP from bitbucket)
hg clone curvefit
It runs in any delphi version 5 and up, Unicode or not, even Delphi 10 Berlin. It has a little chart in the demo, added by me. I also added a way to force the result through the origin, a common technique where you want a best fit on all values, other than the constant term, which should be forced, either to zero, or to some experimentally derived average. A forced "blank subtraction" which is set equal to the average of a series of analytical "zero samples", is common in certain types of analytical chemistry when used with certain types of instrumentation, and in other scientific cases, where it can be more useful than a best-fit, because you may wish to minimize error around the origin more than minimize error across the area of the curve that is farthest from the origin.
I should also clarify that for purposes of linear regression, a "curve" may also be a line, which is the case I needed for analytical chemistry purposes, and that equation for any straight line (y=mx+b) is also called the "calibration curve". A first order curve fit is a line (y = mx +b), a second order curve fit (shown in the picture) is a parabola (y= nX^2 + mX + b). As you might guess, this algorithm scales from first order up to any level you might wish. I haven't tested it above 8 terms though.
Here's a screenshot:
Bitbucket project link:
Try TPMath - I've been using this for years for fitting a hill regression and can recommend it.
Check the functions in Turbo Power's SysTools library, now is open source, it includes math functions in the unit StStat.
Even though you've already awarded an answer, for completeness, I thought I'd add this:
We use SDL Components' Math pack and have been very happy with it.
It's well thought out, and does exactly what we need.
He's got a variety of other interesting tools on his site.
XlXtrFun is the best curve fitting I know and use, but it is for Excel:
