AWS CDK increase memory limit on Opensearch lambda - aws-cdk

I deployed AWS OpenSearch domain using a construct from #aws-cdk/aws-opensearchservice.
In the same stack with the OpenSearch domain there is a lambda (I suppose provisioned by the same construct).
This lambda started triggering alarms with 'Out of memory' error.
I want to increase the max allocated memory (as I typically do for normal lambdas), but I didn't found any property in the DomainProps related to lambda memory limit.
Here's my DomainProps
{
version: EngineVersion.OPENSEARCH_1_1,
ebs: {
enabled: true,
volumeSize: 10,
volumeType: EbsDeviceVolumeType.GP2,
},
logging: {
slowSearchLogEnabled: true,
appLogEnabled: true,
slowIndexLogEnabled: true,
},
capacity: {
masterNodes: 0,
dataNodes: 1,
dataNodeInstanceType: "t3.small.search"
}
}
Did some of you had the same problem?
And if yes, what solution did you choose?

Related

AWS CDK, running code on first deploy but not after

Goal
I want to be able to create a lambda function with CDK, but then manage the docker image that the lambda uses with a CI/CD pipeline (github actions)
What I have done
I have the following code:
this.repository =
this.config.repository ??
new ecr.Repository(this, 'Repository', {
repositoryName: this.config.repositoryName,
});
this.lambda = new lambda.DockerImageFunction(this, 'DockerLambda', {
code: lambda.DockerImageCode.fromImageAsset(
path.join(__dirname, '../docker/minimal'),
{ cmd: this.config.cmd, entrypoint: this.config.entrypoint },
),
functionName: config.functionName ?? this.node.id,
environment: config.environment,
timeout: Duration.seconds(config.timeout ?? 600),
memorySize: config.memorySize ?? 1024,
vpc: config.vpc,
vpcSubnets: config.vpcSubnets ?? {
subnets: config.vpc?.privateSubnets,
},
});
I am doing it this way because there doesn't appear to be a way to create a lambda without specifying where the code will come from. The 'minimal' docker is just a generic placeholder, it will eventually get replaced by the real code. That code does not live in the repository where we have our CDK code, so CDK does not have access to build the real docker image.
So, the steps that we follow are:
Use this generic DockerImageLambda construct to create both an ECR repository, and a lambda with a placeholder docker image. This ECR repository is where github will be uploading the real images, but until then, it will be empty (since it was just created).
Use Github actions to upload a real docker image to the ECR repository created in step #1
Use Github actions to update the lambda function with the new image from step #2
The Problem
This method works until you change something in the lambda CDK code. At that point, it will try to reconfigure the lambda to use the placeholder docker image, which essentially "breaks" what was working there.
The question
How can I make it use the placeholder docker image only the first time the lambda is created? OR, is there a better way to do this?
You can decouple uploading the asset to ECR from the lambda definition.
To upload to the repository you created, use the cdk-ecr-deployment construct. Then create the lambda with the correct ECR repository from the beginning. You will not need to edit the lambda to change the source ECR repository.
You also need to make your Lambda construct depend on the deployment, so that when the lambda is created, the repository contains your dummy image.
It would look like this:
this.repository =
this.config.repository ??
new ecr.Repository(this, 'Repository', {
repositoryName: this.config.repositoryName,
});
const dummyImage = DockerImageAsset(
path.join(__dirname, '../docker/minimal')
)
const dummyDeployment = new ECRDeployment(this, 'DummyImage',
{ src: new DockerImageName(dummyImage.imageUri),
dest: new DockerImageName(this.repository.repositoryUriForTagOrDigest('latest')
})
this.lambda = new lambda.DockerImageFunction(this, 'DockerLambda', {
code: lambda.DockerImageCode.fromEcr(
this.repository,
{ cmd: this.config.cmd, entrypoint: this.config.entrypoint },
),
functionName: config.functionName ?? this.node.id,
environment: config.environment,
timeout: Duration.seconds(config.timeout ?? 600),
memorySize: config.memorySize ?? 1024,
vpc: config.vpc,
vpcSubnets: config.vpcSubnets ?? {
subnets: config.vpc?.privateSubnets,
},
});
this.lambda.node.addDependency(dummyDeployment)
You could import the real ECR into your CDK stack with the fromXXXXX help methods.
https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecr.Repository.html#static-fromwbrrepositorywbrarnscope-id-repositoryarn

How do I get AWS CDK stack env values for bootstrapping an environment?

AWS and other sources consider explicitly specifying the AWS account and region for each stack as best practice. I'm trying to write a CI pipeline that will bootstrap my environments. However, I'm not seeing any straight-forward way to retrieve the stack's explicit env values from here:
regions.forEach((region) =>
new DbUpdateStack(app, `${stackBaseName}-prd-${region}`, {
env: {
account: prdAccount,
region: region
},
environment_instance: 'prd',
vpc_id: undefined,
})
);
EG, base-name-prd-us-east-1 knows the region and account as defined in the code but how do I access this from the command line without doing something hacky?
I need to run cdk bootstrap with those values and I don't want to duplicate them.
The Cloud Assembly module can introspect an App's stack environments. Synth the app, then instantiate a CloudAssembly class by pointing at the cdk output directory:
import * as cx_api from '#aws-cdk/cx-api';
(() => {
const cloudAssembly = new cx_api.CloudAssembly('cdk.out');
const appEnvironments = cloudAssembly.stacks.map(stack => stack.environment);
console.log(appEnvironments);
})();
Result:
[
{
account: '123456789012',
region: 'us-east-1',
name: 'aws://123456789012/us-east-1',
},
];

How to force Discrete GPU in electron.js?

Update: I also saw documentation and discussions that it must always use discrete GPU but it is not, it always use internal one at the moment.
I need to use discrete GPU in electron.js app in case there are integrated and discrete, how to force it in Electron?
In c++ it can be done like that:
extern "C"
{
__declspec(dllexport) unsigned long NvOptimusEnablement = 0x00000001;
__declspec(dllexport) int AmdPowerXpressRequestHighPerformance = 1;
}
How to do that in electron.js?
With current Electron.js/WebGL, there is no mechanism to enforce this. However, you shouldn't need to, because running on the discrete GPU is the default.
I figured out, you can silently restart the app with setting the the special windows env variable, which forces the process to use the dedicated GPU.
const { spawn } = require('child_process');
// Restart with force using the dedicated GPU
if (process.env.GPUSET !== 'true') {
spawn(process.execPath, process.argv, {
env: {
...process.env,
SHIM_MCCOMPAT: '0x800000001', // this forces windows to use the dedicated GPU for the process
GPUSET: 'true'
},
detached: true,
});
process.exit(0);
}

ASP.NET Core 2.0 application running in Linux container suffers from locking issues when there are many calls to Trace.WriteLine()

Our application is an ASP.NET Core 2.0 WebAPI deployed in Linux Docker containers and running in Kubernetes.
During load testing, we discovered intermittent spikes in CPU usage that our application would never recover from.
We used perfcollect to collect traces from a container so that we could compare a successful test and a test with CPU spikes. We discovered that around 75% of the CPU time in the failing test was spent in JIT_MonRelaibleEnter_Protable, an interface of lock operations. The caller was System.Diagnostics.TraceSource.dll.
Our application was ported from .NET Framework and contained a lot of calls to System.Diagnostics.Trace.WriteLine(). When we removed all of these, our CPU/memory usage reduced by more than 50% and we don't see the CPU spikes anymore.
I want to understand the cause of this issue.
In the corefx repo, I can see that a default trace listener is setup in TraceInternal.cs:
public static TraceListenerCollection Listeners
{
get
{
InitializeSettings();
if (s_listeners == null)
{
lock (critSec)
{
if (s_listeners == null)
{
// In the absence of config support, the listeners by default add
// DefaultTraceListener to the listener collection.
s_listeners = new TraceListenerCollection();
TraceListener defaultListener = new DefaultTraceListener();
defaultListener.IndentLevel = t_indentLevel;
defaultListener.IndentSize = s_indentSize;
s_listeners.Add(defaultListener);
}
}
}
return s_listeners;
}
}
I can see that DefaultTraceListener.cs calls Debug.Write():
private void Write(string message, bool useLogFile)
{
if (NeedIndent)
WriteIndent();
// really huge messages mess up both VS and dbmon, so we chop it up into
// reasonable chunks if it's too big
if (message == null || message.Length <= InternalWriteSize)
{
Debug.Write(message);
}
else
{
int offset;
for (offset = 0; offset < message.Length - InternalWriteSize; offset += InternalWriteSize)
{
Debug.Write(message.Substring(offset, InternalWriteSize));
}
Debug.Write(message.Substring(offset));
}
if (useLogFile && !string.IsNullOrEmpty(LogFileName))
WriteToLogFile(message);
}
In Debug.Unix.cs, I can see that there is a call to SysLog:
private static void WriteToDebugger(string message)
{
if (Debugger.IsLogging())
{
Debugger.Log(0, null, message);
}
else
{
Interop.Sys.SysLog(Interop.Sys.SysLogPriority.LOG_USER | Interop.Sys.SysLogPriority.LOG_DEBUG, "%s", message);
}
}
I don't have a lot of experience working with Linux but I believe that I can simulate the call to SysLog by running the following command in the container:
logger --socket-errors=on 'SysLog test'
When I run that command, I get the following response:
socket /dev/log: No such file or directory
So it looks like I can't successfully make calls to SysLog from the container. If this is indeed what is going on when I call Trace.WriteLine(), why is it causing locking issues in my application?
As far as I can tell, EnvVar_DebugWriteToStdErr is not set in my container so it should not be attempting to write to StdErr.
The solution can be that rsyslog is not running. Is that installed in your container? Use a base image that has rsyslog built in.
This link can help too.

Huge performance drop in cassandra-orm after million records

I'm using cassandra-orm plugin (cassandra-orm:0.4.5) for migrating clicks from Postgres DB to Cassandra. (I know I could use raw data import, but I want to make use of groupBy and explicit indexes maintained by the plugin).
The migration procedure is simple: I select a bunch of clicks from Postgres (via GORM) and then I flush them to Cassandra. Every click is a new record and a new object is created in Grails and saved in Cassandra. With 20 threads I was able to reach throughput of 2000 clicks/sec. After importing 5 mil clicks the performance started to degrade dramatically to 50 clicks/sec.
I made some profiling and I've found out, that 19 threads were waiting (parked) and one thread was performing a rehash on Groovy's AbstractConcurrentMapBase.
stack trace for waiting threads:
Name: pool-4-thread-2
State: WAITING on org.codehaus.groovy.util.ManagedConcurrentMap$Segment#5387f7af
Total blocked: 45,027 Total waited: 55,891
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
org.codehaus.groovy.util.LockableObject.lock(LockableObject.java:34)
org.codehaus.groovy.util.AbstractConcurrentMap$Segment.put(AbstractConcurrentMap.java:101)
org.codehaus.groovy.util.AbstractConcurrentMap$Segment.getOrPut(AbstractConcurrentMap.java:97)
org.codehaus.groovy.util.AbstractConcurrentMap.getOrPut(AbstractConcurrentMap.java:35)
org.codehaus.groovy.runtime.metaclass.ThreadManagedMetaBeanProperty$ThreadBoundGetter.invoke(ThreadManagedMetaBeanProperty.java:180)
groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:1604)
groovy.lang.ExpandoMetaClass.getProperty(ExpandoMetaClass.java:1140)
groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:3332)
groovy.lang.ExpandoMetaClass.getProperty(ExpandoMetaClass.java:1152)
com.nosql.Click.getProperty(Click.groovy)
stack trace for rehash thread:
Name: pool-4-thread-11
State: RUNNABLE
Total blocked: 46,544 Total waited: 57,433
Stack trace:
org.codehaus.groovy.util.AbstractConcurrentMapBase$Segment.rehash(AbstractConcurrentMapBase.java:217)
org.codehaus.groovy.util.AbstractConcurrentMap$Segment.put(AbstractConcurrentMap.java:105)
org.codehaus.groovy.util.AbstractConcurrentMap$Segment.getOrPut(AbstractConcurrentMap.java:97)
org.codehaus.groovy.util.AbstractConcurrentMap.getOrPut(AbstractConcurrentMap.java:35)
org.codehaus.groovy.runtime.metaclass.ThreadManagedMetaBeanProperty$ThreadBoundGetter.invoke(ThreadManagedMetaBeanProperty.java:180)
groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:1604)
groovy.lang.ExpandoMetaClass.getProperty(ExpandoMetaClass.java:1140)
groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:3332)
groovy.lang.ExpandoMetaClass.getProperty(ExpandoMetaClass.java:1152)
com.fma.nosql.Click.getProperty(Click.groovy)
After hours of debugging I've found out, that the issue is in dynamic property "_cassandra_cluster_" which is added to all plugin managed objects:
// cluster property (_cassandra_cluster_)
clazz.metaClass."${CLUSTER_PROP}" = null
This property is then internally saved in ThreadManagedMetaBeanProperty instance2Prop map. When the dynamic property is accessed def cluster = click._cassandra_cluster_ then the click instance is saved to instance2Prop map with soft reference. So far so good, soft references can be garbage collected, right. However there seems to be a bug in the ManagedConcurrentMap implementation which disregards the garbage collected elements and keep rehashing and expanding the map (described here and here).
Workaround
Since the map is internally saved on the class level, the only working solution was to restart the server. Eventually I've developed a dirty solution, which clears the internal map from zombie elements. Following code is running in a separate thread:
public void rehashClickSegmentsIfNecessary() {
ManagedConcurrentMap instanceMap = lookupInstanceMap(Click.class, "_cassandra_cluster_")
if(instanceMap.fullSize() - instanceMap.size() > 50000) {
//we have more than 50 000 zombie references in map
rehashSegments(instanceMap)
}
}
private void rehashSegments(ManagedConcurrentMap instanceMap) {
org.codehaus.groovy.util.ManagedConcurrentMap.Segment[] segments = instanceMap.segments
for(int i=0;i<segments.length;i++) {
segments[i].lock()
try {
segments[i].rehash()
} finally {
segments[i].unlock()
}
}
}
private ManagedConcurrentMap lookupInstanceMap(Class clazz, String prop) {
MetaClassRegistry registry = GroovySystem.metaClassRegistry
MetaClassImpl metaClass = registry.getMetaClass(clazz)
return metaClass.getMetaProperty(prop, false).instance2Prop
}
Do you have any production experience with cassandra-orm or any other grails plugin connecting to cassandra?

Resources