Background
My team is attempting to track down a memory leak in our Rails 6.1 application. We used the technique described here of taking three consecutive heap dumps and diffing them. We used rbtrace to get the dumps and rbheap to do the diffing. We tried this several times with different intervals between the samples.
Versions:
Rails 6.1.6.1
Ruby 3.0.3
Results
About 85% of the results in the diff look like the examples shown below. They are related to ActiveRecord's numerically validation, which we use in one of our models. This is the validation's source code. The strange thing is these allocations are IMEMO objects, which according to this article store information about the compiled code.
Validation in our model
validates :msrp, numericality: { less_than_or_equal_to: MAX_INT }, allow_nil: true
Example IMEMO object allocations
{
"address": "0x5632f3df7588",
"type": "IMEMO",
"class": "0x5632f654de48",
"imemo_type": "callcache",
"references": ["0x5632f654dbc8"],
"file": "/app/vendor/bundle/ruby/3.0.0/gems/activerecord-6.1.6.1/lib/active_record/validations/numericality.rb",
"line": 10,
"method": "validate_each",
"generation": 9233,
"memsize": 40,
"flags": {
"wb_protected": true,
"old": true,
"uncollectible": true,
"marked": true
}
}
{
"address": "0x5632f3e0f070",
"type": "IMEMO",
"class": "0x5632f7dc23d0",
"imemo_type": "callinfo",
"file": "/app/vendor/bundle/ruby/3.0.0/gems/activerecord-6.1.6.1/lib/active_record/validations/numericality.rb",
"line": 10,
"method": "validate_each",
"generation": 6225,
"memsize": 40,
"flags": {
"wb_protected": true,
"old": true,
"uncollectible": true,
"marked": true
}
}
Questions
Has anyone witnessed similar behavior of memory leaks related to ActiveRecord validations?
Does anyone have a theory as to why so many IMEMO objects are allocated and leaked for the same line of code?
This looks like a red herring. We removed the validation and still saw the same memory growth. A subsequent heap diff showed many more IMEMO objects from other methods.
Related
void main() {
List Users = [
{"Name": "Mohamed", "Age": 21, "Phone": 3303304, "password": "nerkl"},
{"Name": "Ahmed", "Age": 34, "Phone": 2299833, "password": "bftjss"}
];
}
It's just a warning, not an error. The compiler sees your program is if it's completely finished. Since you don't actually use Users yet it gives you the warning, because it means you could delete those lines and it doesn't change the behaviour of your program at all. It's normal to see this a lot while you are still developing. Once you do something with the Users variable the warning will go away. Like if you write a simple
print(Users);
you will notice the warning goes away. Also a small advice: The usual coding conventions are that you don't write variables with a capital letter, this is usually only done for class names. So I would change it to users but that is all up to you.
The variables you declared are never used, only assigned to, therefore you're being warned.
I am trying to add a property to a node using
n.item = apoc.convert.toJson(itemObject)
Where
itemArrayObjects = {"source":"Blogspot.com","author":"noreply#blogger.com (Unknown)","title":"Elon Musk reveals who bitcoin's creator Satoshi Nakamoto might be","content":"Musk.MARK RALSTON/AFP via Getty Images\r\nElon Musk seems to agree with many that hyper-secret cryptocurrency expert Nick Szabo could be Satoshi Nakamoto, the mysterious creator of the digital currency… [+1467 chars]","publishedAt":"2021-12-29T20:41:00Z","url":"https://techncruncher.blogspot.com/2021/12/elon-musk-reveals-who-bitcoins-creator.html"}
this results in
Neo4jError: Failed to invoke function `apoc.convert.toJson`: Caused by: java.lang.NullPointerException
In the Neo4j Browser this works:
RETURN apoc.convert.toJson({d:"ddddd", e:"eeee"})
but this doesn't work:
RETURN apoc.convert.toJson({"a": "aaaaaa", "b": "bbbbbb"})
If I assign the values to a cypher :param like this:
:param items =>[{source:"Blogspot.com",author:"noreply#blogger.com (Unknown)",title:"Elon Musk reveals who bitcoin's creator Satoshi Nakamoto might be",content:"Musk.MARK RALSTON/AFP via Getty Images\r\nElon Musk seems to agree with many that hyper-secret cryptocurrency expert Nick Szabo could be Satoshi Nakamoto, the mysterious creator of the digital currency… [+1467 chars]",publishedAt:"2021-12-29T20:41:00Z",url:"https://techncruncher.blogspot.com/2021/12/elon-musk-reveals-who-bitcoins-creator.html"},{d:"xxddddd",e:"eeee"},{d:"ddddd",e:"eeee"}]
I get this as :params
{
"items": [
{
"publishedAt": "2021-12-29T20:41:00Z",
"author": "noreply#blogger.com (Unknown)",
"source": "Blogspot.com",
"title": "Elon Musk reveals who bitcoin's creator Satoshi Nakamoto might be",
"url": "https://techncruncher.blogspot.com/2021/12/elon-musk-reveals-who-bitcoins-creator.html",
"content": "Musk.MARK RALSTON/AFP via Getty Images
Elon Musk seems to agree with many that hyper-secret cryptocurrency expert Nick Szabo could be Satoshi Nakamoto, the mysterious creator of the digital currency… [+1467 chars]"
},
{
"d": "xxddddd",
"e": "eeee"
},
{
"d": "ddddd",
"e": "eeee"
}
]
}
Notice the keys are double quoted "" as they rightly should
and this works:
return apoc.convert.toJson($items)
So it appears some behind the scenes conversions going on. It also appears to be some inconsistency as it works sometime without changes.
can anyone shed some light on this?
EDIT: I am actually using neo4j Desktop 4.2.1 and APOC 4.2.0 locally and neo4j 4.4.2 docker image with apoc 4.4.0.1 on Digital Ocean. The inconsistency is that for the most part this works locally.
Apparently there was a bug in apoc v4.4.0.1 as it relates apoc.convert.Json()....a fix was made in v4.4.0.2
The log4j2 PatternLayout offers a %notEmpty conversion pattern that allows you to skip sections of the pattern that refer to empty variables.
Is there any way to do something similar for JsonTemplateLayout, specifically for thread context data (MDC)? It correctly (IMO) suppresses null fields, but it doesn't do the same with empty ones.
E.g., given the following in my JSON template:
"application": {
"name": { "key": "x-app", "$resolver": "mdc" },
"context": { "key": "x-app-context", "$resolver": "mdc" },
"instance": {
"name": { "key": "x-appinst", "$resolver": "mdc" },
"context": { "key": "x-appinst-context", "$resolver": "mdc" }
}
}
is there a way to prevent blocks like this from being logged, where the only data in the subtree is the empty string values for context?
"application":{"context":"","instance":{"context":""}}
(Yes, ideally I'd prevent those empty strings being put into the context in the first place, but this isn't my app, I'm just configuring it.)
JsonTemplateLayout author speaking here. Currently, JsonTemplateLayout doesn't support blank property exclusion for the following reasons:
The definition of empty/blank is ambiguous. One might have, null, {}, "\s*", [], [[]], [{}], etc. as valid JSON values. Which one of these are empty/blank? Let's assume we have agreed on a certain behavior. Will it apply to the rest of its users?
Checking if a value is empty/blank incurs an extra runtime cost.
Most of the time you don't care. You persist logs in a storage system, e.g., ELK stack, and there blank value elimination is provided out of the box by the storage engine in the most efficient way.
Would you mind sharing your use case, please? Why do you want to prevent the emission of "context": "" properties? If you deliver your logs to Elasticsearch, there you can easily exclude such fields via appropriate index mappings.
Near as I can tell, no. I would suggest you create a Jira issue to get that addressed.
In my data factory I've got a stored procedure which manipulates 2 tables for a single output. I need to pass 2 sqlWriterTableType but I cant seem to see how this is possible, anyone had experience of doing this ?
"sink": {
"type": "SqlSink",
"sqlWriterStoredProcedureName": "spDashStat",
"sqlWriterTableType": "UserType",
"sqlWriterTableType": "StatsType",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "InputDataset-kpx"
},
{
"name": "InputDataset-kpx"
},
I dont think you explained it correctly, you want to call a stored procedure and give parameters to it? That is explained here: https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-stored-procedure
Remember that its not mandatory for a pipeline to have an input dataset. Use an output dataset to save the output of your stored procedure.
Please give more info for a more detailed answer :)
Cheers!
Disclaimer: I'm a novice.
I want to simulate a join for my mongodb embedded document. If I have an embedded list:
{
_id: ObjectId("5320f6c34b6576d373000000"),
user_id: "52f581096b657612fe020000",
list: "52f4fd9f52e39bc0c15674ea"
{
player_1: "52f4fd9f52e39bc0c15674ex",
player_2: "52f4fd9f52e39bc0c15674ey",
player_3: "52f4fd9f52e39bc0c15674ez"
}
}
And a player collection with each player being something like:
{
_id: ObjectId("52f4fd9f52e39bc0c15674ex"),
college: "Louisville",
headshot: "player.png",
height: "6'2",
name: "Wayne Brady",
position: "QB",
weight: 205
}
I want to end up with:
{
_id: ObjectId("5320f6c34b6576d373000000"),
user_id: "52f581096b657612fe020000",
list: "52f4fd9f52e39bc0c15674ea"
{
player_1:
{
_id: ObjectId("52f4fd9f52e39bc0c15674ex"),
college: "Louisville",
headshot: "player.png",
height: "6'2",
name: "Wayne Brady",
position: "QB",
weight: 205
},
etc...
}
}
So I can call User.lists.first.player_1.name.
This is what makes sense in my mind since I'm new to rails...and I don't want to embed players in each user's list because I'd have so many redundancies...
Advice? Is this possible, if so how? Is it a good idea, or is there a better way?
So have have a typical relational model, let's call it "one to many", which you have users or "user teams" and a whole pool of players. And in typical modelling form you want to "de-normalize" this to avoid duplication.
But here's the thing, MongoDB does not do joins. Joins are not "webscale" in the current parlance. So it leaves you thinking what to do. Or does MongoDB do joins?
db.eval(function() {
var user = db.user.findOne({ "user_id": "52f581096b657612fe020000" });
for ( k in user.list ) {
var player = db.player.findOne({ "_id": user.list[k] });
user.list[k] = player;
}
return user;
});
Which "arguably" is "kind of a join". And it was all done on the server, right?
But DO NOT DO THAT. While db.eval() has uses, something that you are going to query regularly is not one of the practical uses. Read the documentation, which shows the warnings. In particular, all JavaScript is running in a single thread so that will lock things up very quickly.
Now client side, you are more or less doing the same thing. And you ODM of choice is likely again doing "the same thing", though it is usually hiding it away in some manner so you don't see that part. Likewise the same could likely be said of your SQL ORM, which was also "sneaking off behind your back" and querying the database while you just accessed the objects in your code.
As for mapReduce. Well the problem with the data you present is that there is nothing to "reduce". There is a technique known as in "incremental mapReduce" but it would not be well suited to this type of data. A topic in itself, but you would basically need all the "users" associated to the "players" as well, stored in the "player data" to make that any kind of viability. And it's ultimately just another way of "cheating" joins.
This is the space in which MongoDB exists.
So rather than going and doing all this fetching or joining, it allows the concept of being able to "pre-join" your data as it were. And the point of this is to allow faster, and more atomic reads and writes. And this is known as embedding.
Looking at your data, there should not be a problem with embedding at all. Consider the points:
Presumably you are modelling "fantasy teams" for a given user. It would be fair to day that a "team" does not consist of an infinite number of players.
Aside from other things your "A1" usage is likely to be "displaying" the players associated with that "user team". And in so much as, you want to "display" as much information as possible, and keep that to a single read operation. You also want to easily add "players" to the "user team".
While a "player" may have "extended information", and possibly even some global statistics or scores, that information may well be not what you want to access, while associated to the "user team" that often. It can probably be written independently, and only read when looking at the "player detail".
Those are three good cases to support embedding. Sure you would be duplicating information stored against each user team, opposed to just a small "key" reference. And sure that information is likely to exist elsewhere in the full "player detail" and that would be duplication as well.
But the point of the "duplication" here is to optimize. So here it would seem valid to embed "some of the data", not all, but what you regularly use in your main operations. Considering the "player's" name, position, height and weight are not likely to change on a regular basis or not even at all in the context, then that seems a reasonable trade-off.
{
"_id": ObjectId("5320f6c34b6576d373000000"),
"user_id": ObjectId("52f581096b657612fe020000"),
"list": [
{
"_id": ObjectId("52f4fd9f52e39bc0c15674ex"),
"label": "Player1",
"college": "Louisville",
"headshot": "player.png",
"height": "6'2",
"name": "Wayne Brady",
"position": "QB",
"weight": 205
},
{
"label": "Player2",
(...)
}
]
},
That's not that bad. And it would take a lot to break the 16MB limit. And considering this seems to be a "user team" then it could probably do with some information from the "user" as well.
You also get a lot of power out of this when data is kept together like this, to find the top "player" picked by each user:
db.userteams.aggregate([
// Unwind the array
{ "$unwind": "$list" },
// Group and use the player name
{ "$group": {
"_id": {
"user_id": "$user_id",
"player": "$list.name",
},
"count": { "$sum": 1 }
}},
// Sort the results descending by popularity
{ "$sort": { "_id.user_id": 1, "count": -1 } },
// Group to limit the first one
{ "$group": {
"_id": "$_id.user_id",
"player": { "$first": "$_id.player" },
"picks": { "$first:" "$count" }
}}
])
Which admittedly is a reasonably trivial use of a name in this case, but it is an example of using information that has become available by the use of some embedding.
Of course you really believe that you need everything to be properly normalized, then do it that way, and live with the patterns you would need to access it. But this offers a perspective of doing this another way.
So don't over-concern yourself with embedding everything, and lose a little fear on embedding some things. There are no "get out of jail free cards" for using something not suited to relational modeling in a standard relational way. Choose something that suits your needs.