测试发现,因为GC的延迟常常会使得有效的指针堆积如山,由heap.cc可以得知V8的在台式机上的GC策略为
reserved_semispace_size_(16*MB),
max_semispace_size_(16*MB), //GC最大阈值为16M
initial_semispace_size_(1*MB), //初使的GC的阈值为1M
max_old_generation_size_(1*GB), //最大内存消耗
max_executable_size_(256*MB),
code_range_size_(512*MB),
1. 测试代码1
对于如下C++0x测试代码 1#include <v8.h>
#include <stdio.h>
#include <sys/time.h>
using namespace v8;
class Foo
{
public:
static int total;
static int number;
static int last;
Foo();
~Foo();
};
int Foo::total = 1;
int Foo::number = 1;
int Foo::last = 1;
Foo::Foo()
{
if (last > number) {
printf("creating from id %d\n", number);
}
last = number;
++number;
++total;
// 加上下面这条似乎更合理,不过对于没有成员变量的类来说有没有都无感觉。
// 2011-08-13 补充,使用sizeof(Foo) 1字节,并不正确,详见文尾补充
V8::AdjustAmountOfExternalAllocatedMemory(sizeof(Foo));
}
Foo::~Foo()
{
if (last<number) {
printf("sweeping from id %d\n", number);
}
last = number;
--number;
V8::AdjustAmountOfExternalAllocatedMemory(-sizeof(Foo));
}
long long microtime()
{
timeval tv;
gettimeofday(&tv, NULL);
return (long long)tv.tv_sec * 1000 + tv.tv_usec / 1000;
}
int main(int argc, char *argv[])
{
V8::SetFlagsFromCommandLine(&argc, argv, true);
HandleScope scope;
auto context = Context::New();
Context::Scope context_scope(context);
auto object_template = ObjectTemplate::New();
object_template->SetInternalFieldCount(1);
auto start = microtime();
int times = 2e8;
for (int i=0; i<times; ++i) {
HandleScope scope;
auto object = object_template->NewInstance();
auto foo = new Foo();
// 把foo可入到object,使得在任何时候都可能通过对象object得到foo的指针。
// 这句也可简写成object->SetPointerInInternalField(0, foo);
object->SetInternalField(0, External::New(foo));
// 创建一个Persistent句柄监视object的引用状况,
// 当object不被任何对象引用时,删除foo与该句柄。
Persistent<Object>::New(object).MakeWeak(foo, [](Persistent<Value> value, void *data) {
delete (Foo*)data;
value.Dispose();
});
if (i%(int)1e6 == 0) {
int speed = i / (microtime() - start);
printf("Speed: %d\n", speed);
}
}
context.Dispose();
return 0;
}$ g++ -std=c++0x -O3 gc_test.cc -lv8
2. 使用默认策略
max-semispace-size 16Mmax-old-generation-size 1G
如果使用默认策略,输出结果
$ ./a.out --trace-gc Speed: 0 Scavenge 0.9 -> 0.8 MB, 5 ms. Scavenge 1.2 -> 1.2 MB, 6 ms. Scavenge 1.6 -> 1.6 MB, 6 ms. Scavenge 2.3 -> 2.3 MB, 13 ms. sweeping from id 114428 Mark-sweep 3.1 -> 3.0 MB, 35 / 74 ms. creating from id 1 Scavenge 5.0 -> 5.0 MB, 25 ms. ... ... Scavenge 46.4 -> 46.4 MB, 89 ms. Speed: 186 Scavenge 52.4 -> 52.4 MB, 103 ms. Scavenge 58.4 -> 58.4 MB, 112 ms. ... ... Scavenge 106.4 -> 106.4 MB, 170 ms. sweeping from id 3233110 Mark-sweep 112.4 -> 74.4 MB, 998 / 1999 ms. creating from id 1 Speed: 182 Scavenge 82.4 -> 82.4 MB, 89 ms. Scavenge 88.4 -> 88.4 MB, 94 ms. Scavenge 94.4 -> 94.4 MB, 104 ms.
3. 限制max-old-generation-size为20
max-old-generation-size的参数在命令行下为max-old-space-size,单位M加入参数 --max-old-space-size=20 限制space的大小为20M
$ ./a.out --trace-gc --max-old-space-size=20 Speed: 0 Scavenge 0.9 -> 0.8 MB, 5 ms. Scavenge 1.2 -> 1.2 MB, 6 ms. Scavenge 1.6 -> 1.6 MB, 6 ms. Scavenge 2.3 -> 2.3 MB, 13 ms. sweeping from id 114428 Mark-sweep 3.1 -> 3.0 MB, 35 / 75 ms. creating from id 1 Scavenge 5.0 -> 5.0 MB, 26 ms. Scavenge 6.5 -> 6.5 MB, 24 ms. Scavenge 9.5 -> 9.5 MB, 52 ms. Scavenge 12.5 -> 12.5 MB, 54 ms. sweeping from id 677206 Mark-sweep 18.5 -> 15.9 MB, 213 / 458 ms. creating from id 1 Speed: 190 sweeping from id 349526 Mark-sweep 23.9 -> 8.4 MB, 111 / 304 ms. creating from id 1 sweeping from id 349526 Mark-sweep 16.4 -> 8.4 MB, 116 / 284 ms. creating from id 1 sweeping from id 349526 Mark-sweep 16.4 -> 8.4 MB, 111 / 277 ms. creating from id 1 Speed: 190 sweeping from id 349526 Mark-sweep 16.4 -> 8.4 MB, 108 / 273 ms. creating from id 1
4. 设置--gc-global强制MarkSweep与max-semispace-size为2048K
max-semispace-size对应的命令行设置参数为--max-new-space-size,单位K测试2、通过设置参数--gc-global 强制每次都执行Mark-sweep 与 --max-new-space-size 缩小最大阈值
$ ./a.out --trace-gc --max-new-space-size=2048 --gc-global Speed: 0 sweeping from id 16124 Mark-sweep 0.7 -> 0.7 MB, 6 / 15 ms. creating from id 1 sweeping from id 21846 Mark-sweep 1.2 -> 0.8 MB, 8 / 20 ms. creating from id 1 sweeping from id 21846 Mark-sweep 1.3 -> 0.8 MB, 8 / 20 ms. creating from id 1 sweeping from id 43692 Mark-sweep 1.8 -> 1.3 MB, 17 / 39 ms. creating from id 2 sweeping from id 43692 Mark-sweep 2.3 -> 1.3 MB, 17 / 39 ms. creating from id 1 sweeping from id 43692 Speed: 186
但对于C/C++程序员来说,对于Persistent对象不交给GC而手动进行及时释放是最好的选择,避免了这样那样的GC问题。可惜nodejs缺少标准,而且不规范的native module过份依赖GC,与C++对象关联的ObjectWrap无法得到很好的释放,如果光占用内存还好,要是还占用描述符等系统资源,那等积压了数十万个才到GC时释放就晚了。
5. 其它测试
另外打算测试下 小窥nodejs 一文中的md5代码,竟发现代码在0.4.9中卡壳。升到git最新的0.5.3-pre也一样,
var crypto = require('crypto');
var mymd5 = function(str, encoding){
return crypto
.createHash('md5')
.update(str)
.digest(encoding || 'hex');
};
while(1){
console.info(mymd5("" + parseInt(Math.random() * 100000)));
}
对于代码
if(typeof(print)=='undefined') {
print = console.info
}
i = 0;
while(true){
++i;
print(i);
}
修改代码至
if(typeof(print)=='undefined') {
print = console.info
}
i = 0;
while(true){
++i;
if (i%100 ==0)
print(i);
}
原代码修改为
if(typeof(print)=='undefined') {
print = console.info
}
var crypto = require('crypto');
var mymd5 = function(str, encoding){
return crypto
.createHash('md5')
.update(str)
.digest(encoding || 'hex');
};
i=0;
while(1){
mymd5("" + parseInt(Math.random() * 100000));
++i;
if (i%100000 == 0) {
print(i);
}
}
$ node --trace-gc --max-new-space-size=2048 --gc-global md5.js Mark-sweep 1.5 -> 1.2 MB, 7 ms. Mark-sweep 1.9 -> 1.5 MB, 1 / 11 ms. ... ... Mark-sweep 2.9 -> 1.9 MB, 4 / 20 ms. 100000 Mark-sweep 2.9 -> 1.9 MB, 4 / 20 ms. ... ... Mark-sweep 2.9 -> 1.9 MB, 4 / 19 ms. 200000 Mark-sweep 2.9 -> 1.9 MB, 4 / 19 ms. Mark-sweep 2.9 -> 1.9 MB, 4 / 18 ms. Mark-sweep 2.9 -> 1.9 MB, 4 / 18 ms. Mark-sweep 2.9 -> 1.9 MB, 4 / 20 ms. Mark-sweep 2.9 -> 1.9 MB, 4 / 19 ms.
if(typeof(print)=='undefined') {
print = console.info
}
i=0;
while(1){
str = "hello world" + i
++i;
if (i%1000000 == 0) {
print(i);
}
}
$ node --trace-gc md5.js Scavenge 1.5 -> 1.3 MB, 2 ms. Scavenge 1.8 -> 1.6 MB, 2 ms. Scavenge 2.0 -> 1.8 MB, 2 ms. ... ... Scavenge 5.8 -> 2.1 MB, 2 ms. Scavenge 5.8 -> 2.1 MB, 2 ms. Scavenge 5.8 -> 2.1 MB, 2 ms. 1000000 Scavenge 5.8 -> 2.1 MB, 2 ms. Scavenge 5.8 -> 2.1 MB, 2 ms. ... ... Scavenge 5.8 -> 2.1 MB, 2 ms. Scavenge 9.8 -> 2.1 MB, 2 ms. 2000000 Scavenge 9.8 -> 2.1 MB, 2 ms. Scavenge 9.8 -> 2.1 MB, 2 ms. ... ... Scavenge 9.8 -> 2.1 MB, 2 ms. 3000000 Scavenge 9.8 -> 2.1 MB, 2 ms. ... ... Scavenge 9.8 -> 2.1 MB, 2 ms. 5000000 Scavenge 9.8 -> 2.1 MB, 2 ms. Scavenge 9.8 -> 2.1 MB, 2 ms. Scavenge 9.8 -> 2.1 MB, 2 ms.
另外,新版的crypto似乎修正了不断侵食内存的问题。
6. 小结
要解决Persistent问题,方法有31、如本文所述,限制阈值,强制global gc
2、规范nodejs module,及时释放Persistent相关对象。
3、改进V8 GC策略
方法1,2可行性比较高。
如果遇到不断侵食内存的问题module,那只能使用方法1了。
7. 其它问题:
暂时不知道原因的问题,等路过的高人指点:问题:使用默认策略测试代码1 Mark Sweep为什么物理内存还是占有950M以上。是V8内存策略还是Persistent的MakeWeak调用方法不正确?
===2011-08-13 补充==
知道了,即使是空对象也是有大小的,包括一些内存分配信息等,如果像文中例子使用sizeof(Foo) 1字节,是不正确的。分配new一个对象将占多少内存,不用深究具体数值,一定不大,用最大可能占用额然内存128字节限定,就能把内存控制在一个合理的范围。
// 加上下面这条似乎更合理,不过对于没有成员变量的类来说有没有都无感觉。 - V8::AdjustAmountOfExternalAllocatedMemory(sizeof(Foo)); + V8::AdjustAmountOfExternalAllocatedMemory(128); - V8::AdjustAmountOfExternalAllocatedMemory(-sizeof(Foo)); + V8::AdjustAmountOfExternalAllocatedMemory(-128);
sweeping from id 124978 Mark-sweep 11.2 -> 3.2 MB, 85 / 149 ms. creating from id 2 Speed: 167 sweeping from id 349436 Mark-sweep 11.2 -> 8.4 MB, 235 / 474 ms. creating from id 1 sweeping from id 124985 Mark-sweep 11.2 -> 3.2 MB, 84 / 147 ms. creating from id 2
== 2011-08-16 补充 ==
在程序中可以通过函数 V8::LowMemoryNotification 进行强行GC
看到你的测试,灰常高兴.对于GC的Stop-The-World带来的困扰一致让我纠结.我会在你工作的基础上再次进行测试的.
握手
mark sweep用在非胶水语言中如java, 一般没什么问题, 因为它不存在宿主层;
但胶水语言中就会有很大问题.
比如: 宿主程序其实内存资源很紧张了, 但虚拟机中因为new出来的宿主对象(可能是100k)仅是对宿主对象指针的引用(大概几十个字节?),这样的话虚拟机内存回收时间就会非常不及时;
因此很多虚拟机不得不采用引用计数与mark sweep相结合的方法来避免这个问题(如flash)
nodejs用persist来封装c++对象应该是无奈之举, 否则没办法在gc回收object前通知宿主,
如果是仿照google v8文档上给出的Point例子来封装,是会产生内存泄漏的;
我最近也在考虑用v8作脚本,以上说法未必准确,仅供参考
回复中提到的两个问题在V8中都有相应的解决方法,V8可以通过V8::AdjustAmountOfExternalAllocatedMemory来通知GC C++对象实际占用尺寸,通过MakeWeak在GC回收object前进行处理,通过LowMemoryNotification进行强制GC。比起内存更值得考虑的是描述符等相关系统资源,比如socket,文件fd,这些东西全部手动释放才是最明智之举,如果这些东西等到了GC才能释放,那一定是设计和编写上的问题了。