我在我的C程序中遇到了一个运行时错误“double free or corruption”,它调用了一个可靠的库ANN并使用OpenMP来平行for循环.
- *** glibc detected *** /home/tim/test/debug/test: double free or corruption (!prev): 0x0000000002527260 ***
这是否意味着地址0x0000000002527260的内存被释放多次?
错误发生在“_search_struct-> annkSearch(queryPt,k_max,nnIdx,dists,_eps);”内部函数classify_varIoUs_k(),它在函数tune_complexity()内部的OpenMP for循环中.
请注意,当OpenMP有多个线程时会发生错误,并且在单线程情况下不会发生.不知道为什么.
以下是我的代码.如果它不足以进行诊断,请告诉我.谢谢你的帮助!
- void KNNClassifier::train(int nb_examples,int dim,double **features,int * labels) {
- _nPts = nb_examples;
- _labels = labels;
- _dataPts = features;
- setting_ANN(_dist_type,1);
- delete _search_struct;
- if(strcmp(_search_neighbors,"brutal") == 0) {
- _search_struct = new ANNbruteForce(_dataPts,_nPts,dim);
- }else if(strcmp(_search_neighbors,"kdtree") == 0) {
- _search_struct = new ANNkd_tree(_dataPts,dim);
- }
- }
- void KNNClassifier::classify_varIoUs_k(int dim,double *feature,int label,int *ks,double * errors,int nb_ks,int k_max) {
- ANNpoint queryPt = 0;
- ANNidxArray nnIdx = 0;
- ANNdistArray dists = 0;
- queryPt = feature;
- nnIdx = new ANNidx[k_max];
- dists = new ANNdist[k_max];
- if(strcmp(_search_neighbors,"brutal") == 0) {
- _search_struct->annkSearch(queryPt,_eps);
- }else if(strcmp(_search_neighbors,"kdtree") == 0) {
- _search_struct->annkSearch(queryPt,_eps); // where error occurs
- }
- for (int j = 0; j < nb_ks; j++)
- {
- scalar_t result = 0.0;
- for (int i = 0; i < ks[j]; i++) {
- result+=_labels[ nnIdx[i] ];
- }
- if (result*label<0) errors[j]++;
- }
- delete [] nnIdx;
- delete [] dists;
- }
- void KNNClassifier::tune_complexity(int nb_examples,int *labels,int fold,char *method,int nb_examples_test,double **features_test,int *labels_test) {
- int nb_try = (_k_max - _k_min) / scalar_t(_k_step);
- scalar_t *error_validation = new scalar_t [nb_try];
- int *ks = new int [nb_try];
- for(int i=0; i < nb_try; i ++){
- ks[i] = _k_min + _k_step * i;
- }
- if (strcmp(method,"ct")==0)
- {
- train(nb_examples,dim,features,labels );// train once for all nb of nbs in ks
- for(int i=0; i < nb_try; i ++){
- if (ks[i] > nb_examples){nb_try=i; break;}
- error_validation[i] = 0;
- }
- int i = 0;
- #pragma omp parallel shared(nb_examples_test,error_validation,features_test,labels_test,nb_try,ks) private(i)
- {
- #pragma omp for schedule(dynamic) nowait
- for (i=0; i < nb_examples_test; i++)
- {
- classify_varIoUs_k(dim,features_test[i],labels_test[i],ks,ks[nb_try - 1]); // where error occurs
- }
- }
- for (i=0; i < nb_try; i++)
- {
- error_validation[i]/=nb_examples_test;
- }
- }
- ......
- }
更新:
谢谢!我现在正试图通过使用“#pragma omp critical”来纠正classify_varIoUs_k()中写入相同内存问题的冲突:
- void KNNClassifier::classify_varIoUs_k(int dim,int k_max) {
- ANNpoint queryPt = 0;
- ANNidxArray nnIdx = 0;
- ANNdistArray dists = 0;
- queryPt = feature; //for (int i = 0; i < Vignette::size; i++){ queryPt[i] = vignette->content[i];}
- nnIdx = new ANNidx[k_max];
- dists = new ANNdist[k_max];
- if(strcmp(_search_neighbors,"brutal") == 0) {// search
- _search_struct->annkSearch(queryPt,_eps);
- }else if(strcmp(_search_neighbors,"kdtree") == 0) {
- _search_struct->annkSearch(queryPt,_eps);
- }
- for (int j = 0; j < nb_ks; j++)
- {
- scalar_t result = 0.0;
- for (int i = 0; i < ks[j]; i++) {
- result+=_labels[ nnIdx[i] ]; // Program received signal SIGSEGV,Segmentation fault
- }
- if (result*label<0)
- {
- #pragma omp critical
- {
- errors[j]++;
- }
- }
- }
- delete [] nnIdx;
- delete [] dists;
- }
解决方法
好的,既然你已经声明它在单线程情况下可以正常工作,那么“普通”方法将不起作用.您需要执行以下操作:
>查找并行访问的所有变量
>特别是看看那些经过修改的
>不要在共享资源上调用delete
>查看在共享资源上运行的所有库函数 – 检查它们是否不进行分配/释放
这是双重删除的候选人列表:
- shared(nb_examples_test,ks)
此外,此代码可能不是线程安全的:
- for (int i = 0; i < ks[j]; i++) {
- result+=_labels[ nnIdx[i] ];
- }
- if (result*label<0) errors[j]++;
因为两个或多个进程可能会尝试写入错误数组.