c# – 优化数百万个char *到字符串转换

我有一个应用程序需要接受数百万个char *作为输入参数(通常字符串少于512个字符(在unicode中)),并将它们转换并存储为.net字符串.

它结果是我的应用程序性能的真正瓶颈.我想知道是否有一些设计模式或想法使其更有效.

有一个关键部分让我觉得它可以改进：有很多重复.假设有100万个对象进入,可能只有50个独特的char *模式.

为了记录,这里是我用于将char *转换为字符串的算法(此算法在C中,但项目的其余部分在C#中)

String ^StringTools::MbCharToStr ( const char *Source ) 
{
   String ^str;

   if( (Source == NULL) || (Source[0] == '\0') )
   {
      str = gcnew String("");
   }
   else
   {
      // Find the number of UTF-16 characters needed to hold the
      // converted UTF-8 string,and allocate a buffer for them.
      const size_t max_strsize = 2048;

      int wstr_size = MultiByteToWideChar (CP_UTF8,0L,Source,-1,NULL,0);
      if (wstr_size < max_strsize)
      {
         // Save the malloc/free overhead if it's a reasonable size.
         // Plus,KJN was having fits with exceptions within exception logging due
         // to a corrupted heap.

         wchar_t wstr[max_strsize];

         (void) MultiByteToWideChar (CP_UTF8,wstr,(int) wstr_size);
         str = gcnew String (wstr);
      }
      else
      {
         wchar_t *wstr = (wchar_t *)calloc (wstr_size,sizeof(wchar_t));
         if (wstr == NULL) 
            throw gcnew PCSException (__FILE__,__LINE__,PCS_INSUF_MEMORY,MSG_SEVERE);

         // Convert the UTF-8 string into the UTF-16 buffer,construct the
         // result String from the UTF-16 buffer,and then free the buffer.

         (void) MultiByteToWideChar (CP_UTF8,(int) wstr_size);
         str = gcnew String ( wstr );
         free (wstr);
      }
   }
   return str;
}

解决方法

您可以使用输入字符串中的每个字符来提供 trie结构.在叶子上,有一个.NET字符串对象.然后,当您之前看到的char *进入时,您可以快速找到现有的.NET版本而无需分配任何内存.

伪代码：

>从一个空的特里开始,
>通过搜索特里来处理一个字符*,直到你不再继续
>添加节点,直到整个char *已编码为节点
>在叶子上,附加一个实际的.NET字符串

这个其他SO问题的答案应该让你开始：How to create a trie in c#

c# – 优化数百万个char *到字符串转换

解决方法

猜你在找的C#相关文章