background

A long time ago I wrote a tool for GacUI that packs a large number of H and CPP files into several large files, such as GacUI’s output

  • GacUI.h / GacUI.cpp
  • GacUIWindows.h / GacUIWindows.cpp
  • GacUIReflection.h / GacUIReflection.cpp
  • GacUICompiler.h / GacUICompiler.cpp

This is the tool that did it. Because I was too lazy to mess around with dotnet core in Ubuntu (yes, it was easy, but I didn’t want to add it to my one-click environment script), I spent about a whole day writing CodePack in C# under Visual Studio 2010. This is rewritten in C++ and can be compiled under Ubuntu using clang++.

Before rewriting: CodePack (C#)

CodePack (C++)

So today I’m going to compare the difference between C++ and C# when dealing with this mess.

First, you need to have a basic idea of the tool. How do you pack a large number of files into groups of files? The general idea is:

  • Put h files together one by one, CPP files together one by one
  • When spelling h files, pay attention to the order in which you include them
  • Include each other in large files after spelling

It involves a topological sorting algorithm. And I don’t want you to do that, because I’m just doing this because I only have a few hundred files in there, so I’m just doing a topological sort as fast as I can, and it’s n cubed logn. I guess actually it shouldn’t, because it seems a little bit too complicated. Here is the most complex fragment:

n-> FOREACH_INDEXER(T, category, index, unsorted) { logn-> if (! deps.Keys().Contains(category)) { n-> sorted->Add(category); n-> unsorted.RemoveAt(index); n-> for (vint i = deps.Count() - 1; i >= 0; i--) { nlogn-> deps.Remove(deps.Keys()[i], category); } break; Worst Case: n * (logn + n + n + n * nlogn) = n³logn, but average is much betterCopy the code

In practice, if you run this function during Release compilation, even if you enter hundreds or thousands of files and their include target list, the sorting time will be too fast to see, so leave it there

Now let’s start matching them up. Although I do not use STL, I use my own strings, regular expressions, containers, foreach and Linq (C++ will soon join the standard ranges have almost the same function), in STL. So it seems fair.

1: scans for disk files

C# is easy to write because it already has a built-in function to search for files:

     static string[] GetCppFiles(string folder)
        {
            return Directory
                .GetFiles(folder. "*.cpp". SearchOption.AllDirectories)
                .Select(s = > s.ToUpper())
                .ToArray(a)
                ;
        }
        static string[] GetHeaderFiles(string folder)
        {
            return Directory
                .GetFiles(folder. "*.h". SearchOption.AllDirectories)
                .Select(s = > s.ToUpper())
                .ToArray(a)
                ;
        }
Copy the code

I only have a function to traverse the directory, so I’m going to use a recursion. The point here is that the Windows API provides that functionality in C#, so you can call it directly if you want. I just suddenly found that I didn’t want to encapsulate it until I needed more later.

LazyList<FilePath> SearchFiles(const Folder& folder. const WString& extension)
{
	auto files = MakePtr<List<File>>(a);
	auto folders = MakePtr<List<Folder>>(a);
	folder.GetFiles(*files.Obj());
	folder.GetFolders(*folders.Obj());

	return LazyList<File>(files)
		.Select([] (const File& file) { return file.GetFilePath(a); })
		.Where([=] (const FilePath& path) { return INVLOC.EndsWith(path.GetName(), extension. Locale: :IgnoreCase); })
		.Concat(
			LazyList<Folder>(folders)
			.SelectMany([=] (const Folder& folder) { return SearchFiles(folder. extension); })
			);
}

LazyList<FilePath> GetCppFiles(const FilePath& folder)
{
	return SearchFiles(folder. L".cpp");
}

LazyList<FilePath> GetHeaderFiles(const FilePath& folder)
{
	return SearchFiles(folder. L".h");
}
Copy the code

2: Classifies files

Classification means that files are grouped according to the preset pattern. It is necessary to show the configuration file here:


        
<codegen>
  <folders>
    <folder path=".. \Source" />
    <folder path=".. \Import" />
  </folders>
  <categories>
    <category name="vlpp" pattern="\Import\Vlpp."/>
    <category name="wfruntime" pattern="\Import\VlppWorkflow."/>
    <category name="wfcompiler" pattern="\Import\VlppWorkflowCompiler."/>
    <category name="gacui" pattern="\Source\">
      <except pattern="\Windows\" />
      <except pattern="\WindowsDirect2D\" />
      <except pattern="\WindowsGDI\" />
      <except pattern="\Reflection\" />
      <except pattern="\Compiler\" />
    </category>
    <category name="windows" pattern="\Source\GraphicsElement\WindowsDirect2D\" />
    <category name="windows" pattern="\Source\GraphicsElement\WindowsGDI\" />
    <category name="windows" pattern="\Source\NativeWindow\Windows\" />
    <category name="reflection" pattern="\Source\Reflection\" />
    <category name="compiler" pattern="\Source\Compiler\" />
  </categories>
  <output path=".">
    <codepair category="vlpp" filename="Vlpp" generate="false"/>
    <codepair category="wfruntime" filename="VlppWorkflow" generate="false"/>
    <codepair category="wfcompiler" filename="VlppWorkflowCompiler" generate="false"/>
    <codepair category="gacui" filename="GacUI" generate="true"/>
    <codepair category="windows" filename="GacUIWindows" generate="true"/>
    <codepair category="reflection" filename="GacUIReflection" generate="true"/>
    <codepair category="compiler" filename="GacUICompiler" generate="true"/>
  </output>
</codegen>
Copy the code

< Categories > Below are the categories of files, basically which files are grouped into groups. <output> specifies the file name of the final output for each set. First look at the C# code. This function is longer, but the content is very simple. We got all the files in the previous step, so now even if we match all the //category@pattern and //category/except@pattern, we filter them out, and finally eliminate the duplicate files.

     static Dictionary<string. string[] > CategorizeCodeFiles(XDocument config. string[] files)
        {
            Dictionary<string. string[] > categorizedFiles = new Dictionary<string. string[] > ();
            foreach (var e in config.Root.Element("categories").Elements("category"))
            {
                string name = e.Attribute("name").Value;
                string pattern = e.Attribute("pattern").Value.ToUpper(a);
                string[] exceptions = e.Elements("except").Select(x = > x.Attribute("pattern").Value.ToUpper()).ToArray(a);
                string[] filteredFiles = files
                        .Where(f = >
                            {
                                string path = f.ToUpper(a);
                                return path.Contains(pattern) && exceptions.All(ex = > !path.Contains(ex));
                            })
                        .ToArray(a);
                string[] previousFiles = null;
                if (categorizedFiles.TryGetValue(name. out previousFiles))
                {
                    filteredFiles = filteredFiles.Concat(previousFiles).ToArray(a);
                    categorizedFiles.Remove(name);
                }
                categorizedFiles.Add(name. filteredFiles);
            }
            foreach (var a in categorizedFiles.Keys)
            {
                foreach (var b in categorizedFiles.Keys)
                {
                    if (a ! = b)
                    {
                        if (categorizedFiles[a].Intersect(categorizedFiles[b]).Count(a) ! = 0)
                        {
                            throw new ArgumentException(a);
                        }
                    }
                }
            }
            return categorizedFiles;
        }
Copy the code

The last step is just defensive. If the same file is found in different groups at the same time, it crashes. I am too lazy to report an error, crash directly debug this program, immediately see (escape

Here is the C++ code. And you can see there’s basically no difference. But from now on, you’ll find that C++ is much more verbose in writing lambda expressions than C#. Of course, lambda expression parameter types should not be written (instead of auto), but my Linq is old, when I wrote it, C++ lambda expressions cannot be template functions, so now it is not compatible with this new feature. Change him sometime.

As for the specific reason, because in Linq, for any lambda expression F, I use it

A template class that takes the return value of a function pointer<decltype(&F: :operator())>
Copy the code

To get the return value to generate the result type of the Linq function. Decltype (declval<F>()); decltype(declval<F>());

void CategorizeCodeFiles(Ptr<XmlDocument> config. LazyList<FilePath> files. Group<WString. FilePath> & categorizedFiles)
{
	FOREACH(Ptr<XmlElement>. e. XmlGetElements(XmlGetElement(config->rootElement. L"categories"), L"category"))
	{
		auto name = XmlGetAttribute(e. L"name")->value.value;
		auto pattern = wupper(XmlGetAttribute(e. L"pattern")->value.value);

		List<WString> exceptions;
		CopyFrom(
			exceptions.
			XmlGetElements(e.L"except")
				.Select([] (const Ptr<XmlElement> x)
				{
					return XmlGetAttribute(x. L"pattern")->value.value;
				})
			);

		List<FilePath> filterFiles;
		CopyFrom(
			filterFiles.
			From(files).Where([&] (const FilePath& f)
				{
					auto path = f.GetFullPath(a);
					return INVLOC.FindFirst(path. pattern. Locale: :IgnoreCase).key ! = -1
						&& From(exceptions).All([&] (const WString& ex)
						{
							return INVLOC.FindFirst(path. ex. Locale: :IgnoreCase).key = = -1;
						});
				})
			);

		FOREACH(FilePath. file. filterFiles)
		{
			if (!categorizedFiles.Contains(name. file))
			{
				categorizedFiles.Add(name. file);
			}
		}
	}

	FOREACH(WString. a. categorizedFiles.Keys())
	{
		FOREACH(WString. b. categorizedFiles.Keys())
		{
			if (a ! = b)
			{
				const auto& as = categorizedFiles.Get(a);
				const auto& bs = categorizedFiles.Get(b);
				CHECK_ERROR(!From(as).Intersect(bs).IsEmpty(), L"A file should not appear in multiple categories.");
			}
		}
	}
}
Copy the code

3: Recursively enumerate all files directly or indirectly #include in this file

From here, C# and C++ programs will be slightly different. I mentioned above that when you concatenate all the header files you have to pay attention to the order, and that’s what GetIncludedFiles does. However, in C++, because there are some differences between my container and C# container, I cannot copy this code with original taste, so I changed the topological sorting function of (4) into template function, and finally sorted in the Combine function of (6).

Thus, in the C# version, sorting is done by GetIncludedFiles, whereas the C++ version does it in the Combine function.

     static Dictionary<string. string[] > ScannedFiles = new Dictionary<string. string[] > ();
        static Regex IncludeRegex = new Regex(@"^\s*\#include\s*""(? 
        
         [^""]+)""\s*$"
        );
        static Regex IncludeSystemRegex = new Regex(@"^\s*\#include\s*\<(? 
        
         [^""]+)\>\s*$"
        );

        static string[] GetIncludedFiles(string codeFile)
        {
            codeFile = Path.GetFullPath(codeFile).ToUpper(a);
            string[] result = null;
            if (!ScannedFiles.TryGetValue(codeFile. out result))
            {
                List<string> directIncludeFiles = new List<string> ();
                foreach (var line in File.ReadAllLines(codeFile))
                {
                    Match match = IncludeRegex.Match(line);
                    if (match.Success)
                    {
                        string path = match.Groups["path"].Value;
                        path = Path.GetFullPath(Path.GetDirectoryName(codeFile) + @ \ "" + path).ToUpper(a);
                        if (!directIncludeFiles.Contains(path))
                        {
                            directIncludeFiles.Add(path);
                        }
                    }
                }

                for (int i = directIncludeFiles.Count - 1; i > = 0; i-)
                {
                    directIncludeFiles.InsertRange(i. GetIncludedFiles(directIncludeFiles[i]));
                }
                result = directIncludeFiles.Distinct().ToArray(a);
                ScannedFiles.Add(codeFile. result);
            }
            return result;
        }
Copy the code

The content of GetIncludedFiles is also very simple, which is to find each line of the file using the regular expression #include “”, and then open the file with the #include to continue until the file is finished. The results will be cached in the scannedFiles variable and will not be reused. The C# version puts the discovered files first, because obviously, if a.h# includes b.h, you must open b first to find a. But the C++ container I wrote didn’t have the key InsertRange function, so I had to add chunks later. But when you add the Reverse, they are not equivalent. So in the C++ version, I moved the order thing to the back.

Dictionary<FilePath. LazyList<FilePath>> scannedFiles;
Regex regexInclude(LR"/(^\s*#include\s*"(<path>[^"]+)"\s*$)/");
Regex regexSystemInclude(LR"/(^\s*#include\s*<(<path>[^"]+)>\s*$)/");

LazyList<FilePath> GetIncludedFiles(const FilePath& codeFile)
{
	{
		vint index = scannedFiles.Keys().IndexOf(codeFile);
		if (index ! = -1)
		{
			return scannedFiles.Values(to)index];
		}
	}

	List<FilePath> includes;
	StringReader reader(ReadFile(codeFile));
	while (!reader.IsEnd())
	{
		auto line = reader.ReadLine(a);
		if (auto match = regexInclude.MatchHead(line))
		{
			auto path = codeFile.GetFolder(a) / match->Groups(to)L"path"] [0].Value(a);
			if (!includes.Contains(path))
			{
				includes.Add(path);
			}
		}
	}

	auto result = MakePtr<List<FilePath>>(a);
	CopyFrom(
		*result.Obj(),
		From(includes)
			.Concat(From(includes).SelectMany(GetIncludedFiles))
			.Distinct(a)
		);

	scannedFiles.Add(codeFile. result);
	return result;
}
Copy the code

As you can see, the two versions of the code are still fairly close. When it comes to strings, the only difference between C++ and C# is syntax.

4: topology sort

It has been posted above, so it is omitted. If you compare the C++ and C# versions, the main differences are:

  • C++ favours value-typed containers, so Linq doesn’t have C#’s ToArray, ToList, and ToDictionary, nor Dictionary

    .
    ,>
  • I have Group

    instead, but Group cannot express the case where the key exists but the value does not.
    ,>

So these two major differences result in slightly different logic between the two versions of the code.

5: the longest public prefix

When looking for the longest public prefix, the C# version counts strings, while C++ counts paths, resulting in the C# version where the file name would be chopped off if every file had the same folder and filename prefix. However, the output file name only exists in the comment, wrong is not important, so it has not changed. I took advantage of the rewrite to fix the bug.

C # :

     static string GetLongestCommonPrefix(string[] strings)
        {
            if (strings.Length = = 0) return "";
            int shortestLength = strings.Select(s = > s.Length).Min(a);
            return Enumerable.Range(0. shortestLength + 1)
                .Reverse(a)
                .Select(i = > strings[0].Substring(0. i))
                .Where(s = > strings.Skip(1).All(t = > t.StartsWith(s)))
                .First(a);
        }
Copy the code

C + + :

FilePath GetCommonFolder(const List<FilePath> & paths)
{
	auto folder = paths[0].GetFolder(a);
	while (true)
	{
		if (From(paths).All([&] (const FilePath& path)
			{
				return INVLOC.StartsWith(path.GetFullPath(), folder.GetFullPath(a) + WString(folder.Delimiter), Locale: :IgnoreCase);
			}))
		{
			return folder;
		}
		folder = folder.GetFolder(a);
	}
	CHECK_FAIL(L"Cannot process files across multiple drives.");
}
Copy the code

6: Glue a pile of papers together

The C# and C++ versions are written exactly the same, except for the C++ version, which calls SortDependencies.

7: Main function

It’s also too long so I’m just going to post a very distinct fragment, which is used to calculate the dependencies between the groups, and the output will be #include “vlpp.h”. Let’s look at the C# version first:

         var categoryDependencies = categorizedCppFiles
                .Keys
                .Select(k = >
                {
                    var headerFiles = categorizedCppFiles[k]
                        .SelectMany(GetIncludedFiles)
                        .Distinct(a)
                        .ToArray(a);
                    var keys = categorizedHeaderFiles
                        .Where(p = > p.Value.Any(h = > headerFiles.Contains(h)))
                        .Select(p = > p.Key)
                        .Except(new string[] { k })
                        .ToArray(a);
                    return Tuple.Create(k. keys);
                })
                .ToDictionary(t = > t.Item1. t = > t.Item2);
Copy the code

How refreshing! Find h files directly or indirectly included in all CPP files for each category, and then reverse lookup their categories. Then look at the C++ version:

	Group<WString. WString> categoryDepedencies;
	CopyFrom(
		categoryDepedencies.
		From(categorizedCppFiles.Keys())
			.SelectMany([&] (const WString& key)
			{
				SortedList<FilePath> headerFiles;
				CopyFrom(
					headerFiles.
					From(categorizedCppFiles[key])
						.SelectMany(GetIncludedFiles)
						.Distinct(a)
					);

				auto keys = MakePtr<SortedList<WString>>(a);
				CopyFrom(
					*keys.Obj(),
					From(categorizedHeaderFiles.Keys())
						.Where([&] (const WString& key)
						{
							return From(categorizedHeaderFiles[key])
								.Any([&] (const FilePath& h)
								{
									return headerFiles.Contains(h);
								});
						})
					);
				keys->Remove(key);

				return LazyList<WString>(keys).Select([=] (const WString& k)->Pair<WString. WString>{ return {key.k}; });
			})
		);
Copy the code

Smelly and long (escape. In fact, they are the same, thanks to the verbose lambda syntax of C++, so I have to set the layout to my liking and have to break up one line into multiple lines.

conclusion

In general, as long as C++ lambda expressions can be written as C# lambda expressions, the difference between the two languages is GC and shared_ptr. When you’re doing a lot of things that aren’t overly performance-sensitive, you can fiddle around, and development efficiency should be pretty close.

You’ll also notice that in C++ I sometimes write List<int> directly, and sometimes MakePtr<List<int>>. The main difference is that lambda expressions that are executed after exiting this function, The local variable that references a List<int> will kneel (because it has already been freed). This time I use the smart pointer to hold it, there is no problem.

C++ LR”FuckShitBitch “is 10,000 times better than C# @”abcd” for writing regular expressions.

If you are doing something that is very performance-oriented, you may want to consider writing a small portion of C# code in C++ after profiling on a project. C++ can achieve unbeatable performance at the cost of making code super ugly and unmaintainable by low-paid programmers, which C# can’t, and that’s not what C# was designed for.