Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c2cpg] Recognize more source file extensions #5173

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

max-leuthaeuser
Copy link
Contributor

@max-leuthaeuser max-leuthaeuser commented Dec 10, 2024

To be consistent with latest cmake and CDT (eclipse-cdt/cdt#422).

With this PR we parse .h header files with the CDT C++ parser. While this is theoretically not correct (one should choose either the C or C++ parser dependent on the source file including it but we can't know this in all cases) we achieve a higher successfull parse ratio for these files. This is also the default in Eclipse CDT (i.e., the IDE).
Method de-duplication works by fullName (C++) or fullName + signature (C) now. With this change we are always able to de-duplicate:
1.) C++: the fullName already contains the signature. Hence, safe and the same as before.
2.) C: there is no method overloading anyway. So the methods from .h files are now parsed as C++ methods (see 1). Hence, we need to compare them with their fullName + signature C counterpart.

@@ -40,9 +40,9 @@ class HeaderAstCreationPassTests extends C2CpgSuite {
case Seq(bar, foo, m, printf) =>
// note that we don't see bar twice even so it is contained
// in main.h and included in main.c and we do scan both
bar.fullName shouldBe "bar"
bar.fullName shouldBe "bar:void()"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it would be a regression for C code, in order to improve C++ code. I guess there just is inherent ambiguity with .h files, whether they contain C or C++ code. But I don't think we can get away with this right now.

Maybe we do header files in a second pass after the regular files, so we can know whether they got included from C or C++ files? Or maybe CDT has some guess-the-file-type magic since the IDE runs into the same problem?

Copy link
Contributor Author

@max-leuthaeuser max-leuthaeuser Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe CDT has some guess-the-file-type magic since the IDE runs into the same problem?

Sadly no, they also simply use the C++ parser in all cases.

The two-passes approach also won't work in all cases, as one could e.g. include a C header file in a C and C++ source file.

Copy link
Contributor Author

@max-leuthaeuser max-leuthaeuser Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or to phrase it differently:
The behaviour without this PR is definitely wrong as it makes parsing CPP code in .h files impossible.
With this PR will are able to parse such code. "Wrong" fullnames for C method declarations that are never implemented in any source file (because there we will create the correct fullname and de-duplicate correctly) should be no issue or do I miss something there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants