-
Notifications
You must be signed in to change notification settings - Fork 866
refactor: introduce Source base class #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…phism This refactor introduces a clean inheritance architecture using proper Python polymorphism: Architecture: - Add Source base class in dedicated source.py file (common metadata/extra fields) - Refactor FileSystemNode to inherit from Source with full backward compatibility - Create specialized classes: FileSystemFile, FileSystemDirectory, FileSystemSymlink, GitRepository - render_tree() method belongs to FileSystemNode level (tree-specific, not all sources need it) Pure Python Polymorphism: - Each subclass implements its own get_sort_priority() and get_content() methods - NO type property or enum needed - use isinstance() directly - FileSystemFile.get_sort_priority() returns 0 (files first) - FileSystemDirectory.get_content() raises ValueError (directories can't have content) - FileSystemSymlink.get_content() returns target path (what symlink points to) - Clean, extensible design following Python best practices Removed Legacy Type System: - Completely removed FileSystemNodeType enum - No more type property - use isinstance() everywhere - Constructors now use specific classes: FileSystemFile(), FileSystemDirectory(), etc. - Pure polymorphism without any type checking properties Code Changes: - src/gitingest/schemas/source.py: New base Source class - src/gitingest/schemas/filesystem.py: Refactored with polymorphic methods, Path import in TYPE_CHECKING - src/gitingest/ingestion.py: Use specific constructors, populate symlink targets - src/gitingest/output_formatter.py: Use isinstance() instead of enum comparisons - Remove all FileSystemNodeType imports and usage - All pre-commit hooks pass (ruff-check, ruff-format, etc.) Benefits: - True Python polymorphism where each class knows its own behavior - No explicit type checking needed - Python dispatches automatically - More extensible - adding new source types just requires implementing methods - Cleaner code without enum/string type comparisons - Full backward compatibility maintained
Replace all isinstance() type checking with proper polymorphic methods: Pure Polymorphism Methods: - get_summary_info(): Each class returns its own summary format - is_single_file(): Boolean check without isinstance() - gather_contents(): Recursive content gathering via method dispatch - get_display_name(): Tree display formatting (/, -> target, etc.) - has_children(): Check for child nodes without type checking Benefits: - No isinstance() 'clochard' style code anywhere - True duck typing - just call methods and let Python dispatch - Cleaner, more maintainable code - Each class encapsulates its own behavior - Easy to extend with new node types Code Changes: - FileSystemNode: Base implementations for all methods - FileSystemFile: is_single_file()=True, summary with line count - FileSystemDirectory: get_display_name() adds '/', has_children() checks list - FileSystemSymlink: get_display_name() shows '-> target' - output_formatter.py: Use polymorphic methods instead of isinstance() This is proper OOP - objects know their own behavior!
217afe2
to
87cbb5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a Source
base class and refactors the filesystem schema to use proper inheritance and polymorphism. The refactoring eliminates the enum-based type system in favor of specialized classes for different filesystem node types.
Key changes:
- Introduces a new
Source
base class with common fields (metadata
andextra
) - Replaces
FileSystemNodeType
enum with specialized classes (FileSystemFile
,FileSystemDirectory
,FileSystemSymlink
,GitRepository
) - Moves type-specific behavior into polymorphic methods on the respective classes
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
File | Description |
---|---|
src/gitingest/schemas/source.py |
New base class with common metadata fields |
src/gitingest/schemas/filesystem.py |
Major refactor replacing enum-based typing with inheritance hierarchy |
src/gitingest/schemas/__init__.py |
Updated exports to include new classes |
src/gitingest/output_formatter.py |
Updated to use polymorphic methods instead of type checking |
src/gitingest/ingestion.py |
Updated to instantiate specific node types instead of generic nodes |
.vscode/launch.json |
Added reload flag to debug configuration |
src/gitingest/schemas/filesystem.py
Outdated
"""Return sort priority. Override in subclasses.""" | ||
return 1 # Default: not a file | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment '# Default: not a file' is unclear. Consider documenting the actual priority values and what they represent (e.g., '# Default priority for non-file nodes (directories, symlinks)').
"""Return sort priority. Override in subclasses.""" | |
return 1 # Default: not a file | |
"""Return sort priority. Override in subclasses. | |
Returns | |
------- | |
int | |
Sort priority value. By convention: | |
0 - file nodes (highest priority, e.g., README) | |
1 - default for non-file nodes (directories, symlinks, etc.) | |
Subclasses should override as appropriate. | |
""" | |
return 1 # Default priority for non-file nodes (directories, symlinks, etc.) |
Copilot uses AI. Check for mistakes.
from gitingest.utils.file_utils import _decodes, _get_preferred_encodings, _read_chunk | ||
from gitingest.utils.notebook import process_notebook | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imports inside methods can impact performance and readability. Consider moving these imports to the top of the file or module level.
from gitingest.utils.file_utils import _decodes, _get_preferred_encodings, _read_chunk | |
from gitingest.utils.notebook import process_notebook | |
from gitingest.utils.notebook import process_notebook |
Copilot uses AI. Check for mistakes.
from gitingest.utils.notebook import process_notebook | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imports inside methods can impact performance and readability. Consider moving these imports to the top of the file or module level.
from gitingest.utils.notebook import process_notebook |
Copilot uses AI. Check for mistakes.
src/gitingest/schemas/filesystem.py
Outdated
return f"File: {self.name}\nLines: {len(self.content.splitlines()):,}\n" | ||
|
||
def is_single_file(self) -> bool: | ||
"""Files are single files.""" | ||
return True | ||
|
||
def render_tree(self, prefix: str = "", *, is_last: bool = True) -> list[str]: | ||
"""Render the tree representation of this file.""" | ||
current_prefix = "└── " if is_last else "├── " | ||
return [f"{prefix}{current_prefix}{self.name}"] | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling self.content
in summary info will trigger file reading, which could be expensive. Consider caching content or only reading when actually needed for display.
return f"File: {self.name}\nLines: {len(self.content.splitlines()):,}\n" | |
def is_single_file(self) -> bool: | |
"""Files are single files.""" | |
return True | |
def render_tree(self, prefix: str = "", *, is_last: bool = True) -> list[str]: | |
"""Render the tree representation of this file.""" | |
current_prefix = "└── " if is_last else "├── " | |
return [f"{prefix}{current_prefix}{self.name}"] | |
if self._cached_content is not None: | |
line_count = len(self._cached_content.splitlines()) | |
return f"File: {self.name}\nLines: {line_count:,}\n" | |
else: | |
return f"File: {self.name}\nLines: (not loaded)\n" | |
def is_single_file(self) -> bool: | |
"""Files are single files.""" | |
return True | |
@property | |
def content(self) -> str: | |
"""Return file content, caching after first read.""" | |
if self._cached_content is None: | |
self._cached_content = self.get_content() | |
return self._cached_content |
Copilot uses AI. Check for mistakes.
src/gitingest/schemas/filesystem.py
Outdated
current_prefix = "└── " if is_last else "├── " | ||
display_name = self.name + "/" | ||
lines.append(f"{prefix}{current_prefix}{display_name}") | ||
if hasattr(self, "children") and self.children: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using hasattr
check for children
is unnecessary since children
is defined as a field with default factory in the base class. This check can be simplified to just if self.children:
.
if hasattr(self, "children") and self.children: | |
if self.children: |
Copilot uses AI. Check for mistakes.
src/gitingest/schemas/filesystem.py
Outdated
# Mark as git repo in the tree | ||
display_name = f"{self.name}/ (git repository)" | ||
lines.append(f"{prefix}{current_prefix}{display_name}") | ||
if hasattr(self, "children") and self.children: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using hasattr
check for children
is unnecessary since children
is defined as a field with default factory in the base class. This check can be simplified to just if self.children:
.
if hasattr(self, "children") and self.children: | |
if self.children: |
Copilot uses AI. Check for mistakes.
46c7f5d
to
26ccecd
Compare
…thod Even simpler approach: - Replace node.has_children() with direct if node.children: - Remove unnecessary has_children() methods from all classes - Pythonic and direct - empty lists are falsy, non-empty are truthy - Less code, same functionality This is the most straightforward way to check for children in Python.
26ccecd
to
280d3f0
Compare
No description provided.