Skip to content

refactor: introduce Source base class #499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

NicolasIRAGNE
Copy link
Contributor

No description provided.

…phism

This refactor introduces a clean inheritance architecture using proper Python polymorphism:

Architecture:
- Add Source base class in dedicated source.py file (common metadata/extra fields)
- Refactor FileSystemNode to inherit from Source with full backward compatibility
- Create specialized classes: FileSystemFile, FileSystemDirectory, FileSystemSymlink, GitRepository
- render_tree() method belongs to FileSystemNode level (tree-specific, not all sources need it)

Pure Python Polymorphism:
- Each subclass implements its own get_sort_priority() and get_content() methods
- NO type property or enum needed - use isinstance() directly
- FileSystemFile.get_sort_priority() returns 0 (files first)
- FileSystemDirectory.get_content() raises ValueError (directories can't have content)
- FileSystemSymlink.get_content() returns target path (what symlink points to)
- Clean, extensible design following Python best practices

Removed Legacy Type System:
- Completely removed FileSystemNodeType enum
- No more type property - use isinstance() everywhere
- Constructors now use specific classes: FileSystemFile(), FileSystemDirectory(), etc.
- Pure polymorphism without any type checking properties

Code Changes:
- src/gitingest/schemas/source.py: New base Source class
- src/gitingest/schemas/filesystem.py: Refactored with polymorphic methods, Path import in TYPE_CHECKING
- src/gitingest/ingestion.py: Use specific constructors, populate symlink targets
- src/gitingest/output_formatter.py: Use isinstance() instead of enum comparisons
- Remove all FileSystemNodeType imports and usage
- All pre-commit hooks pass (ruff-check, ruff-format, etc.)

Benefits:
- True Python polymorphism where each class knows its own behavior
- No explicit type checking needed - Python dispatches automatically
- More extensible - adding new source types just requires implementing methods
- Cleaner code without enum/string type comparisons
- Full backward compatibility maintained
Replace all isinstance() type checking with proper polymorphic methods:

Pure Polymorphism Methods:
- get_summary_info(): Each class returns its own summary format
- is_single_file(): Boolean check without isinstance()
- gather_contents(): Recursive content gathering via method dispatch
- get_display_name(): Tree display formatting (/, -> target, etc.)
- has_children(): Check for child nodes without type checking

Benefits:
- No isinstance() 'clochard' style code anywhere
- True duck typing - just call methods and let Python dispatch
- Cleaner, more maintainable code
- Each class encapsulates its own behavior
- Easy to extend with new node types

Code Changes:
- FileSystemNode: Base implementations for all methods
- FileSystemFile: is_single_file()=True, summary with line count
- FileSystemDirectory: get_display_name() adds '/', has_children() checks list
- FileSystemSymlink: get_display_name() shows '-> target'
- output_formatter.py: Use polymorphic methods instead of isinstance()

This is proper OOP - objects know their own behavior!
@NicolasIRAGNE NicolasIRAGNE requested a review from Copilot August 7, 2025 18:04
@NicolasIRAGNE NicolasIRAGNE changed the title Refactor/source inheritance refactor: introduce Source base class Aug 7, 2025
Copilot

This comment was marked as outdated.

@cursor cursor bot force-pushed the refactor/source-inheritance branch from 217afe2 to 87cbb5a Compare August 7, 2025 21:50
@NicolasIRAGNE NicolasIRAGNE requested a review from Copilot August 7, 2025 21:54
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a Source base class and refactors the filesystem schema to use proper inheritance and polymorphism. The refactoring eliminates the enum-based type system in favor of specialized classes for different filesystem node types.

Key changes:

  • Introduces a new Source base class with common fields (metadata and extra)
  • Replaces FileSystemNodeType enum with specialized classes (FileSystemFile, FileSystemDirectory, FileSystemSymlink, GitRepository)
  • Moves type-specific behavior into polymorphic methods on the respective classes

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/gitingest/schemas/source.py New base class with common metadata fields
src/gitingest/schemas/filesystem.py Major refactor replacing enum-based typing with inheritance hierarchy
src/gitingest/schemas/__init__.py Updated exports to include new classes
src/gitingest/output_formatter.py Updated to use polymorphic methods instead of type checking
src/gitingest/ingestion.py Updated to instantiate specific node types instead of generic nodes
.vscode/launch.json Added reload flag to debug configuration

Comment on lines 71 to 82
"""Return sort priority. Override in subclasses."""
return 1 # Default: not a file

Copy link
Preview

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment '# Default: not a file' is unclear. Consider documenting the actual priority values and what they represent (e.g., '# Default priority for non-file nodes (directories, symlinks)').

Suggested change
"""Return sort priority. Override in subclasses."""
return 1 # Default: not a file
"""Return sort priority. Override in subclasses.
Returns
-------
int
Sort priority value. By convention:
0 - file nodes (highest priority, e.g., README)
1 - default for non-file nodes (directories, symlinks, etc.)
Subclasses should override as appropriate.
"""
return 1 # Default priority for non-file nodes (directories, symlinks, etc.)

Copilot uses AI. Check for mistakes.

Comment on lines +98 to 104
from gitingest.utils.file_utils import _decodes, _get_preferred_encodings, _read_chunk
from gitingest.utils.notebook import process_notebook

Copy link
Preview

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports inside methods can impact performance and readability. Consider moving these imports to the top of the file or module level.

Suggested change
from gitingest.utils.file_utils import _decodes, _get_preferred_encodings, _read_chunk
from gitingest.utils.notebook import process_notebook
from gitingest.utils.notebook import process_notebook

Copilot uses AI. Check for mistakes.

Comment on lines +99 to 104
from gitingest.utils.notebook import process_notebook

Copy link
Preview

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports inside methods can impact performance and readability. Consider moving these imports to the top of the file or module level.

Suggested change
from gitingest.utils.notebook import process_notebook

Copilot uses AI. Check for mistakes.

Comment on lines 170 to 167
return f"File: {self.name}\nLines: {len(self.content.splitlines()):,}\n"

def is_single_file(self) -> bool:
"""Files are single files."""
return True

def render_tree(self, prefix: str = "", *, is_last: bool = True) -> list[str]:
"""Render the tree representation of this file."""
current_prefix = "└── " if is_last else "├── "
return [f"{prefix}{current_prefix}{self.name}"]


Copy link
Preview

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling self.content in summary info will trigger file reading, which could be expensive. Consider caching content or only reading when actually needed for display.

Suggested change
return f"File: {self.name}\nLines: {len(self.content.splitlines()):,}\n"
def is_single_file(self) -> bool:
"""Files are single files."""
return True
def render_tree(self, prefix: str = "", *, is_last: bool = True) -> list[str]:
"""Render the tree representation of this file."""
current_prefix = "└── " if is_last else "├── "
return [f"{prefix}{current_prefix}{self.name}"]
if self._cached_content is not None:
line_count = len(self._cached_content.splitlines())
return f"File: {self.name}\nLines: {line_count:,}\n"
else:
return f"File: {self.name}\nLines: (not loaded)\n"
def is_single_file(self) -> bool:
"""Files are single files."""
return True
@property
def content(self) -> str:
"""Return file content, caching after first read."""
if self._cached_content is None:
self._cached_content = self.get_content()
return self._cached_content

Copilot uses AI. Check for mistakes.

current_prefix = "└── " if is_last else "├── "
display_name = self.name + "/"
lines.append(f"{prefix}{current_prefix}{display_name}")
if hasattr(self, "children") and self.children:
Copy link
Preview

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hasattr check for children is unnecessary since children is defined as a field with default factory in the base class. This check can be simplified to just if self.children:.

Suggested change
if hasattr(self, "children") and self.children:
if self.children:

Copilot uses AI. Check for mistakes.

# Mark as git repo in the tree
display_name = f"{self.name}/ (git repository)"
lines.append(f"{prefix}{current_prefix}{display_name}")
if hasattr(self, "children") and self.children:
Copy link
Preview

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hasattr check for children is unnecessary since children is defined as a field with default factory in the base class. This check can be simplified to just if self.children:.

Suggested change
if hasattr(self, "children") and self.children:
if self.children:

Copilot uses AI. Check for mistakes.

@cursor cursor bot force-pushed the refactor/source-inheritance branch 2 times, most recently from 46c7f5d to 26ccecd Compare August 7, 2025 22:01
…thod

Even simpler approach:
- Replace node.has_children() with direct if node.children:
- Remove unnecessary has_children() methods from all classes
- Pythonic and direct - empty lists are falsy, non-empty are truthy
- Less code, same functionality

This is the most straightforward way to check for children in Python.
@cursor cursor bot force-pushed the refactor/source-inheritance branch from 26ccecd to 280d3f0 Compare August 7, 2025 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants