MayaFlux 0.1.0
Digital-First Multimedia Processing Framework
Loading...
Searching...
No Matches
UniversalExtractor.hpp
Go to the documentation of this file.
1#pragma once
2
4
5/**
6 * @file UniversalExtractor.hpp
7 * @brief Modern, digital-first universal extractor framework for Maya Flux
8 *
9 * The UniversalExtractor system provides a clean, extensible foundation for data extraction
10 * in the Maya Flux ecosystem. Unlike traditional feature extractors, this focuses on
11 * digital-first paradigms: data-driven workflows, composability, and type safety.
12 *
13 * ## Core Philosophy
14 * An extractor **gives the user a specified ComputeData** through two main pathways:
15 * 1. **Direct extraction:** Using convert_data/extract from DataUtils
16 * 2. **Copy extraction:** Copy data via extract in DataUtils
17 *
18 * Concrete extractors can optionally integrate with analyzers when they need to extract
19 * regions/features identified by analysis.
20 *
21 * ## Key Features
22 * - **Universal input/output:** Template-based I/O types defined at instantiation
23 * - **Type-safe extraction:** C++20 concepts and compile-time guarantees
24 * - **Extraction strategies:** Direct, region-based, feature-guided, recursive
25 * - **Composable operations:** Integrates with ComputeMatrix execution modes
26 * - **Digital-first design:** Embraces computational possibilities beyond analog metaphors
27 *
28 * ## Usage Examples
29 * ```cpp
30 * // Extract specific data type from DataVariant
31 * auto extractor = std::make_shared<MyExtractor<Kakshya::DataVariant, std::vector<double>>>();
32 *
33 * // Extract matrix from container
34 * auto matrix_extractor = std::make_shared<MyExtractor<
35 * std::shared_ptr<Kakshya::SignalSourceContainer>,
36 * Eigen::MatrixXd>>();
37 * ```
38 */
39
40namespace MayaFlux::Yantra {
41
42/**
43 * @enum ExtractionType
44 * @brief Categories of extraction operations for discovery and organization
45 */
46enum class ExtractionType : uint8_t {
47 DIRECT, ///< Direct data type conversion/extraction
48 REGION_BASED, ///< Extract from spatial/temporal regions
49 FEATURE_GUIDED, ///< Extract based on feature analysis
50 PATTERN_BASED, ///< Extract based on pattern recognition
51 TRANSFORM, ///< Mathematical transformation during extraction
52 RECURSIVE, ///< Recursive/nested extraction
53 CUSTOM ///< User-defined extraction types
54};
55
56/**
57 * @enum ExtractionScope
58 * @brief Scope control for extraction operations
59 */
60enum class ExtractionScope : uint8_t {
61 FULL_DATA, ///< Extract all available data
62 TARGETED_REGIONS, ///< Extract only specific regions
63 FILTERED_CONTENT, ///< Extract content meeting criteria
64 SAMPLED_DATA ///< Extract sampled/downsampled data
65};
66
67/**
68 * @class UniversalExtractor
69 * @brief Template-flexible extractor base with instance-defined I/O types
70 *
71 * The UniversalExtractor provides a clean, concept-based foundation for all extraction
72 * operations. I/O types are defined at instantiation time, providing maximum flexibility
73 * while maintaining type safety through C++20 concepts.
74 *
75 * Key Features:
76 * - Instance-defined I/O types via template parameters
77 * - Concept-constrained data types for compile-time safety
78 * - Extraction type categorization for discovery
79 * - Scope control for targeted extraction
80 * - Parameter management with type safety
81 * - Integration with ComputeMatrix execution modes
82 */
83template <ComputeData InputType = std::vector<Kakshya::DataVariant>, ComputeData OutputType = InputType>
84class MAYAFLUX_API UniversalExtractor : public ComputeOperation<InputType, OutputType> {
85public:
89
90 virtual ~UniversalExtractor() = default;
91
92 /**
93 * @brief Gets the extraction type category for this extractor
94 * @return ExtractionType enum value
95 */
96 [[nodiscard]] virtual ExtractionType get_extraction_type() const = 0;
97
98 /**
99 * @brief Gets human-readable name for this extractor
100 * @return String identifier for the extractor
101 */
102 [[nodiscard]] std::string get_name() const override
103 {
104 return get_extractor_name();
105 }
106
107 /**
108 * @brief Type-safe parameter management with extraction-specific defaults
109 */
110 void set_parameter(const std::string& name, std::any value) override
111 {
112 if (name == "scope") {
113 if (auto* scope = std::any_cast<ExtractionScope>(&value)) {
114 m_scope = *scope;
115 return;
116 }
117 }
118 set_extraction_parameter(name, std::move(value));
119 }
120
121 [[nodiscard]] std::any get_parameter(const std::string& name) const override
122 {
123 if (name == "scope") {
124 return m_scope;
125 }
126 return get_extraction_parameter(name);
127 }
128
129 [[nodiscard]] std::map<std::string, std::any> get_all_parameters() const override
130 {
131 auto params = get_all_extraction_parameters();
132 params["scope"] = m_scope;
133 return params;
134 }
135
136 /**
137 * @brief Type-safe extraction method
138 * @param data Input data
139 * @return Extracted data directly (no IO wrapper)
140 */
141 OutputType extract_data(const InputType& data)
142 {
143 input_type wrapped_input(data);
144 auto result = operation_function(wrapped_input);
145 return result.data;
146 }
147
148 /**
149 * @brief Extract with specific scope
150 * @param data Input data
151 * @param scope Extraction scope to use
152 * @return Extracted output data
153 */
154 OutputType extract_with_scope(const InputType& data, ExtractionScope scope)
155 {
156 auto original_scope = m_scope;
157 m_scope = scope;
158 auto result = extract_data(data);
159 m_scope = original_scope;
160 return result;
161 }
162
163 /**
164 * @brief Batch extraction for multiple inputs
165 * @param inputs Vector of input data
166 * @return Vector of extracted results
167 */
168 std::vector<OutputType> extract_batch(const std::vector<InputType>& inputs)
169 {
170 std::vector<OutputType> results;
171 results.reserve(inputs.size());
172
173 for (const auto& input : inputs) {
174 results.push_back(extract_data(input));
175 }
176
177 return results;
178 }
179
180 /**
181 * @brief Get available extraction methods for this extractor
182 * @return Vector of method names
183 */
184 [[nodiscard]] virtual std::vector<std::string> get_available_methods() const = 0;
185
186 /**
187 * @brief Helper to get typed parameter with default value
188 * @tparam T Parameter type
189 * @param name Parameter name
190 * @param default_value Default value if parameter not found
191 * @return Parameter value or default
192 */
193 template <typename T>
194 T get_parameter_or_default(const std::string& name, const T& default_value) const
195 {
196 auto param = get_extraction_parameter(name);
197 if (param.has_value()) {
198 try {
199 return std::any_cast<T>(param);
200 } catch (const std::bad_any_cast&) {
201 return default_value;
202 }
203 }
204 return default_value;
205 }
206
207protected:
208 /**
209 * @brief Core operation implementation - called by ComputeOperation interface
210 * @param input Input data with metadata
211 * @return Output data with metadata
212 */
214 {
215 auto raw_result = extract_implementation(input);
216 return apply_scope_filtering(raw_result);
217 }
218
219 /**
220 * @brief Pure virtual extraction implementation - derived classes implement this
221 * @param input Input data with metadata
222 * @return Raw extraction output before scope processing
223 */
225
226 /**
227 * @brief Get extractor-specific name (derived classes override this)
228 * @return Extractor name string
229 */
230 [[nodiscard]] virtual std::string get_extractor_name() const { return "UniversalExtractor"; }
231
232 /**
233 * @brief Extraction-specific parameter handling (override for custom parameters)
234 */
235 virtual void set_extraction_parameter(const std::string& name, std::any value)
236 {
237 m_parameters[name] = std::move(value);
238 }
239
240 [[nodiscard]] virtual std::any get_extraction_parameter(const std::string& name) const
241 {
242 auto it = m_parameters.find(name);
243 return (it != m_parameters.end()) ? it->second : std::any {};
244 }
245
246 [[nodiscard]] virtual std::map<std::string, std::any> get_all_extraction_parameters() const
247 {
248 return m_parameters;
249 }
250
251 /**
252 * @brief Input validation (override for custom validation logic)
253 */
254 virtual bool validate_extraction_input(const input_type& /*input*/) const
255 {
256 // Default: accept any input that satisfies ComputeData concept
257 return true;
258 }
259
260 /**
261 * @brief Apply scope filtering to results
262 * @param raw_output Raw extraction results
263 * @return Filtered output based on scope setting
264 */
266 {
267 switch (m_scope) {
268 case ExtractionScope::FULL_DATA:
269 return raw_output;
270
271 case ExtractionScope::TARGETED_REGIONS:
272 return filter_to_target_regions(raw_output);
273
274 case ExtractionScope::FILTERED_CONTENT:
275 return apply_content_filtering(raw_output);
276
277 case ExtractionScope::SAMPLED_DATA:
278 return apply_data_sampling(raw_output);
279
280 default:
281 return raw_output;
282 }
283 }
284
285 /**
286 * @brief Filter results to target regions (override for custom filtering)
287 * @param raw_output Raw extraction output
288 * @return Filtered output
289 */
291 {
292 // Default: return as-is with metadata
293 auto result = raw_output;
294 result.template set_metadata<bool>("region_filtered", true);
295 return result;
296 }
297
298 /**
299 * @brief Apply content-based filtering (override for custom filtering)
300 * @param raw_output Raw extraction output
301 * @return Content-filtered output
302 */
304 {
305 // Default: return as-is with metadata
306 auto result = raw_output;
307 result.template set_metadata<bool>("content_filtered", true);
308 return result;
309 }
310
311 /**
312 * @brief Apply data sampling (override for custom sampling)
313 * @param raw_output Raw extraction output
314 * @return Sampled output
315 */
316 virtual output_type apply_data_sampling(const output_type& raw_output)
317 {
318 // Default: return as-is with metadata
319 auto result = raw_output;
320 result.template set_metadata<bool>("sampled", true);
321 return result;
322 }
323
324private:
325 ExtractionScope m_scope = ExtractionScope::FULL_DATA;
326 std::map<std::string, std::any> m_parameters;
327};
328
329/// Extractor that takes DataVariant and produces any ComputeData type
330template <ComputeData OutputType = Kakshya::DataVariant>
332
333/// Extractor for signal container processing
334template <ComputeData OutputType = std::shared_ptr<Kakshya::SignalSourceContainer>>
336
337/// Extractor for region-based extraction
338template <ComputeData OutputType = Kakshya::Region>
340
341/// Extractor for region group processing
342template <ComputeData OutputType = Kakshya::RegionGroup>
344
345/// Extractor for segment processing
346template <ComputeData OutputType = std::vector<Kakshya::RegionSegment>>
348
349/// Extractor that produces Eigen matrices
350template <ComputeData InputType = std::vector<Kakshya::DataVariant>>
352
353/// Extractor that produces Eigen vectors
354template <ComputeData InputType = std::vector<Kakshya::DataVariant>>
356
357/// Extractor that produces numeric vectors
358template <ComputeData InputType = std::vector<Kakshya::DataVariant>>
360
361} // namespace MayaFlux::Yantra
Base interface for all computational operations in the processing pipeline.
virtual std::any get_extraction_parameter(const std::string &name) const
std::vector< OutputType > extract_batch(const std::vector< InputType > &inputs)
Batch extraction for multiple inputs.
virtual ExtractionType get_extraction_type() const =0
Gets the extraction type category for this extractor.
output_type operation_function(const input_type &input) override
Core operation implementation - called by ComputeOperation interface.
virtual output_type filter_to_target_regions(const output_type &raw_output)
Filter results to target regions (override for custom filtering)
virtual output_type extract_implementation(const input_type &input)=0
Pure virtual extraction implementation - derived classes implement this.
virtual output_type apply_data_sampling(const output_type &raw_output)
Apply data sampling (override for custom sampling)
std::string get_name() const override
Gets human-readable name for this extractor.
OutputType extract_with_scope(const InputType &data, ExtractionScope scope)
Extract with specific scope.
virtual std::map< std::string, std::any > get_all_extraction_parameters() const
virtual std::vector< std::string > get_available_methods() const =0
Get available extraction methods for this extractor.
std::map< std::string, std::any > m_parameters
OutputType extract_data(const InputType &data)
Type-safe extraction method.
void set_parameter(const std::string &name, std::any value) override
Type-safe parameter management with extraction-specific defaults.
std::map< std::string, std::any > get_all_parameters() const override
Retrieves all parameters and their values.
virtual output_type apply_content_filtering(const output_type &raw_output)
Apply content-based filtering (override for custom filtering)
virtual void set_extraction_parameter(const std::string &name, std::any value)
Extraction-specific parameter handling (override for custom parameters)
virtual output_type apply_scope_filtering(const output_type &raw_output)
Apply scope filtering to results.
std::any get_parameter(const std::string &name) const override
Retrieves a parameter's current value.
T get_parameter_or_default(const std::string &name, const T &default_value) const
Helper to get typed parameter with default value.
virtual std::string get_extractor_name() const
Get extractor-specific name (derived classes override this)
virtual bool validate_extraction_input(const input_type &) const
Input validation (override for custom validation logic)
Template-flexible extractor base with instance-defined I/O types.
@ CUSTOM
User-defined analysis types.
@ TRANSFORM
Mathematical transformations.
ExtractionType
Categories of extraction operations for discovery and organization.
@ RECURSIVE
Recursive/nested extraction.
@ DIRECT
Direct data type conversion/extraction.
@ FEATURE_GUIDED
Extract based on feature analysis.
@ PATTERN_BASED
Extract based on pattern recognition.
@ REGION_BASED
Extract from spatial/temporal regions.
ExtractionScope
Scope control for extraction operations.
@ SAMPLED_DATA
Extract sampled/downsampled data.
@ FULL_DATA
Extract all available data.
@ FILTERED_CONTENT
Extract content meeting criteria.
@ TARGETED_REGIONS
Extract only specific regions.
Input/Output container for computation pipeline data flow with structure preservation.
Definition DataIO.hpp:24