MayaFlux 0.4.0
Digital-First Multimedia Processing Framework
Loading...
Searching...
No Matches
UniversalExtractor.hpp
Go to the documentation of this file.
1#pragma once
2
4
5/**
6 * @file UniversalExtractor.hpp
7 * @brief Modern, digital-first universal extractor framework for Maya Flux
8 *
9 * The UniversalExtractor system provides a clean, extensible foundation for data extraction
10 * in the Maya Flux ecosystem. Unlike traditional feature extractors, this focuses on
11 * digital-first paradigms: data-driven workflows, composability, and type safety.
12 *
13 * ## Core Philosophy
14 * An extractor **gives the user a specified ComputeData** through two main pathways:
15 * 1. **Direct extraction:** Using convert_data/extract from DataUtils
16 * 2. **Copy extraction:** Copy data via extract in DataUtils
17 *
18 * Concrete extractors can optionally integrate with analyzers when they need to extract
19 * regions/features identified by analysis.
20 *
21 * ## Key Features
22 * - **Universal input/output:** Template-based I/O types defined at instantiation
23 * - **Type-safe extraction:** C++20 concepts and compile-time guarantees
24 * - **Extraction strategies:** Direct, region-based, feature-guided, recursive
25 * - **Composable operations:** Integrates with ComputeMatrix execution modes
26 * - **Digital-first design:** Embraces computational possibilities beyond analog metaphors
27 *
28 * ## Usage Examples
29 * ```cpp
30 * // Extract specific data type from DataVariant
31 * auto extractor = std::make_shared<MyExtractor<Kakshya::DataVariant, std::vector<double>>>();
32 *
33 * // Extract matrix from container
34 * auto matrix_extractor = std::make_shared<MyExtractor<
35 * std::shared_ptr<Kakshya::SignalSourceContainer>,
36 * Eigen::MatrixXd>>();
37 * ```
38 */
39
40namespace MayaFlux::Yantra {
41
42/**
43 * @enum ExtractionType
44 * @brief Categories of extraction operations for discovery and organization
45 */
46enum class ExtractionType : uint8_t {
47 DIRECT, ///< Direct data type conversion/extraction
48 REGION_BASED, ///< Extract from spatial/temporal regions
49 FEATURE_GUIDED, ///< Extract based on feature analysis
50 PATTERN_BASED, ///< Extract based on pattern recognition
51 TRANSFORM, ///< Mathematical transformation during extraction
52 RECURSIVE, ///< Recursive/nested extraction
53 CUSTOM ///< User-defined extraction types
54};
55
56/**
57 * @enum ExtractionScope
58 * @brief Scope control for extraction operations
59 */
60enum class ExtractionScope : uint8_t {
61 FULL_DATA, ///< Extract all available data
62 TARGETED_REGIONS, ///< Extract only specific regions
63 FILTERED_CONTENT, ///< Extract content meeting criteria
64 SAMPLED_DATA ///< Extract sampled/downsampled data
65};
66
67/**
68 * @class UniversalExtractor
69 * @brief Template-flexible extractor base with instance-defined I/O types
70 *
71 * The UniversalExtractor provides a clean, concept-based foundation for all extraction
72 * operations. I/O types are defined at instantiation time, providing maximum flexibility
73 * while maintaining type safety through C++20 concepts.
74 *
75 * Key Features:
76 * - Instance-defined I/O types via template parameters
77 * - Concept-constrained data types for compile-time safety
78 * - Extraction type categorization for discovery
79 * - Scope control for targeted extraction
80 * - Parameter management with type safety
81 * - Integration with ComputeMatrix execution modes
82 */
83template <ComputeData InputType = std::vector<Kakshya::DataVariant>, ComputeData OutputType = InputType>
84class MAYAFLUX_API UniversalExtractor : public ComputeOperation<InputType, OutputType> {
85public:
89
90 virtual ~UniversalExtractor() = default;
91
92 /**
93 * @brief Gets the extraction type category for this extractor
94 * @return ExtractionType enum value
95 */
96 [[nodiscard]] virtual ExtractionType get_extraction_type() const = 0;
97
98 /**
99 * @brief Gets human-readable name for this extractor
100 * @return String identifier for the extractor
101 */
102 [[nodiscard]] std::string get_name() const override
103 {
104 return get_extractor_name();
105 }
106
107 [[nodiscard]] OperationType get_operation_type() const override
108 {
109 return OperationType::EXTRACTOR;
110 }
111
112 /**
113 * @brief Type-safe parameter management with extraction-specific defaults
114 */
115 void set_parameter(const std::string& name, std::any value) override
116 {
117 if (name == "scope") {
118 if (auto result = safe_any_cast<ExtractionScope>(value)) {
119 m_scope = *result.value;
120 return;
121 }
122 }
123 set_extraction_parameter(name, std::move(value));
124 }
125
126 [[nodiscard]] std::any get_parameter(const std::string& name) const override
127 {
128 if (name == "scope") {
129 return m_scope;
130 }
131 return get_extraction_parameter(name);
132 }
133
134 [[nodiscard]] std::map<std::string, std::any> get_all_parameters() const override
135 {
136 auto params = get_all_extraction_parameters();
137 params["scope"] = m_scope;
138 return params;
139 }
140
141 /**
142 * @brief Type-safe extraction method
143 * @param data Input data
144 * @return Extracted data directly (no Datum wrapper)
145 */
146 OutputType extract_data(const input_type& data)
147 {
148 auto result = operation_function(data);
149 return result.data;
150 }
151
152 OutputType extract_data(const InputType& data)
153 {
154 return this->extract_data(input_type { data });
155 }
156
157 /**
158 * @brief Extract with specific scope
159 * @param data Input data
160 * @param scope Extraction scope to use
161 * @return Extracted output data
162 */
163 OutputType extract_with_scope(const input_type& data, ExtractionScope scope)
164 {
165 auto original_scope = m_scope;
166 m_scope = scope;
167 auto result = extract_data(data);
168 m_scope = original_scope;
169 return result;
170 }
171
172 OutputType extract_with_scope(const InputType& data, ExtractionScope scope)
173 {
174 return this->extract_with_scope(input_type { data }, scope);
175 }
176
177 /**
178 * @brief Batch extraction for multiple inputs
179 * @param inputs Vector of input data
180 * @return Vector of extracted results
181 */
182 std::vector<OutputType> extract_batch(const std::vector<input_type>& inputs)
183 {
184 std::vector<OutputType> results;
185 results.reserve(inputs.size());
186
187 for (const auto& input : inputs) {
188 results.push_back(extract_data(input));
189 }
190
191 return results;
192 }
193
194 std::vector<OutputType> extract_batch(const std::vector<InputType>& inputs)
195 {
196 return this->extract_batch(as_io_batch(inputs));
197 }
198
199 /**
200 * @brief Get available extraction methods for this extractor
201 * @return Vector of method names
202 */
203 [[nodiscard]] virtual std::vector<std::string> get_available_methods() const = 0;
204
205 /**
206 * @brief Helper to get typed parameter with default value
207 * @tparam T Parameter type
208 * @param name Parameter name
209 * @param default_value Default value if parameter not found
210 * @return Parameter value or default
211 */
212 template <typename T>
213 T get_parameter_or_default(const std::string& name, const T& default_value) const
214 {
215 return safe_any_cast_or_default<T>(get_extraction_parameter(name), default_value);
216 }
217
218protected:
219 /**
220 * @brief Core operation implementation - called by ComputeOperation interface
221 * @param input Input data with metadata
222 * @return Output data with metadata
223 */
225 {
226 auto raw_result = extract_implementation(input);
227 return apply_scope_filtering(raw_result);
228 }
229
230 /**
231 * @brief Pure virtual extraction implementation - derived classes implement this
232 * @param input Input data with metadata
233 * @return Raw extraction output before scope processing
234 */
236
237 /**
238 * @brief Get extractor-specific name (derived classes override this)
239 * @return Extractor name string
240 */
241 [[nodiscard]] virtual std::string get_extractor_name() const { return "UniversalExtractor"; }
242
243 /**
244 * @brief Extraction-specific parameter handling (override for custom parameters)
245 */
246 virtual void set_extraction_parameter(const std::string& name, std::any value)
247 {
248 m_parameters[name] = std::move(value);
249 }
250
251 [[nodiscard]] virtual std::any get_extraction_parameter(const std::string& name) const
252 {
253 auto it = m_parameters.find(name);
254 return (it != m_parameters.end()) ? it->second : std::any {};
255 }
256
257 [[nodiscard]] virtual std::map<std::string, std::any> get_all_extraction_parameters() const
258 {
259 return m_parameters;
260 }
261
262 /**
263 * @brief Input validation (override for custom validation logic)
264 */
265 virtual bool validate_extraction_input(const input_type& /*input*/) const
266 {
267 // Default: accept any input that satisfies ComputeData concept
268 return true;
269 }
270
271 /**
272 * @brief Apply scope filtering to results
273 * @param raw_output Raw extraction results
274 * @return Filtered output based on scope setting
275 */
277 {
278 switch (m_scope) {
279 case ExtractionScope::FULL_DATA:
280 return raw_output;
281
282 case ExtractionScope::TARGETED_REGIONS:
283 return filter_to_target_regions(raw_output);
284
285 case ExtractionScope::FILTERED_CONTENT:
286 return apply_content_filtering(raw_output);
287
288 case ExtractionScope::SAMPLED_DATA:
289 return apply_data_sampling(raw_output);
290
291 default:
292 return raw_output;
293 }
294 }
295
296 /**
297 * @brief Filter results to target regions (override for custom filtering)
298 * @param raw_output Raw extraction output
299 * @return Filtered output
300 */
302 {
303 // Default: return as-is with metadata
304 auto result = raw_output;
305 result.template set_metadata<bool>("region_filtered", true);
306 return result;
307 }
308
309 /**
310 * @brief Apply content-based filtering (override for custom filtering)
311 * @param raw_output Raw extraction output
312 * @return Content-filtered output
313 */
315 {
316 // Default: return as-is with metadata
317 auto result = raw_output;
318 result.template set_metadata<bool>("content_filtered", true);
319 return result;
320 }
321
322 /**
323 * @brief Apply data sampling (override for custom sampling)
324 * @param raw_output Raw extraction output
325 * @return Sampled output
326 */
327 virtual output_type apply_data_sampling(const output_type& raw_output)
328 {
329 // Default: return as-is with metadata
330 auto result = raw_output;
331 result.template set_metadata<bool>("sampled", true);
332 return result;
333 }
334
335private:
336 ExtractionScope m_scope = ExtractionScope::FULL_DATA;
337 std::map<std::string, std::any> m_parameters;
338};
339
340/// Extractor that takes DataVariant and produces any ComputeData type
341template <ComputeData OutputType = Kakshya::DataVariant>
343
344/// Extractor for signal container processing
345template <ComputeData OutputType = std::shared_ptr<Kakshya::SignalSourceContainer>>
347
348/// Extractor for region-based extraction
349template <ComputeData OutputType = Kakshya::Region>
351
352/// Extractor for region group processing
353template <ComputeData OutputType = Kakshya::RegionGroup>
355
356/// Extractor for segment processing
357template <ComputeData OutputType = std::vector<Kakshya::RegionSegment>>
359
360/// Extractor that produces Eigen matrices
361template <ComputeData InputType = std::vector<Kakshya::DataVariant>>
363
364/// Extractor that produces Eigen vectors
365template <ComputeData InputType = std::vector<Kakshya::DataVariant>>
367
368/// Extractor that produces numeric vectors
369template <ComputeData InputType = std::vector<Kakshya::DataVariant>>
371
372} // namespace MayaFlux::Yantra
Base interface for all computational operations in the processing pipeline.
virtual std::any get_extraction_parameter(const std::string &name) const
std::vector< OutputType > extract_batch(const std::vector< InputType > &inputs)
virtual ExtractionType get_extraction_type() const =0
Gets the extraction type category for this extractor.
output_type operation_function(const input_type &input) override
Core operation implementation - called by ComputeOperation interface.
virtual output_type filter_to_target_regions(const output_type &raw_output)
Filter results to target regions (override for custom filtering)
virtual output_type extract_implementation(const input_type &input)=0
Pure virtual extraction implementation - derived classes implement this.
virtual output_type apply_data_sampling(const output_type &raw_output)
Apply data sampling (override for custom sampling)
OperationType get_operation_type() const override
Returns the category of this operation for grammar and registry discovery.
std::vector< OutputType > extract_batch(const std::vector< input_type > &inputs)
Batch extraction for multiple inputs.
OutputType extract_data(const input_type &data)
Type-safe extraction method.
std::string get_name() const override
Gets human-readable name for this extractor.
OutputType extract_with_scope(const InputType &data, ExtractionScope scope)
virtual std::map< std::string, std::any > get_all_extraction_parameters() const
virtual std::vector< std::string > get_available_methods() const =0
Get available extraction methods for this extractor.
std::map< std::string, std::any > m_parameters
OutputType extract_data(const InputType &data)
void set_parameter(const std::string &name, std::any value) override
Type-safe parameter management with extraction-specific defaults.
std::map< std::string, std::any > get_all_parameters() const override
Retrieves all parameters and their values.
virtual output_type apply_content_filtering(const output_type &raw_output)
Apply content-based filtering (override for custom filtering)
virtual void set_extraction_parameter(const std::string &name, std::any value)
Extraction-specific parameter handling (override for custom parameters)
virtual output_type apply_scope_filtering(const output_type &raw_output)
Apply scope filtering to results.
std::any get_parameter(const std::string &name) const override
Retrieves a parameter's current value.
T get_parameter_or_default(const std::string &name, const T &default_value) const
Helper to get typed parameter with default value.
OutputType extract_with_scope(const input_type &data, ExtractionScope scope)
Extract with specific scope.
virtual std::string get_extractor_name() const
Get extractor-specific name (derived classes override this)
virtual bool validate_extraction_input(const input_type &) const
Input validation (override for custom validation logic)
Template-flexible extractor base with instance-defined I/O types.
@ CUSTOM
User-defined analysis types.
@ TRANSFORM
Mathematical transformations.
ExtractionType
Categories of extraction operations for discovery and organization.
@ RECURSIVE
Recursive/nested extraction.
@ DIRECT
Direct data type conversion/extraction.
@ FEATURE_GUIDED
Extract based on feature analysis.
@ PATTERN_BASED
Extract based on pattern recognition.
@ REGION_BASED
Extract from spatial/temporal regions.
OperationType
Operation categories for organization and discovery.
std::vector< Datum< T > > as_io_batch(const std::vector< T > &inputs)
Convert a vector of raw values into a vector of Datum wrappers.
Definition DataIO.hpp:341
ExtractionScope
Scope control for extraction operations.
@ SAMPLED_DATA
Extract sampled/downsampled data.
@ FULL_DATA
Extract all available data.
@ FILTERED_CONTENT
Extract content meeting criteria.
@ TARGETED_REGIONS
Extract only specific regions.
Input/Output container for computation pipeline data flow with structure preservation.
Definition DataIO.hpp:24