Acting as a technical expert in AI data infrastructure, this position is responsible for designing, building, and optimizing data architectures that support AI systems such as Chatbots, Callbots, AutoCall, NLP pipelines, and Machine Learning systems.

The role ensures that the entire data lifecycle is effectively managed, scalable, reusable, and well-structured to support AI model training, evaluation, and deployment in real-world production environments.

Job Description

As a Senior Data Architecture, you will work closely with AI Engineers, AI Researchers, and Software Engineers to build a robust data foundation for AI systems. Key responsibilities include:

1. Design & Build AI Data Infrastructure

Design data architecture for NLP/LLM, Callbot, and Chatbot systems.
Build data processing pipelines: ingestion → cleaning → transformation → storage.
Organize and manage data lakes / object storage systems.
Design data schemas to support training, evaluation, and inference.

2. Build Pipelines for AI Systems

Design and optimize embedding and indexing pipelines (vector databases).
Manage data and embedding versioning.
Develop reproducible dataset mechanisms for training and fine-tuning.
Optimize data retrieval performance for real-time inference.

3. Support RAG / LLM Systems

Design storage structures for Retrieval-Augmented Generation (RAG) systems.
Manage chunking, indexing, and re-indexing strategies.
Monitor and optimize retrieval performance.

4. Logging, Monitoring & Data Governance

Design logging storage systems to support analysis and model improvement.
Ensure data lineage, tracking, and metadata management.
Collaborate with AI Engineers to build pipelines supporting MLOps.
Ensure data security and integrity.

5. Advisory & Data Standardization

Define data standards for the AI team.
Review pipelines and support data system optimization.
Contribute to establishing best practices for AI product development.

Requirements

Bachelor’s degree or higher in Computer Science, Information Systems, Data Science, or related fields.
Minimum 3 years of experience in Data Engineering.
Proficient in Python and SQL.
Experience building ETL/ELT pipelines.
Strong understanding of data architecture concepts: Data Lake, Data Warehouse.
Experience working with storage systems such as MinIO, GCS, or equivalent.
Experience with databases: PostgreSQL, MariaDB, NoSQL.
Understanding of distributed systems and data performance optimization.
Ability to collaborate closely with AI/ML teams.
Strong systems thinking, proactive mindset, and high sense of responsibility.
English proficiency: TOEIC ≥ 550 or equivalent.

Preferred Qualifications

Experience with Vector Databases (Milvus, Pinecone, Weaviate, pgvector, etc.).
Experience building pipelines for NLP/LLM or AI systems.
Experience with workflow orchestration tools (Airflow, Prefect, etc.).
Experience with containerization technologies (Docker, Kubernetes).
Understanding of MLOps: MLflow, experiment tracking, dataset versioning.
Prior experience building production-grade AI systems.