Position Title: Lead/Architect – GPU Programming & Infrastructure
Location: Bronx, NY
Contract Duration: 6 Months
Employment Type: Contract
The Lead/Architect will play a critical role in deploying and optimizing LLMs (e.g., LLaMA), managing GPU-based inference infrastructure, and streamlining production-grade machine learning pipelines. This is a highly collaborative role requiring strong engineering skills, cloud expertise, and a deep understanding of model performance tuning.
Deploy and optimize LLMs using Hugging Face Transformers
Configure and manage GPU-based inference workflows with NVIDIA CUDA
Automate infrastructure and development tasks using Python, PyTorch, and shell scripting
Administer and maintain Linux (Ubuntu) environments for development and deployment
Provision and manage GPU resources in cloud environments (AWS EC2, GCP, Hugging Face Spaces)
Implement robust environment isolation and package management using Conda and pip
Lead the end-to-end design and deployment of scalable ML model pipelines
Partner with cross-functional engineering teams to ensure optimal model performance and reliability
Strong experience in deploying and fine-tuning LLMs in production environments
Proficiency with Hugging Face Transformers and PyTorch
Deep knowledge of GPU performance tuning and parallel processing
Fluency in Python and experience with Linux systems administration
Hands-on expertise with cloud GPU orchestration and containerized environments
Proven leadership in designing scalable AI/ML infrastructure
Ability to work in a fast-paced, performance-driven environment