ECS
阿里云挂载数据盘
yum update
yum install -y e2fsprogs
mkfs -t ext4 /dev/vdb
mkdir /data
mount /dev/vdb /data
sh -c "echo `sudo blkid /dev/vdb | awk '{print \$2}' | sed 's/\"//g'` /data ext4 defaults 0 0 >> /etc/fstab"
EMR on ECS
Componets
- Hadoop-Common
- Yarn
- port: 8088
- OSS-HDFS
- Hive
- port: 10000
- Hudi
- Iceberg
- Paimon
- Spark3
- port: 18080
Gateway
- 部署 Gateway:使用EMR-CLI自定义部署Gateway环境
- 通过集群Gateway节点提交作业
emrcli gateway deploy \
--clusterId c-b2a8a74c4d44c537 \
--appNames YARN,HIVE,HUDI,ICEBERG,SPARK3
Serverless EMR
DLF
VVP Flink SQL 对接 DLF Paimon Catalog Serverless Spark 对接 DLF Paimon Catalog Serverless StarRocks 对接 DLF Paimon Catalog
ACK
在 ACK 集群中调度虚拟节点:ACK集群虚拟节点调度方案对比及介绍 运行共享GPU调度示例:运行共享GPU调度示例
ACS
增加临时盘容量:增加临时存储空间大小
挂载 CPFS:静态挂载CPFS存储卷
使用 ossfs 2.0 挂载 OSS:在ACK中通过ossfs 2.0挂载静态OSS存储卷
使用性能型实例
apiVersion: v1
kind: Pod
metadata:
namespace: namespace
name: pod-name
labels:
app: app-name
alibabacloud.com/compute-class: performance
alibabacloud.com/compute-qos: default
spec:
containers:
- name: container-name
image: image:tag
imagePullPolicy: Always
command: ["python", "src/main.py"]
resources:
requests:
memory: "1Gi"
cpu: "1"
restartPolicy: Never
按照标签分账:使用标签实现成本分摊管理
Spark on ACS
Ray on ACS
CPFS
并行删除文件:Github - multi-thread-posix