# Read XML with PySpark

I'm trying to read XML with PySpark, but I have some problems.

I'm running this code:

```
df_cli = sqlContext.read.format('com.databricks.spark.xml').options(rowTag='Cli').load('MyFile.txt', schema = schema_xml)
```

and MyFile.txt:

```
<Cli Tp="1" Cd="8338" Autorzc="S">
<Op Contrt="1" NatuOp="01" Mod="1304">
<Venc v110=" 123" v120=" 123"/>
</Op>
<Op Contrt="2" NatuOp="01" Mod="1304">
<Venc v110=" 123" v120=" 123"/>
</Op>
</Cli>
<Cli Tp="2" Cd="8568" Autorzc="N">
<Op Contrt="3" NatuOp="01" Mod="1304">
<Venc v110=" 123" v120=" 123"/>
</Op>
<Op Contrt="4" NatuOp="01" Mod="1304">
<Venc v110=" 123" v120=" 123"/>
</Op>
</Cli>
```

Schema:

```
schema_xml = StructType([
StructField("@Autorzc", StringType(), True),
StructField("@Cd", StringType(), True),
StructField("@Tp", StringType(), True),
StructField("Op",
StructType([
StructField("@Contrt", StringType(), True),
StructField("@Mod", StringType(), True),
StructField("@NatuOp", StringType(), True),
StructField("Venc",
StructType([
StructField("@v110", StringType(), True),
StructField("@v120", StringType(), True)
])
)
])
)
])
```

When I run this code, my output is that:

```
Row(@Autorzc=u'S', @Cd=u'8338', @Tp=u'1', Op=Row(@Contrt=u'2', @Mod=u'1304', @NatuOp=u'01', Venc=Row(@v110=u' 123', @v120=u' 123)))
Row(@Autorzc=u'N', @Cd=u'8568', @Tp=u'2', Op=Row(@Contrt=u'4', @Mod=u'0202', @NatuOp=u'01', Venc=Row(@v110=u' 123', @v120=u' 123)))
```

The result is getting only last 'Op' in each 'Cli', but I expected to get all 'Op' inside each 'Cli' in a list, like this:

```
Row(@Autorzc=u'S', @Cd=u'8338', @Tp=u'1', Op=[Row(@Contrt=u'1', @Mod=u'1304', @NatuOp=u'01', Venc=Row(@v110=u' 123', @v120=u' 123)), Row(@Contrt=u'2', @Mod=u'1304', @NatuOp=u'01', Venc=Row(@v110=u' 123', @v120=u' 123))])
Row(@Autorzc=u'N', @Cd=u'8568', @Tp=u'2', Op=[Row(@Contrt=u'3', @Mod=u'1304', @NatuOp=u'01', Venc=Row(@v110=u' 123', @v120=u' 123)), Row(@Contrt=u'4', @Mod=u'1304', @NatuOp=u'01', Venc=Row(@v110=u' 123', @v120=u' 123))])
```

Obs: In my original file I can have 1, 2, 3...N 'Op' and each 'Op' has only 1 'Venc'.

I'm using this documentation.